Preparing ESRI shapefiles for online transformation
ESRI Shapefile format
The ESRI shapefile is a vector data storage format comprised of three files with the same file name prefix:
- shp: stores the geometry as a list of vertices
- shx: index of the geometry to enable fast, index-based searches
- dbf: dBase file used to store attribute information
Shapefiles may have additional files, such as .prj
which stores coordinate system information (used by ArcGIS), or .cpg
which specifies the code page for identifying the character set to be used. By definition, a vector is a Cartesian-based (i.e. X,Y) data structure used to store spatial data. Most geographic information systems, including ESRI's ArcGIS, store geospatial data in vector format. Other common spatial data formats include raster and TIN.
For more information, read the ESRI Shapefile Technical Description: An ESRI White Paper-July 1988
ESRI prj files and WKT files
A shapefile is typically accompanied by a .prj
file that contains a Well known text (WKT) string which stores coordinate reference system information. Well known text (WKT) is a text markup language for representing vector geometry objects on a map. The format was originally defined by the Open Geospatial Consortium (OGC) and described in their Simple Feature Access specification, to which ESRI was a contributing member. The current version of the WKT standard was published on August 13, 2019.
Not all .prj
files contain the same parameters. There are also syntactical differences between ESRI WKT files and OGC WKT files. Visit the EPSG.io to view examples of different WKT files. Here are examples of WKT files for EPSG:25832:
Example of Well Known Text as HTML
PROJCS["ETRS89 / UTM zone 32N",
GEOGCS["ETRS89",
DATUM["European_Terrestrial_Reference_System_1989",
SPHEROID["GRS 1980",6378137,298.257222101,
AUTHORITY["EPSG","7019"]],
TOWGS84[0,0,0,0,0,0,0],
AUTHORITY["EPSG","6258"]],
PRIMEM["Greenwich",0,
AUTHORITY["EPSG","8901"]],
UNIT["degree",0.0174532925199433,
AUTHORITY["EPSG","9122"]],
AUTHORITY["EPSG","4258"]],
PROJECTION["Transverse_Mercator"],
PARAMETER["latitude_of_origin",0],
PARAMETER["central_meridian",9],
PARAMETER["scale_factor",0.9996],
PARAMETER["false_easting",500000],
PARAMETER["false_northing",0],
UNIT["metre",1,
AUTHORITY["EPSG","9001"]],
AXIS["Easting",EAST],
AXIS["Northing",NORTH],
AUTHORITY["EPSG","25832"]]
Example of OGC WKT
PROJCS["ETRS89 / UTM zone 32N",
GEOGCS["ETRS89",
DATUM["European_Terrestrial_Reference_System_1989",
SPHEROID["GRS 1980",6378137,298.257222101,AUTHORITY["EPSG","7019"]],
TOWGS84[0,0,0,0,0,0,0],
AUTHORITY["EPSG","6258"]],
PRIMEM["Greenwich",0,AUTHORITY["EPSG","8901"]],
UNIT["degree",0.0174532925199433,AUTHORITY["EPSG","9122"]],
AUTHORITY["EPSG","4258"]],
PROJECTION["Transverse_Mercator"],
PARAMETER["latitude_of_origin",0],
PARAMETER["central_meridian",9],
PARAMETER["scale_factor",0.9996],
PARAMETER["false_easting",500000],
PARAMETER["false_northing",0],
UNIT["metre",1,AUTHORITY["EPSG","9001"]],
AXIS["Easting",EAST],
AXIS["Northing",NORTH],
AUTHORITY["EPSG","25832"]]
Example of ESRI WKT
PROJCS["ETRS89_UTM_zone_32N",
GEOGCS["GCS_ETRS_1989",
DATUM["D_ETRS_1989",
SPHEROID["GRS_1980",6378137,298.257222101]],
PRIMEM["Greenwich",0],
UNIT["Degree",0.017453292519943295]],
PROJECTION["Transverse_Mercator"],
PARAMETER["latitude_of_origin",0],
PARAMETER["central_meridian",9],
PARAMETER["scale_factor",0.9996],
PARAMETER["false_easting",500000],
PARAMETER["false_northing",0],
UNIT["Meter",1]]
Datum transformations
Datum transformations are required when your transformation from one coordinate system to another includes a geographic coordinate system. Datum transformations should be performed with caution because they can result in significant data shifts of 1 meter or more. There are several different methods used to perform datum transformation- Bursa-Wolf is one method.
The WKT parameter TOWGS84
is used to approximate a transformation from the horizontal datum to the WGS84 datum.
The original Simple Features specification of WKT does not specify TOWGS84
as a valid keyword. ESRI does not support the TOWGS84
parameter in WKT. Instead of using a fixed transformation, ESRI software asks the user to choose an appropriate transformation method when necessary, as does QGIS.
The current version of WKT has no backward compatibility with TOWGS84
. WKT descriptions of geodetic datums written to the OGC 01-009 specification are not readable by implementations of the current WKT specification if the optional TOWGS84
object is provided.
Coordinate reference system identification in hale»connect
hale»connect uses the open source Java GIS toolkit GeoTools to process spatial data. GeoTools, in turn, contains a copy of the EPSG database. When a shapefile is uploaded to the platform, hale»connect uses the GeoTools library to attempt to match the WKT contained in the .prj
file to the CRS definitions in the EPSG database. If a match is found, the data is subsequently processed using the matched EPSG CRS code.
In some cases, GeoTools cannot determine the CRS definition based on the information contained in the WKT. To support GeoTools in finding the correct match, the user can add an AUTHORITY
parameter as the final parameter to the WKT file. GeoTools then uses the user-supplied EPSG code in the AUTHORITY
parameter to apply the CRS definition defined in the EPSG database to the shapefile. These definitions also contain the Bursa-Wolf parameters required for datum transformation from the CRS definitions in the EPSG database.
Manually adding the AUTHORITY parameter to an ESRI prj file
You can manually edit an ESRI .prj
file to include the optional AUTHORITY
parameter. The parameter must be added as the final, terminating parameter in the file using the following pattern: AUTHORITY["EPSG","25832"]
. Although the use of the AUTHORITY
parameter is officially deprecated in the current version of WKT, for purposes of backward compatibility its use is still accepted.
In the current version of WKT, the AUTHORITY
object has been replaced by the ID
object. The identifier object is not as narrowly defined as AUTHORITY
was in previous versions.
Axis flip
The root cause for many of the axis-flip problems users experience, is that the axis order of the geometries in the shapefile is longitude, latitude/Easting, Northing but the WKT definition in the .prj
file describes a CRS that has latitude, longitude/Northing, Easting axis order. For example, WGS 84 = EPSG:4326.
This is problematic if no EPSG definition for the CRS with the axis order used in the shapefile exists, because no matching code can be provided for the AUTHORITY
parameter. In these cases it is necessary to use the "Flip coordinate order" toggle to flip the axis order after publishing the dataset.
In the end, the only way to handle axis order is to analyze the axis order of the shapefile geometries and determine if the axis order fits the CRS definition in the .prj
file and act accordingly, i.e. either add an AUTHORITY
parameter to the .prj
file or utilize the "Flip coordinate order" toggle in hale»connect.