Tuesday, October 9, 2007

Shapefiles and PRJ - Tying them together

ESRI has developed a de-facto standard for Shapefiles, but sometimes we as users' of Shapefiles forget something, a Shapefile is actually a minimum of three files.

The three mandatory/required files are:
  • .shp - the file that stores the feature geometry.
  • .shx - the file that stores the index of the feature geometry.
  • .dbf - the dBASE file that stores the attribute information of features.

A recent thread on the GDAL Mailing List led to the first link about Shapefiles. The key to the thread was a discussion about where Projections are stored in the Shapefile definition. As it can be seen, there is no location in the required files for projections.

Therefore ESRI adopted a new file type (PRJ) with the extension .prj.

So how are projections defined in this file?

As we know coordinate systems in terms of mapping can either geographic (longitude, latitude) or projected (X, Y). In the PRJ definition, the coordinate system is composed of several objects, with every object having a keyword in uppercase. Objects can be composed of other objects. ESRI calls the string in this file a Projection Engine (PE). The Sole purpose of the Projection Engine is to store the metadata for a coordinate system in a string, or in a .prj file. This string, which ESRI also calls a PE (not for Physical Education!) string, must be continuous and not broken.

Now the scary part: You can define your own units, datums, and spheroids!

An example, taken from the ESRI website, shows how we can define in .prj file a projected or geographic definition.

As we know projected coordinate systems (of which maps are) are based upon a geographic coordinate system (latitude, longitude), so the in their sample file, a projected coordinate system first is defined.

For example, UTM zone 10N on the NAD83 datum is defined as

PROJCS["NAD_1983_UTM_Zone_10N",
,
PROJECTION["Transverse_Mercator"],
PARAMETER["False_Easting",500000.0],
PARAMETER["False_Northing",0.0],
PARAMETER["Central_Meridian",-123.0],
PARAMETER["Scale_Factor",0.9996],
PARAMETER["Latitude_of_Origin",0.0],
UNIT["Meter",1.0]]

The geographic coordinate system name is followed by the datum, the prime meridian, and the angular unit of measure.

The geographic coordinate system string for UTM zone 10N on NAD 1983 is:

GEOGCS["GCS_North_American_1983",
DATUM["D_North_American_1983",
SPHEROID["GRS_1980",6378137,298.257222101]],
PRIMEM["Greenwich",0],
UNIT["Degree",0.0174532925199433]]

The full string representation of NAD 1983 UTM zone 10N is:

PROJCS["NAD_1983_UTM_Zone_10N",
GEOGCS["GCS_North_American_1983",
DATUM["D_North_American_1983",
SPHEROID["GRS_1980",6378137,298.257222101]],
PRIMEM["Greenwich",0],
UNIT["Degree",0.0174532925199433]],
PROJECTION["Transverse_Mercator"],
PARAMETER["False_Easting",500000.0],
PARAMETER["False_Northing",0.0],
PARAMETER["Central_Meridian",-123.0],
PARAMETER["Scale_Factor",0.9996],
PARAMETER["Latitude_of_Origin",0.0],
UNIT["Meter",1.0]]

As I stated earlier, you can define your own to use the predefined names for map projection and parameter object's.

I hope the above information aids in your understanding of Shapefiles and how projections and coordinate systems are defined.

My best advice for dealing with PRJ files; copy one that you already have and use it as a base, then modify the parameters as you need.

Good luck and have fun!