Geo-Rails Part 4: Coordinate Systems and Projections

2011-12-12 georails

When people speak of a learning curve in geospatial programming, they're usually referring to handling coordinate systems. It's true that many spatial applications require close attention to the coordinate system, and it's true that there are some difficult concepts involved. However, it's been my experience that once the light bulb turns on, it opens up a lot of the power and potential of geodata.

In this article, we'll take a first look at coordinate systems and geographic projections. We will:

Examine the importance and effect of coordinate system differences
Survey the various coordinate systems used for geospatial data
Become familiar with coordinate system representations and SRIDs
Specify coordinate systems in RGeo factories
Use RGeo to convert data between coordinate systems
Learn how to handle coordinate systems in Rails

This is part 4 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit http://daniel-azuma.com/articles/georails.

Why coordinate systems are important: a cautionary fable

Once upon a time, a plane took off from San Francisco, California on a routine flight to Athens, Greece. The captain, being both valiant and diligent with his passengers' safety, sent a Twitter message to air traffic control. He wanted to ensure that his flight route would not be disrupted, for he had read on Hacker News that the diabolical EvilVolcano in Iceland was erupting, sending a deadly ash cloud high into the air lanes.

Meanwhile, an air traffic technician, having just finished installing a brand new Rails-based flight planning application, received the tweet. "@air_traffic_control pls chk flt path SFO-ATH. Far enuf fr #EvilVolcano?"

Excited to use his new tool, the technician got busy with his analysis. He looked up the latitude-longitude coordinates of San Francisco and Athens, and plotted a straight line between them, using Google Maps to verify that his path looked correct. Then he looked up the latitude-longitude coordinates of the diabolical EvilVolcano, and calculated the distance between it and the plane's expected path.

"Lo!" he exclaimed. "Surely the flight path shall follow a straight line between two points, right along the 38-degree latitude line. But look-- Iceland is nowhere near that path. The safety of our valiant pilot is assured!"

The flight path as seen by the air traffic technician

The technician proudly tweeted back his findings: "@valiant_pilot Flt path safe dist fr #EvilVolcano. Have pleasant journey."

Having received this response on Twitter, the pilot proceeded to take off on his planned route. The last tweet received from the ill-fated flight, some seven hours later, read as follows: "Oceanic Air flt 815 encountering ash #SmokeMonster from #EvilVolcano. #Mayday #Mayday". And the rest is history.

What went wrong?

For the most part, the air traffic technician did the right thing. He looked up the latitude-longitude coordinates of the departure and arrival cities, drew a straight line path between them, and then computed the distance bewteen that path and the EvilVolcano in Iceland. That is, he computed:

Distance(LineString(-122.4 37.8, 23.7 37.9), Point(-19.6 63.6))

However, he missed one thing. The shape of a "straight" line drawn between two points, and thus the distance calculated, may be vastly different depending on what coordinate system you are using, even though the latitude and longitude coordinates are the same!

Google Maps uses a Mercator Projection to display a world map. In this flat coordinate system, a straight line between San Francisco and Athens follows the 38-degree latitude line, passing through muggy Virginia and sunny southern Spain on its way to balmy (albeit bankrupt) Athens.

But the earth itself is not flat. Using a spherical coordinate system, the straight line shortest path between the two cities across the surface of the globe passes further north, directly over chilly and volcano-infested Iceland. This is the actual flight path of the ill-fated valiant pilot.

The actual shortest straight-line path from San Francisco to Athens

Simply by using the wrong coordinate system, the air traffic technician got a grossly innacurate answer, resulting in dozens of innocent passengers becoming instant prime-time celebrities.

Coordinate systems for geo data

At a basic level, a coordinate system can be thought of as providing "meaning" to a set of coordinates. When you see the location "Point(-19.6 63.6)", how do you interpret those numbers? They could be the latitude and longitude of the Iceland volcano, but they could equally be measurements in feet from your front door, or light years from Alpha Centauri. The coordinate system is what differentiates these cases.

Location applications generally work with coordinate systems related to the earth's surface, and these coordinate systems fall into three types.

Geocentric coordinate systems are three-dimensional coordinate systems with the origin located at the earth's center. You won't generally see much data in a geocentric coordinate system, but it is sometimes a convenient coordinate system to use for computational geometry and analysis algorithms.

Geocentric coordinates measure X, Y, and Z distances from the center of the earth. (Credit: http://kartoweb.itc.nl/geometrics/Coordinate%20systems/coordsys.html)

Geographic coordinate systems are the familiar latitude-longitude systems identifying points on the earth's surface in terms of degrees. The most common geographic coordinate system, the one used by GPS and expected by most mapping applications, is known as "WGS 84".

Geographic coordinates measure our familiar latitude and longitude. (Credit: http://kartoweb.itc.nl/geometrics/Coordinate%20systems/coordsys.html)

Projected coordinate systems involve taking a portion of the earth's surface and "flattening" it. These coordinate systems are planar and generally Cartesian for easy display and computation; however, they introduce various kinds and amounts of distortion. Whenever you see a map displayed on a piece of paper, a computer screen, or other flat medium (basically anything other than a globe), you are looking at a projection.

There are hundreds of different projections in use, from common projections used for world maps, to special-purpose projections used for specific regions. When you view a Google Map, you are looking at a Mercator Projection. This is a projection designed to preserve shapes and straight-line directions, at the expense of distorting sizes and distances away from the Equator. A Google Map, for instance, implies that Greenland is larger than Africa, when in fact it is much, much smaller.

Google Maps uses a Mercator Projection

The United Nations logo includes a polar projection putting the north pole at the center. In this projection the pole is the least distorted region of the world, and everything else revolves around it, symbolizing the ideal of a world community privileging no particular country (except possibly the northern hemisphere, but we'll ignore the politics for our purposes...)

The United Nations logo includes a north polar projection

Finally, whenever you look at a folding map-- whether a street map for a city, a state map, or any other map that shows a limited area-- you are usually looking at a local projection, one specifically tailored to that particular limited area. Such projections define a particular area in which they make sense. Objects inside that boundary can usually be displayed with minimal distortion, while objects outside that boundary often cannot be described at all.

Representing and specifying coordinate systems

Whenever you receive location data-- whether from geocoding, user input, an external database, or any other source-- the data should come with a coordinate system. Currently, there are two common ways a coordinate system can be defined:

Proj4 Syntax. One common way to specify a coordinate system is through a syntax defined by the Proj library. We won't go into detail on the syntax here, but it is intended to describe how to convert data to and from that coordinate system. That is, if you receive data in a projected coordinate system, if you have the Proj4 syntax for that projection, the Proj library can convert the data back into latitude and longitude, and vice versa. Because of this useful property, Proj4 syntax is quite ubiquitous. Below is an example of the Proj4 syntax for "NAD83 / Washington North", a local projection commonly used for topgraphic mapping in northern Washington state. For now, don't worry if you don't understand every field. This is just an example so that you can recognize Proj4 syntax when you see it.

+proj=lcc +lat_1=48.73333333333333 +lat_2=47.5 +lat_0=47
  +lon_0=-120.8333333333333 +x_0=500000.0001016001 +y_0=0
  +ellps=GRS80 +datum=NAD83 +to_meter=0.3048006096012192 +no_defs

OGC well-known-text (or WKT). The Open Geospatial Consortium, the standards body that developed the Simple Feature Access specification described in part 3, also developed a syntax for representing coordinate systems and transformations. Below is the well-known-text for "NAD83 / Washington North". Again, for now you don't need to understand every field here, but you should be able to recognize WKT format when you see it.

PROJCS["NAD83 / Washington North (ftUS)",
  GEOGCS["NAD83",
    DATUM["North_American_Datum_1983",
      SPHEROID["GRS 1980",6378137,298.257222101,
        AUTHORITY["EPSG","7019"]],
      AUTHORITY["EPSG","6269"]],
    PRIMEM["Greenwich",0,
      AUTHORITY["EPSG","8901"]],
    UNIT["degree",0.01745329251994328,
      AUTHORITY["EPSG","9122"]],
    AUTHORITY["EPSG","4269"]],
  UNIT["US survey foot",0.3048006096012192,
    AUTHORITY["EPSG","9003"]],
  PROJECTION["Lambert_Conformal_Conic_2SP"],
  PARAMETER["standard_parallel_1",48.73333333333333],
  PARAMETER["standard_parallel_2",47.5],
  PARAMETER["latitude_of_origin",47],
  PARAMETER["central_meridian",-120.8333333333333],
  PARAMETER["false_easting",1640416.667],
  PARAMETER["false_northing",0],
  AUTHORITY["EPSG","2285"],
  AXIS["X",EAST],
  AXIS["Y",NORTH]]

As we saw above, each piece of geometry needs a corresponding coordinate system in order to specify the meaning of its coordinates and thus how to handle its data. Instead of attaching an entire Proj4 or WKT formatted string to every latitude-longitude point in a system, most geospatial systems provide a database of coordinate systems, each identified by an ID known as the Spatial Reference ID (or SRID). Each geometric data object then includes an SRID field referencing an entry in that database.

Technically, a geospatial system can provide its own spatial reference database and set its own SRIDs. However, in practice, many systems use a de facto standard dataset known as the EPSG dataset. This is a database of several thousand coordinate systems managed by the International Association of Oil & Gas Producers. The EPSG dataset is ubiquitous enough that most spatial database tools include a copy of it. A spatially-enabled PostGIS database, for example, automatically includes a table called spatial_ref_sys that is typically prepopulated with the EPSG dataset. You can look up SRIDs and get the coordinate system name, the WKT representation, and sometimes the Proj4 representation. The "NAD83 / Washington North" coordinate system example above has SRID 2285 in the EPSG database.

One important EPSG-specified SRID that you will encounter often is 4326. 4326 refers to the "WGS 84" geographic (latitude-longitude) coordinate system we mentioned earlier-- the coordinate system used by Global Positioning System (GPS). Typically, when you get a latitude-longitude coordinate from a GPS system, from a geocoder, from a Google map input, or most other common sources, it will implicitly be in the EPSG 4326 coordinate system. This is so universally true that PostGIS currently mandates that the SRID must be set to 4326 when you create a column of type "geography" (that is, a latitude-longitude column), as we saw in part 2.

You can browse the EPSG database at http://www.spatialreference.org/.

Coordinate systems for RGeo factories

In part 3, we discussed how RGeo manages coordinate systems through factories. Now that we understand coordinate systems in more detail, we can take a closer look at how to handle coordinate systems in Ruby.

RGeo supports two main types of factories: Cartesian factories and geographic factories. Cartesian factories are ideal for handling projected coordinate systems, in which the domain is flat. Geographic factories are useful for geographic coordinate systems representing latitude and longitude, in which the domain is curved like the surface of the earth.

When you create an RGeo factory, you may specify the exact coordinate system by passing arguments to the factory constructor. In most cases, you should provide an SRID using the :srid parameter. You may also provide Proj4 and/or WKT representations for the coordinate system using the :proj4 and :coord_sys parmeters, respectively. For example, here's how you could create a factory designed to handle the "NAD83 / Washington North" projection we discussed earlier.

north_wa_proj4 = '+proj=lcc +lat_1=48.73333333333333 +lat_2=47.5 ' +
  '+lat_0=47 +lon_0=-120.8333333333333 +x_0=500000.0001016001 ' +
  '+y_0=0 +ellps=GRS80 +datum=NAD83 +to_meter=0.3048006096012192 ' +
  '+no_defs'
north_wa_wkt = <<WKT
  PROJCS["NAD83 / Washington North (ftUS)",
    GEOGCS["NAD83",
      DATUM["North_American_Datum_1983",
        SPHEROID["GRS 1980",6378137,298.257222101,
          AUTHORITY["EPSG","7019"]],
        AUTHORITY["EPSG","6269"]],
      PRIMEM["Greenwich",0,
        AUTHORITY["EPSG","8901"]],
      UNIT["degree",0.01745329251994328,
        AUTHORITY["EPSG","9122"]],
      AUTHORITY["EPSG","4269"]],
    UNIT["US survey foot",0.3048006096012192,
      AUTHORITY["EPSG","9003"]],
    PROJECTION["Lambert_Conformal_Conic_2SP"],
    PARAMETER["standard_parallel_1",48.73333333333333],
    PARAMETER["standard_parallel_2",47.5],
    PARAMETER["latitude_of_origin",47],
    PARAMETER["central_meridian",-120.8333333333333],
    PARAMETER["false_easting",1640416.667],
    PARAMETER["false_northing",0],
    AUTHORITY["EPSG","2285"],
    AXIS["X",EAST],
    AXIS["Y",NORTH]]
WKT
north_wa_factory = RGeo::Cartesian.factory(:srid => 2285,
  :proj4 => north_wa_proj4, :coord_sys => north_wa_wkt)

Notice that, since the coordinate system is a projection, we're using a Cartesian factory that will perform computations in a flat domain.

When you work with a projected coordinate system like this one, the coordinates themselves are expressed in the projection rather than in latitude and longitude. For example, the location of the Space Needle in Seattle, which is at latitude 47.620578, longitude -122.34978, is expressed as (1266457.58, 230052.50) in this coordinate system.

space_needle = north_wa_factory.point(1266457.58, 230052.50)

Let's consider another example. If you want to work with latitudes and longitudes, say from a GPS system, then you should use the "WGS 84" coordinate system, which has SRID 4326. Looking up this coordinate system in spatialreference.org, we can find its Proj4 and WKT forms:

wgs84_proj4 = '+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs'
wgs84_wkt = <<WKT
  GEOGCS["WGS 84",
    DATUM["WGS_1984",
      SPHEROID["WGS 84",6378137,298.257223563,
        AUTHORITY["EPSG","7030"]],
      AUTHORITY["EPSG","6326"]],
    PRIMEM["Greenwich",0,
      AUTHORITY["EPSG","8901"]],
    UNIT["degree",0.01745329251994328,
      AUTHORITY["EPSG","9122"]],
    AUTHORITY["EPSG","4326"]]
WKT

To create a factory for this coordinate system, we should use one of the geographic factories provided by RGeo. These factories perform computations over a curved earth rather than a flat earth. For example, they will correctly draw the line between San Francisco and Athens so that it passes through Iceland. You can use RGeo::Geographic.spherical_factory to create a factory that performs computations on a spherical earth:

wgs84_factory = RGeo::Geographic.spherical_factory(:srid => 4326,
  :proj4 => wgs84_proj4, :coord_sys => wgs84_wkt)

We now have a factory that will correctly manage features using latitude/longitude.

space_needle = wgs84_factory.point(-122.34978, 47.620578)

Technically, the earth is not a perfect sphere, but is slightly flattened. The WGS84 coordinate system actually uses a flattened ellipsoid to more closely model this shape. However, RGeo does not yet support ellipsoidal computations, so we used a spherical factory instead as an approximation. In many cases, this will be good enough, but if you need accuracy down to the millimeter, this is something to be aware of.

Converting data between coordinate systems

The above examples may have sparked an important question. In the examples in parts 2 and 3, we created factories without the benefit of the proj4 and wkt strings, and in some cases we also omitted the SRID. Under what circumstances are proj4, wkt, and/or SRID needed when you create an RGeo factory, and under what circumstances can you leave them out?

SRID is generally needed when you want to store your data in a PostGIS database. This is because PostGIS generally puts an SRID constraint on spatial columns that you create, in order to make sure you don't mismatch coordinate systems. So, if you are storing data in or pulling data from PostGIS, you should always specify the SRID in the factory.

The proj4 string has a different purpose: it is used by the Proj library to describe how to convert coordinates between coordinate systems. Here is how this works.

Suppose you are reading a set of points from a data source that uses the "NAD83 / Washington North" coordinate system, and you wanted to convert them to latitude and longitude. To perform this conversion, you need two factories, one in the source coordinate system and one in the destination coordinate system. Both factories need to have their :proj4 string set. This lets the Proj library understand both coordinate systems so it can figure out how to convert from one to the other.

In the above examples when we created our north_wa_factory and wgs84_factory, we provided the needed proj4 strings. So these factories are ready to be used.

Now, to actually convert the data, load it using the source factory, and then use RGeo's "cast" mechanism to cast it to the other factory. Call RGeo::Feature.cast, pass it the original object and the destination factory, and set the :project argument to true to tell it to transform coordinates.

space_needle = north_wa_factory.point(1266457.58, 230052.50)
space_needle_latlon = RGeo::Feature.cast(space_needle,
  :factory => wgs84_factory, :project => true)

This code tells RGeo to take the space_needle object (which is in the projected coordinate system) and convert it to the WGS84 factory, while transforming (projecting) its coordinates so they are correct in the new factory's coordinate system. As a result, space_needle_latlon is created, containing latitude and longitude coordinates, and with its factory set to wgs84_factory.

What about the WKT string? Currently, RGeo does not have any functional use for the WKT string; it is just an informational field. So you can omit it if you want. However, it turns out that in many cases you can theoretically use the WKT to transform coordinates in the same way as the Proj4 string. RGeo will likely provide this capability in the future-- you will be able to pass WKT instead of Proj4 to allow factories to transform coordinates.

Using Coordinate Systems in Rails

In part 2, we looked at an example in which we created a PostGIS column that stored point data in geographic (latitude-longitude) coordinates. We did this by installing the activerecord-postgis-adapter, which extends ActiveRecord migrations to create spatial columns, and which exposes spatial attributes as RGeo data objects.

Now let's consider another example. Suppose we obtained a set of polygons (say, zip code boundaries) from a data source that uses the "NAD83 / Washington North" projection. We could convert the polygons to latitude-longitude, but remember that doing so can actually change the shape of the polygon. So let's suppose we decided this was unacceptable, and we need to store the polygons in the database in the projected coordinate system. Here's the migration for this case:

class CreateZipCodes < ActiveRecord::Migration
  def change
    create_table :zip_codes do |t|
      t.integer :zip
      t.polygon :boundary, :srid => 2285
    end
  end
end

The name of our column is "boundary", and we set its type to "polygon". Now, in the example in part 2, we had set the :geographic property of the column, indicating that it was storing latitude and longitude and that it should use the PostGIS features designed for that case. In this new example, we are storing projected coordinates, so we do not set :geographic. Instead, we just set the SRID to match the coordinate system that we are using. Setting a SRID on the database column actually sets up a database constraint: it allows only geometries with a matching SRID to be stored in the column. This is PostGIS's way of helping you avoid the mistake of our unfortunate air traffic control technician: that of mismatching our coordinate systems.

As we did in part 2, we also need to set the factory for this column so that Rails knows what factory to use when it reads geometries from the database.

class ZipCode < ActiveRecord::Base
  north_wa_factory = ... # use the factory we created earlier
  set_rgeo_factory_for_column(:boundary, north_wa_factory)
end

In general, it is important that the factory you set in your ActiveRecord class matches the constraints in the database column, so that both sides handle the data in the same way. In particular, the SRIDs should match, and either both should be Cartesian or both should be geographic.

Now we can read and write the polygon data. We just need to remember that the data is not in latitude-longitude, but in the projected coordinate system. So when we write polygons to the database and read from the database, we will receive projected coordinates.

This example, of course, brings up an important question. We need to decide up front whether the database should contain projected data or latitude-longitude data. How do we choose? This can be a somewhat complicated question, and I will dive more deeply into the pros and cons of different strategies in a later article. However, for now we know enough to understand a few of the issues.

In many simple cases-- if you are working only with points, or with small features that do not cover a large part of the globe, or if extreme accuracy is not important-- you may find it easiest to think in latitude and longitude. In those cases, you can just create a :geographic column in the database and convert everything to a geographic coordinate system. Just be aware that there are potential issues whenever you have to convert data from one coordinate system to another. As we saw in our story, lines and polygons that span a large area can change their shape dramatically when switching coordinate systems. So if accuracy is essential, it may be desirable for your database to use the same coordinate system as your data source. You also should consider what type of spatial queries you are likely to run against your data. Remember that coordinates in a query must match the SRID and coordinate system of the data in your database.

Where to go from here

Congratulations on making it through this article! Understanding coordinate systems can be tricky, but it is very necessary for doing nontrivial applications. In this discussion, I've deliberately left out a number of more advanced topics that I'll probably cover in a later article. But you should now have enough information so you won't get lost when people start talking about projections and SRIDs.

If you'd like to explore more about map projections and geospatial coordinate systems, the articles on Wikipedia are not too bad. I'm not aware of any good books on this material geared towards web developers, but if you know of any, please send me a line. For reference, you'll probably find yourself going to spatialreference.org frequently when you need information on the coordinate system referenced by a particular SRID.

For the next article, I'm currently planning on covering common file and data interchange formats used for geospatial data. Stay tuned, and let's bring Rails down to earth!

This is part 4 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit http://daniel-azuma.com/articles/georails.