Originally published Dec 12, 2011
When people speak of a learning curve in geospatial programming, they're usually referring to handling coordinate systems. It's true that many spatial applications require close attention to the coordinate system, and it's true that there are some difficult concepts involved. However, it's been my experience that once the light bulb turns on, it opens up a lot of the power and potential of geodata.
In this article, we'll take a first look at coordinate systems and geographic projections. We will:
This is part 4 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit http://daniel-azuma.com/articles/georails.
Once upon a time, a plane took off from San Francisco, California on a routine flight to Athens, Greece. The captain, being both valiant and diligent with his passengers' safety, sent a Twitter message to air traffic control. He wanted to ensure that his flight route would not be disrupted, for he had read on Hacker News that the diabolical EvilVolcano in Iceland was erupting, sending a deadly ash cloud high into the air lanes.
Meanwhile, an air traffic technician, having just finished installing a brand new Rails-based flight planning application, received the tweet. "
Excited to use his new tool, the technician got busy with his analysis. He looked up the latitude-longitude coordinates of San Francisco and Athens, and plotted a straight line between them, using Google Maps to verify that his path looked correct. Then he looked up the latitude-longitude coordinates of the diabolical EvilVolcano, and calculated the distance between it and the plane's expected path.
"Lo!" he exclaimed. "Surely the flight path shall follow a straight line between two points, right along the 38-degree latitude line. But look-- Iceland is nowhere near that path. The safety of our valiant pilot is assured!"
The technician proudly tweeted back his findings: "
Having received this response on Twitter, the pilot proceeded to take off on his planned route. The last tweet received from the ill-fated flight, some seven hours later, read as follows: "
What went wrong?
For the most part, the air traffic technician did the right thing. He looked up the latitude-longitude coordinates of the departure and arrival cities, drew a straight line path between them, and then computed the distance bewteen that path and the EvilVolcano in Iceland. That is, he computed:
Distance(LineString(-122.4 37.8, 23.7 37.9), Point(-19.6 63.6))
However, he missed one thing. The shape of a "straight" line drawn between two points, and thus the distance calculated, may be vastly different depending on what coordinate system you are using, even though the latitude and longitude coordinates are the same!
Google Maps uses a Mercator Projection to display a world map. In this flat coordinate system, a straight line between San Francisco and Athens follows the 38-degree latitude line, passing through muggy Virginia and sunny southern Spain on its way to balmy (albeit bankrupt) Athens.
But the earth itself is not flat. Using a spherical coordinate system, the straight line shortest path between the two cities across the surface of the globe passes further north, directly over chilly and volcano-infested Iceland. This is the actual flight path of the ill-fated valiant pilot.
Simply by using the wrong coordinate system, the air traffic technician got a grossly innacurate answer, resulting in dozens of innocent passengers becoming instant prime-time celebrities.
At a basic level, a coordinate system can be thought of as providing "meaning" to a set of coordinates. When you see the location "
Location applications generally work with coordinate systems related to the earth's surface, and these coordinate systems fall into three types.
Geocentric coordinate systems are three-dimensional coordinate systems with the origin located at the earth's center. You won't generally see much data in a geocentric coordinate system, but it is sometimes a convenient coordinate system to use for computational geometry and analysis algorithms.
Geographic coordinate systems are the familiar latitude-longitude systems identifying points on the earth's surface in terms of degrees. The most common geographic coordinate system, the one used by GPS and expected by most mapping applications, is known as "WGS 84".
Projected coordinate systems involve taking a portion of the earth's surface and "flattening" it. These coordinate systems are planar and generally Cartesian for easy display and computation; however, they introduce various kinds and amounts of distortion. Whenever you see a map displayed on a piece of paper, a computer screen, or other flat medium (basically anything other than a globe), you are looking at a projection.
There are hundreds of different projections in use, from common projections used for world maps, to special-purpose projections used for specific regions. When you view a Google Map, you are looking at a Mercator Projection. This is a projection designed to preserve shapes and straight-line directions, at the expense of distorting sizes and distances away from the Equator. A Google Map, for instance, implies that Greenland is larger than Africa, when in fact it is much, much smaller.
The United Nations logo includes a polar projection putting the north pole at the center. In this projection the pole is the least distorted region of the world, and everything else revolves around it, symbolizing the ideal of a world community privileging no particular country (except possibly the northern hemisphere, but we'll ignore the politics for our purposes...)
Finally, whenever you look at a folding map-- whether a street map for a city, a state map, or any other map that shows a limited area-- you are usually looking at a local projection, one specifically tailored to that particular limited area. Such projections define a particular area in which they make sense. Objects inside that boundary can usually be displayed with minimal distortion, while objects outside that boundary often cannot be described at all.
Whenever you receive location data-- whether from geocoding, user input, an external database, or any other source-- the data should come with a coordinate system. Currently, there are two common ways a coordinate system can be defined:
Proj4 Syntax. One common way to specify a coordinate system is through a syntax defined by the Proj library. We won't go into detail on the syntax here, but it is intended to describe how to convert data to and from that coordinate system. That is, if you receive data in a projected coordinate system, if you have the Proj4 syntax for that projection, the Proj library can convert the data back into latitude and longitude, and vice versa. Because of this useful property, Proj4 syntax is quite ubiquitous. Below is an example of the Proj4 syntax for "NAD83 / Washington North", a local projection commonly used for topgraphic mapping in northern Washington state. For now, don't worry if you don't understand every field. This is just an example so that you can recognize Proj4 syntax when you see it.
+proj=lcc +lat_1=48.73333333333333 +lat_2=47.5 +lat_0=47 +lon_0=-120.8333333333333 +x_0=500000.0001016001 +y_0=0 +ellps=GRS80 +datum=NAD83 +to_meter=0.3048006096012192 +no_defs
OGC well-known-text (or WKT). The Open Geospatial Consortium, the standards body that developed the Simple Feature Access specification described in part 3, also developed a syntax for representing coordinate systems and transformations. Below is the well-known-text for "NAD83 / Washington North". Again, for now you don't need to understand every field here, but you should be able to recognize WKT format when you see it.
PROJCS["NAD83 / Washington North (ftUS)", GEOGCS["NAD83", DATUM["North_American_Datum_1983", SPHEROID["GRS 1980",6378137,298.257222101, AUTHORITY["EPSG","7019"]], AUTHORITY["EPSG","6269"]], PRIMEM["Greenwich",0, AUTHORITY["EPSG","8901"]], UNIT["degree",0.01745329251994328, AUTHORITY["EPSG","9122"]], AUTHORITY["EPSG","4269"]], UNIT["US survey foot",0.3048006096012192, AUTHORITY["EPSG","9003"]], PROJECTION["Lambert_Conformal_Conic_2SP"], PARAMETER["standard_parallel_1",48.73333333333333], PARAMETER["standard_parallel_2",47.5], PARAMETER["latitude_of_origin",47], PARAMETER["central_meridian",-120.8333333333333], PARAMETER["false_easting",1640416.667], PARAMETER["false_northing",0], AUTHORITY["EPSG","2285"], AXIS["X",EAST], AXIS["Y",NORTH]]
As we saw above, each piece of geometry needs a corresponding coordinate system in order to specify the meaning of its coordinates and thus how to handle its data. Instead of attaching an entire Proj4 or WKT formatted string to every latitude-longitude point in a system, most geospatial systems provide a database of coordinate systems, each identified by an ID known as the Spatial Reference ID (or SRID). Each geometric data object then includes an SRID field referencing an entry in that database.
Technically, a geospatial system can provide its own spatial reference database and set its own SRIDs. However, in practice, many systems use a de facto standard dataset known as the EPSG dataset. This is a database of several thousand coordinate systems managed by the International Association of Oil & Gas Producers. The EPSG dataset is ubiquitous enough that most spatial database tools include a copy of it. A spatially-enabled PostGIS database, for example, automatically includes a table called
One important EPSG-specified SRID that you will encounter often is 4326. 4326 refers to the "WGS 84" geographic (latitude-longitude) coordinate system we mentioned earlier-- the coordinate system used by Global Positioning System (GPS). Typically, when you get a latitude-longitude coordinate from a GPS system, from a geocoder, from a Google map input, or most other common sources, it will implicitly be in the EPSG 4326 coordinate system. This is so universally true that PostGIS currently mandates that the SRID must be set to 4326 when you create a column of type "geography" (that is, a latitude-longitude column), as we saw in part 2.
You can browse the EPSG database at http://www.spatialreference.org/.
In part 3, we discussed how RGeo manages coordinate systems through factories. Now that we understand coordinate systems in more detail, we can take a closer look at how to handle coordinate systems in Ruby.
RGeo supports two main types of factories: Cartesian factories and geographic factories. Cartesian factories are ideal for handling projected coordinate systems, in which the domain is flat. Geographic factories are useful for geographic coordinate systems representing latitude and longitude, in which the domain is curved like the surface of the earth.
When you create an RGeo factory, you may specify the exact coordinate system by passing arguments to the factory constructor. In most cases, you should provide an SRID using the
north_wa_proj4 = '+proj=lcc +lat_1=48.73333333333333 +lat_2=47.5 ' + '+lat_0=47 +lon_0=-120.8333333333333 +x_0=500000.0001016001 ' + '+y_0=0 +ellps=GRS80 +datum=NAD83 +to_meter=0.3048006096012192 ' + '+no_defs' north_wa_wkt = <<WKT PROJCS["NAD83 / Washington North (ftUS)", GEOGCS["NAD83", DATUM["North_American_Datum_1983", SPHEROID["GRS 1980",6378137,298.257222101, AUTHORITY["EPSG","7019"]], AUTHORITY["EPSG","6269"]], PRIMEM["Greenwich",0, AUTHORITY["EPSG","8901"]], UNIT["degree",0.01745329251994328, AUTHORITY["EPSG","9122"]], AUTHORITY["EPSG","4269"]], UNIT["US survey foot",0.3048006096012192, AUTHORITY["EPSG","9003"]], PROJECTION["Lambert_Conformal_Conic_2SP"], PARAMETER["standard_parallel_1",48.73333333333333], PARAMETER["standard_parallel_2",47.5], PARAMETER["latitude_of_origin",47], PARAMETER["central_meridian",-120.8333333333333], PARAMETER["false_easting",1640416.667], PARAMETER["false_northing",0], AUTHORITY["EPSG","2285"], AXIS["X",EAST], AXIS["Y",NORTH]] WKT north_wa_factory = RGeo::Cartesian.factory(:srid => 2285, :proj4 => north_wa_proj4, :coord_sys => north_wa_wkt)
Notice that, since the coordinate system is a projection, we're using a Cartesian factory that will perform computations in a flat domain.
When you work with a projected coordinate system like this one, the coordinates themselves are expressed in the projection rather than in latitude and longitude. For example, the location of the Space Needle in Seattle, which is at latitude 47.620578, longitude -122.34978, is expressed as
space_needle = north_wa_factory.point(1266457.58, 230052.50)
Let's consider another example. If you want to work with latitudes and longitudes, say from a GPS system, then you should use the "WGS 84" coordinate system, which has SRID 4326. Looking up this coordinate system in spatialreference.org, we can find its Proj4 and WKT forms:
wgs84_proj4 = '+proj=longlat +ellps=WGS84 +datum=WGS84 +no_defs' wgs84_wkt = <<WKT GEOGCS["WGS 84", DATUM["WGS_1984", SPHEROID["WGS 84",6378137,298.257223563, AUTHORITY["EPSG","7030"]], AUTHORITY["EPSG","6326"]], PRIMEM["Greenwich",0, AUTHORITY["EPSG","8901"]], UNIT["degree",0.01745329251994328, AUTHORITY["EPSG","9122"]], AUTHORITY["EPSG","4326"]] WKT
To create a factory for this coordinate system, we should use one of the geographic factories provided by RGeo. These factories perform computations over a curved earth rather than a flat earth. For example, they will correctly draw the line between San Francisco and Athens so that it passes through Iceland. You can use
wgs84_factory = RGeo::Geographic.spherical_factory(:srid => 4326, :proj4 => wgs84_proj4, :coord_sys => wgs84_wkt)
We now have a factory that will correctly manage features using latitude/longitude.
space_needle = wgs84_factory.point(-122.34978, 47.620578)
Technically, the earth is not a perfect sphere, but is slightly flattened. The WGS84 coordinate system actually uses a flattened ellipsoid to more closely model this shape. However, RGeo does not yet support ellipsoidal computations, so we used a spherical factory instead as an approximation. In many cases, this will be good enough, but if you need accuracy down to the millimeter, this is something to be aware of.
The above examples may have sparked an important question. In the examples in parts 2 and 3, we created factories without the benefit of the proj4 and wkt strings, and in some cases we also omitted the SRID. Under what circumstances are proj4, wkt, and/or SRID needed when you create an RGeo factory, and under what circumstances can you leave them out?
SRID is generally needed when you want to store your data in a PostGIS database. This is because PostGIS generally puts an SRID constraint on spatial columns that you create, in order to make sure you don't mismatch coordinate systems. So, if you are storing data in or pulling data from PostGIS, you should always specify the SRID in the factory.
The proj4 string has a different purpose: it is used by the Proj library to describe how to convert coordinates between coordinate systems. Here is how this works.
Suppose you are reading a set of points from a data source that uses the "NAD83 / Washington North" coordinate system, and you wanted to convert them to latitude and longitude. To perform this conversion, you need two factories, one in the source coordinate system and one in the destination coordinate system. Both factories need to have their
In the above examples when we created our
Now, to actually convert the data, load it using the source factory, and then use RGeo's "cast" mechanism to cast it to the other factory. Call
space_needle = north_wa_factory.point(1266457.58, 230052.50) space_needle_latlon = RGeo::Feature.cast(space_needle, :factory => wgs84_factory, :project => true)
This code tells RGeo to take the
What about the WKT string? Currently, RGeo does not have any functional use for the WKT string; it is just an informational field. So you can omit it if you want. However, it turns out that in many cases you can theoretically use the WKT to transform coordinates in the same way as the Proj4 string. RGeo will likely provide this capability in the future-- you will be able to pass WKT instead of Proj4 to allow factories to transform coordinates.
In part 2, we looked at an example in which we created a PostGIS column that stored point data in geographic (latitude-longitude) coordinates. We did this by installing the activerecord-postgis-adapter, which extends ActiveRecord migrations to create spatial columns, and which exposes spatial attributes as RGeo data objects.
Now let's consider another example. Suppose we obtained a set of polygons (say, zip code boundaries) from a data source that uses the "NAD83 / Washington North" projection. We could convert the polygons to latitude-longitude, but remember that doing so can actually change the shape of the polygon. So let's suppose we decided this was unacceptable, and we need to store the polygons in the database in the projected coordinate system. Here's the migration for this case:
class CreateZipCodes < ActiveRecord::Migration def change create_table :zip_codes do |t| t.integer :zip t.polygon :boundary, :srid => 2285 end end end
The name of our column is "
As we did in part 2, we also need to set the factory for this column so that Rails knows what factory to use when it reads geometries from the database.
class ZipCode < ActiveRecord::Base north_wa_factory = ... # use the factory we created earlier set_rgeo_factory_for_column(:boundary, north_wa_factory) end
In general, it is important that the factory you set in your ActiveRecord class matches the constraints in the database column, so that both sides handle the data in the same way. In particular, the SRIDs should match, and either both should be Cartesian or both should be geographic.
Now we can read and write the polygon data. We just need to remember that the data is not in latitude-longitude, but in the projected coordinate system. So when we write polygons to the database and read from the database, we will receive projected coordinates.
This example, of course, brings up an important question. We need to decide up front whether the database should contain projected data or latitude-longitude data. How do we choose? This can be a somewhat complicated question, and I will dive more deeply into the pros and cons of different strategies in a later article. However, for now we know enough to understand a few of the issues.
In many simple cases-- if you are working only with points, or with small features that do not cover a large part of the globe, or if extreme accuracy is not important-- you may find it easiest to think in latitude and longitude. In those cases, you can just create a
Congratulations on making it through this article! Understanding coordinate systems can be tricky, but it is very necessary for doing nontrivial applications. In this discussion, I've deliberately left out a number of more advanced topics that I'll probably cover in a later article. But you should now have enough information so you won't get lost when people start talking about projections and SRIDs.
If you'd like to explore more about map projections and geospatial coordinate systems, the articles on Wikipedia are not too bad. I'm not aware of any good books on this material geared towards web developers, but if you know of any, please send me a line. For reference, you'll probably find yourself going to spatialreference.org frequently when you need information on the coordinate system referenced by a particular SRID.
For the next article, I'm currently planning on covering common file and data interchange formats used for geospatial data. Stay tuned, and let's bring Rails down to earth!
This is part 4 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit http://daniel-azuma.com/articles/georails.