Geo-Rails Part 7: Geometry vs. Geography, or, How I Learned To Stop Worrying And Love Projections

2012-01-09 georails

This week we're going to look at how to choose a coordinate system for your database. In PostGIS, this includes the choice of geometry vs geography columns, as well as which projection (if any) to use, and how to interact with it from Rails.

In this article, we'll:

Review geographic and projected coordinate systems
Discuss the pros and cons of using the PostGIS geographic type
See why I typically store data in a projection
Look at some specific projections I recommend using (or avoiding)
Learn how to handle projected data in Rails

My original series plan for this week called for a worked example of a location-based web service, bringing together much of the material that we've covered so far. But as I was writing it, I realized there was one more topic we probably ought to cover first. So I'll publish the example next week.

This is part 7 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit http://daniel-azuma.com/articles/georails.

A tale of two coordinate systems

In part 4, we took a first look at coordinate systems. We saw that coordinate systems are different ways of assigning meaning to coordinate values. Or, put another way, any particular meaning (such as a location) can be described in multiple ways. Each of those ways would use a different set of values, according to a different coordinate system.

Locations on the earth's surface are typically specified using one of two general types of coordinate systems: geographic coordinate systems and projected coordinate systems. Geographic coordinate systems usually use some notion of latitude and longitude, measuring angles along the surface of the earth. They are also embedded in a curved domain. What this means is, you can't technically show latitude and longitude on a flat piece of paper or computer screen. Objects described in latitude and longitude are always curved like the surface of the earth; distances measured between latitudes and longitudes are always measured along a curved surface.

Projected coordinate systems are formed by "flattening" the earth's surface into a flat domain. Coordinates in a projected system are not in latitude and longitude. They do not measure angles. Instead, they measure distance and position along that flattened surface. Because of this, the actual coordinate values in a projection may not be immediately recognizable. However, the benefit is that objects in a projected coordinate system are flat, so you can draw them on a flat piece of paper or computer screen, and you can perform analysis and calculations the way you are used to used to from your high school geometry class.

Here are two sets of coordinates for the Space Needle in Seattle. The first uses a geographic coordinate system, and the values are the familiar longitude and latitude. The second, called "NAD83 / Washington North", is the state plane projected coordinate system for northern Washington state. The coordinates in this projection may not be immediately recognizable, but it points to the same location.

POINT(-122.34978 47.620578)  -- geographic
POINT(1266457.58 230052.50)  -- projected

In the beginning of part 4, we looked at some of the ramifications of using different coordinate systems. They can drastically change the way that objects are shaped or computations are done. Now we'll look at some practical advice regarding choosing coordinate systems to use.

The PostGIS geographic type

The PostGIS database provides two different types of spatial columns: geometric and geographic. We saw in part 2 that we can specify which type to use in our Rails migrations, through the use of the :geographic modifier:

class CreateLocations < ActiveRecord::Migration
  def change
    create_table :locations do |t|
      t.string :name
      t.point :latlon, :geographic => true
      t.timestamps
    end
  end
end

Geographic columns use a geographic coordinate system (latitude and longitude on a curved domain). Geometric columns use a projected coordinate system (on a flat domain). But which should you use for your application? To answer this question, we need to unpack what the coordinate system differences mean in the context of PostGIS.

Let's start with the obvious. Geographic types use units of latitude and longitude. Since these are familiar concepts, we can put them directly into the database and pull them out for display without having to perform any transformations on the values. This makes the geographic type very convenient for many simple applications.

Second, the shape of lines and polygons in geographic columns will follow the curvature of the earth. We saw a dramatic demonstration of this in the beginning of part 4: a "straight line" from San Francisco to Athens passes over Iceland in a geographic coordinate system, even though Iceland is far to the north of either endpoint.

Third, as a corollary to the previous point, geographic coordinates for the most part let you ignore seams and singularities. Take a short line segment from POINT(179 0) to POINT(-179 0). On the globe, in a geographic coordinate system, this is a short line that crosses the International Date Line. Projections, in contrast, have to flatten the earth, and in order to do so, they have to "cut" the globe someplace. This cut becomes the edge of the map. Many projections perform this cut along the Date Line. Hence, if we take our two points on either side of the Date Line, and draw a line segment between then in such a projection, that line would run the other way, crossing most of the world.

A line segment connecting two points on either side of the Date Line, in a geographic coordinate system

A line segment connecting the same endpoints, in some projections, may cross the entire world.

Similarly, the north and south poles also cause problems for many projections. As a result, if you deal with objects that cross the Date Line or live near or especially surrounding the poles, you may have to deal with these (literal) edge cases specially. Generally, the geographic type lets you avoid having to think about these special cases because a globe has no edges.

Now the bad news. Computations across a curved surface are more complex than across a flat surface. Distance calculation, intersections, and so forth, will be slower on geographic types than on projections. In fact, some computations will not be available at all. In part 6, we considered an example "counties" table, in which we chose to use a projected coordinate system to store polygons. The reason I did that is that I wanted to cover ST_Relate, a function that PostGIS supports for geometric types but not geographic types.

Finally, geographic types are also subject to the model of the earth that you are using. The earth is actually not a perfect sphere, but is slightly flattened along its axis of rotation. In order to perform computations across a large area with a high degree of accuracy, you need to take that flattening into account. Unfortunately, the flattening makes the already complex computations maddeningly complex (and correspondingly slower). Because of this, PostGIS gives you the option of choosing whether to perform computations using the spherical or flattened shape, trading off speed for accuracy. Each function that supports geographic inputs performs the more accurate computations by default, but you can change it to use the faster spherical formulas by passing FALSE as an optional final parameter.

ST_Distance(pt1, pt2)         -- Uses more accurate computation
ST_Distance(pt1, pt2, FALSE)  -- Uses faster spherical computation

A case for projections

So which type should you use? There will be some cases when the decision is clear. If you need to perform computations across large sections of the globe, for example, you will usually want to use the geographic type. However, my experience has been that, for most use cases that you're likely to encounter in a Rails application, you'll get better results by choosing a reasonable projection.

Why do I say that?

Spatial data storage should match its usage. This is, I think, the most important but most overlooked consideration. Often, your application will lend itself to particular projection based on what it does with the data, and it is almost always beneficial to structure your data storage accordingly. I know as engineers we often want to abstract our data representation from our application functionality. But you don't always have that luxury with big data---whether you like it or not, you have to accommodate the resource and performance needs of your database. This goes double with geospatial data, because the queries and analysis can get quite expensive.

One very common application is simply the display of your database objects on a Google Map or similar visualization tool. In such an application, most of your queries might be of the form: Give me all the objects that appear within this rectangle on a Google Map. If your data is stored and queried in the same coordinate system as that used by Google Maps, then those rectangular map areas will translate directly into simple rectangular queries in your database. If, however, your database uses a geographic coordinate system or a different projection, your query may map into a distorted or non-rectangular area in your database's coordinate system, resulting in more complex code and/or decreased performance.

Many shapes are best represented in a (particular) projection. Let's take a look at a shape that should be familiar to most readers, the outline of the United States:

The United States, in a Lambert Conformal Conic projection. (credit: http://csanet.org/newsletter/winter03/nlw0303.html)

Now, much of the northern border with Canada follows a line of latitude, the "49th parallel". A straight line. Except, in the above image, it's not straight; it's curved slightly. This map is in a Lambert Conformal Conic projection, very commonly used for US national and state maps. To represent the northern border of the country in this projection, you would need a curved line (or, in practice, a bunch of short straight lines that together approximate a curved line.) But in some other projections---for example a Mercator projection---lines of latitude are straight, making the shape much easier and more efficient to represent.

The United States, in a Mercator projection. (credit: http://csanet.org/newsletter/winter03/nlw0303.html)

East-west and north-south lines in most political boundaries tend to follow lines of latitude and longitude, respectively, and so are best represented in a projection (such as Mercator) that preserves those lines as straight. Remember, most lines of latitude are not straight in a geographic coordinate system, so a geographic latitude-longitude coordinate system is not particularly well-suited for large political boundaries such as states and countries.

Most data is hyperlocal. The geographic type's advantages come to the foreground when you're dealing with data spread over the entire globe, or when you need to deal with objects covering large areas or distances covering significant portions of the globe. However, in practice I've found there are very few applications like that. In most cases, you'll be dealing with primarily point data, or if you do have line or polygonal data, the individual objects are small: streets, parcel boundaries, municipal and statistical boundaries, and so forth. Furthermore, in most cases, your data will be limited to a particular part of the world, or at least you'll seldom need to handle data that crosses seams such as poles or the Date Line. So in practice, you seldom actually run into the problems that would be solved by using the geographic type.

Performance does matter. Many operations gain a substantial performance improvement from using the PostGIS geometry type rather than the geography type. Furthermore, using geometry saves you from having to think about which functions are available and which are not.

A projection to avoid and a projection to consider

You might be tempted to store latitude and longitude in a geometry type column. That is, to set up your PostGIS column with a geometry type, but use SRID=4326 (which is the EPSG number for WGS 84 latitude and longitude).

Don't do this.

I did this a few times in my naive youth, and it came back to bite me. What you're really doing here is employing a particular projection called Plate Carree, which simply maps latitude and longitude directly to x and y on the plane. Remember, any time you use geometry rather than geography, you are working with a flat coordinate system, and thus a projection. You might think you're working with latitude and longitude, but you're actually not.

The Plate-Carree projection. (Credit: http://kartoweb.itc.nl/geometrics/Map%20projections/body.htm)

Plate Carree is not a particularly useful projection (except that it is trivial to compute). It doesn't preserve distances, angles, directions, areas, or any other cartographically useful properties, and its distortion in polar regions is severe. In almost all cases, you can do much better with a different projection.

The projection I tend to recommend for many applications is Mercator. In particular, a minor variation on Mercator that is used by Google and Bing Maps:

The Google world map, a slight variation on a Mercator projection

This coordinate system has EPSG number 3785, and has a number of helpful properties.

It's used by Google maps and Bing maps (and possibly other mapping systems as well), so if you use those systems for visualization, you have a good match between your data storage and application.
It preserves angles and shapes locally. (In cartographic terms, it is conformal.) This means if you zoom into any part of the map, the shapes and aspect ratio will closely match the real shapes on the globe. This is, I think, the primary reason it is popular with mapping visualizations.
Lines of latitude and longitude are straight, so political boundaries tend to work well.
It's relatively simple to compute.

As with any projection, there will be times when this one is not appropriate. By now, you should have enough understanding to identify many of these cases. However, a few of the common objections you might encounter, are not as important as they sound, and I think I should say a few words about them.

You might hear people object to using EPSG 3785 on the grounds that it contains a simplification that introduces cartographic inaccuracies. (Specifically, it treats its underlying geography as a sphere rather than a flattened ellipsoid.) In most cases, this argument makes too much of too little. All projections rely on simplifications that introduce inaccuracies in one form or another. If your application is to bounce a laser across a continent, then by all means dig deep into the corrective factors. But for most web applications, 3785 should be more than sufficient. Indeed, the inaccuracies in most of the data you will gather, including GPS and geocoded data, will far outweigh most of what can be introduced by the projection.

You also might hear people object to using the Mercator projection at all, on the grounds that it gives a distorted picture of the nature of the world. Because the projection magnifies areas further from the Equator, it generates map images that appear to privilege richer countries in higher latitudes while downgrading the importance of poorer countries closer to the Equator. In 1989, a well-publicized resolution, signed by a number of prominent geographers, was published in American Cartographer, decrying the use of Mercator and similar rectangular projections for these and other reasons. This point is well-taken, and if you are displaying a full world map, I generally do not recommend Mercator if you can help it. However, here we are talking specifically about database structure, not visualization, so for our purposes I think the point is moot.

Working with projected data in Rails

So let's see some code! I'll demonstrate how to set up your PostGIS database to store data using EPSG 3785, and how to read and write data using ActiveRecord.

We'll use our code from part 2 as a starting point. But now, in our migration, we no longer set :geographic, but instead use a geometric (flat) coordinate system with SRID = 3785, as follows. (We'll also set up a spatial index, as we saw in part 6.)

class CreateLocations < ActiveRecord::Migration
  def change
    create_table :locations do |t|
      t.string :name
      t.point :loc, :srid => 3785
      t.timestamps
    end
    change_table :locations do |t|
      t.index :loc, :spatial => true
    end
  end
end

We also need to specify a corresponding factory in our ActiveRecord class. Here I'm going to introduce a rather dirty little feature of RGeo: "projected geographic" factories. Now, if you cringed a little at that description, then you're getting the hang of coordinate systems. Geographic coordinate systems are by definition not projected! However, sometimes when you're working with a projection, you'll want a quick way to interact with the data in latitude and longitude---a quick way to transform individual points to geographic coordinates and back again. This is where RGeo's projected geographic factories come in handy.

These factories really use a projected coordinate system under the hood. In fact, they reference a full Cartesian factory internally, and you can gain access to that "real" projected factory by calling the projection_factory method. However, they provide you with a convenience interface that lets you look at the data as latitudes and longitudes, as if it were a geographic factory.

The "simple_mercator" factory is a useful example. Its "real" internal factory has SRID 3785, indicating the Google Maps style Mercator projection, but the wrapper factory reports latitudes and longitudes. In this way, it mirrors the Google Maps Javascript API. It talks latitudes and longitudes on the outside, but converts them internally to the projection for use with the map.

In our ActiveRecord class, we'll set up the factory so it correctly interacts with the database in projected coordinates.

class Location < ActiveRecord::Base

  # Create a simple mercator factory. This factory itself is
  # geographic (latitude-longitude) but it also contains a
  # companion projection factory that uses EPSG 3785.
  FACTORY = RGeo::Geographic.simple_mercator_factory

  # We're storing data in the database in the projection.
  # So data gotten straight from the "loc" attribute will be in
  # projected coordinates.
  set_rgeo_factory_for_column(:loc, FACTORY.projection_factory)

  # To interact in projected coordinates, just use the "loc"
  # attribute directly.
  def loc_projected
    self.loc
  end
  def loc_projected=(value)
    self.loc = value
  end

  # To use geographic (lat/lon) coordinates, convert them using
  # the wrapper factory.
  def loc_geographic
    FACTORY.unproject(self.loc)
  end
  def loc_geographic=(value)
    self.loc = FACTORY.project(value)
  end

end

Now let's do an example query. Suppose our basic query is a simple map search where we want to return all the locations in a given rectangle on our map visualization. Since our data is in the same projection as the original map, a rectangular query in the map translates into a rectangular query in our database. So we'll take the latitudes and longitudes of the rectangle edges as parameters, and convert them to projected coordinates. Once there, we can use a simple PostGIS box intersection to run the query itself. It's a simple query that can be accelerated using the spatial index.

We'll add a scope to our class as follows:

class Location < ActiveRecord::Base

  # ...

  # w,s,e,n are in latitude-longitude
  def self.in_rect(w, s, e, n)
    # Create lat-lon points, and then get the projections.
    sw = FACTORY.point(w, s).projection
    ne = FACTORY.point(e, n).projection
    # Now we can create a scope for this query.
    where("loc && '#{sw.x},#{sw.y},#{ne.x},#{ne.y}'::box")
  end

end

Now rectangle searches are simple:

locations = Location.in_rect(-122, 47, -121, 48).all

Where to go from here

In this article, we saw some of the pros and cons of using different coordinate systems for your database. The right coordinate system will depend on your application, but I've found that for many applications, using a projection---often the specific projection EPSG 3785---produces good results.

It may be useful at this point to gain a general feel for the different types of projections, how they work, and what their pros and cons are. A very good online resource for this is provided here by the USGS.

The RGeo::Geographic.simple_mercator_factory is useful for storing data in EPSG 3785. However, if you want to use a different projection under the hood, you can use a more powerful method, RGeo::Geographic.projected_factory, which lets you specify arbitrary projections using Proj4. Read about it in the RGeo documentation.

Next time, I will get to the worked example I promised last week. Stay tuned, and let's bring Rails down to earth!

This is part 7 of my continuing series of articles on geospatial programming in Ruby and Rails. For a list of the other installments, please visit http://daniel-azuma.com/articles/georails.