Historical Railroad data

As I continue to pull together historical information for Creating Data, I’m thinking about where and when to integrate some of the most important historic national-scale datasets about human activity and population in the United States. One of these is Jeremy Atack’s collection of historical railroad shapefiles.1

I want to post a short essay about this now, in part because I’m posting a longer piece soon and want to scare up any problems with the web technologies driving it. Let me know (bmschmidt via gmail.com) if you encounter problems reading this.

I’ve got four fairly simple things I want to do in this essay.

  1. Describe the general outlines of the dataset with interactive narrative visualization. (See the Interactive Mode button in the upper right hand corner; if you want to leave the walkthrough text and scroll or pinch to look more closely at a particular region, click there).
  2. Describe its limitations, omissions, and places where further work might be useful;
  3. Sketch a few intersections between this and other datasets.
  4. Give a link to a lower-quality but smaller version of the dataset if you, like me, want to use it for online mapping rather than research.

Overview

Let’s start with a basic overview of the set. Atack’s collection contains a large number of individual rail lines as vector files. Each line contains some information about the date that it was built. In general, this is represented by an InOpBy field. Here you see rail lines in operation by 1868; this includes a dense network of railways across the North and a single transcontinental, the Union Pacific, reaching through Nebraska but not yet to the Pacific.

{
   "year": 1868,
   "drawing": [ "ExternalLines", "Rail"],
   "getters.annotation.ModernCounties": "d => d.ID",
   "getters.fill.ModernCounties": "density",
   "filters.Rail": "extant",
   "zoom.States": ["CA","ME","WA","FL"],
   "scales.fill.ModernCounties": "better"
  }

The level of detail is very high. Around St. Louis, for example, he has entered dozens of lines: you can also easily see the places where lines track both sides of major rivers.

Attack’s documentation describes his methods precisely, so I’ll just summarize here. The work took two stages:

  • First, the creation of a baseline map of railroads in 1911 using the New Century Atlas to determine their existence. Crucially, he used modern government shapefiles from the National Atlas to get precise locations for current roads, and older topographical maps for extinct ones; this means the resolution is much higher than you would get from a national- or state-scale map.

  • Second, the consultation of historical maps to determine the precise year of origin.

{
  "zoom.States": ["MO"]
  ,"year": 1900
  ,"duration": 2000
  }

The metadata about these lines is essentially, three fields. First is the date, generally expressed as a the first map on which it appears but occasionally with a precise date of opening. One is the name of the rail company (e.g., “Milwaukee and Prairie du Chien”); one is the track gauge (the width, in inches, between the rails.)

{
  "getters.fill.Rail": "d => d.properties.Gauge",
  "scales.fill.Rail": "<cat>",
  "zoom.States": ["IL", "PA"],
  "filters.Rail": "d => d.properties.Gauge != 0"
  }

Here’s an animation of the spread of railroads from 1830 to 1865, compressed into 20 seconds.

You can see here the gradual westward spread of the various railroad systems. The colors here highlight the diversity different gauge systems in the South, Ohio, and the Northeast. (Ohio gauge tracks are four feet 10 inches apart; standard gauge is 4’ 8½”). Conflicts could be extreme–one author describes the Erie Gauge War, when the citizens of Erie PA repeatedly tore up Ohio gauge tracks because the existing 70” trakcs forced trains to stop in the town, as “one of the most fantastic episodes in Pennsylvania history.” As Atack points out, these determinations are subject to various types of ambiguity–in certain places, for example, tracks were laid with three rails so trains of multiple width could run effectively. But in general, the colors show the growth of systems and standards before the Civil War.

{
  "zoom.States": ["GA", "ME", "IN"]
  , "year": 1830
  , "duration": 2000
  , "filters.Rail": "extant"
  , "animate": [{
    "key": "year",
    "value": 1865,
    "duration": 15000
    },
    {
     "key": "zoom.States"
  , "value": ["ME","FL","TX","CA","WA"]
  , "duration": 15000
  }]
  }
  

The next forty years have less data in two senses. First, the time granularity is considerably less. Atack consulted maps only at 5-year intervals, generally staying within a single publisher’s series. (He used Rand McNally’s Business Atlas from 1877 to 1903). Second, two of the three metadata fields are mostly unpopulated; there is no gauge or line information after 1865. Railroad history is abundantly documented, both online in places like Wikipedia and in scanned late 19th-century publications; it would be possible, if difficult, to rectify bring this into maps like the one here, although changes in ownership, especially would probably require a complication of the data model.

{
  "zoom.States": ["WA","CA","ME","AK","FL","TX"]
  ,"year": 1865
  ,"duration": 2000
  , "filters.Rail": "extant"
  , "scales.fill.Rail": "undefined"
  , "animate": [{
    "key": "year",
  "value": 1911,
  "duration": 15000
  }]
  }
  

This year-resolution is appropriate for the kind of work that Atack is interested in doing, which is primarily economic history of the way the transportation network affected–for instance–agricultural production in counties on rail lines as opposed to those left out.2 But both for animation and for a variety of other queries individuals might have, it would be nice to be able to see year-by-year data. Work underway on railway stations might help to solve this, but for much of the country there are some heuristics an enterprising researcher could apply now.

Post offices

My colleague Cameron Blevins, for example, has worked extensively with a dataset of post offices created by Richard Helbock.3 (He geolocated the post offices and, with Jason Heppler, created a more comprehensive visualization of this set at Geography of the Post.4. As he has argued, post offices can provide a higher resolution representation of settlement than almost any other dataset out there; they also tend to have been placed along rail lines whenever possible. (Mail cars could pick up and drop off mail using hooks without even stopping). The dots here show post offices established in any given year, on top of alongside rail lines: the correlations are obvious. Some lines of post offices are rivers or stage lines; but many are on rail tracks.

{
  "drawing": [ "ExternalLines", "Rail", "PostOffices"]
  , "filters.PostOffices": "extant"
  , "zoom.States": ["MT", "WA", "CO"]
  , "getters.fill.PostOffices": "d => d.properties.years[0]"
  , "scales.fill.PostOffices": "homesteadYear"
  , "changeOffset": 1
  , "year": 1870
  , "animate": [{
  "key": "year"
  , "value": 1911
  , "duration": 5000
  }]
  }

In certain places, the Helbock/Blevins data clearly provides a better level of resolution than the existing railroad maps. Here are just offices that opened in 1892. You can see, for example, a string of post offices that open up in 1892 in northern Montana and Idaho, in a nice linear pattern.

{
  "drawing": [ "ExternalLines", "StateLines", "Rail", "PostOffices"],
  "rendering.strokeStyle.StateLines": "rgba(10, 10 , 255, 0.1)",
   "filters.PostOffices":"novel",
   "year": 1892,
   "changeOffset": 1,
   "duration": 500,
   "scales.fill.PostOffices":"undefined",
   "filters.Rail": "extant",
   "zoom.PostOffices": ["ELMIRA-1892", "SHELBY-1892", "BONNERS FERRY-1892","COMBINATION-1892", "MARION-1892", "CAMDEN-1892"],
   "annotate.PostOffices":["ELMIRA-1892", "SHELBY-1892", "BONNERS FERRY-1892", "MILAN-1892", "MARION-1892", "CAMDEN-1892","SCOTIA-1892", "NAPLES-1892", "CRESTON-1892","JENNINGS-1892","LEONAI-1892","MIDVALE-1892","CUT BANK-1892","KIPP-1892"]
  }

Flashing forward a year: in 1893, the Great Northern Railway arrives on the map, precisely on the line carved by these stations. but as the post offices suggest, it was almost certainly in operation by 1893. According to Wikipedia the last spike on the line was place in January 1893 in Washington State; these Montana sections probably opened in 1892 along with the post offices.

You can also see a clear gap in the line at the 1892 Richland post office, even though the 1893 spike marked the completion of the Great Northern as a transcontinental.

{
   "drawing": [ "ExternalLines", "StateLines", "Rail", "PostOffices"],
   "filters.PostOffices":"novel",
   "year": 1893,
   "changeOffset": 3,
   "duration": 50,
    "annotate.PostOffices":["RICHLAND-1892", "ELMIRA-1892", "SHELBY-1892", "BONNERS FERRY-1892", "MILAN-1892", "MARION-1892", "CAMDEN-1892","SCOTIA-1892", "NAPLES-1892", "CRESTON-1892","JENNINGS-1892","LEONAI-1892","MIDVALE-1892","CUT BANK-1892","KIPP-1892"]
  }

As best as I can tell, this is because the Great Northern crossed the rockies at the Haskell pass before 1900, but afterwards switched to a more circuitous route to the North. You can see the more northerly route when I eliminate all time filters on the railroad; Atack did not include the old route.

{
   "drawing": [ "ExternalLines", "StateLines", "Rail", "PostOffices"],
   "filters.PostOffices":"novel",
   "filters.Rail": "none",
   "year": 1893,
   "changeOffset": 3,
   "duration": 50,
    "annotate.PostOffices":["RICHLAND-1892"]
  }

The slider below lets you adjust the plotted year. At any point it shows you railroads that are extant in that year, and post offices that were founded within 3 years of that year.

{"zoom.States":["MT","CO","CA"],
  "filters.Rail": "extant",
   "filters.PostOffices":"novel",
     "year": 1893,
     "changeOffset": 3
     , "getters.fill.Rail": "d => d.properties.InOpBy"
     , "scales.fill.Rail": "homesteadYear"   
  }
  

It can be interesting to look at the national rail network in concert with the Wikipedia-derived US city population data I’ve looked at here before. Both new rail lines, and any rail lines, tend to drive urban growth. This patch of northern New England here, for example, in 1893 shows widespread population loss (the small pink dots), while the scattered green areas of growth tend to be concentrated at railway junctions. Try zooming out and trying this on a different state.

{
   "drawing": ["StateLines", "Rail", "ExternalLines", "Cities"],
   "zoom.States":["NH"],
   "duration": 2000,
   "filters.Rail": "extant",
   "filters.PostOffices":"novel",
   "year": 1893,
   "getters.size.Cities": "absolutePopulationDiff",
   "changeOffset": 10
   , "getters.fill.Rail": "d => d.properties.InOpBy"
   , "scales.fill.Rail": "homesteadYear"   
  }
  

Online-optimized shapefile.

I’ve altered Atack’s file in a few ways to make it small enough to serve and quickly render online. First I bind short segments that share metadata and endpoints into single features. Then I combine all lines with the same metadata into vector features. Finally, I use the toposimplify D3 libirary to round and simplify the line vectors slightly. This significantly reduces the size of the shapefile (from 60M for Atack’s files as geojson to 12M for reduced geojson, and just 4M for topojson), while retaining almost all the visual elements. Others may find this useful for web mapping applications.

To get a sense of the changes, the image below shows part of Chicago. Atack’s original files in black, and the simplified lines are in red. The simplification introduced by topojson literally cuts corners, by eliminating points when doing so does not radically change the course of the full line. Where you can’t see red lines, they follow exactly the same course as Atack’s black ones. At a large scale like the one displayed here, or for any GIS work, you would want the original data: for county- or state-resolution online mapping, the lower-resolution feaures might be convenient. You can download these simplified files as geojson, and as topojson. If you use either, you should cite Atack’s original file.

Simplified vs original shapefiles
Simplified vs original shapefiles

Atack, Jeremy. “Historical Geographic Information Systems (GIS) Database of U.S. Railroads. Jeremy Atack.” Accessed July 11, 2018. https://my.vanderbilt.edu/jeremyatack/data-downloads/.

Atack, Jeremy, and Robert A Margo. “The Impact of Access to Rail Transportation on Agricultural Improvement: The American Midwest as a Test Case, 1850–1860.” Journal of Transport and Land Use 4, no. 2 (August 18, 2011). https://doi.org/10.5198/jtlu.v4i2.188.

Blevins, Cameron. The Postal West Spatial Integration and the American West, 1865–1902 /, 2015.

Blevins, Cameron, and Jason Heppler. “Geography of the Post.” Stanford University, 2015. http://cameronblevins.org/gotp/.


  1. Jeremy Atack, “Historical Geographic Information Systems (GIS) Database of U.S. Railroads. Jeremy Atack,” accessed July 11, 2018, https://my.vanderbilt.edu/jeremyatack/data-downloads/.

  2. Jeremy Atack and Robert A Margo, “The Impact of Access to Rail Transportation on Agricultural Improvement: The American Midwest as a Test Case, 1850–1860,” Journal of Transport and Land Use 4, no. 2 (August 18, 2011), https://doi.org/10.5198/jtlu.v4i2.188.

  3. Cameron Blevins, The Postal West Spatial Integration and the American West, 1865–1902 /, 2015.

  4. Cameron Blevins and Jason Heppler, “Geography of the Post” (Stanford University, 2015), http://cameronblevins.org/gotp/.