Mapping Changes to Job Access

By Brandon Martin-Anderson 26 May 2014

How long does it take to get to your job? Is it practical to go by public transit? Which routes will you use? These days it’s easy to find an answer using Google Maps, or apps like The Transit App. Public transit planners are faced with a similar question, but instead of picking the shortest route for one person to one job, they have to create the shortest route for all people to all possible jobs.

One tool for grappling with this task is the concept of job access; the total number of jobs that one can reach from a given point. Job access is conditioned on some criteria, giving every point a number of different kinds of job access. A place could have an automobile job access, pedestrian job access, wheelchair job access, leaving-for-the-bus-at-7am job access, or leaving-for-the-bus-at-midnight job access. Additionally, a place will have different levels of access to different kinds of jobs - access to manufacturing jobs may be different than access to retail jobs. (For a more complete description of job access, check out Access to Destination: Annual Accessibility Measure for Twin Cities Metropolitan Area(pdf) by Andrew Owen and David Levinson.)

One of the jobs of a transit planner, therefore, is to maximize the aggregate job access for the system’s constituents. Here at Conveyal, we’ve been working on tools to inform transit planners and policy makers while grappling with difficult decisions relating to transit level of service and job access.

In this post I’ll be showing the process for mapping the impact of the 2014 King County Metro service cuts on job access in the Seattle area.

Let’s start by looking at a normal day. The following map shows total jobs accessible within one hour via public transit, departing at 7am on a weekday.

fig1

This route was made by repeating a procedure for every block:

  1. Using OpenTripPlanner, find the transit route, beginning at 7am, from that block to every block in the county.
  2. Throw away every route that takes more than an hour. This leaves every block that’s accessible within an hour.
  3. For every accessible block, look up in the LODES data set for the jobs at that location.
  4. Add up all figures from step 3. This is the total number of jobs accessible from the origin block.

Easier said than done, but a 32-core number-crunching monster running OpenTripPlanner Analyst can finish the task in a few minutes.

A simple example

We can use maps like this to ask questions. Like: how essential is a route for job access? What happens if we just delete the #255, an important commuting link between the suburbs and downtown Seattle?

fig2

This map shows the areas most impacted by the removal of route 255. The darkest red areas lose more than half of jobs accessible within an hour. Areas colored light yellow lose between five and ten percent of accessible jobs. Blue areas are unaffected.

From this map it’s clear that the 255 is an essential job connector along the suburban component. In particular some regions in northern and southern Kirkland seem especially dependent on the 255. It’s interesting to note that some regions along the 255 do not depend upon it for job access. Also, note that the removal of the 255 affects job access to a small but relatively uniform extent all across Seattle proper. If you were to remove it, it would be a little harder for everyone in Seattle to get out to to the surburbs.

By way of contrast, let’s see what happens if we were to remove the #2 linking Queen Anne and Capitol Hill via Downtown.

fig3

The story here is very different; the impact much smaller. Though a region on the eastern shore of Seattle would see dramatically reduced morning job access, the rest of the city would be relatively unaffected.

A real-world example

Let’s look at the impact of a more complicated and realistic service change - service cuts proposed in response to budget shortfalls for the King County Metro system.

Analysing job access impacts from the proposed King County service cuts involve the same basic steps as the simple examples above. First, we modify the GTFS file to reflect the service change. Then we find the job access of every block in the service are before and after the change. Finally we diff the two results and map it.

In the above example, modifying the existing King County GTFS was simple - I wrote a small script to delete all trips and stoptimes corresponding to a the routes I wanted to cut. The proposed cuts to King County, on the other hand, are wide-reaching and nontrivial. The proposal involves deleting 72 routes, and revising the alignments, frequencies, or service hours of 84 routes. Of those 81 routes, 51 routes have been partially or entirely re-routed to revised alignments.

The proposed revisions come in the form of a detailed spreadsheet, listing the precut and postcut service frequencies for each route during a number of time windows, along with and a written description of the details of the service change.

fig4

King County Metro also published maps of proposed route realignments.

fig5

Point being, it’s a lot more difficult than deleting a single route.

To cope with this we’ve built two tools to generate a close GTFS approximation of the proposed postcut service. The first, geom2gtfs, is a command line tool to generate a GTFS from a shapefile. We used this to generate a GTFS to approximate routes with new alignments. The second, resample_gtfs, is a command line tool for changing trip frequency of an existing GTFS feed. This we used to change the service frequency of all the rest.

Using geom2gtfs

geom2gtfs is a command line tool that converts each feature in a shapefile into a route in a GTFS file. Using the geom2gtfs tool is simple, but it requires a carefully prepared shapefile and configuration file. To run it from the command line:

$ geom2gtfs shapefile_filename config_filename output_filename

The input shapefile must be unprojected, and contain only lines, with no multigeometry features. If the features in the shapefile have properties, such as “route” or “mode” or “speed” those properties can be used to modulate the speed of the routes written to the GTFS. It’s possible to use multiple lines to represent the same route if they share a route id property (the name of the property is defined in the config file) and sequential lines run in the same direction and have successive “segment” properties. It’s possible to join a CSV to the shapefile using the geom2gtfs config file, which is how we associated service frequencies with lines. The CSV of service frequencies from King County was compiled using the King County spreadsheet, and looks like this:

route,peak_am,midday,peak_pm,night,sat,sun
1,15.0,30.0,15.0,45.0,None,None
10,10.0,15.0,10.0,30.0,15.0,30.0
101,15.0,30.0,15.0,30.0,30.0,30.0
102,30.0,None,25.0,None,None,None
105,30.0,30.0,30.0,30.0,30.0,60.0
106,15.0,15.0,15.0,45.0,30.0,30.0
107,30.0,30.0,30.0,45.0,30.0,30.0
11,15.0,30.0,15.0,45.0,30.0,30.0
110,None,None,None,None,None,None
111,22.0,None,25.0,None,None,None
113,36.0,None,45.0,None,None,None
114,60.0,None,60.0,None,None,None
116EX,22.0,None,25.0,None,None,None
...

Finally, it’s possible to have the geom2gtfs tool place stops at a regular spacing along the lines, or to use a shapefile of existing stops. In the case of our King County analysis, I produced a shapefile from the stops.txt of the original GTFS and used that.

An example geom2gtfs configuration file

The config file is a JSON file, which follows with annotations. They aren’t part of the config file.

{ First, some basic information for the GTFS feed.

  "agency_name":"King County Metro",
  "agency_url":"http://metro.kingcounty.gov/",
  "agency_timezone":"America/Los_Angeles", Specify which mode each route will be.

  "gtfs_mode":3, Set the speed of the GTFS trips, in meters per second.

  "speed":[ The speed can be a constant, or a list. If it’s a list, each item in the list must have two items. The first is the filter, and the second is the speed. The filter [“ROUTE”,”12”] will match if the property “ROUTE” takes the value “12”. geom2gtfs scans down the list and uses the first match. So if a shapefile feature had property “ROUTE” with value “12” and “express” with value “1”, that feature’s trips would run at 4.0 meters per second. “*” matches everything, which means that 5.4 is the default speed.

        [["ROUTE","12"],4.0],
        [["ROUTE","2"],4.0],
        [["ROUTE","3"],4.0],
        [["express","1"],13.4],
        [["ROUTE","193EX"],4.0],
        [["express","*"],5.4],
  ], The ‘stops’ section specifies a strategy, which can be either “shapefile” or “picket”, and some arguments required by either given strategy. The “shapefile” strategy requires an unprojected point shapefile and a threshold around each linear feature to look for stops in that shapefile. The ‘picket’ strategy takes one named argument ‘spacing’, either a scalar or list of filters like the speed argument.

  "stops":{
         "strategy":"shapefile",
         "filename":"data/kingco/kingco_stops.shp",
         "threshold":0.0002,
  }, Specify the name of the shapefile property where the route id is kept.

  "route_id_prop_name":"ROUTE", Specify the name of the shapefile property where the route name is kept.

  "route_name_prop_name":"ROUTE", Specify the ‘service windows’. Each entry in the list specifies the service window name, is starting time, and its ending time, both in hours since midnight. For example “peak_am” runs from 6am to 9am. The shapefile must contain a property with the same name as each service window, filled out with the service level for that frequency. For example the route “1” has a property “peak_am” with value “15.0”.

  "service_windows":[
         ["peak_am",6,9],
         ["midday",9,15],
         ["peak_pm",15,18],
         ["night",18,24],
  ], Optionally, join a CSV to the shapefile features. In the case of our shapefile, none of the service windows are actually properties of the shapefile features; they are joined in from this shapefile.

  "csv_join":{
    "filename":"data/kingco/prop_freqs.csv",
    "csv_col":"route",
    "shp_col":"ROUTE",
  }, Optionally, set a filter that must pass for a feature to be converted to a GTFS route.

  "filters":[
    ["CATEGORY","topo"],
  ], Set the start and end date of the service calendar. Our hypothetical GTFS will be valid from the start of 2014 to the start of 2015.

  "start_date":"20140101",
  "end_date":"20150101", Specify whether the service level values are periods, the amount of time between departures; or frequencies, the number of departures in an hour.

 "use_periods":true, Set whether or not service should run in both directions of a shapefile line. If “is_bidirectional” is false, you’ll need to make a shapefile feature for each route direction.

 "is_bidirectional":true, Set whether the GTFS should contain precise times, or whether it should be frequency-based.

  "exact":true
}

This config file, combined with a shapefile that I manually drew for every one of the fifty revised routes in King County, produced a GTFS representing a reasonable approximation of the realigned routes. That’s only a part of the puzzle though. For the next part, we’ll need resample_gtfs.

Using resample_gtfs

resample_gtfs is a tool used to change the level of service in an existing GTFS feed. This is accomplished by using a configuration file to identify a set of existing GTFS trips, and a target level of service to replace those trips. For example: we may specify that we want to resample all trips for route 255 between 6am and 9am to run every half hour, whereas before it may have operated more frequently. We do this by composing a configuration file “service.json” that looks like this:

{
  "windows": [
    {
      "start": 6,
      "end": 9,
      "name": "peak_am",
      "service_ids": [
        "WEEKDAY"
      ]
    },
  ],
  "routes": [
    {
      "route": "255",
      "headways": {
        "peak_am": 30.0,
      },
    },
  ]
}

And then run the resample_gtfs tool:

$ resample_gtfs original_gtfs.zip config.json resampled_gtfs.json

This will go through the original_gtfs.zip file, and look for all trips running on service_id “WEEKDAY” associated with a route with the short name “255”. It groups them by direction, and then picks an exemplar route for each direction. Then it copies that trip at 30 minute intervals between 6am and 9am, setting the service_id for the trip to “WEEKDAY” but otherwise leaving the rest of the GTFS alone.

The config.json file for resampling King County is somewhat more complicated. I wrote a small script for converting the service cut spreadsheet to a config.json file. Here are some example “routes” entries:

{
      "route": "1",
      "headways": {
        "peak_am": 15.0,
        "sun": null,
        "peak_pm": 15.0,
        "night": 45.0,
        "midday": 30.0,
        "sat": null
      },
      "trips": {
        "peak_am": null,
        "peak_pm": null
      }
}

Note there are two ways of specifying how resample_gtfs fills out a service window: ‘headways’ or ‘trips’. If there’s an entry in the ‘headways’ section, resample_gtfs will fill up the service window with the headway. Using ‘trips’, resample_gtfs will place the specified number of trips in the middle of the service window with a headway deduced from the original GTFS.

{
      "route": "9EX",
      "trip_filters":[["trip_short_name","EXPRESS"]],
      "headways": {
        "peak_am": null,
        "sun": null,
        "peak_pm": null,
        "night": null,
        "midday": null,
        "sat": null
      },
      "trips": {
        "peak_am": 9,
        "peak_pm": 8
      }
}

This routes entry has a property “trip_filters”, which specifies that resample_gtfs should only resample trips of route “9” that have the ‘trip_short_name’ of “EXPRESS”. This will leave all non-express trips unaffected.

Finally, there are routes entry like:

{
      "suppress": true,
      "route": "2",
}

All trips for the route 2 will simply be left out of the resampled GTFS. Remember, we created a fake GTFS out of a shapefile with geom2gtfs. We need some way to keep all those routes out of the resampled GTFS, so we when we piece them together as an OpenTripPlanner graph, we don’t end up with two colliding routes wherever we produced a route in geom2gtfs.

Putting it all together

By this point we’ve used geom2gtfs to build a GTFS of realigned routes, and resample_gtfs to edit the service level of the existing King County GTFS. The next step is to build an OpenTripPlanner graph file out of those two feeds and an OSM street file, and then subject the resulting graph to the analysis procedure detailed in the first section.

Building the postcut GTFS

We’ll start by building a partial GTFS of revised alignments. Grab this shapefile fo revised alignments, and this shapefile of stops. Finally you’ll need this config file. Unzip the shapefiles into a handy directory, edit the config file to reflect the location of the stops shapefile, and run:

$ java -jar /path/to/geom2gtfs/build/libs/geom2gtfs.jar /path/to/alignment.shp /path/to/config.json /path/to/alignment_gtfs.zip

Next we need to resample an existing GTFS. Grab the King County GTFS, and this configuration file. Then run:

$ java -jar /path/to/resample_gtfs/build/libs/resample_gtfs.jar /path/to/kingco_gtfs.zip /path/to/config.json /path/to/resampled_gtfs.zip

Place both GTFS files into a directory along with an openstreetmap file (available here). Go to your OpenTripPlanner directory, and run:

$ otp --build /path/to/gtfs_and_osm_files/

OTP will build a graph file suitable for analysis. Actually running the analysis is out of the scope of this document.

Analyzing the results

First of all, in order to make sense of the job access analysis we need to take a look at the The service cuts are pretty deep. Here’s a map of every route that will be completely cut:

cutroutes

And here’s every route that will be reduced.

revisedroutes

To further gain some intuition, here’s a map of every deleted route on top of a map of job density. As a rule of thumb, every route that touches a major job center will result in some degree of job access reduction once it’s gone.

cutroutesjobdensity

Finally, on to the analysis. Here’s a map showing the ratio of precut job access to postcut job access. “Job access” here is defined as the number of jobs reacahble by transit within an hour starting at 7am.

fig6

The deepest red corresponds to which have between 0% and 13% precut job access. Yellow corresponds to areas with 100% precut job access - that is, areas where job access is unchanged. The deepest blue corresponds to areas which have as much as fifteen times original job access - places where the job access increased, sometimes by dramatic amounts.

This map is a little too complicated to analyze with the naked eye. We can simplify the color scheme to four categories: dark green (>200% of original), light green (130% to 200%), light red (70% to 90% of original) and dark red (<70% of original job access).

fig7

This map looks very scattered. This analysis shows a number of regions where job access increased as a result of service cuts. This is a result of changes in departure times eliminating boarding waits in a number of places. For example imagine a bus leaves every 15 minutes and the next departure from Stop A is 7:05am. After the service cut, the bus leaves every half an hour and as a result of random trip-shuffling now leaves at 7:00am, effectively increasing job access at 7:00am despite service cuts. This type of thing isn’t too uncommon; you see it all over the place.

This map is also “grainy”. One block will lose job access, while the block next to it will remain unchanged or see increased job access. This is a symptom of the all-or-nothing nature of transit travel. If you’re one minute early to a train departure the whole city is your oyster, but if you’re one minute late you’re playing Angry Birds on the platform for half an hour.

In response to this, we could look at the job access for several points during the hour 7-8am and average the precut/postcut average job access. A map of job access ratio sampled every 15 minutes from 7am to 8am (inclusive) looks like this:

fig8

The graininess is reduced somewhat. With smaller sample periods the graininess would reduce while the computational time would increase. This last map, for instance, took nearly an hour of time on a 32-core machine. We’re working on a new “Profile Routing” feature in OpenTripPlanner that will dramatically reduce this computational burden by allowing a single query to calculate the minimum, maximum and average journey times for a given time range.

Andrew Owen of the University of Minnesota Accessibility Observatory produced a similar accessibility analysis of Minneapolis using OpenTripPlanner by averaging job accessibility every minute of every hour of a day, for a total of 1440 samples. The results are much less grainy.

Access impact and population density

The population density of King County is of course not uniform. Here’s a map of the population density of Seattle and immediately adjoining areas. The lightest areas have a population density up to 120/km^2. The darkest blue areas have a population about 10000/km^2. The darkest red areas have a population density as high as 180000/km^2.

popdensity

It’s disingenuous to imply that a deep service cut on a sparsely populated area is the same as a similar service cut on a dense urban area. In the next map I’ve used population density to modulate the darkness of a map of service impacts. The red areas have under 70% original morning job access. The purple areas have 70% to 90% morning job access.

accessimpactwithdensity

This map is more useful for identifying places where lots of people will be affected by service cuts. In King County, the Overlake region in Bellevue and Rainier Valley in Seattle both seem significantly impacted in densely settled areas.

Clamped job access

We can can ask a slightly different question - how many jobs can we get to within one hour of transit travel, starting at 7:00am, not counting the initial wait. Because the initial wait is squished out of the hour reserved for commuting, we call this the clamped job access. This metric has advantages and disadvantages. The main advantage of clamping job access it’s relatively insensitive to small changes in departure time resulting from the resampler moving a trip. If your link to the central business district was moved from 7:05 to 7:15 but on-vehicle travel time stayed the same, the clamped job accessibility contributed by the link to downtown wouldn’t change, whereas the unclamped job accessibility would suffer somewhat for having to incorporate an extra ten minute wait. On the other hand, average wait times do affect job accessibility. If a link to a job center increases its headway from 15 minutes to 30 minutes, the jobs at the job center are actually less accessible. As a result, clamped job accessibility primarily identifies areas where job access was changed by the wholesale addition or subtraction of a whole route.

The following map shows clamped one-hour public transit job access at 7am, with no averaging over the 7-8 morning hour.

fig9

You can see that it’s a little less grainy than the unclamped job access maps. This isn’t necessarily better, though. It’s showing a slightly different thing.

Conclusions

Though King County Metro has done a lot of work to avoid cutting access while cutting service hours by sixteen percent, several areas were left without morning job access.

  • Western slope of Finn Hill: loss of the 260 and 935 has serious effect on morning commute.
  • Southern tip of Mercer Island: loss of the 201 has serious impact on morning job access.
  • South of Alki Point: deletion of the 57 leaves a small stretch without convenient service.
  • Woodinville: Loss of DART service to central Woodinville cuts off the link to downtown.
  • Northern Rose Hill/South Juanita: loss of the 260 means a long walk to the 255 in order to get downtown.
  • Eastern Overlake: loss of the 250 affects job access to downtown.
  • Lake Kathleen: realignment of route 111 leaves Lake Kathleen without any morning commute service.

For all this detail, this is only one of the many questions that can be asked, and should be asked, about accessibility impacts of service changes.

Some future things to investigate:

  • How can we investigate school access? Childcare access? Food access? Recreation access?
  • What is level of job access and job access change at different times of day?
  • What is the socioeconomic profile of the regions most hard hit by service changes?
  • The LODES dataset provides origin/destination pairs for every commute on record. We can ask ask, for a given origin, how many commutes out of that origin are transit-accessible? How is it affected by a service change?