OpenTripPlanner and other trip planning systems typically calculate itineraries using arrival and departure times from published timetables. An increasing number of agencies are providing these data openly via GTFS and other formats. However, because of delays, cancellations, and other unforeseen changes in service, the true optimal path on any given day might be very different from the one implied by scheduled service. New data sources are making it possible to know and account for the current state of traffic and plan trips accordingly. With these real-time data sources now opening up, we have revised how OpenTripPlanner stores and manipulates timetables to allow continuous updates and thus more accurate results.
In many cities, equipment on buses and other vehicles relays position information to an operations center that monitors and coordinates transit traffic. Arrival times at subsequent stops are predicted based on deviation from the published schedule, possibly incorporating additional real-time or historical data on traffic congestion and punctuality. Many of these systems were originally intended to drive arrival time information displays provided for passengers at stops. When agencies distribute this data via an API, it can provide additional value by improving passenger itinerary planning.
In 2011, OTP developer Brian Ferris interfaced OTP with the One Bus Away real-time transit data system and demonstrated the feasibility of real-time trip planning as part of his doctoral research work. Building on that experience, we have redesigned OTP data structures to make routing with real-time arrival updates a standard feature.
In a large public transit system with many vehicles continually reporting their locations, new arrival time predictions may be produced on a trip-by-trip basis in a continuous stream. Aggregating these time-sensitive data and refreshing the OTP server only occasionally could reduce the value of these predictions to passengers. Periodic fetches also impose a brief heavy load on server resources rather than a continuous lighter load. To address these concerns, we have designed our real-time update components from the outset to allow both periodic polling of real-time data providers and continuous streaming of updates via publisher/subscriber feeds.
One major challenge presented by continuous real-time updates is concurrency control. At any given moment, an OpenTripPlanner server may be responding to several trip planning requests simultaneously, and each of those requests needs a coherent random-access view of both the updated and scheduled timetables. Until now, with very few exceptions the OpenTripPlanner graph was treated as a read-only data structure. Each request-handling thread might add and remove temporary edges using a simple copy-on-write strategy to avoid concurrent modification conflicts, but there was no mechanism for managing and reverting larger-scale write operations.
The simplest option would be to keep only one copy of the timetable data and make reading and writing mutually exclusive operations. In such a stop-the-world update strategy, there is only one copy of the timetable data and a single readers/writer lock on the entire data set. This lock may be held simultaneously by any number of reading threads (e.g. incoming trip planning API requests) but only one writer. If the writing thread must wait for all read operations to finish, we risk write starvation during peak periods when the system is under load. If on the other hand we favor write operations by delaying reads until at least one write has finished, we degrade OTP’s ability to handle multiple requests simultaneously, making it for all practical purposes single-threaded.
For ideas and inspiration on concurrent access to tabular data, we look to the large body of theory and experience surrounding database management systems since a DBMS must provide industrial-strength support for simultaneous transactions, eliminating race conditions and concerns about reversibility or data loss. The alternative we have adopted is a variety of multi-version concurrency control (MVCC). This approach is streamlined by the fact that in OTP, timetable reading and writing transactions (i.e. request or update handling threads) are entirely separate from one another, and while there may be a number of simultaneous reading threads, we can limit the number of writing threads to one by design. Each reader continues to see a stable snapshot of the entire data set for the duration of the search, even while the writing thread is updating its working copy. Updates are handled at a level of granularity that corresponds to an individual vehicle, avoiding excessive memory use by sharing unchanged timetable objects with successive snapshots.
Our initial implementation of these features is included in OTP release version 0.9.0, and will be updated as we pair it with various emerging data sources. Our next steps will be in determining how to expose this new information to users via our web or mobile clients. A person using a trip planning system is often not aware of the scheduled timetables. When a vehicle deviates from its schedule (i.e. is late) but the itinerary accounts for that lateness, he or she might not perceive that vehicle as being late at all. From the user’s perspective, it may be more significant that the expected arrival time of the vehicle has been confirmed or updated based on recent location information. Research on displaying real-time data to passengers indicates that color is usually the best way to signify delays and cancellations. Accordingly, times included in itineraries could be color-coded to indicate whether they are scheduled or predicted. Perhaps another color variation could indicate how recent the position data used is.
We look forward to hearing any suggestions on the OpenTripPlanner users’ and developers’ mailing lists!