Effectiveness of trip planner data in predicting short-term bus ridership

More Info
expand_more

Abstract

Predictions on Public Transport (PT) ridership are beneficial as they allow for sufficient and cost-efficient deployment of vehicles. On an operational level, this relates to short-term predictions with lead times of less than an hour. Where conventional data sources on ridership, such as Automatic Fare Collection (AFC) data, may have longer lag times and contain no travel intentions, in contrast, trip planner data are often available in (near) real-time and are used before traveling. In this paper, we investigate how such data from a trip planner app can be utilized for short-term bus ridership predictions. This is combined with AFC data (in this case smart card data) to construct a ground truth on actual ridership. Using informative variables from the trip planner dataset through correlation analysis, we develop 3 supervised Machine Learning (ML) models, including k-nearest neighbors, random forest, and gradient boosting. The best-performing model relies on random forest regression with trip planner requests. Compared with the baseline model that depends on the weekly trend, it reduces the mean absolute error by approximately half. Moreover, using the same model with and without trip planner data, we prove the usefulness of trip planner data by an improved mean absolute error of 8.9% and 21.7% and an increased coefficient of determination from a 5-fold cross-validation of 7.8% and 18.5% for two case study lines, respectively. Lastly, we show that this model performance is maintained even for the trip planner requests with prediction lead times up to 30 min ahead, and for different periods of the day. We expect our methodology to be useful for PT operators to elevate their daily operations and level of service as well as for trip planner companies to facilitate passenger replanning, in particular during peak hours.