Travel behavior analysis using Artificial Neural Networks
Striking the balance between model complexity and data requirements
More Info
expand_more
Abstract
Despite having been known for a long time (e.g., McCulloch & Pitts, 1943; Rosenblatt, 1958), and despite having been occasionally used for the analysis of travel behavior since more than a decade ago (Hensher & Ton, 2000; Mohammadian & Miller, 2002), Artificial Neural Networks (ANNs) have only lately become – by a distance – the most prominent and promising Artificial Intelligence (AI) model for the analysis of travel behavior in the context of large, emerging data sources (e.g. Karlaftis & Vlahogianni, 2011; Chen et al., 2016; van Cranenburgh & Alwosheel, 2017). This sharp increase in the popularity of ANNs as a tool for travel behavior analysis has resulted from a range of improvements in ANNs’ capabilities, increases in computational power, and the rapidly increasing size and diversity of data which are at the disposal of choice modelers. This paper aims to help pave the way for further and effective deployment of ANNs for travel behavior analysis. It does so by highlighting and articulating an easily overlooked aspect of the ANN-methodology, which is of crucial importance for its successful use in a travel choice modeling context. More specifically, we study the relation between i) the assumed characteristics of the Data Generating Process (DGP; in this case the assumed model of travel choice behavior or decision rule), and ii) the size of the data that is required for meaningful, reliable travel choice analysis using ANNs. The core idea behind this relation is intuitive: if the DGP is relatively complex – e.g. highly non-linear – then a given ANN needs more data to be able to generate a reliable representation of the DGP, leading to accurate predictions. Despite or perhaps because of this straightforward intuition, choice modelers employing ANNs so far seem to have ignored important results from the AI literature which rigorously define this relation between the complexity of the DGP and resulting data-requirements, in the context of empirical analysis using ANNs. Such concepts as the Universal Approximation Theorem (Cybenko, 1989; Hornik et al., 1989), the notion of Probably Approximately Correct (Valiant, 1984) and the so-called V-C dimension (Vapnik & Chervonenkis, 1971) have helped AI-researchers in various fields of application determine the required size of their dataset as a function of the assumed characteristics of the DGP. This paper aims to introduce these theoretical concepts and notions from the AI-literature to the travel behavior research community, and moreover to translate them in a way that they can be readily used by travel choice modelers. By doing so, we aim to help travel behavior researchers who wish to use ANNs for discrete choice analysis, in the process of selecting data sets or collecting data. To focus our attention, we limit our discussion to the context of two particular travel choice models as DGPs: one is the well-known linear-additive MNL model based on utility maximization premises, which is the workhorse of discrete choice analysis and in many ways the least complex choice model available (Ben-Akiva & Lerman, 1985; Train, 2009). The other is the Random Regret Minimization model (in MNL form), which is one the most used behavioral alternatives to the canonical linear in parameters utility based MNL model (Chorus et al., 2008; van Cranenburgh et al., 2015). The regret function embedded in most RRM models is highly non-linear and includes attributes of all alternatives in the choice set. As such it is a considerably more ‘complex’ choice model than its utility based counterpart, something which for example shows in considerably higher runtimes (Guevara et al., 2016). As such, the comparison between these two models (i.e., DGPs) serves well to highlight how, in the context of discrete choice analysis based on ANNs, data-requirements follow from the characteristics – i.e., level of complexity – of the DGP. Our study thus consists of two parts. In Part 1 we will introduce all relevant concepts, notions and theorems that have been developed in the ANN literature to determine minimum sample sizes as a function of model complexity. We will make sure to present these ideas in a notation and framework that connects directly with conventional modeling practice in the travel behavior research community. In Part 2 we will use these ideas in a concrete example, for illustration purposes and to establish face validity. More specifically, we will show how Random Utility and Random Regret DGPs differ in terms of their data requirements, in the context of model estimation with ANNs. We conclude our study with the derivation and discussion of implications for researchers and practitioners in the field of travel behavior analysis. To get a flavor of the analyses which we performed in Part 2, we here present some first results. Our ‘empirical’ setting is a simple travel mode choice between three alternatives (car, bus, train) based on two attributes (travel time, travel cost). We generate two synthetic datasets containing mode choices: one dataset uses a Random Utility DGP (in MNL-form) and the other one uses a Random Regret DGP (also in MNL-form). Subsequently, we derive – using the introduced concepts from the ANN-literature – the theoretically expected minimum (training) sample size needed to achieve a reliable representation of the DGP by an appropriately specified ANN. We do this for the RUM and RRM DGPs, and show how – in line with expectations – the theoretically required minimum (training) sample size is larger for the latter. Finally, we verify this theoretical result by training ANNs, for each DGP, using increasingly large subsets of the synthetic data. As Figure 1 (RUM) and Figure 2 (RRM) show, the out of sample predictive ability – measured in terms of out of sample LogLikelihood – of the corresponding ANNs is found to increase sharply up to the theoretically identified minimum (training) sample size, after which marginal increments in model fit become notably smaller. This suggests that the theoretically established minimum sample size provides a reasonable indication of practical (training) sample size requirements for the two different DGPs.