Data-driven Methods to Study Individual Choice Behaviour
with Applications to Discrete Choice Experiments and Participatory Value Evaluation Experiments
More Info
expand_more
Abstract
Since its origins in the 1970s, choice modelling has become an important field of study in diverse areas, including transportation, health economics, environmental economics and marketing. Choice modellers have developed several methods to collect and model individual choices. Researchers and policymakers use such methods to understand individual preferences in diverse contexts, derive economic values or predict behaviour.
Over the years, the field of choice modelling has been developed in two key areas. Firstly, choice modellers have developed new data collection tools to account for more realistic forms of decision-making. While discrete choice experiments (DCEs) are still popular and highly customisable, they force respondents to choose among mutually-exclusive alternatives, which may not reflect how individuals choose in real life. In response, new SC experiments have been proposed to incorporate more realistic forms of decision-making, such as Participatory Value Evaluation (PVE). In a PVE experiment, respondents select a combination of alternatives without surpassing resource constraints. Secondly, while theory-driven models based on utility theory, e.g., random utility maximisation (RUM) or Kuhn-Tucker, are still the norm to model choice behaviour, there is a broader recognition that individual' behaviour is ultimately unknown from the analyst perspective, data-driven methods can help to uncover such behaviour.
Despite the latter, to the author’s knowledge, three methodological and practical challenges are still unresolved in the literature. Firstly, no research has been done to explore the potential of data-driven methods to analyse data from SC experiments outside DCEs, and in particular for PVE experiments, either as complements to improve the specification of choice models or as standalone data analysis methods. Secondly, while data-driven methods for discrete choices (and DCEs) are available in the literature, such methods either sacrifice their flexibility to learn from the data to satisfy consistency assumptions or vice versa. Thus, a method that balances flexibility and consistency assumptions is lacking. Thirdly, there is a lack of software tools to estimate and compare data-driven methods easily and conveniently, hindering their widespread use.
Considering these challenges, this thesis further investigates how data-driven methods can be used for analysing individual choice behaviour from SC experiments, either to complement theory-driven choice models or alternatives to theory-driven choice models; and to develop methodological tools for such purposes, i.e., new models and software. This thesis scopes its research to two specific SC experiments: PVE and DCEs.
To reach the goals of this thesis, five novel studies are proposed. The first study (Chapter 2) introduces the reader to how PVE experiments are conducted in real-life and how they are conventionally analysed with theory-driven choice models. The second study (Chapter 3) proposes three procedures based on association rules (AR) learning and random forests (RF) to assist the specification and test the validity of the assumptions of theory-driven choice models for PVE experiments. The third study (Chapter 4) shows how XGBoost and SHAP -a machine learning model and explainable artificial intelligence method, respectively- can be used to analyse PVE experiments data as an alternative to theory-driven analysis. The fourth study (Chapter 5) proposes a new discrete choice model based on artificial neural networks that balances flexibility to learn utility functions from the data while satisfying consistency with RUM and economic theory. The fifth study (Chapter 6) introduces NP4VTT, a new software tool that provides five nonparametric models to uncover the VTT distribution from two-attribute-two-alternative DCEs. Together, these studies provide further evidence that supports the use of data-driven methods to analyse individual choice behaviour and specific methodological tools were provided for such purposes.
This thesis concludes by highlighting that while the primary research goal and sub-goals were achieved, the relevance of the findings and conclusions of this thesis shall be put into perspective. Firstly, using data-driven methods, either to assist choice models or as an alternative to them, lead to “moderate-to-modest” model fit improvements. Consequently, researchers or policymakers interested in using the methods proposed in this thesis for prediction should not expect considerable differences compared with conventional choice models. Secondly, the methods proposed in this thesis provide a considerable number of new insights of behavioural interest. Choice modellers could benefit from thesis insights to contrast or further assist the development of choice models, while policymakers have a wide range of new information for targeting decisions to specific policies or individuals. However, researchers should consider how to synthesise all these new insights effectively. Thirdly, this thesis made efforts to make more data-driven methods available by, for instance, publishing the studies in open-access journals and, when possible, making code and data publicly available for the general public. Nevertheless, there are still conceptual challenges to make these methods more amicable to researchers accustomed to the concepts and structure of the choice modelling community. As a final reflection, while having the potential to help choice modellers to increase their understanding of individual choice behaviour, data-driven methods still require more development (and being easily accessible) to serve as a real alternative to choice models.