Evaluating students' study habits using Bayesian Multinomial Logistic Regression
More Info
expand_more
Abstract
The Bayesian approach is a very important approach for tackling problems in statistics. It involves choosing a distribution that reflects the prior knowledge and thus takes all knowledge into account in contrast to the frequentist approach. It also assumes that the parameters (the regression coefficients) follow a distribution called the posterior distribution instead of fixed constants. When a specific choice of this prior is made, this needs to be justified as the prior directly influences the posterior distribution of the regression coefficients. It is also possible to consider priors that do not carry a lot of information and such priors will be compared in this project.
In this thesis, the Bayesian approach will be used to apply a multinomial logistic regression model to data concerning students’ study habits and beliefs. The data is provided by a research group called PRIME and they focus on mathematics education at the TU Delft. Multinomial logistic regression is used to find predictions of the choices expressed in probabilities. Bayesian statistics is not only useful in a sense that it offers the possibility to specify the prior knowledge, but also because the Bayesian way of thinking can be incorporated in evaluating results. This can be done by constructing credible intervals for the predicted probabilities. Overlap between intervals can then give insight on prediction quality.
In this project, the models are coded in R and here two packages are used: the UPG and the BRMS package. The priors that are compared are the Gaussian and Cauchy distributions. Other than that there are also default priors used in the packages, which can be compared to the Gaussian and Cauchy priors. In the end, a conclusion can be drawn about the performance of each model based on the prediction accuracy. It can be concluded that the BRMS package outperforms the UPG package in terms of accuracy both using default priors and overall using default priors gives more accurate results than specifying the prior. However, the difference in the accuracy of the model using the BRMS package is not significantly higher than the accuracy obtained from the UPG model and the running time is a lot higher for the BRMS package. From the models with a specified prior, the model with the Cauchy distribution as prior performed better.