Acting in the Face of Uncertainty

Pessimism in Offline Model-Based Reinforcement Learning

Bachelor thesis (2024)

Authors

S.K. van Wolfswinkel Electrical Engineering, Mathematics and Computer Science

Contributors

J. He Sequential Decision Making - (mentor)

F.A. Oliehoek Sequential Decision Making - (graduation committee member)

M.M. de Weerdt Algorithmics - (graduation committee member)

Faculty

Electrical Engineering, Mathematics and Computer Science, Electrical Engineering, Mathematics and Computer Science

To reference this document use:

http://resolver.tudelft.nl/uuid:2fa22ec9-88a4-4f71-bb04-ef606ec6b48a

More Info

expand_more

Published Date

27-06-2024

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Abstract

Offline model-based reinforcement learning uses a model of the environment, learned from a static dataset of interactions, to guide policy generation. Sub-optimal planning decisions can be made when the agent explores states that are out-of-distribution, as the world model will have more uncertainty. This paper explores the use of pessimism, the tendency to avoid uncertain states, in the planning procedure. We evaluate Lower Confidence Bound, ensembles, and Monte Carlo dropout in the MinAtar breakout environment. Results indicate that ensemble methods yield the highest performance, with a significant performance gain over the baseline, while LCB also shows varying degrees of improvement. MC dropout is generally shown to not yield a performance improvement.

Files

Research_Project_Final.pdf

(pdf | 0.949 Mb)

Unknown license