Comparing Exploration Approaches in Deep Reinforcement Learning for Traffic Light Control

Conference paper (2020)

Authors

Y. Oren Student

R.A.N. Starre Interactive Intelligence -

F.A. Oliehoek Interactive Intelligence -

Research Group

Interactive Intelligence () (TU Delft)

To reference this document use:

http://resolver.tudelft.nl/uuid:423f0b70-f265-46be-bc58-2bd8daef7a25

More Info

expand_more

Published Date

2020

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Faculty

Electrical Engineering, Mathematics and Computer Science

Department

Intelligent Systems

Research Group

Interactive Intelligence

Abstract

Identifying the most efficient exploration approach for deep reinforcement learning in traffic light control is not a trivial task, and can be a critical step in the development of reinforcement learning solutions that can effectively reduce traffic congestion. It is common to use baseline dithering methods such as -greedy. However, the value of more evolved exploration approaches in this setting has not yet been determined. This paper addresses this concern by comparing the performance of the popular deep Q-learning algorithm using one baseline and two state of the art exploration approaches, and their combination. Specifically, -greedy is used as a baseline, and compared to the exploration approaches Bootstrapped DQN, randomized prior functions, and their combination. This is done in three different traffic scenarios, capturing different traffic profiles. The results obtained suggest that the higher the complexity of the traffic scenario, and the larger the size of the observation space of the agent, the larger the gain from efficient exploration. This is illustrated by the improved performance observed in the agents using efficient exploration and enjoying a larger observation space in the complex traffic scenarios.

Files

Bnaic2020proceedings01.pdf

(pdf | 10 Mb)

Unknown license