Multi-agent hierarchical reinforcement learning with dynamic termination

Conference paper (2019)

Authors

Dongge Han University of Oxford

J.W. Böhmer University of Oxford

Michael Wooldridge University of Oxford

Alex Rogers University of Oxford

Affiliation

External organisation

Hierarchical reinforcement learning Multi-agent learning

To reference this document use:

http://resolver.tudelft.nl/uuid:c3c9adcc-5cf1-4325-ae72-5fdbdf4ade5b

More Info

expand_more

Published Date

01-01-2019

Language

English

Reuse Rights

Other than for strictly personal use, it is not permitted to download, forward or distribute the text or part of it, without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license such as Creative Commons.

Affiliation

External organisation

Abstract

In a multi-agent system, an agent's optimal policy will typically depend on the policies of other agents. Predicting the behaviours of others, and responding promptly to changes in such behaviours, is therefore a key issue in multi-agent systems research. One obvious possibility is for each agent to broadcast their current intention, for example, the currently executed option in a hierarchical RL framework. However, this approach results in inflexible agents when options have an extended duration. While adjusting the executed option at each step improves flexibility from a single-agent perspective, frequent changes in options can induce inconsistency between an agent's actual behaviour and its broadcasted intention. In order to balance flexibility and predictability, we propose a dynamic termination Bellman equation that allows the agents to flexibly terminate their options.