F.A. Oliehoek | TU Delft Repository

An Analysis of Model-Based Reinforcement Learning From Abstracted Observations

Journal article (2024) - R.A.N. Starre (author) , M. Loog (author) , E. Congeduti (author) , E. Congeduti (author) , F.A. Oliehoek (author)

Many methods for Model-based Reinforcement learning (MBRL) in Markov decision processes (MDPs) provide guarantees for both the accuracy of the model they can deliver and the learning efficiency. At the same time, state abstraction techniques allow for a reduction of the size of a ...

Policy Space Response Oracles

A Survey

Conference paper (2024) - A. Bighashdel (author) , A. Bighashdel (author) , Yongzhao Wang (author) , Stephen McAleer (author) , Rahul Savani (author) , Rahul Savani (author) , F.A. Oliehoek (author)

Game theory provides a mathematical way to study the interaction between multiple decision makers. However, classical game-theoretic analysis is limited in scalability due to the large number of strategies, precluding direct application to more complex scenarios. This survey prov ...

Safety Guarantees in Multi-agent Learning via Trapping Regions

Journal article (2023) - A.T. Czechowski (author) , F.A. Oliehoek (author)

One of the main challenges of multi-agent learning lies in establishing convergence of the algorithms, as, in general, a collection of individual, self-serving agents is not guaranteed to converge with their joint policy, when learning concurrently. This is in stark contrast to m ...

Safe Multi-agent Learning via Trapping Regions

Conference paper (2023) - A.T. Czechowski (author) , F.A. Oliehoek (author)

One of the main challenges of multi-agent learning lies in establishing convergence of the algorithms, as, in general, a collection of individual, self-serving agents is not guaranteed to converge with their joint policy, when learning concurrently. This is in stark contrast to m ...

Bad Habits: Policy Confounding and Out-of-Trajectory Generalization in RL

Conference paper (2023) - M. Suau (author) , M.T.J. Spaan (author) , F.A. Oliehoek (author)

Reinforcement learning agents may sometimes develop habits that are effective
only when specific policies are followed. After an initial exploration phase in which
agents try out different actions, they eventually converge toward a particular policy.
When this occurs, ...

What Lies beyond the Pareto Front? A Survey on Decision-Support Methods for Multi-Objective Optimization

Conference paper (2023) - Z. MS Osika (author) , J. Zatarain Salazar (author) , Diederik M. Roijers (author) , Diederik M. Roijers (author) , F.A. Oliehoek (author) , P.K. Murukannaiah (author)

We present a review that unifies decision-support methods for exploring the solutions produced by multi-objective optimization (MOO) algorithms. As MOO is applied to solve diverse problems, approaches for analyzing the trade-offs offered by MOO algorithms are scattered across fie ...

Teacher-apprentices RL (TARL)

Leveraging complex policy distribution through generative adversarial hypernetwork in reinforcement learning

Journal article (2023) - Shi Yuan Tang (author) , Athirai A. Irissappane (author) , F.A. Oliehoek (author) , Jie Zhang (author)

Typically, a Reinforcement Learning (RL) algorithm focuses in learning a single deployable policy as the end product. Depending on the initialization methods and seed randomization, learning a single policy could possibly leads to convergence to different local optima across diff ...

A Survey on Scenario Theory, Complexity, and Compression-Based Learning and Generalization

Journal article (2023) - Roberto Rocchetta (author) , Alexander Mey (author) , F.A. Oliehoek (author)

This work investigates formal generalization error bounds that apply to support vector machines (SVMs) in realizable and agnostic learning problems. We focus on recently observed parallels between probably approximately correct (PAC)-learning bounds, such as compression and compl ...

Distributed Influence-Augmented Local Simulators for Parallel MARL in Large Networked Systems

Conference paper (2022) - M. Suau (author) , J. He (author) , Mustafa Mert Çelikok (author) , M.T.J. Spaan (author) , F.A. Oliehoek (author)

Due to its high sample complexity, simulation is, as of today, critical for the successful application of reinforcement learning. Many real-world problems, however, exhibit overly complex dynamics, which makes their full-scale simulation computationally slow. In this paper, we sh ...

Overcoming Traffic Sensors Malfunctions with Deep Learning

Conference paper (2022) - V. Catalán Pastor (author) , E. Congeduti (author) , E. Congeduti (author) , F.A. Oliehoek (author)

Constant growth of cities and their rapid urbanization contribute significantly to an increase in traffic congestion, leading to high costs both in terms of time and fuel consumption. Intelligent Transportation Systems (ITSs) play an important role in managing traffic in urban ar ...

Online Planning in POMDPs with Self-Improving Simulators

Conference paper (2022) - J. He (author) , M. Suau (author) , Hendrik Baier (author) , Michael Kaisers (author) , F.A. Oliehoek (author)

How can we plan efficiently in a large and complex environment when the time budget is limited? Given the original simulator of the environment, which may be computationally very demanding, we propose to learn online an approximate but much faster simulator that improves over tim ...

Back to the Future

Solving Hidden Parameter MDPs with Hindsight

Conference paper (2022) - C.T. Ponnambalam (author) , Danial Kamran (author) , T. D. Simão (author) , F.A. Oliehoek (author) , M.T.J. Spaan (author)

MORAL: Aligning AI with Human Norms through Multi-Objective Reinforced Active Learning

Conference paper (2022) - M. Peschl (author) , A. Zgonnikov (author) , F.A. Oliehoek (author) , L. Cavalcante Siebert (author)

Inferring reward functions from demonstrations and pairwise preferences are auspicious approaches for aligning Reinforcement Learning (RL) agents with human intentions. However, state-of-the art methods typically focus on learning a single reward model, thus rendering it difficul ...

BADDr

Bayes-Adaptive Deep Dropout RL for POMDPs

Conference paper (2022) - Sammie Katt (author) , Hai Nguyen (author) , F.A. Oliehoek (author) , Christopher Amato (author)

While reinforcement learning (RL) has made great advances in scalability, exploration and partial observability are still active research topics. In contrast, Bayesian RL (BRL) provides a principled answer to both state estimation and the exploration-exploitation trade-off, but s ...

Best-Response Bayesian Reinforcement Learning with Bayes-adaptive POMDPs for Centaurs

Conference paper (2022) - M.M. Celikok (author) , F.A. Oliehoek (author) , Samuel Kaski (author) , Samuel Kaski (author)

Centaurs are half-human, half-AI decision-makers where the AI's goal is to complement the human. To do so, the AI must be able to recognize the goals and constraints of the human and have the means to help them. We present a novel formulation of the interaction between the human ...

Speeding up Deep Reinforcement Learning through Influence-Augmented Local Simulators

Conference paper (2022) - M. Suau (author) , J. He (author) , M.T.J. Spaan (author) , F.A. Oliehoek (author)

Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which that data can be obtained. In this paper, we study how to build lightweight simul ...

A Cross-Field Review of State Abstraction for Markov Decision Processes

Conference paper (2022) - E. Congeduti (author) , E. Congeduti (author) , F.A. Oliehoek (author)

Complex real-world systems pose a significant challenge to decision making: an agent needs to explore a large environment, deal with incomplete or noisy information, generalize the experience and learn from feedback to act optimally. These processes demand vast representation cap ...

Influence-Augmented Local Simulators

A Scalable Solution for Fast Deep RL in Large Networked Systems

Conference paper (2022) - M. Suau (author) , J. He (author) , M.T.J. Spaan (author) , F.A. Oliehoek (author)

Learning effective policies for real-world problems is still an open challenge for the field of reinforcement learning (RL). The main limitation being the amount of data needed and the pace at which that data can be obtained. In this paper, we study how to build lightweight simul ...

Model-Based Reinforcement Learning with State Abstraction: A Survey

Conference paper (2022) - R.A.N. Starre (author) , M. Loog (author) , F.A. Oliehoek (author)

Model-based reinforcement learning methods are promising since they can increase sample efficiency while simultaneously improving generalizability. Learning can also be made more efficient through state abstraction, which delivers more compact models. Model-based reinforcement le ...

Multi Robot Surveillance and Planning in Limited Communication Environments

Conference paper (2022) - V. Inna Kedege (author) , A.T. Czechowski (author) , Ludo Stellingwerff (author) , F.A. Oliehoek (author)

Distributed robots that survey and assist with search & rescue operations usually deal with unknown environments with limited communication. This paper focuses on distributed & cooperative multi-robot area coverage strategies of unknown environments, having constrained co ...