Q-value reuse between state abstractions for traffic light control
More Info
expand_more
Abstract
Previous research has in reinforcement learning for traffic control has used various state abstractions. Some use feature vectors while others use matrices of car positions. This paper first compares a simple feature vector consisting of only queue sizes per incoming lane to a matrix of car positions. Then it investigates if knowledge can be transferred from a simple agent using the feature vector abstraction to a more complex agent that uses the position matrix abstraction.We find that training cannot be sped up by first training an agent with the feature vector abstraction and then reusing this Q-function to train an agent with the position matrix abstraction. The simple agent does not take considerably fewer samples to converge, and the total time needed to first train the simple agent and then transfer exceeds the time needed to train the complex agent from scratch.