B. Bakker (Switzerland) and J. Schmidhuber (The Netherlands)
Reinforcement learning, hierarchical reinforcement learn ing, feedforward neural networks, recurrent neural net works, MDPs, POMDPs, short-term memory
This paper describes a method for hierarchical reinforce ment learning in which high-level policies automatically discover subgoals, and low-level policies learn to special ize for different subgoals. Subgoals are represented as de sired abstract observations which cluster raw input data. High-level value functions cover the state space at a coarse level; low-level value functions cover only parts of the state space at a fine-grained level. An experiment shows that this method outperforms several flat reinforcement learn ing methods. A second experiment shows how problems of partial observability due to observation abstraction can be overcome using high-level policies with memory.
Important Links:
Go Back