M. Asadi and M. Huber (USA)
Markov Decision Process, Reinforcement Learning, Stochastic Systems
To operate effectively in complex environments learning agents have to selectively ignore irrelevant details by forming useful abstractions. In this paper we outline a formulation of abstraction for reinforcement learning approaches to stochastic decision problems by extending one of the recent minimization models, known as ε-reduction. The technique presented here extends ε-reduction to SMDPs by executing a policy instead of a single action, and grouping all states which have a small difference in transition probabilities and reward function under a given policy. When the reward structure is not known or multiple tasks need to be learned on the same environments, a two-phase method for state aggregation is introduced and a theorem in this paper shows the solvability of tasks using the two-phase method partitions. Simulations of different state spaces show that the policies in both MDP and this representation achieve similar results and that the total learning time in the partition space is much smaller than the total amount of time spent on learning in the original state space.
Important Links:
Go Back