K. Saito, S. Masuda, and T. Yamaguchi (Japan)
reinforcement learning, State Partition method, rewards, POMDPs, stochastic policy
Reinforcement learning(RL) is a kind of machine learn ing. Its goal is that an autonomous agent opti mizes its behavior by progressively improving its per formance based on given rewards from an unknown environment. Partially Observable Markov Decision Process(POMDP) is a representative class of non Markovian environments, where agents sense different environmental states as the same input. If we ap ply the traditional exploitation-oriented methods to POMDPs, these methods cannot acquire a reasonable policy due to two deceptive problems. Therefore, in this paper, we propose a new approach called State Partition method for POMDPs where stochastic pol icy is necessary.
Important Links:
Go Back