Modeling User Behavior by using a POMDP Reinforcement Learning Algorithm

Fangju Wang


Human Activity and Behavior Understanding, Statistical and Probabilistic Modelling, Agent-based Modelling


In building a spoken dialogue system (SDS), disambiguation of user speech input is an important and challenging task. In our research, we develop a new disambiguation technique. The core component of the technique is a user behavior model that is used to predict user dialogue actions. When ambiguity occurs, information about predicted user actions can be used for disambiguation. We apply a reinforcement learning algorithm for creating and online updating the user behavior model. In the previous stage of our resaerch, the algorithm was based on the Markov Decision Process (MDP), which had limitations to deal with uncertainty in perceiving states. In this paper, we present a new reinforcement learning algorithm for user behavior modeling, which is based on POMDP (Partially Observable Markov Decision Process). We will describe how a learning agent creates and updates the user behavior model when it is uncertain about the current states, and how the agent applies the model for disambiguating user speech input. We will present an experimental system and initial experimental results as well.

Important Links:

Go Back