CPG BASED RL ALGORITHM LEARNS TO CONTROL OF A HUMANOID ROBOT LEG

Önder Tutsoy

References

  1. [1] A.G. Barto, R.S. Sutton, and C.W. Anderson, Neuronlike adaptive elements that can solve difficult learning control problems, Artificial Neural Networks, 13, 1983, 81–93.
  2. [2] R.S. Sutton and A.G. Barto, Reinforcement learning: An introduction (Cambridge: The MIT Press, 1998), 1–300.
  3. [3] R.S. Sutton, Learning to predict by the methods of temporal differences, Machine Learning, 3, 1998, 9–44.
  4. [4] D. Kenji, Reinforcement learning in continuous time and space, Neural Computation, 12, 2000, 219–245.
  5. [5] M.R. Lagoudakis, R. Parr, and M. Littman, Least-squares methods in reinforcement learning for control, Methods and Applications of Artificial Intelligence, 2308, 2002, 249–260.
  6. [6] L. Busoniu, B.D. Schutter, and R. Babuska, Decentralized reinforcement learning control of a robotic manipulator, Control, Automation, Robotics and Vision, 9th International Conf., 2006, 1–6.
  7. [7] H. Benbrahim and J.A. Franklin, Biped dynamic walking using reinforcement learning, Robotics and Autonomous Systems, 22, 1996, 283–302.
  8. [8] G. Endo, J. Morimoto, T. Matsubara, J. Nakanishi, and G. Cheng, Learning CPG-based biped locomotion with a policy gradient method: Application to a humanoid robot, International Journal of Robotic Research, 27, 2008, 213–228.
  9. [9] G.L. Liu, M.K. Habib, K. Watanabe, and K. Izumi, Central pattern generators based on Matsuoka oscillators for the locomotion of biped robots, Artificial Life and Robotics, 12, 2008, 263–269.
  10. [10] N. Zeitlin, Reinforcement learning methods to enable automatic tuning of legged robots, Technical report, University of California, Berkeley, 2012.
  11. [11] A. Kralj, R.J. Jaeger, and M. Munih, Analysis of standing up and sitting down in humans: Definitions and normative data presentation, Journal of Biomechanics, 23, 1990, 1123–1138.
  12. [12] C. Rougier, J. Meunier, A. St-Anaud, and J. Rousseau, Fall detection from human shape and motion history using video surveillance, 21st International Conf. on Advanced Information Networking and Applications Workshops, AINAW ’07, 2007, 875–880.
  13. [13] J. Morimoto and K. Doya, Reinforcement learning of dynamic motor sequence: learning to stand up, Proc. IEEE/RSJ International Conf. on Intelligent Robots and Systems, 1998, 1721–1726.
  14. [14] J.N. Tsitsiklis, On the convergence of optimistic policy iteration, The Journal of Machine Learning Research, 3, 2002, 59–72.
  15. [15] M. Geist and O. Pietquin, Parametric value function approximation: A unified view, 2011 IEEE Symposium on Adaptive Dynamic Programming and Reinforcement Learning (ADPRL), 9–16.
  16. [16] F.L. Lewis and D. Vrabie, Reinforcement learning and adaptive dynamic programming for feedback control, IEEE Circuits and System Magazine, 9, 2009, 32–50.
  17. [17] J.N. Tsitsiklis and B.V. Roy, An analysis of temporal-difference learning with function approximation, IEEE Transactions on Automatic Control, 42, 1997, 674–690.
  18. [18] L. Baird, Residual algorithms: Reinforcement learning with function approximation, Proc. of the Twelfth International Conf. on Machine Learning, 1995, 30–37.
  19. [19] H. Kimura and S. Kobayashi, An analysis of actor/critic algorithms using eligibility traces: Reinforcement learning with imperfect value function, in Proceedings of the Fifteenth International Conference on Machine Learning, 1998, 278–286.
  20. [20] K. Matsuoka, N. Ohyama, A. Watanabe, and M. Ooshima, Control of a giant swing robot using a neural oscillator, Advances in Natural Computation, Lecture Notes in Computer Science, 3611, Springer, 2005, 274–282.
  21. [21] M. Tokic, Adaptive ε-greedy exploration in reinforcement learning based on value differences, Proc. of the 33rd Annual German Conf. on Advances in Artificial Intelligence, 2010, 203–210.
  22. [22] O. Tutsoy, M. Brown, and H. Wang, Reinforcement learning algorithm application and multi-body system design by using MapleSim and Modelica, International Conf. on Advanced Mechatronic Systems (ICAMechS), 2012, 650–655.
  23. [23] Y. Nakamura, T. Mori, M.A. Sato, and S. Ishii, Reinforcement learning for a biped robot based on a CPG-actor-critic method, Neural Networks, 20, 2007, 723–735.
  24. [24] MapleSim used in the creation of breakthrough vehicle driving simulator technology. Maplesoft [online], 2012, http://www.maplesoft.com/company/publications/articles/.
  25. [25] P. Goossens and T. Richard, (2011, Nov.) Using symbolic technology to derive inverse kinematic solutions for actuator control development. Maplesoft, Waterloo, Canada [Online]. Available: http://www.maplesoft.com/Whitepapers/Mathmod2012_ pgoossens_trichard_preprint.

Important Links:

Go Back