SOFT ACTOR-CRITIC REINFORCEMENT LEARNING FOR ROBOTIC MANIPULATOR WITH HINDSIGHT EXPERIENCE REPLAY

Li Yu, Tao Yan, Wen-An Zhang and Simon X. Yang

References

  1. [1] D. Silver, A. Huang, C. J. Maddison, A. Guez, L. Sifre et. al., Mastering the game of go with deep neural networks and tree search, Nature, 529(7587), 2016, 484–489.
  2. [2] A. Y. Ng, A. Coates, M. Diel, V. Ganapathi et. al., Autonomous inverted helicopter flight via reinforcement learning, Experimental Robotics IX, 2006, 363–372.
  3. [3] S. Levine, C. Finn, T. Darrell, P. Abbeel, End-to-end training of deep visuomotor policies, The Journal of Machine Learning Research, 17(1), 2016, 1334–1373.
  4. [4]S. Gu, T. Lillicrap, I. Sutskever, S. Levine, Continuous deep q-learning with model-based acceleration, Proc. International Conference on Machine Learning, 2016, 2829–2838.
  5. [5] M. P. Deisenroth, C. E. Rasmussen, D. Fox, Learning to control a low-cost manipulator using data-efficient reinforcement learning, Proc. Robotics: Science and Systems, 2011, 57-64.
  6. [6] J. Gläscher, N. Daw, P. Dayan, J. P. O’Doherty, States versus rewards: dissociable neural prediction error signals underlying model-based and model-free reinforcement learning, Neuron, 66(4), 2010, 585-595.
  7. [7] V. Mnih, A. P. Badia, M. Mirza, A. Graves et. al., Asynchronous methods for deep reinforcement learning, Proc. International conference on machine learning, 2016, 1928-1937.
  8. [8] J. Schulman, S. Levine, P. Abbeel, M. Jordan et. al., Trust region policy optimization, Proc. International Conference on Machine Learning, 2015, 1889–1897.
  9. [9] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu et. al., Human-level control through deep reinforcement learning, Nature, 518(7540), 2015, 529.
  10. [10] S. Bhatnagar, D. Precup, D. Silver, R. S. Sutton et. al., Convergent temporal-difference learning with arbitrary smooth function approximation, Proc. Advances in Neural Information Processing Systems, 2009, 1204–1212.
  11. [11] T. P. Lillicrap, J. J. Hunt, A. Pritzel, N. Heess et. al., Continuous control with deep reinforcement learning, arXiv preprint arXiv:1509.02971, 2015.
  12. [12] Y. Duan, X. Chen, R. Houthooft, J. Schulman et. al., Benchmarking deep reinforcement learning for continuous control, Proc. International Conference on Machine Learning, 2016, 1329–1338.
  13. [13] B. D. Ziebart, A. L. Maas, J. A. Bagnell, A. K. Dey, Maximum entropy inverse reinforcement learning, Proc. AAAI, 8, 2008, 1433–1438.
  14. [14] T. Haarnoja, H. Tang, P. Abbeel, S. Levine, Reinforcement learning with deep energy-based policies, arXiv preprint arXiv:1702.08165, 2017.
  15. [15] B. D. Ziebart, Modeling purposeful adaptive behavior with the principle of maximum causal entropy, (Carnegie Mellon University, 2010.)
  16. [16] M. Andrychowicz, F. Wolski, A. Ray, J. Schneider et. al, Hindsight experience replay, Proc. Advances in Neural Information Processing Systems, 2017, 5048–5058.
  17. [17] I. Popov, N. Heess, T. Lillicrap, R. Hafner et. al., Data-efficient deep reinforcement learning for dexterous manipulation, arXiv preprint arXiv:1704.03073, 2017.
  18. [18] B. Sallans, G. E. Hinton, Reinforcement learning with factored states and actions, Journal of Machine Learning Research, 5(Aug), 2004 1063–1088.
  19. [19] B. O’Donoghue, R. Munos, K. Kavukcuoglu, V. Mnih, Pgq: Combining policy gradient and q-learning, arXiv preprint arXiv:1611.01626, 2016.
  20. [20] T. Haarnoja, A. Zhou, P. Abbeel, S. Levine, Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor, arXiv preprint arXiv:1801.01290, 2018.
  21. [21] T. Schaul, D. Horgan, K. Gregor, D. Silver, Universal value function approximators, Proc. International Conference on Machine Learning, 2015, 1312–1320.
  22. [22] E. Todorov, T. Erez, Y. Tassa, Mujoco: A physics engine for model-based control, Proc. Intelligent Robots and Systems (IROS), 2012 IEEE/RSJ International Conference on, IEEE, 2012, 5026–5033.
  23. [23] G. E. Uhlenbeck, L.S. Ornstein, On the theory of the brownian motion, Physical review, 36(5), 1930, 823.
  24. [24] L. Wang, C. Luo, A hybrid genetic tabu search algorithm for mobile robot to solve AS/RS path planning, International Journal of Robotics and Automation, 33(2), 2018.
  25. [25] G. Ascioglu, Y. Senol, Prediction of lower extremity joint angles using neural networks for exoskeleton robotic leg, International Journal of Robotics and Automation, 33(2), 2018.

Important Links:

Go Back