Yuchi Zhang and Peter Xiaoping Liu
[1] W. Wang, L. Li, F. Ye, Y. Peng, and Y. Ma, a large-scalepath planning algorithm for underwater robots based on deepreinforcement learning, International Journal of Robotics andAutomation, 39, 2024, 204–210. [2] J. Hwangbo, J. Lee, A. Dosovitskiy, D. Bellicoso, V. Tsounis,V. Koltun, and M. Hutter, Learning agile and dynamicmotor skills for legged robots, Science Robotics, 4(26), 2019,5872. [3] L. Yi, M. Cong, H. Dong, and D. Liu, Reinforcement learningand EGA-based trajectory planning for dual robots, Inter-national Journal of Robotics and Automation, 33(4),2018,206–5084. [4] V. Mnih, K. Kavukcuoglu, D. Silver, A. Graves, I. Antonoglou,D. Wierstra, and M. Ried-miller, Playing Atari with deepreinforcement learning, 2013, arXiv:1312.5602. [5] V. Mnih, K. Kavukcuoglu, D. Silver, A. A. Rusu, J. Veness,M. G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland,8G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou,H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis,,Human-level control through deep reinforcement learning,Nature, 518(7540), 2015,529–533. [6] Y. Wang, Mastering the game of Gomoku without humanknowledge, Ph.D. dissertation, California Polytechnic StateUniversity, San Luis Obispo, CA, 2018. [7] D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre, G.V.D.Driessche, J. Schrittwieser, I. Antonoglou, V. Panneershelvam,M. Lanctot, S. Dieleman, D. Grewe, J. Nham, N. Kalchbrenner,I. Sutskever, T. Lillicrap, M. Leach, K. Kavukcuoglu, T.Graepel, and D. Hassabis, Mastering the game of go withdeep neural networks and tree search, Nature, 529(7587), 2016,484–489. [8] J. Perolat, B.D. Vylder, D. Hennes, E. Tarassov, F. Strub, V.Boer, P. Muller, J.T. Connor, N. Burch, T. Anthony et al,Mastering the game of Stratego with model-free multiagentreinforcement learning, Science, 378(6623), 2022,990–996. [9] D. Zou, L. Lu, W. Zhang, and J. Guo, A navigationmethod for UUVs under ocean current disturbance based ondeep reinforcement learning, in Proceeding 7th InternationalConference on Advanced Algorithms and Control Engineering(ICAACE), Shanghai,, 2024, 1165–1168. [10] B. Li, H. Zhang, and X. Shi, A novel path planning for AUVbased on dung beetle optimization algorithm with deep Q-network, International Journal of Robotics and Automation,40, 2024, 65–73. [11] C. Watkins, Learning from delayed rewards, Ph.D. dissertation,King’s College, London, 1989. [12] G.A. Rummery and M. Niranjan, On-line Q-learning usingconnectionist systems, Technical Report 166, Department ofEngineering, University of Cambridge, Cambridge, 1994. [13] R.S. Sutton, Learning to predict by the methods of temporaldifferences, Machine Learning, 3, 1988, 9–44. [14] R.S. Sutton, D. McAllester, S. Singh, and Y. Mansour, Policygradient methods for reinforcement learning with functionapproximation, in Proceeding Neural Information ProcessingSystems, Denver, CO, 1999, 1057–1063. [15] D. Silver, G. Lever, N. Heess, T. Degris, D. Wierstra, andM. Riedmiller, Deterministic policy gradient algorithms, inProceeding International Conference on Machine LearningJMLR, , Beijing, 2014, 387–395. [16] T.P. Lillicrap, J.J. Hunt, A. Pritzel, N.M.O. Heess, T.Erez, Y. Tassa, D. Silver, and D. Wierstra, Continuouscontrol with deep reinforcement learning, 2015, arXiv:1509.02971. [17] R. Fox, A. Pakman, and N. Tishby, Taming the noise inreinforcement learning via soft updates, 2015, arXiv:1512.08562. [18] O. Nachum, M. Norouzi, G. Tucker, and D. Schuurmans,Smoothed action value functions for learning Gaussian policies,in Proceeding International Conference on Machine LearningPMLR, Stockholm, 2018, 3692–3700. [19] H.V. Hasselt, A. Guez, and D. Silver, Deep reinforcementlearning with double Q-learning, in Proceeding AAAIConference on Artificial Intelligence, Phoenix, AZ, 2016,2094–2100. [20] O. Anschel, N. Baram, and N. Shimkin, Averaged-DQN:Variance reduction and stabilization for deep reinforcementlearning, in Proceeding International Conference on MachineLearning, Sydney, NSW, 2017, 176–185. [21] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, Soft actor-critic: Off-policy maximum entropy deep reinforcement learningwith a stochastic actor, in Proceeding International Conferenceon Machine Learning,, Stockholm, 2018, 1861–1870. [22] S. Fujimoto, H. Hoof, and D. Meger, Addressing functionapproximation error in actor-critic methods, in ProceedingInternational Conference on Machine Learning, , Stockholm,2018, 1587–1596. [23] E. Todorov, T. Erez, and Y. Tassa, MuJoCo: A physics enginefor modelbased control, in Proceeding IEEE/RSJ InternationalConference on Intelligent Robots and Systems, , Vilamoura,2012, 5026–5033. [24] J. Schulman, F. Wolski, P. Dhariwal, A. Radford, andO. Klimov, Proximal policy optimization algorithms, 2017,arXiv:1707.06347. [25] S. Thrun and A. Schwartz, Issues in using functionapproximation for reinforcement learning, in Proceedings ofthe Connectionist Models Summer School, Hillsdale, NJ, 2014,255–263. [26] R. Bellman, Dynamic programming, Science, 153(3731), 1966,34–37. [27] R.S. Sutton and A.G. Barto, Reinforcement learning: Anintroduction.( Cambridge, MA: MIT Press, 2018).
Important Links:
Go Back