OVERCOMING VALUE OVERESTIMATION FOR DISTRIBUTIONAL REINFORCEMENT LEARNING-BASED PATH PLANNING WITH CONSERVATIVE CONSTRAINTS

Yuwan Gu, Yongtao Chu, Fang Meng, Yan Chen, Jidong Lv, and Shoukun Xu

References

  1. [1] Md. A. K. Niloy, A. Shama, R.K. Chakrabortty, M.J. Ryan,F.R. Badal, Z. Tasneem, Md. H. Ahamed, S.I. Moyeen, S.K.Das, Md. F. Ali, Md. R. Islam, and D.K. Saha, Critical designand control issues of indoor autonomous mobile robots: Areview, IEEE Access, 9, 2021, 35338–35370.
  2. [2] L.A. Nguyen, T.D. Ngo, T.D. Pham, and X.T. Truong, Anefficient navigation system for autonomous mobile robots indynamic social environments, International Journal of Roboticsand Automation, 37(1), 2022, 97–106.
  3. [3] R.H. Abiyev, N. Akkaya, E. Aytac, and D. Ibrahim, Behaviourtree based control for efficient navigation of holonomic robots,International Journal of Robotics and Automation, 29(6), 2014,2014.
  4. [4] X. Yang, W. Yang, and H. Zhang, A new method for robotpath planning based artificial potential field, Proc. 2016 IEEE11th Conf. on Industrial Electronics and Applications (ICIEA),Hefei, 2016, 1294–1299.
  5. [5] S. Sedighi, D.V. Nguyen, and K.D. Kuhnert, Guided hybridA-star path planning algorithm for valet parking applications,Proc. 2019 5th International Conf. on Control, Automationand Robotics (ICCAR), Beijing, 2019, 570–575.
  6. [6] K. Ming, Solving path planning problem based on ant colonyalgorithm, Proc. 2017 29th Chinese Control And DecisionConf. (CCDC), Chongqing, 2017, 5391–5395.7
  7. [7] V. Mnih, K. Kavukcuoglu, D. Silver, and A. Rusu, Human-level control through deep reinforcement learning, Nature,518(7540), 2015, 529–533.
  8. [8] J. Degrave, F. Felici, J. Buchli, M. Neunert, and B.Tracey, Magnetic control of tokamak plasmas through deepreinforcement learning, Nature, 602(7897), 2022, 414–419.
  9. [9] M. Tang and V.W.S. Wong, Deep reinforcement learningfor task offloading in mobile edge computing systems, IEEETransactions on Mobile Computing, 21(6), 2022, 1985–1997.
  10. [10] D. Silver, T. Hubert, J. Schrittwieser, and I. Antonoglou, Ageneral reinforcement learning algorithm that masters chess,shogi, and Go through self-play, Science, 362(6419), 2018,1140–1144.
  11. [11] T. Morimura, M. Sugiyama, H. Kashima, H. Hachiya, and T.Tanaka, Nonparametric return distribution approximation forreinforcement learning, Proc. of the 27th International Conf. onInternational Conference on Machine Learning, Haifa, 2010,799–806.
  12. [12] H. van Hasselt, A. Guez, and D. Silver, Deep reinforcementlearning with double Q-learning, Proceedings of the AAAIConference on Artificial Intelligence, 30(1), 2016.
  13. [13] Y. Liu, M. Cong, H. Dong, and D. Liu, Reinforcementlearning and EGA-based trajectory planning for dual robots,International Journal of Robotics and Automation, 33, 2018.
  14. [14] X. Tang, Y. Yang, T. Liu, X. Lin, K. Yang, and S. Li, Pathplanning and tracking control for parking via soft actor-criticunder non-ideal scenarios, I EEE/CAA Journal of AutomaticaSinica, 11(1), 2023, 181–195.
  15. [15] D. Hong, S. Lee, Y. H. Cho, D. Baek, J. Kim, and N.Chang, Energy-efficient online path planning of multiple dronesusing reinforcement learning, IEEE Transactions on VehicularTechnology, 70(10), 2021, 9725–9740.
  16. [16] S. Thrun and A. Schwartz, Issues in using functionapproximation for reinforcement learning, Proc. of the FourthConnectionist Models Summer School, Hillsdale, NJ, 1993.
  17. [17] S. Fujimoto, H. van Hoof, and D. Meger, Addressing functionapproximation error in actor-critic methods, Proc. of the 35thInternational Conf. on Machine Learning, Stockholm, 2018,1587–1596.
  18. [18] J. Lyu, X. Ma, J. Yan, and X. Li, Efficient continuous controlwith double actors and regularized critics, Proceedings of theAAAI Conference on Artificial Intelligence, 36(7), 2022.
  19. [19] M.G. Bellemare, W. Dabney, and R. Munos, A distributionalperspective on reinforcement learning, Proc. of the 34thInternational Conf. on Machine Learning, Sydney NSW, 2017,449–458.
  20. [20] T. Nguyen-Tang, S. Gupta, and S. Venkatesh, Distributionalreinforcement learning via moment matching, Proceedingsof the AAAI Conference on Artificial Intelligence, 2021,9144–9152.
  21. [21] W. Dabney, M. Rowl, M. Bellemare, and R. Munos,Distributional reinforcement learning with quantile regression,Proceedings of the AAAI Conference on Artificial Intelligence,32, 2017.
  22. [22] A.S. Lowet, Q. Zheng, S. Matias, J. Drugowitsch, and N.Uchida, Distributional reinforcement learning in the brain,Trends in Neurosciences, 43(12), 2020, 980–997.
  23. [23] C. Banerjee, Z. Chen, and N. Noman, Improved soft actor-critic: Mixing prioritized off-policy samples with on-policyexperiences, IEEE Transactions on Neural Networks andLearning Systems, 35(3), 2022, 3121–3129.
  24. [24] A. Kumar, A. Zhou, G. Tucker, and S. Levine, ConservativeQ-learning for offline reinforcement learning, Proc. Advancesin Neural Information Processing Systems, Virtual, 2020,1179–1191.
  25. [25] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, Soft actor-critic: Off-policy maximum entropy deep reinforcement learningwith a stochastic actor, Proc. of the 35th International Conf.on Machine Learning, Stockholm, 2018, 1861–1870.
  26. [26] M. Arjovsky, S. Chintala, and L. Bottou, Wasserstein generativeadversarial networks, Proc. International Conf. on MachineLearning, Sydney, 2017, 214–223.
  27. [27] T. Schaul, D. Horgan, K. Gregor, and D. Silver, Universalvalue function approximators, Proc. of the 32nd InternationalConf. on Machine Learning, Lille, 2015, 1312–1320.
  28. [28] N. Tishby and N. Zaslavsky, Deep learning and the informationbottleneck principle, Proc. 2015 IEEE Information TheoryWorkshop (ITW), Jerusalem, 2015, 1–5.
  29. [29] Y. Mo, L. Peng, J. Xu, X. Shi, and X. Zhu, Simpleunsupervised graph representation learning, Proceedings of theAAAI Conference on Artificial Intelligence, 2022, 7797–7805.
  30. [30] D.P. Kingma and M. Welling, Auto-encoding variational Bayes,2014, arXiv:1312.6114.
  31. [31] D. Kingma and J. Ba, Adam: A method for stochasticoptimization, Computer Science, 2014.
  32. [32] J. Duan, Y. Guan, S. E. Li, Y. Ren, Q. Sun, and B.Cheng, Distributional soft actor-critic: Off-policy reinforcementlearning for addressing value estimation errors, IEEETransactions on Neural Networks and Learning Systems,33(11), 2022, 6584–6598.

Important Links:

Go Back