OVERCOMING VALUE OVERESTIMATION FOR DISTRIBUTIONAL REINFORCEMENT LEARNING-BASED PATH PLANNING WITH CONSERVATIVE CONSTRAINTS

doi:10.2316/J.2024.206-1114

OVERCOMING VALUE OVERESTIMATION FOR DISTRIBUTIONAL REINFORCEMENT LEARNING-BASED PATH PLANNING WITH CONSERVATIVE CONSTRAINTS

Yuwan Gu, Yongtao Chu, Fang Meng, Yan Chen, Jidong Lv, and Shoukun Xu

References

[1] Md. A. K. Niloy, A. Shama, R.K. Chakrabortty, M.J. Ryan,F.R. Badal, Z. Tasneem, Md. H. Ahamed, S.I. Moyeen, S.K.Das, Md. F. Ali, Md. R. Islam, and D.K. Saha, Critical designand control issues of indoor autonomous mobile robots: Areview, IEEE Access, 9, 2021, 35338–35370.
[2] L.A. Nguyen, T.D. Ngo, T.D. Pham, and X.T. Truong, Aneﬃcient navigation system for autonomous mobile robots indynamic social environments, International Journal of Roboticsand Automation, 37(1), 2022, 97–106.
[3] R.H. Abiyev, N. Akkaya, E. Aytac, and D. Ibrahim, Behaviourtree based control for eﬃcient navigation of holonomic robots,International Journal of Robotics and Automation, 29(6), 2014,2014.
[4] X. Yang, W. Yang, and H. Zhang, A new method for robotpath planning based artiﬁcial potential ﬁeld, Proc. 2016 IEEE11th Conf. on Industrial Electronics and Applications (ICIEA),Hefei, 2016, 1294–1299.
[5] S. Sedighi, D.V. Nguyen, and K.D. Kuhnert, Guided hybridA-star path planning algorithm for valet parking applications,Proc. 2019 5th International Conf. on Control, Automationand Robotics (ICCAR), Beijing, 2019, 570–575.
[6] K. Ming, Solving path planning problem based on ant colonyalgorithm, Proc. 2017 29th Chinese Control And DecisionConf. (CCDC), Chongqing, 2017, 5391–5395.7
[7] V. Mnih, K. Kavukcuoglu, D. Silver, and A. Rusu, Human-level control through deep reinforcement learning, Nature,518(7540), 2015, 529–533.
[8] J. Degrave, F. Felici, J. Buchli, M. Neunert, and B.Tracey, Magnetic control of tokamak plasmas through deepreinforcement learning, Nature, 602(7897), 2022, 414–419.
[9] M. Tang and V.W.S. Wong, Deep reinforcement learningfor task oﬄoading in mobile edge computing systems, IEEETransactions on Mobile Computing, 21(6), 2022, 1985–1997.
[10] D. Silver, T. Hubert, J. Schrittwieser, and I. Antonoglou, Ageneral reinforcement learning algorithm that masters chess,shogi, and Go through self-play, Science, 362(6419), 2018,1140–1144.
[11] T. Morimura, M. Sugiyama, H. Kashima, H. Hachiya, and T.Tanaka, Nonparametric return distribution approximation forreinforcement learning, Proc. of the 27th International Conf. onInternational Conference on Machine Learning, Haifa, 2010,799–806.
[12] H. van Hasselt, A. Guez, and D. Silver, Deep reinforcementlearning with double Q-learning, Proceedings of the AAAIConference on Artiﬁcial Intelligence, 30(1), 2016.
[13] Y. Liu, M. Cong, H. Dong, and D. Liu, Reinforcementlearning and EGA-based trajectory planning for dual robots,International Journal of Robotics and Automation, 33, 2018.
[14] X. Tang, Y. Yang, T. Liu, X. Lin, K. Yang, and S. Li, Pathplanning and tracking control for parking via soft actor-criticunder non-ideal scenarios, I EEE/CAA Journal of AutomaticaSinica, 11(1), 2023, 181–195.
[15] D. Hong, S. Lee, Y. H. Cho, D. Baek, J. Kim, and N.Chang, Energy-eﬃcient online path planning of multiple dronesusing reinforcement learning, IEEE Transactions on VehicularTechnology, 70(10), 2021, 9725–9740.
[16] S. Thrun and A. Schwartz, Issues in using functionapproximation for reinforcement learning, Proc. of the FourthConnectionist Models Summer School, Hillsdale, NJ, 1993.
[17] S. Fujimoto, H. van Hoof, and D. Meger, Addressing functionapproximation error in actor-critic methods, Proc. of the 35thInternational Conf. on Machine Learning, Stockholm, 2018,1587–1596.
[18] J. Lyu, X. Ma, J. Yan, and X. Li, Eﬃcient continuous controlwith double actors and regularized critics, Proceedings of theAAAI Conference on Artiﬁcial Intelligence, 36(7), 2022.
[19] M.G. Bellemare, W. Dabney, and R. Munos, A distributionalperspective on reinforcement learning, Proc. of the 34thInternational Conf. on Machine Learning, Sydney NSW, 2017,449–458.
[20] T. Nguyen-Tang, S. Gupta, and S. Venkatesh, Distributionalreinforcement learning via moment matching, Proceedingsof the AAAI Conference on Artiﬁcial Intelligence, 2021,9144–9152.
[21] W. Dabney, M. Rowl, M. Bellemare, and R. Munos,Distributional reinforcement learning with quantile regression,Proceedings of the AAAI Conference on Artiﬁcial Intelligence,32, 2017.
[22] A.S. Lowet, Q. Zheng, S. Matias, J. Drugowitsch, and N.Uchida, Distributional reinforcement learning in the brain,Trends in Neurosciences, 43(12), 2020, 980–997.
[23] C. Banerjee, Z. Chen, and N. Noman, Improved soft actor-critic: Mixing prioritized oﬀ-policy samples with on-policyexperiences, IEEE Transactions on Neural Networks andLearning Systems, 35(3), 2022, 3121–3129.
[24] A. Kumar, A. Zhou, G. Tucker, and S. Levine, ConservativeQ-learning for oﬄine reinforcement learning, Proc. Advancesin Neural Information Processing Systems, Virtual, 2020,1179–1191.
[25] T. Haarnoja, A. Zhou, P. Abbeel, and S. Levine, Soft actor-critic: Oﬀ-policy maximum entropy deep reinforcement learningwith a stochastic actor, Proc. of the 35th International Conf.on Machine Learning, Stockholm, 2018, 1861–1870.
[26] M. Arjovsky, S. Chintala, and L. Bottou, Wasserstein generativeadversarial networks, Proc. International Conf. on MachineLearning, Sydney, 2017, 214–223.
[27] T. Schaul, D. Horgan, K. Gregor, and D. Silver, Universalvalue function approximators, Proc. of the 32nd InternationalConf. on Machine Learning, Lille, 2015, 1312–1320.
[28] N. Tishby and N. Zaslavsky, Deep learning and the informationbottleneck principle, Proc. 2015 IEEE Information TheoryWorkshop (ITW), Jerusalem, 2015, 1–5.
[29] Y. Mo, L. Peng, J. Xu, X. Shi, and X. Zhu, Simpleunsupervised graph representation learning, Proceedings of theAAAI Conference on Artiﬁcial Intelligence, 2022, 7797–7805.
[30] D.P. Kingma and M. Welling, Auto-encoding variational Bayes,2014, arXiv:1312.6114.
[31] D. Kingma and J. Ba, Adam: A method for stochasticoptimization, Computer Science, 2014.
[32] J. Duan, Y. Guan, S. E. Li, Y. Ren, Q. Sun, and B.Cheng, Distributional soft actor-critic: Oﬀ-policy reinforcementlearning for addressing value estimation errors, IEEETransactions on Neural Networks and Learning Systems,33(11), 2022, 6584–6598.

Important Links:

Abstract
DOI: 10.2316/J.2024.206-1114
From Journal (206) International Journal of Robotics and Automation - 2025

Go Back