RESEARCH PROGRESS ABOUT DEEP REINFORCEMENT LEARNING, 210-217. SI

doi:10.2316/J.2023.201-0371

RESEARCH PROGRESS ABOUT DEEP REINFORCEMENT LEARNING, 210-217. SI

Liu Liu and Lin-hui Chen

References

[1] V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness,M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland,G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou,H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis,Human-level control through deep reinforcement learning,Nature, 518(7540), 2015, 529–533.
[2] L. Sergey, F. Chelsea, D. Trevor, and P. Abbeel, End-to-end training of deep visuomotor policies, Journal of MachineLearning Research, 17(39), 2016, 1–40.
[3] D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre,G. Van Den Driessche, J. Schrittwieser, I. Antonoglou,V. Panneershelvam, M. Lanctot, and S. Dieleman, Masteringthe game of Go with deep neural networks and tree search,Nature, 529(7587), 2016, 484–489.
[4] C. David, Using machine learning in communication networks[Invited], IEEE/OSA Journal of Optical Communications andNetworking, 10(10), 2018, 100–109.
[5] Y. Kok-Lim Alvin, K. Peter, and T.D. Paul, Reinforcementlearning for context awareness and intelligence in wirelessnetworks: Review, new features and open issues, Journal ofNetwork and Computer Applications, 35(1), 2012, 253–267.
[6] A. Kai, D. Marc Peter, B. Miles, and A.A. Bharath, Deepreinforcement learning: A brief survey, IEEE Signal ProcessingMagazine, 34(6), 2017, 26–38.
[7] Y. Lecun, Y. Bengio, and G. Hinton, Deep learning, Nature,521, 2015, 436–444.
[8] D.E. Rumelhart, G.E. Hinton, and R.J. Williams, Learningrepresentations by back-propagating errors, Nature, 323, 1986,533–536.
[9] H.E. Geoﬀrey, O. Simon, and T. Yee-Whye, A fast learningalgorithm for deep belief nets, Neural Computation, 18(7),2006, 1527–1554.
[10] G.E. Hinton and R.R. Salakhutdinov, Reducing the dimension-ality of data with neural networks, Science, 313(5786), 2006,504–507.
[11] F. Zubair Md., T. Fengxiao, M. Bomin, N. Kato, O. Akashi,T. Inoue, and K. Mizutani, State-of-the-art deep learning:evolving machine intelligence toward tomorrow’s intelligentnetwork traﬃc control systems, IEEE Communications Surveys& Tutorials, 19(4), 2017, 2432–2455.
[12] R.S. Sutton and A.G. Barto, Reinforcement learning: Anintroduction, (Cambridge: Cambridge Univ. Press, 1998), 1–13.215
[13] M. Hamidreza, R. Isura, L.L. Frank, and D.O. Popa, Optimizedassistive human-robot interaction using reinforcement learning,IEEE Transactions on Cybernetics, 46(3), 2016, 655–667.
[14] H.-T.L. Chiang, J. Hsu, M. Fiser, L. Tapia, and A. Faust,RL-RRT: Kinodynamic motion planning via learning reachabil-ity estimators from RL policies, IEEE Robotics and AutomationLetters, 4(4), 2019, 4298–4305.
[15] N. Sadeghianpourhamami, J. Deleu, and C. Develder, Deﬁnitionand evaluation of model-free coordination of electrical vehiclecharging with reinforcement learning, IEEE Transactions onSmart Grid, 11(1), 2020, 203–214.
[16] L.P. Kaelbling, M.L. Littman, and A.R. Cassandra, Planningand acting in partially observable stochastic domains, ArtiﬁcialIntelligence, 101(1/2), 1998, 99–134.
[17] B. Jang, M. Kim, G. Harerimana, and J.W. Kim, Q-learningalgorithms: A comprehensive classiﬁcation and applications,IEEE Access, 7, 2019, 133653–133667.
[18] Y. Wang, T.-H.S. Li, and C. Lin, Backward Q-learning: Thecombination of Sarsa algorithm and Q-learning, EngineeringApplications of Artiﬁcial Intelligence, 26(9), 2013, 2184–2193.
[19] S. Parisi, V. Tangkaratt, J. Peters, and M.E. Khan,TD-regularized actor-critic methods, Machine Learning,108(8–9), 2019, 1467–1501.
[20] H. Liu, Y. Wu, and F. Sun, Extreme trust region policyoptimization for active object recognition, IEEE Transactionson Neural Networks and Learning Systems, 29(6), 2018,2253–2258.
[21] N. Vanvuchelen, J. Gijsbrechts, and R. Boute, Use of proximalpolicy optimization for the joint replenishment problem,Computers in Industry, 119, 2020, 103239.
[22] V. Mnih, K. Kavukcuoglu, D. Silver, G. Alex, A. Ioannis,W. Daan, and R. Martin, Playing Atari with deep reinforcementlearning, Proc. of the Workshops at the 26th Neural InformationProcessing Systems, New York: ACM, 2013, 201–220.
[23] G. Konidaris, S. Osentoski, and P. Thomas, Value functionapproximation in reinforcement learning using the Fourierbasis, Proc. of 2011 AAAI Conf. on Artiﬁcial Intelligence,Palo Alto, USA: AAAI Press, 2011, 1–17.
[24] M.E. Connell, E. Connell, and P.E. Utgoﬀ, Learning to controla dynamic physical system, Computational Intelligence, 3(1),1987, 330–337.
[25] C.G. Atkeson, A.W. Moore, and S. Schaal, Locally weightedlearning for control, (Berlin, Germany: Springer, 1997).
[26] M. Yogeswaran and S.G. Ponnambalam, Reinforcementlearning: Exploration-exploitation dilemma in multi-agentforaging task, Opsearch, 49(3), 2012, 223–236.
[27] L.J. Lin, Self-improving reactive agents based on reinforcementlearning, planning and teaching, Machine Langnage, 8(3/4),1992, 293–321.
[28] H. van Hasselt, A. Guez, and D. Silver, Deep reinforcementlearning with double Q-learning. Proc. Thirtieth AAAI Conf.on Artiﬁcial Intelligence, New York: ACM, 2016, 2094–2100.
[29] J. Liu, F. Gao, and X. Luo, Survey of deep reinforcementlearning based on value function and policy gradient, ChineseJournal of Computers, 42(6), 2019, 1406–1438.
[30] T.-W. Ban, an autonomous transmission scheme using duelingDQN for D2D communication networks, IEEE Transactionson Vehicular Technology, 69(12), 2020, 16348–16352.
[31] L. Huang, H. Fu, A. Rao, A.A. Irissappane, J. Zhang, andC. Xu, A distributional perspective on multiagent cooperationwith deep reinforcement learning, IEEE Transactions on NeuralNetworks and Learning Systems, 2022, 36121959.
[32] P. Jan and S. Stefan, Natural actor-critic, Neurocomputing,71(7–9), 2008, 1180–1190.
[33] Q. Wei, L. Wang, Y. Liu, and M.M. Polycarpou, Optimalelevator group control via deep asynchronous actor-criticlearning, IEEE Transactions on Neural Networks and LearningSystems, 31(12), 2021, 5245–5256.
[34] N. Tasﬁ and M.A.M. Capretz, Noisy importance samplingactorcritic: An oﬀ-policy actor-critic with experience replay,Proc. 2020 International Joint Conf. on Neural Networks,Piscataway: IEEE, 2020, 1–8.
[35] H. Johannes, L. Marc, and S. David, Fictitious self-play inextensive-form games, Proc. of International Conf. on MachineLearning Research, 37(37), 2015, 805–813.
[36] K. Li, B. Jiu, W. Pu, H. Liu, and X. Peng, Neural ﬁctitiousself-play for radar antijamming dynamic game with imperfectinformation, IEEE Transactions on Aerospace and ElectronicSystems, 58(6), 2022, 5533–5547.
[37] H. Cuayahuitl, S. Keizer, and O. Lemon, Strategic dia-logue management via deep reinforcement learning, 2015,arXiv:1511.08099.
[38] Z. Liu, Q. Liu, L. Tang, K. Jin, H. Wang, M. Liu, and H. Wang,Visuomotor reinforcement learning for multirobot cooperativenavigation, IEEE Transactions on Automation Science andEngineering, 19(4), 2021, 3234–3245.
[39] S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen,Learning hand-eye coordination for robotic grasping with deeplearning and large-scale data collection, International Journalof Robotics Research, 37(4–5), 2018, 421–436.
[40] I. Lenz, R.A. Knepper, and A. Saxena, DeepMPC: Learningdeep latent features for model predictive control, Proc. of theRobotics Science and Systems, Rome, Italy, 2015, 201–209.
[41] D.B. Tim, K. Jens, and T. Karl, Deep reinforcement learning forrobotic manipulation, Proc. IEEE/RSJ International Conf. onIntelligent Robots and Systems (IROS 2016), 2016, 3947–3952.
[42] S. B. Remman, I. Str¨umke, and A. M. Lekkas, Causalversus marginal shapley values for robotic lever manipulationcontrolled using deep reinforcement learning, Proc. AmericanControl Conf. (ACC), 2022, 2683–2690.
[43] A. Yahya, A. Li, M. Kalakrishnan, Y. Chebotar, and S. Levine,Collective robot reinforcement learning with distributedasynchronous guided policy search, Proc. IEEE InternationalConf. on Intelligent Robots and Systems, 2017, 79–86.
[44] W. Liu, J. Zhong, R. Wu, B. L. Fylstra, J. Si, andH. H. Huang, Inferring human-robot performance objectivesduring locomotion using inverse reinforcement learning andinverse optimal control, IEEE Robotics and Automation Letters,7(2), 2022, 2549–2556.
[45] N. Bredeche and N. Fontbonne, Social learning in swarmrobotics, Philosophical Transactions of the Royal SocietyB-Biological Sciences, 377(1843), 2020, 20200309.
[46] Q. Fang, X. Xu, X. Wang, and Y Zeng, Target-drivenvisual navigation in indoor scenes using reinforcement learningand imitation learning, CAAI Transactions on IntelligenceTechnology, 7(2), 2022, 167–176.
[47] Y. Lyu, Y. Shi, and X. Zhang, improving target-driven visualnavigation with attention on 3D spatial relationships, NeuralProcessing Letters, 54(5), 2022, 3979–3998.
[48] T. Dong, F. Xue, C. Xiao, and J. Li, Task scheduling basedon deep reinforcement learning in a cloud manufacturingenvironment, Concurrency and Computation-Practice &Experience, 32(11), 2020, e5654.
[49] Y. Shang, J. Li, M. Qin, and Q. Yang, Deep reinforce-ment learning-based task scheduling in heterogeneous MECnetworks, Proc. IEEE 95th Vehicular Technology Conf.:(VTC-Spring), Helsinki, Finland, 2022, 1–6.

Important Links:

Abstract
DOI: 10.2316/J.2023.201-0371
From Journal (201) Mechatronic Systems and Control - 2023

Go Back