Liu Liu and Lin-hui Chen
[1] V. Mnih, K. Kavukcuoglu, D. Silver, A.A. Rusu, J. Veness,M.G. Bellemare, A. Graves, M. Riedmiller, A.K. Fidjeland,G. Ostrovski, S. Petersen, C. Beattie, A. Sadik, I. Antonoglou,H. King, D. Kumaran, D. Wierstra, S. Legg, and D. Hassabis,Human-level control through deep reinforcement learning,Nature, 518(7540), 2015, 529–533. [2] L. Sergey, F. Chelsea, D. Trevor, and P. Abbeel, End-to-end training of deep visuomotor policies, Journal of MachineLearning Research, 17(39), 2016, 1–40. [3] D. Silver, A. Huang, C.J. Maddison, A. Guez, L. Sifre,G. Van Den Driessche, J. Schrittwieser, I. Antonoglou,V. Panneershelvam, M. Lanctot, and S. Dieleman, Masteringthe game of Go with deep neural networks and tree search,Nature, 529(7587), 2016, 484–489. [4] C. David, Using machine learning in communication networks[Invited], IEEE/OSA Journal of Optical Communications andNetworking, 10(10), 2018, 100–109. [5] Y. Kok-Lim Alvin, K. Peter, and T.D. Paul, Reinforcementlearning for context awareness and intelligence in wirelessnetworks: Review, new features and open issues, Journal ofNetwork and Computer Applications, 35(1), 2012, 253–267. [6] A. Kai, D. Marc Peter, B. Miles, and A.A. Bharath, Deepreinforcement learning: A brief survey, IEEE Signal ProcessingMagazine, 34(6), 2017, 26–38. [7] Y. Lecun, Y. Bengio, and G. Hinton, Deep learning, Nature,521, 2015, 436–444. [8] D.E. Rumelhart, G.E. Hinton, and R.J. Williams, Learningrepresentations by back-propagating errors, Nature, 323, 1986,533–536. [9] H.E. Geoffrey, O. Simon, and T. Yee-Whye, A fast learningalgorithm for deep belief nets, Neural Computation, 18(7),2006, 1527–1554. [10] G.E. Hinton and R.R. Salakhutdinov, Reducing the dimension-ality of data with neural networks, Science, 313(5786), 2006,504–507. [11] F. Zubair Md., T. Fengxiao, M. Bomin, N. Kato, O. Akashi,T. Inoue, and K. Mizutani, State-of-the-art deep learning:evolving machine intelligence toward tomorrow’s intelligentnetwork traffic control systems, IEEE Communications Surveys& Tutorials, 19(4), 2017, 2432–2455. [12] R.S. Sutton and A.G. Barto, Reinforcement learning: Anintroduction, (Cambridge: Cambridge Univ. Press, 1998), 1–13.215 [13] M. Hamidreza, R. Isura, L.L. Frank, and D.O. Popa, Optimizedassistive human-robot interaction using reinforcement learning,IEEE Transactions on Cybernetics, 46(3), 2016, 655–667. [14] H.-T.L. Chiang, J. Hsu, M. Fiser, L. Tapia, and A. Faust,RL-RRT: Kinodynamic motion planning via learning reachabil-ity estimators from RL policies, IEEE Robotics and AutomationLetters, 4(4), 2019, 4298–4305. [15] N. Sadeghianpourhamami, J. Deleu, and C. Develder, Definitionand evaluation of model-free coordination of electrical vehiclecharging with reinforcement learning, IEEE Transactions onSmart Grid, 11(1), 2020, 203–214. [16] L.P. Kaelbling, M.L. Littman, and A.R. Cassandra, Planningand acting in partially observable stochastic domains, ArtificialIntelligence, 101(1/2), 1998, 99–134. [17] B. Jang, M. Kim, G. Harerimana, and J.W. Kim, Q-learningalgorithms: A comprehensive classification and applications,IEEE Access, 7, 2019, 133653–133667. [18] Y. Wang, T.-H.S. Li, and C. Lin, Backward Q-learning: Thecombination of Sarsa algorithm and Q-learning, EngineeringApplications of Artificial Intelligence, 26(9), 2013, 2184–2193. [19] S. Parisi, V. Tangkaratt, J. Peters, and M.E. Khan,TD-regularized actor-critic methods, Machine Learning,108(8–9), 2019, 1467–1501. [20] H. Liu, Y. Wu, and F. Sun, Extreme trust region policyoptimization for active object recognition, IEEE Transactionson Neural Networks and Learning Systems, 29(6), 2018,2253–2258. [21] N. Vanvuchelen, J. Gijsbrechts, and R. Boute, Use of proximalpolicy optimization for the joint replenishment problem,Computers in Industry, 119, 2020, 103239. [22] V. Mnih, K. Kavukcuoglu, D. Silver, G. Alex, A. Ioannis,W. Daan, and R. Martin, Playing Atari with deep reinforcementlearning, Proc. of the Workshops at the 26th Neural InformationProcessing Systems, New York: ACM, 2013, 201–220. [23] G. Konidaris, S. Osentoski, and P. Thomas, Value functionapproximation in reinforcement learning using the Fourierbasis, Proc. of 2011 AAAI Conf. on Artificial Intelligence,Palo Alto, USA: AAAI Press, 2011, 1–17. [24] M.E. Connell, E. Connell, and P.E. Utgoff, Learning to controla dynamic physical system, Computational Intelligence, 3(1),1987, 330–337. [25] C.G. Atkeson, A.W. Moore, and S. Schaal, Locally weightedlearning for control, (Berlin, Germany: Springer, 1997). [26] M. Yogeswaran and S.G. Ponnambalam, Reinforcementlearning: Exploration-exploitation dilemma in multi-agentforaging task, Opsearch, 49(3), 2012, 223–236. [27] L.J. Lin, Self-improving reactive agents based on reinforcementlearning, planning and teaching, Machine Langnage, 8(3/4),1992, 293–321. [28] H. van Hasselt, A. Guez, and D. Silver, Deep reinforcementlearning with double Q-learning. Proc. Thirtieth AAAI Conf.on Artificial Intelligence, New York: ACM, 2016, 2094–2100. [29] J. Liu, F. Gao, and X. Luo, Survey of deep reinforcementlearning based on value function and policy gradient, ChineseJournal of Computers, 42(6), 2019, 1406–1438. [30] T.-W. Ban, an autonomous transmission scheme using duelingDQN for D2D communication networks, IEEE Transactionson Vehicular Technology, 69(12), 2020, 16348–16352. [31] L. Huang, H. Fu, A. Rao, A.A. Irissappane, J. Zhang, andC. Xu, A distributional perspective on multiagent cooperationwith deep reinforcement learning, IEEE Transactions on NeuralNetworks and Learning Systems, 2022, 36121959. [32] P. Jan and S. Stefan, Natural actor-critic, Neurocomputing,71(7–9), 2008, 1180–1190. [33] Q. Wei, L. Wang, Y. Liu, and M.M. Polycarpou, Optimalelevator group control via deep asynchronous actor-criticlearning, IEEE Transactions on Neural Networks and LearningSystems, 31(12), 2021, 5245–5256. [34] N. Tasfi and M.A.M. Capretz, Noisy importance samplingactorcritic: An off-policy actor-critic with experience replay,Proc. 2020 International Joint Conf. on Neural Networks,Piscataway: IEEE, 2020, 1–8. [35] H. Johannes, L. Marc, and S. David, Fictitious self-play inextensive-form games, Proc. of International Conf. on MachineLearning Research, 37(37), 2015, 805–813. [36] K. Li, B. Jiu, W. Pu, H. Liu, and X. Peng, Neural fictitiousself-play for radar antijamming dynamic game with imperfectinformation, IEEE Transactions on Aerospace and ElectronicSystems, 58(6), 2022, 5533–5547. [37] H. Cuayahuitl, S. Keizer, and O. Lemon, Strategic dia-logue management via deep reinforcement learning, 2015,arXiv:1511.08099. [38] Z. Liu, Q. Liu, L. Tang, K. Jin, H. Wang, M. Liu, and H. Wang,Visuomotor reinforcement learning for multirobot cooperativenavigation, IEEE Transactions on Automation Science andEngineering, 19(4), 2021, 3234–3245. [39] S. Levine, P. Pastor, A. Krizhevsky, J. Ibarz, and D. Quillen,Learning hand-eye coordination for robotic grasping with deeplearning and large-scale data collection, International Journalof Robotics Research, 37(4–5), 2018, 421–436. [40] I. Lenz, R.A. Knepper, and A. Saxena, DeepMPC: Learningdeep latent features for model predictive control, Proc. of theRobotics Science and Systems, Rome, Italy, 2015, 201–209. [41] D.B. Tim, K. Jens, and T. Karl, Deep reinforcement learning forrobotic manipulation, Proc. IEEE/RSJ International Conf. onIntelligent Robots and Systems (IROS 2016), 2016, 3947–3952. [42] S. B. Remman, I. Str¨umke, and A. M. Lekkas, Causalversus marginal shapley values for robotic lever manipulationcontrolled using deep reinforcement learning, Proc. AmericanControl Conf. (ACC), 2022, 2683–2690. [43] A. Yahya, A. Li, M. Kalakrishnan, Y. Chebotar, and S. Levine,Collective robot reinforcement learning with distributedasynchronous guided policy search, Proc. IEEE InternationalConf. on Intelligent Robots and Systems, 2017, 79–86. [44] W. Liu, J. Zhong, R. Wu, B. L. Fylstra, J. Si, andH. H. Huang, Inferring human-robot performance objectivesduring locomotion using inverse reinforcement learning andinverse optimal control, IEEE Robotics and Automation Letters,7(2), 2022, 2549–2556. [45] N. Bredeche and N. Fontbonne, Social learning in swarmrobotics, Philosophical Transactions of the Royal SocietyB-Biological Sciences, 377(1843), 2020, 20200309. [46] Q. Fang, X. Xu, X. Wang, and Y Zeng, Target-drivenvisual navigation in indoor scenes using reinforcement learningand imitation learning, CAAI Transactions on IntelligenceTechnology, 7(2), 2022, 167–176. [47] Y. Lyu, Y. Shi, and X. Zhang, improving target-driven visualnavigation with attention on 3D spatial relationships, NeuralProcessing Letters, 54(5), 2022, 3979–3998. [48] T. Dong, F. Xue, C. Xiao, and J. Li, Task scheduling basedon deep reinforcement learning in a cloud manufacturingenvironment, Concurrency and Computation-Practice &Experience, 32(11), 2020, e5654. [49] Y. Shang, J. Li, M. Qin, and Q. Yang, Deep reinforce-ment learning-based task scheduling in heterogeneous MECnetworks, Proc. IEEE 95th Vehicular Technology Conf.:(VTC-Spring), Helsinki, Finland, 2022, 1–6.
Important Links:
Go Back