MODEL-FREE MULTI-KERNEL LEARNING CONTROL FOR NONLINEAR DISCRETE-TIME SYSTEMS

Jiahang Liu, Xin Xu, Zhenhua Huang, and Chuanqiang Lian

References

  1. [1] R.S. Sutton and A.G. Barto, Reinforcement learning: Anintroduction (Cambridge, MA: MIT Press, 1998).
  2. [2] W.B. Powell, Approximate dynamic programming: Solving thecurses of dimensionality, vol. 703 (New York: John Wiley &Sons, 2007).
  3. [3] F.L. Lewis and D. Vrabie, Reinforcement learning and adaptivedynamic programming for feedback control, IEEE Circuits andSystems Magazine, 9 (3), 2009, 32–50.
  4. [4] F.Y. Wang, H. Zhang, and D. Liu, Adaptive dynamic pro-gramming: An introduction, IEEE Computational IntelligenceMagazine, 4 (2), 2009, 39–47.
  5. [5] M.L. Littman, Reinforcement learning improves behaviourfrom evaluative feedback, Nature, 521 (7553), 2015, 445–451.
  6. [6] J. Ni, X. Li, M. Hua, and S.X. Yang, Bioinspired neuralnetwork-based q-learning approach for robot path planningin unknown environments, International Journal of Robotics& Automation, 31 (6), 2016. DOI: 10.2316/Journal.206.2016.6.206-4526
  7. [7] M.L. Puterman, Markov decision processes: Discrete stochasticdynamic programming (New York: Wiley, 2014).
  8. [8] C.J. Watkins and P. Dayan, Q-learning, Machine Learning,8 (3–4), 1992, 279–292.
  9. [9] J. Baxter and P.L. Bartlett, Infinite-horizon policy-gradientestimation, Journal of Artificial Intelligence Research, 15,2001, 319–350.
  10. [10] R.S. Sutton, H.R. Maei, D. Precup, S. Bhatnagar, D. Silver,C. Szepesv´ari, and E. Wiewiora, Fast gradient-descent methodsfor temporal-difference learning with linear function approxi-mation, in Proc. 26th Annu. Int. Conf. on Machine Learning,Montreal, Canada (New York: ACM, 2009), 993–1000.
  11. [11] M.G. LAGouDAkis, Value function approximation, inC. Sammut and G.I. Webb (eds.) Encyclopedia of MachineLearning (Berlin: Springer, 2011), 1011–1021.
  12. [12] M. Geist and O. Pietquin, Algorithmic survey of parametricvalue function approximation, IEEE Transactions on NeuralNetworks and Learning Systems, 24 (6), 2013, 845–867.
  13. [13] Z. Huang, C. Lian, X. Xu, and J. Wang, Lateral controlfor autonomous land vehicles via dual heuristic programming,International Journal of Robotics & Automation, 31 (6), 2016.DOI: 10.2316/Journal.206.2016.6.206-4878
  14. [14] V.R. Konda and J.N. Tsitsiklis, Actor–critic algorithms, Proc.Neural Information Processing Systems, vol. 13, Denver,America, 1999, 1008–1014.
  15. [15] C.J. C.H. Watkins, Learning from delayed rewards, Ph.D.dissertation, University of Cambridge, England, 1989.
  16. [16] A.G. Barto, D.A. White and D.A. Sofge, Reinforcement learn-ing and adaptive critic methods, in D.A. White and D.A. Sofge(eds.), Handbook of intelligent control, vol. 469, 1992, 491.
  17. [17] S. Bhasin, N. Sharma, P. Patre, and W. Dixon, Asymptotictracking by a reinforcement learning-based adaptive criticcontroller, Journal of Control Theory and Applications, 9 (3),2011, 400–409.
  18. [18] K.G. Vamvoudakis and F.L. Lewis, Online actor–critic algo-rithm to solve the continuous-time infinite horizon optimalcontrol problem, Automatica, 46 (5), 2010, 878–888.
  19. [19] H. Zhang, L. Cui, X. Zhang, and Y. Luo, Data-driven ro-bust approximate optimal tracking control for unknown gen-eral nonlinear systems using adaptive dynamic programmingmethod, IEEE Transactions on Neural Networks, 22 (12), 2011,2226–2236.
  20. [20] T. Dierks and S. Jagannathan, Online optimal control ofaffine nonlinear discrete-time systems with unknown internaldynamics by using time-based policy update, IEEE Transac-tions on Neural Networks and Learning Systems, 23 (7), 2012,1118–1129.
  21. [21] D. Ormoneit and ´S. Sen, Kernel-based reinforcement learning,Machine Learning, 49 (2–3), 2002, 161–178.
  22. [22] X. Xu, D.W. Hu, and X.C. Lu, Kernel-based least-squarespolicy iteration for reinforcement learning, IEEE Transactionson Neural Networks, 18 (4), 2007, 973–992.
  23. [23] J.C. Santamar´ıa, R.S. Sutton, and A. Ram, Experiments withreinforcement learning in problems with continuous state andaction spaces, Adaptive Behavior, 6 (2), 1997, 163–217.
  24. [24] X. Xu, Z.S. Hou, C.Q. Lian, and H.B. He, Online learning con-trol using adaptive critic designs with sparse kernel machines,IEEE Transactions on Neural Networks and Learning System,24 (5), 2013, 762–775.
  25. [25] X. Xu, C.Q. Lian, L. Zuo, and H.B. He, Kernel-based approx-imate dynamic programming for real-time online learning con-trol: An experimental study, IEEE Transactions on Controland Systems Technology, 22 (1), 2014, 146–156.
  26. [26] F.R. Bach, G.R. Lanckriet, and M.I. Jordan, Multiple kernellearning, conic duality, and the SMO algorithm, Proc. 21stInt. Conf. Machine Learning, Banff, Alberta, Canada (NewYork: ACM, 2004), 6.
  27. [27] M. G¨onen and E. Alpaydın, Multiple kernel learning algorithms,Journal of Machine Learning Research, 12 (July), 2011, 2211–2268.
  28. [28] G.R. Lanckriet, N. Cristianini, P. Bartlett, L.E. Ghaoui, andM.I. Jordan, Learning the kernel matrix with semidefinite pro-gramming, Journal of Machine Learning Research, 5 (January),2004, 27–72.
  29. [29] B. Sch¨olkopf, K. Tsuda, and J.-P. Vert, Kernel methods incomputational biology (Cambridge, MA: MIT Press, 2004).
  30. [30] S.S. Bucak, R. Jin, and A.K. Jain, Multiple kernel learningfor visual object recognition: A review, IEEE Transactionson Pattern Analysis and Machine Intelligence, 36 (7), 2014,1354–1369.
  31. [31] H. Xia, S.C. Hoi, R. Jin, and P. Zhao, Online multiple kernelsimilarity learning for visual search, IEEE Transactions onPattern Analysis and Machine Intelligence, 36 (3), 2014, 536–549.
  32. [32] C. Longworth and M.J. Gales, Multiple kernel learning forspeaker verification, 2008 IEEE Int. Conf. Acoustics, Speechand Signal Processing Las Vegas, America (Piscataway ,NJ:IEEE, 2008), 1581–1584.
  33. [33] Y.-Y. Lin, T.-L. Liu, and C.-S. Fuh, Multiple kernel learningfor dimensionality reduction, IEEE Transactions on PatternAnalysis and Machine Intelligence, 33 (6), 2011, 1147–1160.
  34. [34] A.F. Martins, M.A. Figueiredo, P.M. Aguiar, N.A. Smith,and E.P. Xing, Online multiple kernel learning for structuredprediction, arXiv preprint arXiv:1010.2770, 2010.
  35. [35] W. Samek, A. Binder, and K.-R. M¨uller, Multiple kernellearning for brain–computer interfacing, 2013 35th Annu. Int.Conf. IEEE Engineering in Medicine and Biology Society(EMBC), Osaka, Japan (Piscataway, NJ: IEEE, 2013), 7048–7051.
  36. [36] B. Sch¨olkopf and A.J. Smola, Learning with kernels: Supportvector machines, regularization, optimization, and beyond(Cambridge, MA: MIT Press, 2002).
  37. [37] M. Hu, Y. Chen, and J.T.-Y. Kwok, Building sparse multiple-kernel SVM classifiers, IEEE Transactions on Neural Networks,20 (5), 2009, 827–839.
  38. [38] Z.-Y. Chen and Z.-P. Fan, Dynamic customer lifetime valueprediction using longitudinal data: An improved multiplekernel SVR approach, Knowledge-Based Systems, 43, 2013,123–134.
  39. [39] J.A. Boyan, Technical update: Least-squares temporal differ-ence learning, Machine Learning, 49 (2–3), 2002, 233–246.
  40. [40] F. Piltan, N. Sulaiman, A. Gavahian, S. Soltani, and S. Roosta,Design mathematical tunable gain PID-like sliding mode fuzzycontroller with minimum rule base, International Journal ofRobotics & Automation, 2 (2), 2011, 146–156.
  41. [41] H. ˇSiljak, Inverse matching-based mobile robot following algo-rithm using fuzzy logic, International Journal of Robotics &Automation, 29 (4), 2014, 369–377.
  42. [42] S. Islam, P.X. Liu, and A.E. Saddik, Adaptive sliding modecontrol of unmanned four rotor flying vehicle, InternationalJournal of Robotics & Automation, 30(2), 2014, 140–148.
  43. [43] S. Jadlovsk´a and J. Sarnovsk`y, Classical double inverted pendulum A complex overview of a system, 2012 IEEE 10thInt. Symp. on Applied Machine Intelligence and Informatics(SAMI). (Herl’any, Slovakia: IEEE, 2012), 103–108.

Important Links:

Go Back