MODEL-FREE MULTI-KERNEL LEARNING CONTROL FOR NONLINEAR DISCRETE-TIME SYSTEMS

Jiahang Liu, Xin Xu, Zhenhua Huang, and Chuanqiang Lian

References

  1. [1] R.S. Sutton and A.G. Barto, Reinforcement learning: An introduction (Cambridge, MA: MIT Press, 1998).
  2. [2] W.B. Powell, Approximate dynamic programming: Solving the curses of dimensionality, vol. 703 (New York: John Wiley & Sons, 2007).
  3. [3] F.L. Lewis and D. Vrabie, Reinforcement learning and adaptive dynamic programming for feedback control, IEEE Circuits and Systems Magazine, 9 (3), 2009, 32–50.
  4. [4] F.Y. Wang, H. Zhang, and D. Liu, Adaptive dynamic programming: An introduction, IEEE Computational Intelligence Magazine, 4 (2), 2009, 39–47.
  5. [5] M.L. Littman, Reinforcement learning improves behaviour from evaluative feedback, Nature, 521 (7553), 2015, 445–451.
  6. [6] J. Ni, X. Li, M. Hua, and S.X. Yang, Bioinspired neural network-based q-learning approach for robot path planning in unknown environments, International Journal of Robotics & Automation, 31 (6), 2016. DOI: 10.2316/Journal.206.2016.6. 206-4526
  7. [7] M.L. Puterman, Markov decision processes: Discrete stochastic dynamic programming (New York: Wiley, 2014).
  8. [8] C.J. Watkins and P. Dayan, Q-learning, Machine Learning, 8 (3–4), 1992, 279–292.
  9. [9] J. Baxter and P.L. Bartlett, Infinite-horizon policy-gradient estimation, Journal of Artificial Intelligence Research, 15, 2001, 319–350.
  10. [10] R.S. Sutton, H.R. Maei, D. Precup, S. Bhatnagar, D. Silver, C. Szepesvári, and E. Wiewiora, Fast gradient-descent methods for temporal-difference learning with linear function approximation, in Proc. 26th Annu. Int. Conf. on Machine Learning, Montreal, Canada (New York: ACM, 2009), 993–1000.
  11. [11] M.G. LAGouDAkis, Value function approximation, in C. Sammut and G.I. Webb (eds.) Encyclopedia of Machine Learning (Berlin: Springer, 2011), 1011–1021.
  12. [12] M. Geist and O. Pietquin, Algorithmic survey of parametric value function approximation, IEEE Transactions on Neural Networks and Learning Systems, 24 (6), 2013, 845–867.
  13. [13] Z. Huang, C. Lian, X. Xu, and J. Wang, Lateral control for autonomous land vehicles via dual heuristic programming, International Journal of Robotics & Automation, 31 (6), 2016. DOI: 10.2316/Journal.206.2016.6.206-4878
  14. [14] V.R. Konda and J.N. Tsitsiklis, Actor–critic algorithms, Proc. Neural Information Processing Systems, vol. 13, Denver, America, 1999, 1008–1014.
  15. [15] C.J. C.H. Watkins, Learning from delayed rewards, Ph.D. dissertation, University of Cambridge, England, 1989.
  16. [16] A.G. Barto, D.A. White and D.A. Sofge, Reinforcement learning and adaptive critic methods, in D.A. White and D.A. Sofge (eds.), Handbook of intelligent control, vol. 469, 1992, 491.
  17. [17] S. Bhasin, N. Sharma, P. Patre, and W. Dixon, Asymptotic tracking by a reinforcement learning-based adaptive critic controller, Journal of Control Theory and Applications, 9 (3), 2011, 400–409.
  18. [18] K.G. Vamvoudakis and F.L. Lewis, Online actor–critic algorithm to solve the continuous-time infinite horizon optimal control problem, Automatica, 46 (5), 2010, 878–888.
  19. [19] H. Zhang, L. Cui, X. Zhang, and Y. Luo, Data-driven robust approximate optimal tracking control for unknown general nonlinear systems using adaptive dynamic programming method, IEEE Transactions on Neural Networks, 22 (12), 2011, 2226–2236.
  20. [20] T. Dierks and S. Jagannathan, Online optimal control of affine nonlinear discrete-time systems with unknown internal dynamics by using time-based policy update, IEEE Transactions on Neural Networks and Learning Systems, 23 (7), 2012, 1118–1129.
  21. [21] D. Ormoneit and ´S. Sen, Kernel-based reinforcement learning, Machine Learning, 49 (2–3), 2002, 161–178.
  22. [22] X. Xu, D.W. Hu, and X.C. Lu, Kernel-based least-squares policy iteration for reinforcement learning, IEEE Transactions on Neural Networks, 18 (4), 2007, 973–992.
  23. [23] J.C. Santamar´ıa, R.S. Sutton, and A. Ram, Experiments with reinforcement learning in problems with continuous state and action spaces, Adaptive Behavior, 6 (2), 1997, 163–217.
  24. [24] X. Xu, Z.S. Hou, C.Q. Lian, and H.B. He, Online learning control using adaptive critic designs with sparse kernel machines, IEEE Transactions on Neural Networks and Learning System, 24 (5), 2013, 762–775.
  25. [25] X. Xu, C.Q. Lian, L. Zuo, and H.B. He, Kernel-based approximate dynamic programming for real-time online learning control: An experimental study, IEEE Transactions on Control and Systems Technology, 22 (1), 2014, 146–156.
  26. [26] F.R. Bach, G.R. Lanckriet, and M.I. Jordan, Multiple kernel learning, conic duality, and the SMO algorithm, Proc. 21st Int. Conf. Machine Learning, Banff, Alberta, Canada (New York: ACM, 2004), 6.
  27. [27] M. Gönen and E. Alpaydın, Multiple kernel learning algorithms, Journal of Machine Learning Research, 12 (July), 2011, 2211– 2268.
  28. [28] G.R. Lanckriet, N. Cristianini, P. Bartlett, L.E. Ghaoui, and M.I. Jordan, Learning the kernel matrix with semidefinite programming, Journal of Machine Learning Research, 5 (January), 2004, 27–72.
  29. [29] B. Schölkopf, K. Tsuda, and J.-P. Vert, Kernel methods in computational biology (Cambridge, MA: MIT Press, 2004).
  30. [30] S.S. Bucak, R. Jin, and A.K. Jain, Multiple kernel learning for visual object recognition: A review, IEEE Transactions on Pattern Analysis and Machine Intelligence, 36 (7), 2014, 1354–1369.
  31. [31] H. Xia, S.C. Hoi, R. Jin, and P. Zhao, Online multiple kernel similarity learning for visual search, IEEE Transactions on Pattern Analysis and Machine Intelligence, 36 (3), 2014, 536– 549.
  32. [32] C. Longworth and M.J. Gales, Multiple kernel learning for speaker verification, 2008 IEEE Int. Conf. Acoustics, Speech and Signal Processing Las Vegas, America (Piscataway ,NJ: IEEE, 2008), 1581–1584.
  33. [33] Y.-Y. Lin, T.-L. Liu, and C.-S. Fuh, Multiple kernel learning for dimensionality reduction, IEEE Transactions on Pattern Analysis and Machine Intelligence, 33 (6), 2011, 1147–1160.
  34. [34] A.F. Martins, M.A. Figueiredo, P.M. Aguiar, N.A. Smith, and E.P. Xing, Online multiple kernel learning for structured prediction, arXiv preprint arXiv:1010.2770, 2010.
  35. [35] W. Samek, A. Binder, and K.-R. Müller, Multiple kernel learning for brain–computer interfacing, 2013 35th Annu. Int. Conf. IEEE Engineering in Medicine and Biology Society (EMBC), Osaka, Japan (Piscataway, NJ: IEEE, 2013), 7048– 7051.
  36. [36] B. Schölkopf and A.J. Smola, Learning with kernels: Support vector machines, regularization, optimization, and beyond (Cambridge, MA: MIT Press, 2002).
  37. [37] M. Hu, Y. Chen, and J.T.-Y. Kwok, Building sparse multiplekernel SVM classifiers, IEEE Transactions on Neural Networks, 20 (5), 2009, 827–839.
  38. [38] Z.-Y. Chen and Z.-P. Fan, Dynamic customer lifetime value prediction using longitudinal data: An improved multiple kernel SVR approach, Knowledge-Based Systems, 43, 2013, 123–134.
  39. [39] J.A. Boyan, Technical update: Least-squares temporal difference learning, Machine Learning, 49 (2–3), 2002, 233–246.
  40. [40] F. Piltan, N. Sulaiman, A. Gavahian, S. Soltani, and S. Roosta, Design mathematical tunable gain PID-like sliding mode fuzzy controller with minimum rule base, International Journal of Robotics & Automation, 2 (2), 2011, 146–156.
  41. [41] H. ˇSiljak, Inverse matching-based mobile robot following algorithm using fuzzy logic, International Journal of Robotics & Automation, 29 (4), 2014, 369–377.
  42. [42] S. Islam, P.X. Liu, and A.E. Saddik, Adaptive sliding mode control of unmanned four rotor flying vehicle, International Journal of Robotics & Automation, 30(2), 2014, 140–148.
  43. [43] S. Jadlovská and J. Sarnovsk`y, Classical double inverted pendulum A complex overview of a system, 2012 IEEE 10th Int. Symp. on Applied Machine Intelligence and Informatics (SAMI). (Herl’any, Slovakia: IEEE, 2012), 103–108.

Important Links:

Go Back