MODEL-FREE ONLINE REINFORCEMENT LEARNING OF A ROBOTIC MANIPULATOR

Jerry Sweafford Jr. and Farbod Fahimi

References

  1. [1] R. Bellman, Dynamic Programming (Princeton, NJ, USA:Princeton University Press, 1957).
  2. [2] W.T. Miller, P.J. Werbos, and R.S. Sutton, Neural Networksfor Control (Cambridge, MA, USA: MIT Press, 1990).
  3. [3] D.V. Prokhorov and D.C. Wunsch, Adaptive critic designs,IEEE Transactions on Neural Networks, 8(5), 1997, 997–1007.
  4. [4] D. Liu, X. Xiong, and Y. Zhang, Action-dependent adaptivecritic designs, Proc. of Int. Joint Conf. on Neural Networks(IJCNN), Washington, DC, USA, Vol. 2, 2001, 990–995.
  5. [5] P. He and S. Jagannathan, Reinforcement learning-based out-put feedback control of nonlinear systems with input constraints, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics), 35(1), 2005, 150–154.
  6. [6] L. Yang, J. Si, K.S. Tsakalis, and A.A. Rodriguez, Directheuristic dynamic programming for nonlinear tracking controlwith filtered tracking error, IEEE Transactions on Systems,Man, and Cybernetics, Part B (Cybernetics), 39(6), 2009,1617–1622.
  7. [7] Y.-J. Liu, L. Tang, S. Tong, C.P. Chen, and D.-J. Li, Reinforcement learning design-based adaptive tracking control withless learning parameters for nonlinear discrete-time MIMOsystems, IEEE Transactions on Neural Networks and LearningSystems, 26(1), 2015, 165–176.
  8. [8] Q. Yang and S. Jagannathan, Reinforcement learning controllerdesign for affine nonlinear discrete-time systems using onlineapproximators, IEEE Transactions on Systems, Man, andCybernetics, Part B (Cybernetics), 42(2), 2012, 377–390.
  9. [9] H. Deng, H.-X. Li, and Y.-H. Wu, Feedback-linearization-based neural adaptive control for unknown nonaffine nonlineardiscrete-time systems, IEEE Transactions on Neural Networks,19(9), 2008, 1615–1625.
  10. [10] Q. Yang, J.B. Vance, and S. Jagannathan, Control of nonaffinenonlinear discrete-time systems using reinforcement-learning-based linearly parameterized neural networks, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics),38(4), 2008, 994–1001.
  11. [11] Z. Chen, Y. Zhang, and Z. Wang, Output feedback stabilizationof a class of uncertain non-affine nonlinear systems in discretetime, Control and Intelligent Systems, 44(2), 2016.
  12. [12] J.-H. Park, S.-H. Huh, S.-H. Kim, S.-J. Seo, and G.-T. Park,Direct adaptive controller for nonaffine nonlinear systems usingself-structuring neural networks, IEEE Transactions on NeuralNetworks, 16(2), 2005, 414–422.
  13. [13] X. Yang, D. Liu, D. Wang, and Q. Wei, Discrete-time onlinelearning control for a class of unknown nonaffine nonlinearsystems using reinforcement learning, Neural Networks, 55,2014, 30–41.
  14. [14] Z. Wang, P. Goldsmith, and J. Gu, Adaptive trajectory trackingcontrol for Euler-Lagrange systems with application to robotmanipulators, Control and Intelligent Systems, 37(1), 2009,46–56.
  15. [15] F. Fahimi and S. Praneeth, A universal trajectory trackingcontroller for mobile robots via model-free online reinforcementlearning, Control and Intelligent Systems, 43(1), 2015.
  16. [16] W. Gao, Y. Wang, and A. Homaifa, Discrete-time variablestructure control systems, IEEE Transactions on IndustrialElectronics, 42(2), 1995, 117–122.
  17. [17] R. Bellman, Dynamic programming and stochastic controlprocesses, Information and Control, 1(3), 1958, 228–239.

Important Links:

Go Back