AN ADVICE MECHANISM FOR HETEROGENEOUS ROBOT TEAMS, 53-68.

doi:10.2316/J.2020.206-0166

AN ADVICE MECHANISM FOR HETEROGENEOUS ROBOT TEAMS, 53-68.

Steven Daniluk and M. Reza Emami

References

[1] C.J.C.H. Watkins, Learning from delayed rewards, Ph.D. dissertation, King’s College, Cambridge, UK, May 1989. Available: http://www.cs.rhul.ac.uk/∼chrisw/new_thesis.pdf
[2] M.J. Matari´c, Reinforcement learning in the multi-robot domain, Autonomous Robots, 4 (1), 1997, 73–83.
[3] Y. Wang, P.G. Siriwardana, and C.W. de Silva, Multi-robot cooperative transportation of objects using machine learning, International Journal of Robotics and Automation, 26(4), 2011, 369–375.
[4] J. Girard and M.R. Emami, Concurrent Markov decision processes for robot team learning, Engineering Applications of Artificial Intelligence, 39, 2015, 223–234.
[5] Y. Zhang and C.W. de Silva, Rsmdp-based robust q-learning for optimal path planning in a dynamic environment, International Journal of Robotics and Automation, 31(4), 2016, 290–300.
[6] C. Finn, X.Y. Tan, Y. Duan, T. Darrell, S. Levine, and P. Abbee, Learning visual feature spaces for robotic manipulation with deep spatial autoencoders, http://arxiv.org/abs/1509. 06113
[7] H. Yang and J. Liu, Minimum parameter learning method for an n-link manipulator with nonlinear disturbance observer, International Journal of Robotics and Automation, 31(3), 2016.
[8] C. Boutilier, Planning, learning and coordination in multiagent decision processes, Proc. of the 6th Conf. on Theoretical Aspects of Rationality and Knowledge, Amsterdam (San Francisco, CA, USA: Morgan Kaufmann Publishers Inc. 1996) 195–210.
[9] C.E. Rasmussen and C.K.I. Williams, Gaussian processes for machine learning (Adaptive computation and machine learning) (Cambridge, MA: The MIT Press, 2005).
[10] A.Y. Ng, D. Harada, and S.J. Russell, Policy invariance under reward transformations: Theory and application to reward shaping, Proc. of the Sixteenth Int. Conf. on Machine Learning, Bled, Slovenia (San Francisco, CA, USA: Morgan Kaufmann Publishers Inc., 1999) 278–287.
[11] J. Kober and J. Peters, Reinforcement learning in robotics: A survey (Berlin, Heidelberg: Springer, 2012) 579–610.
[12] A.S. Polydoros and L. Nalpantidis, Survey of model-based reinforcement learning: Applications on robotics, Journal of Intelligent and Robotic Systems, 86 (2), 2017, 153–173.
[13] E. Wiewiora, G. Cottrell, and C. Elkan, Principled methods for advising reinforcement learning agents, Proc. of the Twentieth Int. Conf. on Machine Learning, Washington, DC (Palo Alto, CA: AAAI Press, 2003) 792–799.
[14] Y. Zhan, H. Bou-Ammar, and M.E. Taylor, Theoretically-grounded policy advice from multiple teachers in reinforcement learning settings with applications to negative transfer, Proc. of the Twenty-Fifth Int. Joint Conf. on Artificial Intelligence, IJCAI 2016, New York, NY, USA, 2016, 2315–2321.
[15] L. Ng and M.R. Emami, Concurrent individual and social learning in robot teams, Computational Intelligence, 32 (3), 2016, 420–438.
[16] R. Bellman, Dynamic Programming, 1st edn (Princeton, NJ, USA: Princeton University Press, 1957).
[17] C.J. Watkins and P. Dayan, Technical note: Q-learning, Machine Learning, 8 (3), 1992, 279–292.
[18] G.A. Rummery, Problem solving with reinforcement learning, Ph.D. dissertation, University of Cambridge, 1995.
[19] J. McCarthy, Programs with common sense, Semantic Information Processing (Cambridge, MA: MIT Press, 1968) 403– 418.
[20] M.E. Taylor and P. Stone, Transfer learning for reinforcement learning domains: A survey, Journal of Machine Learning Research, 10 (1), 2009, 1633–1685.
[21] G. Boutsioukis, I. Partalas, and I. Vlahavas, Transfer learning in multi-agent reinforcement learning domains (Berlin, Heidelberg: Springer, 2012) 249–260.
[22] L. Torrey, J. Shavlik, T. Walker, and R. Maclin, Skill acquisition Via transfer learning and advice taking (Berlin, Heidelberg: Springer, 2006) 425–436.
[23] S.D. Whitehead, A complexity analysis of cooperative mechanisms in reinforcement learning, Proc. of the Ninth National Conf. on Artificial Intelligence (AAAI-91), Anaheim, 1991, 607–613.
[24] L.-J. Lin, Programming robots using reinforcement learning and teaching, Proc. of the Ninth National Conf. on Artificial Intelligence - Volume 2, Anaheim, CA (Palo Alto, CA: AAAI Press, 1991) 781–786.
[25] R. Maclin, J.W. Shavlik, and P. Kaelbling, Creating advicetaking reinforcement learners, Machine Learning, 1996, 251–281.
[26] R.J. Malak and P.K. Khosla, A framework for the adaptive transfer of robot skill knowledge using reinforcement learning agents, IEEE Int. Conf. on Proc. 2001 ICRA, Seoul, South Korea, vol. 2, 2001, 1994–2001.
[27] R. Maclin, J. Shavlik, L. Torrey, T. Walker, and E. Wild, Giving advice about preferred actions to reinforcement learners via knowledge-based kernel regression, Proc. of the 20th National Conf. on Artificial Intelligence - Volume 2, Pittsburgh, Pennsylvania (Palo Alto, CA: AAAI Press, 2005) 819–824.
[28] L. Nunes and E. Oliveira, Cooperative learning using advice exchange (Berlin, Heidelberg: Springer, 2003) 33–48.
[29] L. Nunes and E. Oliveira, Exchanging advice and learning to trust, Lecture notes in computer science vol. 2782 (Berlin: Springer-Verlag, 2003) 250–265.
[30] S. Singh, T. Jaakkola, M.L. Littman, and C. Szepesvari, Convergence results for single-step on-policy reinforcement-learning algorithms, Machine Learning, 38 (3), 2000, 287–308. Available: http://dx.doi.org/10.1023/A:1007678930559
[31] T. Jaakkola, M.I. Jordan, and S.P. Singh, On the convergence of stochastic iterative dynamic programming algorithms, Neural Computing, 6(6), 1994, 1185–1201.

Important Links:

Abstract
DOI: 10.2316/J.2020.206-0166
From Journal (206) International Journal of Robotics and Automation - 2020

Go Back