GRASPING TASK PLANNING ALGORITHM FOR DEXTEROUS HAND BASED ON SCENE UNDERSTANDING AND SEMANTIC INFORMATION

Zhekai Zhang, Baojiang Li, Bin Wang, Liang Li, Haiyan Wang, Chenhan Zhang

References

  1. [1] Y. Ma, Z. Song, Y. Zhuang, J. Hao, and I. King. A surveyon vision-language-action models for embodied AI. Robotics,arXiv preprint arXiv:2405.14093, 2024.
  2. [2] F. Soljacic, T. Law, M. Chita-Tegmark, and M. Scheutz, Robotsin healthcare as envisioned by care professionals, IntelligentService Robotics, 17(3), 2024, 685–701.
  3. [3] J. Ye, J. Wang, B. Huang, Y. Qin, and X. Wang, Learningcontinuous grasping function with a dexterous hand fromhuman demonstrations, IEEE Robotics and AutomationLetters, 8(5), 2023, 2882–2889.
  4. [4] Y. Xu, W. Wan, J. Zhang, H. Liu, Z. Shan, H. Shen,R. Wang, et al. UniDexGrasp: Universal robotic dexterousgrasping via learning diverse proposal generation and goal-conditioned policy, Proceedings of the IEEE/CVF Conferenceon Computer Vision and Pattern Recognition, Vancouver,Canada, arXiv:2303.00938, 2023, 4737–4746.
  5. [5] J. Wang, Y. Qin, K. Kuang, Y. Korkmaz, A. Gurumoorthy,H. Su, and X. Wang. CyberDemo: Augmenting simulatedhuman demonstration for real-world dexterous manipulation,arXiv:2402.14795v1 (2024).
  6. [6] Y. Ding, X. Zhang, C. Paxton, and S. Zhang. Task and motionplanning with large language models for object rearrangement.2023 IEEE/RSJ International Conference on Intelligent Robotsand Systems (IROS), IEEE, arXiv:2303.06247, 2023.
  7. [7] He, Yanlin, et al. Design and implementation of the visualdetection system for amphibious robots, International Journalof Robotics and Automation 34, 2019, 417–430.
  8. [8] H. Han, W. Wang, X. Han, and X. Yang, 6-DoF grasp poseestimation based on instance reconstruction, Intelligent ServiceRobotics, 17(2), 2024, 251–264.
  9. [9] B. Wang, N. Sridhar, C. Feng, M. Van der Merwe, A. Fishman,N. Fazeli, and J. Joon Park. This&That: Language-gesturecontrolled video generation for robot planning, arXiv preprintarXiv:2407.05530, 2024.
  10. [10] M. Madhusri, S. Banerjee, and S. Sinha Chaudhuri. FasterR-CNN and YOLO based vehicle detection: A survey. 20215th international conference on computing methodologies andcommunication (ICCMC). IEEE, 2021.
  11. [11] M. Hussain, YOLOv1 to v8: Unveiling each variant–acomprehensive review of YOLO, IEEE Access, 12, 2024, 42816–42833.
  12. [12] X. Zhu, S. Lyu, X. Wang, and Q. Zhao, TPH-YOLOv5:Improved YOLOv5 based on transformer prediction head forobject detection on drone-captured scenarios. Proceedings ofthe IEEE/CVF International Conference on Computer Vision,2021, 2778–2788.
  13. [13] A. Wang, H. Chen, L. Liu, K. Chen, Z. Lin, J. Han, andG. Ding, YOLOv10: Real-time end-to-end object detection.38th Conference on Neural Information Processing Systems,arXiv:2405.14458v2, 2024.
  14. [14] B. An, Y. Geng, K. Chen, X. Li, Q. Dou, and H. Dong.RGBManip: Monocular image-based robotic manipulationthrough active object pose estimation, arXiv preprintarXiv:2310.03478, 2023.
  15. [15] B. Wen, W. Yang, J. Kautz, and S. Birchfield. Foun-dationPose: Unified 6D pose estimation and tracking ofnovel objects, Proceedings of the IEEE/CVF Conference onComputer Vision and Pattern Recognition, arXiv:2312.08344,2024.
  16. [16] J. Jiao, S. Kan, S.-W. Lin, D. Sanan, Y. Liu, and J.Sun, Semantic understanding of smart contracts: Executableoperational semantics of solidity, 2020 IEEE Symposium onSecurity and Privacy (SP), San Francisco, CA, USA, IEEE,2020, pp. 1695–1712.
  17. [17] W. Chuan, D. Jayaraman, and Y. Gao. Can transformerscapture spatial relations between objects? arXiv:2403.00729,2024.
  18. [18] I. Armeni, Z.-Y. He, J.Y. Gwak, A.R. Zamir, M. Fischer,J. Malik, and S. Savarese, 3D scene graph: A structure forunified semantics, 3D space, and camera. Proceedings of theIEEE/CVF International Conference on Computer Vision,arXiv:1910.02527, 2019.
  19. [19] S. Peng, K. Genova, C. ”Max” Jiang, A. Tagliasacchi,M. Pollefeys, and T. Funkhouser. OpenScene: 3D sceneunderstanding with open vocabularies. Proceedings of theIEEE/CVF Conference on Computer Vision and PatternRecognition, arXiv:2211.15654, 2023.
  20. [20] J. Chen, Q. Wang, H.H. Cheng, W. Peng, and W. Xu, A reviewof vision-based traffic semantic understanding in ITSs, IEEETransactions on Intelligent Transportation Systems, 23(11),2022, 19954–19979.
  21. [21] A. Nguyen. Scene understanding for autonomous manipulationwith deep learning. arXiv:1903.09761, 2019.
  22. [22] E. Jang, S. Vijayanarasimhan, P. Pastor, J. Ibarz, and S. Levine.End-to-end learning of semantic grasping, arXiv:1707.01932,2017.
  23. [23] M. Sivachitra and V. Priya. Solar powered dexterous robotcontrolled by mobile phone. Annals of the Romanian Societyfor Cell Biology, 2021, 15770–15778.
  24. [24] M. Kiatos, S. Malassiotis, and I. Sarantopoulos, a geometricapproach for grasping unknown objects with multifin-gered hands, IEEE Transactions on Robotics, 37(3), 2021,735–746.
  25. [25] W. Shang, F. Song, Z. Zhao, H. Gao, S. Cong, and Z. Li, Deeplearning method for grasping novel objects using dexteroushands, IEEE Transactions on Cybernetics, 52(5), 2022, 2750–2762.
  26. [26] H. Zhu, A. Gupta, A. Rajeswaran, S. Levine, V. Kumar,Dexterous manipulation with deep reinforcement learning:Efficient, general, and low-cost, 2019 International Conferenceon Robotics and Automation (ICRA), IEEE, arXiv:1810.06045,2019.304
  27. [27] S. Dasari, A. Gupta, and V. Kumar, Learning dexterousmanipulation from exemplar object trajectories and pre-grasps, 2023 IEEE International Conference on Robotics andAutomation (ICRA), IEEE, arXiv:2209.11221, 2023.
  28. [28] Y. Wang, Z. Xian, F. Chen, T.-H. Wang, Y. Wang, K.Fragkiadaki, Z. Erickson, D. Held, C. Gan. RoboGen: Towardsunleashing infinite data for automated robot learning viagenerative simulation. arXiv:2311.01455, 2023.
  29. [29] Y. Ding, X. Zhang, S. Amiri, N. Cao, H. Yang, A. Kaminski,C. Esselink, S. Zhang, Integrating action knowledge and LLMsfor task planning and situation handling in open worlds,Autonomous Robots, 47(8), 2023, 981–997.
  30. [30] N. Wake, A. Kanehira, K. Sasabuchi, J. Takamatsu, K. Ikeuchi,GPT-4V(ision) for robotics: Multimodal task planning fromhuman demonstration, IEEE Robotics and Automation Letters,9(11), 2024, 10567–10574.
  31. [31] N. Wake, I. Yanokura, K. Sasabuchi, and K. Ikeuchi, Verbalfocus-of-attention system for learning-from-observation, 2021IEEE International Conference on Robotics and Automation(ICRA), Xi’an, China, 2021, 10377–10384.
  32. [32] Q. Fang, X. Xu, Y. Lan, Y. Zhang, Y. Zeng, and T. Tang, Data-efficient deep reinforcement learning with convolution-basedstate encoder networks, International Journal of Robotics andAutomation, 36(10), 2021.
  33. [33] C. Son, Intelligent rule-based sequence planning algorithm withfuzzy optimization for robot manipulation tasks in partiallydynamic environments, Information Sciences, 342, 2016, 209–221.
  34. [34] Z. Liu, J. Wang, J. Li, P. Liu, and K. Ren, A novel multipletargets detection method for service robots in the indoorcomplex scenes, Intelligent Service Robotics, 16(4), 2023, 453–469.
  35. [35] J. Ren and Y. Wang. Overview of object detection algorithmsusing convolutional neural networks, Journal of Computer andCommunications, 2022, 10.1, 115–132.
  36. [36] N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, andS. Zagoruyko, End-to-end object detection with transformers(Cham: Springer International Publishing, 2020), 213–229.
  37. [37] M. Kirtas, K. Tsampazis, N. Passalis, and A. Tefas, Deepbots:A Webots-based deep reinforcement learning framework forrobotics (Cham: Springer International Publishing, 2020), 64–75.
  38. [38] J.K. Jaiswal and R. Samikannu, Application of randomforest algorithm on feature subset selection and classificationand regression, 2017 World Congress on Computing andCommunication Technologies (WCCCT), Tiruchirappalli,India, IEEE, 2017, 65–68
  39. [39] N. Ingelhag, J. Munkeby, J. van Haastregt, A. Varava, M.C.Welle, and D. Kragic, A robotic skill learning system built upondiffusion policies and foundation models. arXiv:2403.16730,2024.
  40. [40] Y. Jin, D. Li, Y. A, J. Shi, P. Hao, F. Sun, J. Zhang, and B. Fang,RobotGPT: Robot Manipulation Learning From ChatGPT,IEEE Robotics and Automation Letters, 9(3), 2024, 2543–2550.
  41. [41] A. Brohan, N. Brown, J. Carbajal, Y. Chebotar, X. Chen,K. Choromanski, T. Ding, et al., RT-2: Vision-language-action models transfer web knowledge to robotic control.arXiv:2307.15818, 2023.
  42. [42] S. Belkhale, T. Ding, T. Xiao, P. Sermanet, Q. Vuong, J. Tomp-son, Y. Chebotar, D. Dwibedi, D. Sadigh, Google DeepMind,RT-H: Action hierarchies using language. arXiv:2403.01823,2024.

Important Links:

Go Back