DDETR-SLAM: A TRANSFORMER-BASED APPROACH TO POSE OPTIMISATION IN DYNAMIC ENVIRONMENTS

Feng Li, Yuanyuan Liu, Kelong Zhang, Zhengpeng Hu, and Guozheng Zhang

Keywords

Simultaneous localisation and mapping, deformable DETR, object detection, dynamic environments

Abstract

Simultaneous localisation and mapping (SLAM) is a critical technology for accurate robot localisation and path planning. It has been an important area of research to improve localisation accuracy. In this paper, we propose a transformer-based visual semantic SLAM algorithm (DDETR-SLAM) to address the shortcomings of traditional visual SLAM frameworks, such as large localisation errors in dynamic scenes. First, by incorporating the deformable Detection Transformer (DETR) network as an object detection thread, the pose estimation accuracy of the system has been improved compared to ORB-SLAM2. Furthermore, an algorithm that combines the semantic information is designed to eliminate outlier points generated by dynamic objects, thereby improving the accuracy and robustness of SLAM localisation and mapping. Experiments are conducted on the public TUM datasets to verify the localisation accuracy, computational efficiency, and readability of the point cloud map of DDETR-SLAM. The results show that in highly dynamic environments, the absolute trajectory error (ATE), translation error, and rotation error are reduced by 98.45%, 95.34%, and 92.67%, respectively, when compared to ORB-SLAM2. In most cases, our proposed system outperforms DS-SLAM, DynaSLAM, Detect-SLAM, RGB-D SLAM, and YOLOv5+ORB-SLAM2. The relative pose error (RPE) is only 0.0076 m, the ATE is only 0.0063 m, and the dense mapping also has better readability.

Important Links:

Go Back