Mingnan Hu and Bo Chen
Cooperative encirclement, deep reinforcement learning, robot failure,GRU
To address the issue of multi-robot encirclement failure caused by individual robot breakdown, a gated recurrent unit (GRU) enhanced MADDPG with decoupled value network (DR-MADDPG) is proposed for cooperative encirclement. First, the value network is decoupled into global and local tiers, which guide the training of the policy network focus on both the whole formation and individual robot, improving the performance of encirclement. Then, the potential identification of robot failure is achieved by leveraging the capability of GRU, which effectively captures dynamic features in time-series data. Additionally, to overcome the limitations of conventional experience replay buffers in capturing temporal dependencies, a novel experience replay buffer that incorporates network memory information is developed to enhance data utilisation efficiency. Finally, experiments demonstrate that the proposed DR- MADDPG achieves higher initial and steady-state rewards under robot malfunctions. Meanwhile, the pursuit robots maintain decision- making continuity during robot failure, leading to notably enhanced encirclement success rates.
Important Links:
Go Back