[an error occurred while processing this directive]

Journal of Systems Engineering and Electronics ›› 2023, Vol. 34 ›› Issue (2): 360-373.doi: 10.23919/JSEE.2023.000056

• • 上一篇    

  

  • 收稿日期:2021-01-08 出版日期:2023-04-18 发布日期:2023-04-18

Deep reinforcement learning for UAV swarm rendezvous behavior

Yaozhong ZHANG(), Yike LI(), Zhuoran WU, Jialin XU   

  1. 1 School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710129, China
  • Received:2021-01-08 Online:2023-04-18 Published:2023-04-18
  • Contact: Yaozhong ZHANG E-mail:zhang_y_z@nwpu.edu.cn;liyike@mail.nwpu.edu.cn
  • About author:
    ZHANG Yaozhong was born in 1974. He received his B.S., M.S., and Ph.D. degrees in systems engineering from Northwestern Polytechnical University, Xi’an, China, in 1997, 2000, and 2006, respectively. He is an associate professor at the School of Electronics and Information in Northwestern Polytechnical University. His research interests include modeling, simulation and effectiveness evaluation of complex systems, reinforcement learning and its application in the complex system.E-mail: zhang_y_z@nwpu.edu.cn

    LI Yike was born in 1997. He received his bachelor’s degree in welding technology and engineering from Xi’an Shiyou University in 2019. In 2022, he graduated from Northwestern Polytechnical University with a master’s degree in control engineering. His research interest is UAV path planning based on reinforcement learning.E-mail: liyike@mail.nwpu.edu.cn

    WU Zhuoran was born in 1999. He graduated from Northwestern Polytechnical University with a bachelor ’s degree in detection guidance and control technology in 2021. He is currently pursuing his master ’s degree in control science and engineering at Northwestern Polytechnical University. His research interest is deep reinforcement learning for multi-agent.E-mail: 542391943@mail.nwpu.edu.cn

    XU Jialin was born in 1995. He graduated from Northwestern Polytechnical University with a bachelor’s degree in detection guidance and control technology in 2018. In 2021, he graduated from Northwestern Polytechnical University with a master ’s degree in control engineering. His research interest is research on UAV swarm decision based on deep reinforcement learning.E-mail: xjl@mail.nwpu.edu.cn
  • Supported by:
    This work was supported by the Aeronautical Science Foundation (2017ZC53033)

Abstract:

The unmanned aerial vehicle (UAV) swarm technology is one of the research hotspots in recent years. With the continuous improvement of autonomous intelligence of UAV, the swarm technology of UAV will become one of the main trends of UAV development in the future. This paper studies the behavior decision-making process of UAV swarm rendezvous task based on the double deep Q network (DDQN) algorithm. We design a guided reward function to effectively solve the problem of algorithm convergence caused by the sparse return problem in deep reinforcement learning (DRL) for the long period task. We also propose the concept of temporary storage area, optimizing the memory playback unit of the traditional DDQN algorithm, improving the convergence speed of the algorithm, and speeding up the training process of the algorithm. Different from traditional task environment, this paper establishes a continuous state-space task environment model to improve the authentication process of UAV task environment. Based on the DDQN algorithm, the collaborative tasks of UAV swarm in different task scenarios are trained. The experimental results validate that the DDQN algorithm is efficient in terms of training UAV swarm to complete the given collaborative tasks while meeting the requirements of UAV swarm for centralization and autonomy, and improving the intelligence of UAV swarm collaborative task execution. The simulation results show that after training, the proposed UAV swarm can carry out the rendezvous task well, and the success rate of the mission reaches 90%.

Key words: double deep Q network (DDQN) algorithms, unmanned aerial vehicle (UAV) swarm, task decision, deep reinforcement learning (DRL), sparse returns