Journal of Systems Engineering and Electronics ›› 2023, Vol. 34 ›› Issue (5): 1211-1224.doi: 10.23919/JSEE.2023.000128

• Systems Engineering • Previous Articles     Next Articles

A UAV collaborative defense scheme driven by DDPG algorithm

Yaozhong ZHANG1,*(), Zhuoran WU1(), Zhenkai XIONG2(), Long CHEN3()   

  1. 1 School of Electronics and Information, Northwestern Polytechnical University, Xi’an 710072, China
    2 College of New Energy and Intelligent Connected Vehicle, Anhui University of Science & Technology, Hefei 231131, China
    3 China Research and Development Academy of Machinery Equipment, Beijing 100089, China
  • Received:2021-11-12 Online:2023-10-18 Published:2023-10-30
  • Contact: Yaozhong ZHANG E-mail:zhang_y_z@nwpu.edu.cn;542391943@qq.com;1223959392@qq.com;dragon-cl@sohu.com
  • About author:
    ZHANG Yaozhong was born in1974. He is an associate professor at the School of Electronics and Information in Northwestern Polytechnical University. His research interests include modeling, simulation and effectiveness evaluation of complex systems, reinforcement learning and its application in the complex system. E-mail: zhang_y_z@nwpu.edu.cn

    WU Zhuoran was born in 1999. He received his bachelor’s degree in detection guidance and control technology from the School of Electronics and Information, Northwestern Polytechnical University, and master’s degree in control science and engineering in the School of Electronics and Information, Northwestern Polytechnical University. His research interest is reinforcement learning. E-mail: 542391943@qq.com

    XIONG Zhenkai was born in 1979. He received his Ph.D. degree from the College of Mechanical and Electrical Engineering, Harbin Engineering University in 2012. His main research interests include intelligent control, high-precision control, embedded system, optimal estimation theory and application, and filter algorithm. E-mail: 1223959392@qq.com

    CHEN Long was born in 1967. He received his bachelor ’s degree from Changchun Institute of Optics and Precision Mechanics, and master ’s degree from Beijing Institute of Technology. He is a researcher at the China Research and Development Academy of Machinery Equipment, specializing in intelligent systems. He has been extensively involved in equipment system research and has served as the chief designer for the development of multiple types of equipment. He has received numerous provincial and ministerial-level scientific and technological advancement awards. E-mail: dragon-cl@sohu.com
  • Supported by:
    This work was supported by the Key Research and Development Program of Shaanxi (2022GY-089) and the Natural Science Basic Research Program of Shaanxi (2022JQ-593)

Abstract:

The deep deterministic policy gradient (DDPG) algorithm is an off-policy method that combines two mainstream reinforcement learning methods based on value iteration and policy iteration. Using the DDPG algorithm, agents can explore and summarize the environment to achieve autonomous decisions in the continuous state space and action space. In this paper, a cooperative defense with DDPG via swarms of unmanned aerial vehicle (UAV) is developed and validated, which has shown promising practical value in the effect of defending. We solve the sparse rewards problem of reinforcement learning pair in a long-term task by building the reward function of UAV swarms and optimizing the learning process of artificial neural network based on the DDPG algorithm to reduce the vibration in the learning process. The experimental results show that the DDPG algorithm can guide the UAVs swarm to perform the defense task efficiently, meeting the requirements of a UAV swarm for non-centralization, autonomy, and promoting the intelligent development of UAVs swarm as well as the decision-making process.

Key words: deep deterministic policy gradient (DDPG) algorithm, unmanned aerial vehicles (UAVs) swarm, task decision making, deep reinforcement learning, sparse reward problem