Journal of Systems Engineering and Electronics ›› 2021, Vol. 32 ›› Issue (6): 1490-1508.doi: 10.23919/JSEE.2021.000126

• CONTROL THEORY AND APPLICATION • Previous Articles     Next Articles

A learning-based flexible autonomous motion control method for UAV in dynamic unknown environments

Kaifang WAN*(), Bo LI(), Xiaoguang GAO(), Zijian HU(), Zhipeng YANG()   

  1. 1 School of Electronic and Information, Northwestern Polytechnical University, Xi’an 710072, China
  • Received:2020-12-02 Accepted:2021-11-09 Online:2022-01-05 Published:2022-01-05
  • Contact: Kaifang WAN E-mail:wankaifang@nwpu.edu.cn;Libo803@nwpu.edu.cn;cxg2012@nwpu.edu.cn;huzijian@mail.nwpu.edu.cn;yzp@mail.nwpu.edu.cn
  • About author:|WAN Kaifang was born in 1987. He received his B.E. degree in detection homing and control technology from Northwestern Polytechnical University (NWPU), Xi’an, in 2010. He received his Ph.D. degree in system engineering in 2016 from NWPU. Now he is an assistant researcher of the Key Laboratory of Aerospace Information Perception and Photoelectric Control of the Ministry of Education, NWPU. His current research interests include sensor management application, multi-agent theory, approximate dynamic programming, and reinforcement learning theory. E-mail: wankaifang@nwpu.edu.cn||LI Bo was born in 1978. He received his B.S. degree in electronic information technology and his M.S. and Ph.D. degree in systems engineering from Northwestern Polytechnical University (NWPU), Xi’an, in 2000, 2003, and 2008, respectively. He was a postdoctoral fellow with NWPU from 2008 to 2010. He is currently an associate professor with the School of Electronics and Information, NWPU. His current research interests include intelligent command and control, deep reinforcement learning, and uncertain information processing. E-mail: Libo803@nwpu.edu.cn||GAO Xiaoguang was born in 1957. She received her B.E. degree in detection homing and control technology from Northwestern Polytechnical University (NWPU) in 1982. She completed her master degree in system engineering from NWPU in 1986. She received her Ph.D. degree from NWPU in 1989. She is currently a professor and the head of the Key Laboratory of Aerospace Information Perception and Photoelectric Control of the Ministry of Education, NWPU. Her research interests are machine learning theory, Bayesian network theory, and multi-agent control application.E-mail: cxg2012@nwpu.edu.cn||HU Zijian was born in 1996. He received his B.E. degree in detection guidance and control technology from Northwestern Polytechnical University (NWPU), Xi’an in 2018. He is currently pursuing his Ph.D. degree in the College of Electronic and Information, NWPU. His current research interests include reinforcement learning theory and the applications of reinforcement learning in UAV control and fire control systems. E-mail: huzijian@mail.nwpu.edu.cn||YANG Zhipeng was born in 1995. He received his B.S. degree in electrical engineering and automation from Hubei University of Technology, Wuhan, in 2016. He is currently a postgraduate student with the School of Electronics and Information, Northwestern Polytechnical University. His current research interest is intelligent maneuver decision for unmanned systems. E-mail: yzp@mail.nwpu.edu.cn
  • Supported by:
    This work was supported by the National Natural Science Foundation of China (62003267), the Natural Science Foundation of Shaanxi Province (2020JQ-220), and the Open Project of Science and Technology on Electronic Information Control Laboratory (JS20201100339)

Abstract:

This paper presents a deep reinforcement learning (DRL)-based motion control method to provide unmanned aerial vehicles (UAVs) with additional flexibility while flying across dynamic unknown environments autonomously. This method is applicable in both military and civilian fields such as penetration and rescue. The autonomous motion control problem is addressed through motion planning, action interpretation, trajectory tracking, and vehicle movement within the DRL framework. Novel DRL algorithms are presented by combining two difference-amplifying approaches with traditional DRL methods and are used for solving the motion planning problem. An improved Lyapunov guidance vector field (LGVF) method is used to handle the trajectory-tracking problem and provide guidance control commands for the UAV. In contrast to conventional motion-control approaches, the proposed methods directly map the sensor-based detections and measurements into control signals for the inner loop of the UAV, i.e., an end-to-end control. The training experiment results show that the novel DRL algorithms provide more than a 20% performance improvement over the state-of-the-art DRL algorithms. The testing experiment results demonstrate that the controller based on the novel DRL and LGVF, which is only trained once in a static environment, enables the UAV to fly autonomously in various dynamic unknown environments. Thus, the proposed technique provides strong flexibility for the controller.

Key words: autonomous motion control (AMC), deep reinforcement learning (DRL), difference amplify, reward shaping