[an error occurred while processing this directive]

Journal of Systems Engineering and Electronics ›› 2022, Vol. 33 ›› Issue (5): 1186-1194.doi: 10.23919/JSEE.2022.000114

• • 上一篇    下一篇

  

  • 收稿日期:2021-01-13 接受日期:2022-07-22 出版日期:2022-10-27 发布日期:2022-10-27

Hybrid Q-learning for data-based optimal control of non-linear switching system

Xiaofeng LI1,2(), Lu DONG3(), Changyin SUN1,2,*()   

  1. 1 School of Automation, Southeast University, Nanjing 210096, China
    2 School of Artificial Intelligence, Anhui University, Hefei 230601, China
    3 School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China
  • Received:2021-01-13 Accepted:2022-07-22 Online:2022-10-27 Published:2022-10-27
  • Contact: Changyin SUN E-mail:230169413@seu.edu.cn;ldong90@seu.edu.cn;cysun@seu.edu.cn
  • About author:|LI Xiaofeng was born in 1990. He received his B.S. degree and M.S. degree in engineering from Nanjing University of Aeronautics and Astronautics, Nanjing, China, in 2012 and 2016, respectively, and his Ph.D. degree in control science and engineering from Southeast University, Nanjing, China, in 2021. He is working as a postdoctoral researcher with the School of Artificial Intelligence, Anhui University, Heifei, China. He was a joint Ph.D. student with the Department of Electrical, Computer, and Biomedical Engineering, University of Rhode Island, Kingston, RI, USA, from 2018 to 2019. His current research interests include reinforcement learning, adaptive dynamic programming, robot system, and optimal control. E-mail: 230169413@seu.edu.cn||DONG Lu was born in 1990. She received her B.S. degree in physics and Ph.D. degree in electrical engineering from Southeast University, Nanjing, China in 2012 and 2017, respectively. She is currently an associate professor with the School of Cyber Science and Engineering, Southeast University, Nanjing, China. Her current research interests include adaptive dynamic programming, event-triggered control, nonlinear system control, and optimization. E-mail: ldong90@seu.edu.cn||SUN Changyin was born in 1975. He received his B.S. degree in applied mathematics from the College of Mathematics, Sichuan University, Chengdu, China, in 1996, and M.S. and Ph.D. degrees in electrical engineering from Southeast University, Nanjing, China, in 2001 and 2004, respectively. He is currently a professor with the School of Automation, Southeast University, Nanjing, China. His current research interests include intelligent control, flight control, and optimal theory. E-mail: cysun@seu.edu.cn
  • Supported by:
    This work was supported by the National Key R&D Program of China (2018AAA0101400), the Natural Science Foundation of Jiangsu Province of China (BK20202006), and the National Natural Science Foundation of China (61921004;62173251).

Abstract:

In this paper, the optimal control of non-linear switching system is investigated without knowing the system dynamics. First, the Hamilton-Jacobi-Bellman (HJB) equation is derived with the consideration of hybrid action space. Then, a novel data-based hybrid Q-learning (HQL) algorithm is proposed to find the optimal solution in an iterative manner. In addition, the theoretical analysis is provided to illustrate the convergence and optimality of the proposed algorithm. Finally, the algorithm is implemented with the actor-critic (AC) structure, and two linear-in-parameter neural networks are utilized to approximate the functions. Simulation results validate the effectiveness of the data-driven method.

Key words: switching system, hybrid action space, optimal control, reinforcement learning, hybrid Q-learning (HQL)