Journal of Systems Engineering and Electronics ›› 2022, Vol. 33 ›› Issue (3): 693-705.doi: 10.23919/JSEE.2022.000064

• CONTROL THEORY AND APPLICATION • Previous Articles     Next Articles

Day-ahead scheduling based on reinforcement learning with hybrid action space

Jingyu CAO1(), Lu DONG2(), Changyin SUN1,*()   

  1. 1 School of Automation, Southeast University, Nanjing 210096, China
    2 School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China
  • Received:2021-07-19 Online:2022-06-18 Published:2022-06-24
  • Contact: Changyin SUN E-mail:cjy0564@seu.edu.cn;ldong90@seu.edu.cn;cysun@seu.edu.cn
  • About author:|CAO Jingyu was born in 1999. She received her B.S. degree in the School of Control and Computer Engineering from North China Electric Power University, Beijing, China, in 2019. She is currently pursuing her Ph.D. degree with the School of Automation, Southeast University, Nanjing, China. Her research interests include machine learning, deep reinforcement learning, optimal control, and multi-agent cooperative control. E-mail: cjy0564@seu.edu.cn||DONG Lu was born in 1990. She received her B.S. degree in the School of Physics and her Ph.D. degree in the School of Automation from Southeast University, Nanjing, China, in 2012 and 2017, respectively. She is currently an associate professor with the School of Cyber Science and Engineering, Southeast University, Nanjing, China. Her research interests include adaptive dynamic programming, event-triggered control, and multi-agent reinforcement learning. E-mail: ldong90@seu.edu.cn||SUN Changyin was born in 1975. He received his B.S. degree in applied mathematics from the College of Mathematics, Sichuan University, Chengdu, China, in 1996, and his M.S. and Ph.D. degrees in electrical engineering from Southeast University, Nanjing, China, in 2001 and 2004, respectively. He is currently a professor with the School of Automation, Southeast University, Nanjing, China. His current research interests include intelligent control, flight control, and optimal theory. E-mail: cysun@seu.edu.cn
  • Supported by:
    This work was supported by the National Key R&D Program of China (2018AAA0101400), the National Natural Science Foundation of China (62173251; 61921004; U1713209), and the Natural Science Foundation of Jiangsu Province of China (BK20202006)

Abstract:

Driven by the improvement of the smart grid, the active distribution network (ADN) has attracted much attention due to its characteristic of active management. By making full use of electricity price signals for optimal scheduling, the total cost of the ADN can be reduced. However, the optimal day-ahead scheduling problem is challenging since the future electricity price is unknown. Moreover, in ADN, some schedulable variables are continuous while some schedulable variables are discrete, which increases the difficulty of determining the optimal scheduling scheme. In this paper, the day-ahead scheduling problem of the ADN is formulated as a Markov decision process (MDP) with continuous-discrete hybrid action space. Then, an algorithm based on multi-agent hybrid reinforcement learning (HRL) is proposed to obtain the optimal scheduling scheme. The proposed algorithm adopts the structure of centralized training and decentralized execution, and different methods are applied to determine the selection policy of continuous scheduling variables and discrete scheduling variables. The simulation experiment results demonstrate the effectiveness of the algorithm.

Key words: day-ahead scheduling, active distribution network (ADN), reinforcement learning, hybrid action space