Day-ahead scheduling based on reinforcement learning with hybrid action space

doi:10.23919/JSEE.2022.000064

Journal of Systems Engineering and Electronics ›› 2022, Vol. 33 ›› Issue (3): 693-705.doi: 10.23919/JSEE.2022.000064

• CONTROL THEORY AND APPLICATION • Previous Articles Next Articles

Day-ahead scheduling based on reinforcement learning with hybrid action space

Jingyu CAO¹(), Lu DONG²(), Changyin SUN^1,*()

¹ School of Automation, Southeast University, Nanjing 210096, China
² School of Cyber Science and Engineering, Southeast University, Nanjing 211189, China

Received:2021-07-19 Online:2022-06-18 Published:2022-06-24
Contact: Changyin SUN E-mail:cjy0564@seu.edu.cn;ldong90@seu.edu.cn;cysun@seu.edu.cn
About author:|CAO Jingyu was born in 1999. She received her B.S. degree in the School of Control and Computer Engineering from North China Electric Power University, Beijing, China, in 2019. She is currently pursuing her Ph.D. degree with the School of Automation, Southeast University, Nanjing, China. Her research interests include machine learning, deep reinforcement learning, optimal control, and multi-agent cooperative control. E-mail: cjy0564@seu.edu.cn||DONG Lu was born in 1990. She received her B.S. degree in the School of Physics and her Ph.D. degree in the School of Automation from Southeast University, Nanjing, China, in 2012 and 2017, respectively. She is currently an associate professor with the School of Cyber Science and Engineering, Southeast University, Nanjing, China. Her research interests include adaptive dynamic programming, event-triggered control, and multi-agent reinforcement learning. E-mail: ldong90@seu.edu.cn||SUN Changyin was born in 1975. He received his B.S. degree in applied mathematics from the College of Mathematics, Sichuan University, Chengdu, China, in 1996, and his M.S. and Ph.D. degrees in electrical engineering from Southeast University, Nanjing, China, in 2001 and 2004, respectively. He is currently a professor with the School of Automation, Southeast University, Nanjing, China. His current research interests include intelligent control, flight control, and optimal theory. E-mail: cysun@seu.edu.cn
Supported by:
This work was supported by the National Key R&D Program of China (2018AAA0101400), the National Natural Science Foundation of China (62173251; 61921004; U1713209), and the Natural Science Foundation of Jiangsu Province of China (BK20202006)

Abstract

Abstract:

Driven by the improvement of the smart grid, the active distribution network (ADN) has attracted much attention due to its characteristic of active management. By making full use of electricity price signals for optimal scheduling, the total cost of the ADN can be reduced. However, the optimal day-ahead scheduling problem is challenging since the future electricity price is unknown. Moreover, in ADN, some schedulable variables are continuous while some schedulable variables are discrete, which increases the difficulty of determining the optimal scheduling scheme. In this paper, the day-ahead scheduling problem of the ADN is formulated as a Markov decision process (MDP) with continuous-discrete hybrid action space. Then, an algorithm based on multi-agent hybrid reinforcement learning (HRL) is proposed to obtain the optimal scheduling scheme. The proposed algorithm adopts the structure of centralized training and decentralized execution, and different methods are applied to determine the selection policy of continuous scheduling variables and discrete scheduling variables. The simulation experiment results demonstrate the effectiveness of the algorithm.

Key words: day-ahead scheduling, active distribution network (ADN), reinforcement learning, hybrid action space

Jingyu CAO, Lu DONG, Changyin SUN. Day-ahead scheduling based on reinforcement learning with hybrid action space[J]. Journal of Systems Engineering and Electronics, 2022, 33(3): 693-705.

Figures/Tables 12

Fig 1

Fig 2

Table 1

Fig 3

Fig 4

Fig 5

Fig 6

Fig 7

Fig 8

Fig 9

Fig 10

Table 2

References 39

1	FANG X, HODGE B M, BAI L Q, et al. Mean-variance optimization-based energy storage scheduling considering day-ahead and real-time LMP uncertainties. IEEE Trans. on Power Systems, 2018, 33(6): 7292–7295.
2	ZHONG Q W, BUCKLEY S, VASSALLO A, et al. Energy cost minimization through optimization of EV, home and workplace battery storage. Science China Technological Sciences, 2018, 61(5): 761–773.
3	WANG X Y, SUN C, WANG R T, et al. Two-stage optimal scheduling strategy for large-scale electric vehicles. IEEE Access, 2020, 8: 13821–13832.
4	WANG Y Y, JIAO X H. Multi-objective energy management for PHEV using pontryagin’s minimum principle and particle swarm optimization online. Science China Information Sciences, 2021, 64(1): 1–3.
5	YU D M, BRESSER C. Peak load management based on hybrid power generation and demand response. Energy, 2018, 163: 969–985.
6	AGHAJANI G R, SHAYANFAR H A, SHAYEGHI H. Demand side management in a smart micro-grid in the presence of renewable generation and demand response. Energy, 2017, 126: 622–637.
7	MURALITHARAN K, SAKTHIVEL R, SHI R. Multi objective optimization technique for demand side management with load balancing approach in smart grid. Neurocomputing, 2016, 177: 110–119.
8	DAI B J, WANG R, ZHU K, et al. A demand response scheme in smart grid with clustering of residential customers. Proc. of the IEEE International Conference on Communications, Control, and Computing Technologies for Smart Grids, 2019: 1–6.
9	DERAKHSHAN C, SHAYANFAR H A, KAZEMI A. The optimization of demand response programs in smart grids. Energy Policy, 2016, 94: 295–306.
10	ZHU X J, HAN H T, GAO S, et al. A multi-stage optimization approach for active distribution network scheduling considering coordinated electrical vehicle charging strategy. IEEE Access, 2018, 6: 50117–50130.
11	MAZIDI M, MONSEF H, SIANO P. Incorporating price-responsive customers in day-ahead scheduling of smart distribution networks. Energy Conversion and Management, 2016, 115: 103–116.
12	YAN Y, ZHANG C H, LI K, et al. Synergistic optimal operation for a combined cooling, heating and power system with hybrid energy storage. Science China Information Sciences, 2018, 61(11): 110202.
13	SOROUDI A, SIANO P, KEANE A. Optimal DR and ESS scheduling for distribution losses payments minimization under electricity price uncertainty. IEEE Trans. on Smart Grid, 2015, 7(1): 261–272.
14	LI S Y, ZHONG S, PEI Z, et al. Multi-objective reconfigurable production line scheduling for smart home appliances. Journal of Systems Engineering and Electronics, 2021, 32(2): 297–317.
15	SALEHI J, ABDOLAHI A. Optimal scheduling of active distribution networks with penetration of PHEV considering congestion and air pollution using DR program. Sustainable Cities and Society, 2019, 51: 101709.
16	LI F, SUN B, ZHANG C H. Operation optimization for integrated energy system with energy storage. Science China Information Sciences, 2018, 61(12): 129207.
17	LEI W H, CUI H, NEMETH T, et al. Deep reinforcement learning-based energy management of hybrid battery systems in electric vehicles. Journal of Energy Storage, 2021, 36: 102355.
18	WAN Z Q, LI H P, HE H B, et al. Model-free real-time EV charging scheduling based on deep reinforcement learning. IEEE Trans. on Smart Grid, 2018, 10(5): 5246–5257.
19	LU R Z, HONG S H, YU M M. Demand response for home energy management using reinforcement learning and artificial neural network. IEEE Trans. on Smart Grid, 2019, 10(6): 6629–6639.
20	CAO J Y, DONG L, XUE L. Load scheduling for an electric water heater with forecasted price using deep reinforcement learning. Proc. of the Chinese Automation Congress, 2020: 2500–2505.
21	XI L, YU L, XU Y C, et al. A novel multi-agent DDQN-AD method-based distributed strategy for automatic generation control of integrated energy systems. IEEE Trans. on Sustainable Energy, 2019, 11(4): 2417–2426.
22	LI H P, WAN Z Q, HE H B. A deep reinforcement learning based approach for home energy management system. Proc. of the IEEE Power & Energy Society Innovative Smart Grid Technologies Conference, 2020: 1–5.
23	XU X, JIA Y W, XU Y, et al. A multi-agent reinforcement learning-based data-driven method for home energy management. IEEE Trans. on Smart Grid, 2020, 11(4): 3201–3211.
24	TSANG N, CAO C, WU S, et al. Autonomous household energy management using deep reinforcement learning. Proc. of the IEEE International Conference on Engineering, Technology and Innovation, 2019. DOI: 10.1109/ICE.2019.8792636.
25	WANG Y D, LIU H, ZHENF W B, et al. Multi-objective workflow scheduling with deep-q-network-based multi-agent reinforcement learning. IEEE Access, 2019, 7: 39974–39982.
26	YU L, XIE W W, XIE D, et al. Deep reinforcement learning for smart home energy management. IEEE Internet of Things Journal, 2019, 7(4): 2751–2762.
27	CHUNG H M, MAHARJAN S, ZHANG Y, et al. Distributed deep reinforcement learning for intelligent load scheduling in residential smart grid. IEEE Trans. on Industrial Informatics, 2020, 17(4): 2752–2763.
28	LEE S, CHOI D H. Energy management of smart home with home appliances, energy storage system and electric vehicle: a hierarchical deep reinforcement learning approach. Sensors, 2020, 20(7): 2157.
29	ALFAVERH F, DENAI M, SUN Y C. Demand response strategy based on reinforcement learning and fuzzy reasoning for home energy management. IEEE Access, 2020, 8: 39310–39321.
30	ZHOU S Y, HU Z J, GU W, et al. Artificial intelligence based smart energy community management: a reinforcement learning approach. CSEE Journal of Power and Energy Systems, 2019, 5(1): 1–10.
31	LOWE R, WU Y, TAMAR A, et al. Multi-agent actor-critic for mixed cooperative-competitive environments. Proc. of the 31th International Conference on Neural Information Processing Systems, 2017: 6382–6393.
32	MASSON W, RANCHOD P, KONIDARIS G. Reinforcement learning with parameterized actions. Proc. of the AAAI Conference on Artificial Intelligence, 2016: 1934–1940.
33	XIONG J C, WANG Q, YANG Z R. Parametrized deep q-networks learning: reinforcement learning with discrete-continuous hybrid action space. https://arxiv.org/abs/1810.06394.
34	BESTER C J, JAMES S D, KONIDARIS G D. Multi-pass q-networks for deep reinforcement learning with parameterized action spaces. https://arxiv.org/abs/1905.04388.
35	FU H T, TANG H Y, HAO J Y, et al. Deep multi-agent reinforcement learning with discrete-continuous hybrid action spaces. https://arxiv.org/abs/1903.04959.
36	BELLMAN R. Dynamic programming. Science, 1966, 153(3731): 34–37.
37	LILLICRAP T P, HUNT J J, PRITZEL A, et al. Continuous control with deep reinforcement learning. https://arxiv.org/abs/1509.02971.
38	SUTTON R S, MCALLESTER D A, SINGH S P, et al. Policy gradient methods for reinforcement learning with function approximation. Proc. of the 12th International Conference on Neural Information Processing Systems, 1999: 1057–1063.
39	HAARNOJA T, ZHOU A, HARTIKAINEN K, et al. Soft actor-critic algorithms and applications. https://arxiv.org/abs/1812.05905.

Parameter	Value
Learning rate of the actor networks $ l_a $	0.001
Learning rate of the critic network $ l_c $	0.01
Update rate of the target networks $ \tau $	0.01
Discount factor $ \gamma $	0.99
Capacity of the experience pool	10000
Batch size	128

Algorithm	$\Delta R_c$	$\Delta R_v$
HRL	20.73	69.48
pDQN	17.24	53.95
DQN	9.09	48.39

[1]	Bohao LI, Yunjie WU, Guofei LI. Hierarchical reinforcement learning guidance with threat avoidance [J]. Journal of Systems Engineering and Electronics, 2022, 33(5): 1173-1185.
[2]	Xiaofeng LI, Lu DONG, Changyin SUN. Hybrid Q-learning for data-based optimal control of non-linear switching system [J]. Journal of Systems Engineering and Electronics, 2022, 33(5): 1186-1194.
[3]	Ang GAO, Qisheng GUO, Zhiming DONG, Zaijiang TANG, Ziwei ZHANG, Qiqi FENG. Research on virtual entity decision model for LVC tactical confrontation of army units [J]. Journal of Systems Engineering and Electronics, 2022, 33(5): 1249-1267.
[4]	Xiangyang LIN, Qinghua XING, Fuxian LIU. Choice of discount rate in reinforcement learning with long-delay rewards [J]. Journal of Systems Engineering and Electronics, 2022, 33(2): 381-392.
[5]	Wenzhang LIU, Lu DONG, Jian LIU, Changyin SUN. Knowledge transfer in multi-agent reinforcement learning with incremental number of agents [J]. Journal of Systems Engineering and Electronics, 2022, 33(2): 447-460.
[6]	Wanping SONG, Zengqiang CHEN, Mingwei SUN, Qinglin SUN. Reinforcement learning based parameter optimization of active disturbance rejection control for autonomous underwater vehicle [J]. Journal of Systems Engineering and Electronics, 2022, 33(1): 170-179.
[7]	Jiandong ZHANG, Qiming YANG, Guoqing SHI, Yi LU, Yong WU. UAV cooperative air combat maneuver decision based on multi-agent reinforcement learning [J]. Journal of Systems Engineering and Electronics, 2021, 32(6): 1421-1438.
[8]	Kaifang WAN, Bo LI, Xiaoguang GAO, Zijian HU, Zhipeng YANG. A learning-based flexible autonomous motion control method for UAV in dynamic unknown environments [J]. Journal of Systems Engineering and Electronics, 2021, 32(6): 1490-1508.
[9]	Xin ZENG, Yanwei ZHU, Leping YANG, Chengming ZHANG. A guidance method for coplanar orbital interception based on reinforcement learning [J]. Journal of Systems Engineering and Electronics, 2021, 32(4): 927-938.
[10]	Ye MA, Tianqing CHANG, Wenhui FAN. A single-task and multi-decision evolutionary game model based on multi-agent reinforcement learning [J]. Journal of Systems Engineering and Electronics, 2021, 32(3): 642-657.

Day-ahead scheduling based on reinforcement learning with hybrid action space

RichHTML

PDF (PC)

Knowledge

Abstract

Cite this article

Share this article

Figures/Tables 12

References 39

Related Articles 10

Recommended Articles

Metrics

Comments