[an error occurred while processing this directive]

Journal of Systems Engineering and Electronics ›› 2024, Vol. 35 ›› Issue (6): 1516-1529.doi: 10.23919/JSEE.2024.000062

• • 上一篇    

  

  • 收稿日期:2022-12-29 出版日期:2024-12-18 发布日期:2025-01-14

Tactical reward shaping for large-scale combat by multi-agent reinforcement learning

Nanxun DUO1,*(), Qinzhao WANG1(), Qiang LYU2(), Wei WANG3()   

  1. 1 Department of Weapon and Control, Academy of Army Armored Forces, Beijing 100072, China
    2 Beijing South Technology Co., Ltd., Beijing 100176, China
    3 Beijing Special Vehicle Institute, Beijing 100072, China
  • Received:2022-12-29 Online:2024-12-18 Published:2025-01-14
  • Contact: Nanxun DUO E-mail:nanxunduo@outlook.com;13641331602@163.com;rokyou@live.cn;wangwei5zjs@163.com
  • About author:
    DUO Nanxun was born in 1994. She received her B.S. degree from Northwestern Polytechnical University in 2016, and M.S. degree from Academy of Army Armored Forces in 2018. She is now a Ph.D. candidate in Academy of Army Armored Forces. Her research interests are multi-agent system and reinforcement learning. E-mail: nanxunduo@outlook.com

    WANG Qinzhao was born in 1973. He received his Ph.D. degree from Academy of Army Armored Forces in 2010, where he is now a professor. His research interests are unmanned multi-agent system and intelligent decision making. E-mail: 13641331602@163.com

    LYU Qiang was born in 1962. He received his Ph.D. degree from Harbin Institute of Technology in 1994. He is now a professor in Beijing South Technology. His research interest includes autonomous robot and reinforcement learning. E-mail: rokyou@live.cn

    WANG Wei was born in 1989. He received his Ph.D. degree from Academy of Army Armored Forces in 2021. He is an engineer in Beijing Special Vehicle Institute. His research interests are electromagnetic spectrum operations and countering unmanned ground system. E-mail: wangwei5zjs@163.com

Abstract:

Future unmanned battles desperately require intelligent combat policies, and multi-agent reinforcement learning offers a promising solution. However, due to the complexity of combat operations and large size of the combat group, this task suffers from credit assignment problem more than other reinforcement learning tasks. This study uses reward shaping to relieve the credit assignment problem and improve policy training for the new generation of large-scale unmanned combat operations. We first prove that multiple reward shaping functions would not change the Nash Equilibrium in stochastic games, providing theoretical support for their use. According to the characteristics of combat operations, we propose tactical reward shaping (TRS) that comprises maneuver shaping advice and threat assessment-based attack shaping advice. Then, we investigate the effects of different types and combinations of shaping advice on combat policies through experiments. The results show that TRS improves both the efficiency and attack accuracy of combat policies, with the combination of maneuver reward shaping advice and ally-focused attack shaping advice achieving the best performance compared with that of the baseline strategy.

Key words: deep reinforcement learning, multi-agent reinforcement learning, multi-agent combat, unmanned battle, reward shaping