中国电力 ›› 2023, Vol. 56 ›› Issue (7): 85-94.DOI: 10.11930/j.issn.1004-9649.202210086

• 电网 • 上一篇    下一篇

基于RG-DDPG的直流微网能量管理策略

李建标1, 陈建福1, 高滢2, 裴星宇1, 吴宏远1, 陆子凯2, 周少雄3, 曾杰2   

  1. 1. 广东电网有限责任公司直流配用电研究中心,广东 珠海 519000;
    2. 南方电网电力科技股份有限公司,广东 广州 510000;
    3. 清科优能(深圳)技术有限公司,广东 深圳 518000
  • 收稿日期:2022-10-21 修回日期:2023-05-18 发布日期:2023-07-28
  • 作者简介:李建标(1984-),男,博士,高级工程师,从事直流配用电技术、智能电网技术研究,E-mail:zhlijianbiao@126.com;高滢(1994-),女,通信作者,硕士,从事综合能源、直流配用电技术研究,E-mail:1643221691@qq.com
  • 基金资助:
    中国南方电网有限责任公司科技项目(GDKJXM20212062)。

Strategy for DC Microgrid Energy Management Based on RG-DDPG

LI Jianbiao1, CHEN Jianfu1, GAO Ying2, PEI Xingyu1, WU Hongyuan1, LU Zikai2, ZHOU Shaoxiong3, ZENG Jie2   

  1. 1. DC Power Distribution and Consumption Technology Research Centre of Guangdong Power Grid Co., Ltd., Zhuhai 519000, China;
    2. China Southern Power Grid Technology Co., Ltd., Guangzhou 510000, China;
    3. Qingke Youneng (Shenzhen) Technology Co., Ltd., Shenzhen 518000, China
  • Received:2022-10-21 Revised:2023-05-18 Published:2023-07-28
  • Supported by:
    This work is supported by Science and Technology Project of China Southern Power Grid Co., Ltd. (No.GDKJXM20212062).

摘要: 针对分布式能源的随机性和间歇性给直流微网能量管理带来的巨大挑战,提出一种基于奖励指导深度确定性策略梯度(reward guidance deep deterministic policy gradient,RG-DDPG)的直流微网能量管理策略。该策略将直流微网的优化运行描述为一个马尔科夫决策过程,利用智能体与直流微网环境间的持续交互,自适应地学习能量管理决策,实现直流微网能量的优化管理。在策略训练过程中,采用基于时序差分误差(temporal difference error,TD-error)的优先经验回放机制减少RG-DDPG在直流微网运行环境中学习、探索的随机性和盲目性,提升所提能量优化管理策略的收敛速度。同时,在训练回合间利用累计奖励的大小构造直流微网能量管理的优秀回合集,加强RG-DDPG智能体在训练回合间的联系,最大化利用优秀回合的训练价值。算例仿真结果表明:所提策略能够实现直流微网内能量的合理分配。相较于基于深度Q网络(deep Q-network,DQN)和粒子群算法(particle swarm optimization,PSO)的能量管理策略,所提策略能使直流微网日平均运行成本分别降低11.16%和7.10%。

关键词: 直流微网, 能量管理, RG-DDPG, 优先经验回放, 优秀回合集

Abstract: The randomness and intermittency of distributed energy have brought great challenges to the energy management of direct current (DC) microgrids. In order to solve this challenge, a DC microgrid energy management strategy based on reward guidance deep deterministic policy gradient (RG-DDPG) is proposed in this paper. This strategy describes the optimal operation of the DC microgrid as a Markov decision process and uses the continuous interaction between the agent and the DC microgrid environment to adaptively learn energy management decisions, thus realizing the optimal management of the DC microgrid energy. In the strategy training process, the priority experience replay mechanism based on temporal difference error (TD-error) is used to reduce the randomness and blindness of RG-DDPG’s learning and exploration in the DC microgrid operating environment and improve the convergence speed of the energy optimization and management strategy proposed in this paper. At the same time, during the training rounds, the size of the accumulated rewards is used to construct an excellent round set of DC microgrid energy management, strengthen the connection between RG-DDPG agents in the training rounds, and maximize the use of the training value of the excellent round. The simulation results show that the proposed strategy can reasonably distribute energy in the DC microgrid. Compared with the energy management strategy based on deep Q learning (DQN) and particle swarm optimization (PSO), the proposed strategy can reduce the daily average operation cost of DC microgrids by 11.16% and 7.10%, respectively.

Key words: DC microgrid, energy management, RG-DDPG, priority experience replay, excellent round set