中国电力 ›› 2025, Vol. 58 ›› Issue (2): 111-117.DOI: 10.11930/j.issn.1004-9649.202404047

• 基于数据驱动的电力系统安全稳定分析与控制 • 上一篇    下一篇

基于多智能体安全深度强化学习的电压控制

曾仪1(), 周毅2(), 陆继翔1,3(), 周良才2(), 唐宁恺1, 李红1   

  1. 1. 国网电力科学研究院有限公司(南瑞集团有限公司),江苏 南京 211106
    2. 国家电网有限公司华东分部,上海 200120
    3. 电网运行风险防御技术与装备全国重点实验室,江苏 南京 211106
  • 收稿日期:2024-04-10 出版日期:2025-02-28 发布日期:2025-02-25
  • 作者简介:曾仪(1999—),女,硕士研究生,从事人工智能及电力系统自动化研究,E-mail:eezengyi@foxmail.com
    周毅(1982—),男,高级工程师,从事电网调度、电力系统自动化研究,E-mail:joryie@163.com
    陆继翔(1973—),男,通信作者,高级工程师(教授级),硕士生导师,从事人工智能及电力系统自动化研究,E-mail:lujixiang@sgepri.sgcc.com.cn
    周良才(1984—),男,高级工程师,从事电网调度、电力系统自动化研究,E-mail:liangcaizhou@163.com
  • 基金资助:
    国家电网有限公司科技项目(5108-20233058A-1-1-ZN)。

Voltage Control Based on Multi-Agent Safe Deep Reinforcement Learning

Yi ZENG1(), Yi ZHOU2(), Jixiang LU1,3(), Liangcai ZHOU2(), Ningkai TANG1, Hong LI1   

  1. 1. State Grid Electric Power Research Institute (NARI Group Corporation), Nanjing 211106, China
    2. East China Branch of State Grid Corporation of China, Shanghai 200120, China
    3. State Key Laboratory of Technology and Equipment for Defense against Power System Operational Risks, Nanjing 211106, China
  • Received:2024-04-10 Online:2025-02-28 Published:2025-02-25
  • Supported by:
    This work is supported by Science and Technology Project of SGCC (No.5108-20233058A-1-1-ZN).

摘要:

针对分布式光伏在配电网中的高比例接入带来的电压越限和波动问题,提出了一种基于多智能体安全深度强化学习的电压控制方法。将含光伏的电压控制建模为分布式部分可观马尔可夫决策过程。在深度策略网络中引入安全层进行智能体设计,同时在智能体奖励函数定义时,使用基于传统优化模型电压约束的电压屏障函数。在IEEE 33节点算例上的测试结果表明:所提方法在光伏高渗透率场景下可生成符合安全约束的电压控制策略,可用于在线辅助调度员进行实时决策。

关键词: 无功电压控制, 安全深度强化学习, 多智能体

Abstract:

To address issues of voltage limit violations and fluctuations caused by the high penetration of distributed photovoltaic (PV) systems in the distribution network, a voltage control method based on multi-agent safe deep reinforcement learning is proposed. The voltage control with PV is modeled as a decentralized partially observable Markov decision process. A safety layer is introduced in the deep policy network for agent design, while the voltage barrier function based on traditional optimization model voltage constraints is used in defining the agent reward function. Testing results on the IEEE 33-bus system demonstrate that the proposed method can generate voltage control strategies that meet safety constraints under high photovoltaic penetration scenarios, and it can be used to assist dispatchers in making real-time decisions online.

Key words: volt-var control, safe deep reinforcement learning, multi-agent

中图分类号: