中国电力 ›› 2024, Vol. 57 ›› Issue (3): 43-50.DOI: 10.11930/j.issn.1004-9649.202311065

• 数字化技术驱动的新型配电网 • 上一篇    下一篇

基于安全强化学习的主动配电网有功-无功协调优化调度

焦昊1(), 殷岩岩2(), 吴晨3(), 刘建1, 徐春雷3, 徐贤3, 孙国强2   

  1. 1. 国网江苏省电力有限公司电力科学研究院,江苏 南京 211103
    2. 河海大学 电气与动力工程学院,江苏 南京 211100
    3. 国网江苏省电力有限公司,江苏 南京 210024
  • 收稿日期:2023-11-15 出版日期:2024-03-28 发布日期:2024-03-26
  • 作者简介:焦昊(1991—),男,工程师,从事人工智能在电网中的应用研究,E-mail:jiaohaopeter@163.com
    殷岩岩(1997—),男,通信作者,硕士研究生,从事配电网优化研究,E-mail:1109851959@qq.com
    吴晨(1989—),女,高级工程师,从事电网规划、电力市场研究,E-mail:cwusgcc@gmail.com
  • 基金资助:
    国家自然科学基金资助项目(U1966205);国网江苏省电力有限公司科技项目(J2023121)。

Coordinated Optimization of Active and Reactive Power of Active Distribution Network Based on Safety Reinforcement Learning

Hao JIAO1(), Yanyan YIN2(), Chen WU3(), Jian LIU1, Chunlei XU3, Xian XU3, Guoqiang SUN2   

  1. 1. State Grid Jiangsu Electric Power Science Research Institute, Nanjing 211103, China
    2. College of Electrical and Power Engineering, Hohai University, Nanjing 211100, China
    3. State Grid Jiangsu Electric Power Co., Ltd., Nanjing 210024, China
  • Received:2023-11-15 Online:2024-03-28 Published:2024-03-26
  • Supported by:
    This work is supported by National Natural Science Foundation of China (No.U1966205) and Science and Technology Project of State Grid Jiangsu Electric Power Co., Ltd. (No.J2023121).

摘要:

提出一种基于离线策略的安全强化学习方法,通过离线训练大量配电网历史运行数据,摆脱了传统优化方法对完备且准确模型的依赖。首先,结合配电网络参数信息,建立了基于约束马尔可夫决策过程的有功无功优化模型;其次,基于原始对偶优化法设计了新型安全强化学习方法,该方法在最大化未来折扣奖励的同时最小化成本函数;最后,在配电系统上进行仿真。仿真结果表明:所提方法能够根据配电网实时观测信息,在线生成满足复杂约束条件且具有经济效益的调度策略。

关键词: 主动配电网, 有功无功协调优化, 安全强化学习

Abstract:

A safe reinforcement learning method based on offline strategies is proposed. Through offline training of a large amount of historical operating data of the distribution network, it gets rid of the traditional optimization method. Dependence on complete and accurate models. First, combined with the distribution network parameter information, an active and reactive power optimization model based on the constrained Markov decision process (CMDP) was established; then, a new safety reinforcement learning method was designed based on the original dual optimization method. The cost function is minimized while maximizing future discount rewards; finally, simulations are performed on power distribution system. The simulation results show that the proposed method can online generate a dispatching strategy that satisfies complex constraints and has economic benefits based on real-time observation information of the distribution network.

Key words: active distribution network, active and reactive power coordination optimization, safety reinforcement learning