中国电力 ›› 2023, Vol. 56 ›› Issue (2): 68-76.DOI: 10.11930/j.issn.1004-9649.202107065

• 电网 • 上一篇    下一篇

基于深度确定性策略梯度算法的风光储系统联合调度策略

张淑兴1, 马驰2, 杨志学3, 王尧1, 吴昊1, 任洲洋3   

  1. 1. 中广核研究院有限公司,广东 深圳 518000;
    2. 中国广核新能源控股有限公司,北京 100084;
    3. 输配电装备及系统安全与新技术国家重点实验室(重庆大学),重庆 400044
  • 收稿日期:2021-08-10 修回日期:2022-12-16 出版日期:2023-02-28 发布日期:2023-02-23
  • 作者简介:张淑兴(1985—),男,硕士,高级工程师,从事核能及新能源发输电技术研究,E-mail:zhangshuxing@cgnpc.com.cn;杨志学(1996—),男,通信作者,硕士研究生,从事深度强化学习在电力系统中应用的研究,E-mail:1836581865@qq.com
  • 基金资助:
    国家自然科学基金资助项目(51677012)。

Deep Deterministic Policy Gradient Algorithm Based Wind-Photovoltaic-Storage Hybrid System Joint Dispatch

ZHANG Shuxing1, MA Chi2, YANG Zhixue3, WANG Yao1, WU Hao1, REN Zhouyang3   

  1. 1. China Nuclear Power Technology Research Institute Co., Ltd., Shenzhen 518000, China;
    2. CGN New Energy Holdings Co., Ltd., Beijing 100084, China;
    3. State Key Laboratory of Power Transmission Equipment & System Security and New Technology (Chongqing University), Chongqing 400044, China
  • Received:2021-08-10 Revised:2022-12-16 Online:2023-02-28 Published:2023-02-23
  • Supported by:
    This work is supported by National Natural Science Foundation of China (No.51677012).

摘要: 针对风光储联合系统的调度问题,提出了一种基于深度强化学习的风光储系统联合调度模型。首先,以计划跟踪、弃风弃光以及储能运行成本最小为目标,建立了充分考虑风光储各个场站约束下的联合调度模型。然后,定义该调度模型在强化学习框架下的系统状态变量、动作变量以及奖励函数等,引入了深度确定性策略梯度算法,利用其环境交互、策略探索的机制,学习风光储系统的联合调度策略,以实现对联合系统功率跟踪,减少弃风弃光以及储能充放电。最后,借用西北某地区风电、光伏、跟踪计划的历史数据对模型进行了训练和算例分析,结果表明所提方法可以较好地适应不同时期的风光变化,得到在给定风光下联合系统的调度策略。

关键词: 风光储联合系统, 联合调度策略, 不确定性, 深度强化学习, 深度确定性策略梯度算法

Abstract: A deep reinforcement learning based wind-photovoltaic-storage system joint dispatch model is proposed. First, a joint dispatch model that fully considers the constraints of various wind and solar storage stations is established, where tracking dispatch plans, wind and solar curtailment, and energy storage operation costs are considered in the objective function. Then, the state variables, action variables and reward function under the reinforcement learning framework are defined. Later, a deep deterministic policy gradient algorithm is introduced, using its environmental interaction and strategy exploration mechanism to learn the joint scheduling strategy, so as to achieve the dispatch strategy tracking, reduce wind and solar abandonment, and reduce energy storage charging and discharging. Finally, the historical data of wind power, photovoltaic, and dispatch plan in a certain area of northwestern China are employed to train and analyze the model. The results of the case studies show that the proposed method can adapt well to the changes in the wind power and photovoltaic power in different periods, and the joint scheduling strategy can be obtained under given data of wind and photovoltaic.

Key words: wind-photovoltaic-storage hybrid system, joint scheduling strategy, uncertainty, deep reinforcement learning, deep deterministic policy gradient algorithm