中国电力 ›› 2021, Vol. 54 ›› Issue (11): 47-58.DOI: 10.11930/j.issn.1004-9649.202103163

• 电力市场建设及运行机制专栏 • 上一篇    下一篇

不完全信息下基于深度双Q网络的发电商三段式竞价策略

杨朋朋1, 王蓓蓓1, 胥鹏1, 王高琴2, 郑亚先2   

  1. 1. 东南大学 电气工程学院, 江苏 南京 210096;
    2. 中国电力科学研究院有限公司, 江苏 南京 210003
  • 收稿日期:2021-03-31 修回日期:2021-09-29 出版日期:2021-11-05 发布日期:2021-11-16
  • 作者简介:杨朋朋(1996-),男,硕士研究生,从事电力市场研究,E-mail:564544091@qq.com;王蓓蓓(1979-),女,通信作者,博士,副教授,从事智能用电、需求侧管理与需求响应、电力系统运行与控制、电力市场等研究,E-mail:wangbeibei@seu.edu.cn
  • 基金资助:
    国家电网有限公司科技项目(支持发电负荷双侧报价的省级日前电力现货市场优化出清技术研究与开发)

Three-Stage Bidding Strategy of Generation Company Based on Double Deep Q-Network under Incomplete Information Condition

YANG Pengpeng1, WANG Beibei1, XU Peng1, WANG Gaoqin2, ZHENG Yaxian2   

  1. 1. School of Electrical Engineering, Southeast University, Nanjing 210096, China;
    2. China Electric Power Research Institute, Nanjing 210003, China
  • Received:2021-03-31 Revised:2021-09-29 Online:2021-11-05 Published:2021-11-16
  • Supported by:
    This work is supported by Science and Technology Project of SGCC (Research and Development of Clearing Technology for Provincial Day-Ahead Spot Electricity Market Supporting Bilateral Bidding)

摘要: 不完全信息的电力市场环境下发电商仅知道自身相关的信息,而其他市场参与者的报价和市场环境都可能会影响市场出清结果,进而影响发电商的收益,因此其报价决策应当考虑多维的市场信息。基于深度强化学习算法,提出多智能体的DDQN(double deep Q-network)算法模拟日前现货市场中发电商三段式竞价策略的过程。首先,定义发电商模型中马尔可夫决策过程的要素和动作价值函数;然后,建立发电商深度双Q网络的框架,并引入经验池和动态ε-greedy算法进行神经网络的训练,该决策模型可以根据市场的出清电价和负荷水平等多维连续状态做出报价;最后,通过算例比较了发电商采用DDQN和传统Q-learning两种算法获得的收益差别,说明DDQN算法可以根据发电商面临的复杂市场环境做出正确的决策而Q-learning算法在环境复杂时决策能力较差,并在不同状态量选取、网络泛化能力、更大规模算例适应性等方面分析了发电商采用DDQN算法进行市场策略生成的有效性和优越性。

关键词: 深度强化学习, 发电商竞价策略, 三段式竞价, DDQN

Abstract: In power market with incomplete information, a generation company only knows its own relevant information, while biddings of other market members and market environment may affect the market clearing result, which impacts the generation company’s revenue, so its bidding strategy should consider multi-dimensional market information. On the basis of deep learning reinforcement method, this paper proposes a framework based on the multi-agent DDQN (Double Deep Q-Network) algorithm to simulate the bidding strategy of generation company in the spot market. Firstly, the elements of the Markov Decision Process and action-value function in the model is defined. Secondly, the framework of the generator’s double deep Q network is established and the ε-greedy algorithm and Experience Replay Memory is adopted to train the neural network. The proposed model can make decisions based on multi-dimensional continuous states such as the market clearing price and load levels. Finally, a PJM 5-bus test case is used to compare the rewards obtained by DDQN and traditional Q-learning algorithm. The results shows that the DDQN algorithm can make appropriate decisions according to the complex state while the Q-learning algorithm has poor performance. This paper also analyzes the effectiveness of the generation company’s adoption of the DDQN algorithm for generating market strategy in terms of selection of different state vector, network generalization ability and adaptability to larger-scale calculation examples.

Key words: deep reinforcement learning, bidding strategy of generator, three stage quotation rules, DDQN