Three-Stage Bidding Strategy of Generation Company Based on Double Deep Q-Network under Incomplete Information Condition

doi:10.11930/j.issn.1004-9649.202103163

Abstract

Abstract: In power market with incomplete information, a generation company only knows its own relevant information, while biddings of other market members and market environment may affect the market clearing result, which impacts the generation company’s revenue, so its bidding strategy should consider multi-dimensional market information. On the basis of deep learning reinforcement method, this paper proposes a framework based on the multi-agent DDQN (Double Deep Q-Network) algorithm to simulate the bidding strategy of generation company in the spot market. Firstly, the elements of the Markov Decision Process and action-value function in the model is defined. Secondly, the framework of the generator’s double deep Q network is established and the ε-greedy algorithm and Experience Replay Memory is adopted to train the neural network. The proposed model can make decisions based on multi-dimensional continuous states such as the market clearing price and load levels. Finally, a PJM 5-bus test case is used to compare the rewards obtained by DDQN and traditional Q-learning algorithm. The results shows that the DDQN algorithm can make appropriate decisions according to the complex state while the Q-learning algorithm has poor performance. This paper also analyzes the effectiveness of the generation company’s adoption of the DDQN algorithm for generating market strategy in terms of selection of different state vector, network generalization ability and adaptability to larger-scale calculation examples.

Key words: deep reinforcement learning, bidding strategy of generator, three stage quotation rules, DDQN

YANG Pengpeng, WANG Beibei, XU Peng, WANG Gaoqin, ZHENG Yaxian. Three-Stage Bidding Strategy of Generation Company Based on Double Deep Q-Network under Incomplete Information Condition[J]. Electric Power, 2021, 54(11): 47-58.

Add to citation manager EndNote|Ris|BibTeX

URL: https://www.electricpower.com.cn/EN/10.11930/j.issn.1004-9649.202103163

https://www.electricpower.com.cn/EN/Y2021/V54/I11/47

References

[1] 中共中央, 国务院. 关于进一步深化电力体制改革的若干意见[EB/OL]. (2015-03-15)[2021-02-20]. http://tgs.ndrc.gov.cn/zywj/201601/t20160129773852.htm1.
[2] 国家发展改革委, 国家能源局. 关于开展电力现货市场建设试点工作的通知[EB/OL].(2017-08-28).[2021-02-20] http://www.ndrc.gov.cn/gzdt/2017091t20170905_86011.
[3] 郭亚军, 郭宏, 东明. PAB竞价机制下发电商报价策略研究[J]. 数量经济技术经济研究, 2004, 21(1): 155–159
[4] 潘旵, 纪昌明. 双层约束发电厂竞价优化模型研究[J]. 武汉大学学报(工学版), 2003, 36(2): 127–129
PAN Gan, JI Changming. Study on optimizing bidding model of power suppliers subjects to dual constraints[J]. Engineering Journal of Wuhan University, 2003, 36(2): 127–129
[5] 尚金成, 黄永皓, 张维存, 等. 一种基于博弈论的发电商竞价策略模型与算法[J]. 电力系统自动化, 2002, 26(9): 12–15
SHANG Jincheng, HUANG Yonghao, ZHANG Weicun, et al. A model and algorithm of game theory based bidding strategy for an independent power provider[J]. Automation of Electric Power Systems, 2002, 26(9): 12–15
[6] 杨道辉, 马光文, 吴世勇. 基于粒子群算法的发电商非合作博弈行为分析[J]. 四川大学学报(工程科学版), 2006, 38(6): 51–56
YANG Daohui, MA Guangwen, WU Shiyong. Analysis of the bidding strategy of power producers with non-cooperative game theory based on PSO[J]. Journal of Sichuan University (Engineering Science Edition), 2006, 38(6): 51–56
[7] 王剑辉, 庞宏立, 谭忠富. 不完全信息静态博弈在发电商报价策略研究中的应用[J]. 现代电力, 2004, 21(2): 91–94
WANG Jianhui, PANG Hongli, TAN Zhongfu. Incomplete information static game theory in apptication of power producers' bidding strategy[J]. Modern Electric Power, 2004, 21(2): 91–94
[8] 刘建良, 周杰娜, 杨华. 基于双人博弈且参数估计下的发电商报价策略研究[J]. 中国电机工程学报, 2007, 27(19): 62–67
LIU Jianliang, ZHOU Jiena, YANG Hua. Research on generating Entities'Bidding strategies based on double-person gaming and parameters estimation[J]. Proceedings of the CSEE, 2007, 27(19): 62–67
[9] TELLIDOU A C, BAKIRTZIS A G. Agent-based analysis of capacity withholding and tacit collusion in electricity markets[J]. IEEE Transactions on Power Systems, 2007, 22(4): 1735–1742.
[10] 邹斌, 李庆华, 言茂松. 电力拍卖市场的智能代理仿真模型[J]. 中国电机工程学报, 2005, 25(15): 7–11
ZOU Bin, LI Qinghua, YAN Maosong. An agent-based simulation model on pool-based electricity market using locational marginal price[J]. Proceedings of the CSEE, 2005, 25(15): 7–11
[11] NANDURI V, DAS T K. A reinforcement learning model to assess market power under auction-based energy pricing[J]. IEEE Transactions on Power Systems, 2007, 22(1): 85–95.
[12] 高瞻, 宋依群. 基于Q学习算法的发电商报价策略模型[J]. 华东电力, 2008, 36(4): 20–22
GAO Zhan, SONG Yiqun. Power supplier bieding strategies based on Q-learning algorithm[J]. East China Electric Power, 2008, 36(4): 20–22
[13] 徐尔丰. 基于A3C强化学习的电力市场发电商报价策略研究[D]. 北京: 华北电力大学(北京), 2019.
XU Erfeng. Research on bidding strategy of generators in electricity market based on asynchronous advantage actor-critic reinforcement learning[D]. Beijing: North China Electric Power University, 2019.
[14] XIONG G F, HASHIYAMA T, OKUMA S. An electricity supplier bidding strategy through Q-Learning[C]//IEEE Power Engineering Society Summer Meeting. Chicago, IL, USA. IEEE, 2002: 1516–1521.
[15] 王帅. 发电商基于Q-Learning算法的日前市场竞价策略[J]. 能源技术经济, 2010, 22(3): 34–39
WANG Shuai. Generators' bidding strategies in the day-ahead market based on Q-learning algorithm[J]. Energy Technology and Economics, 2010, 22(3): 34–39
[16] 宋依群, 吴炯. 基于Q学习算法的发电公司决策新模型[J]. 上海交通大学学报, 2006, 40(4): 568–571
SONG Yiqun, WU Jiong. A Q-learning algorithm based decision model for generation company[J]. Journal of Shanghai Jiao Tong University, 2006, 40(4): 568–571
[17] 王竹晓, 张彭彭, 李为, 等. 基于深度Q网络的电力工控网络异常检测系统[J]. 计算机与现代化, 2019(12): 114–118
WANG Zhuxiao, ZHANG Pengpeng, LI Wei, et al. Electric power industrial control network anomaly detection system based on deep Q network[J]. Computer and Modernization, 2019(12): 114–118
[18] JIN J Q, SONG C R, LI H, et al. Real-time bidding with multi-agent reinforcement learning in display advertising[C]//Proceedings of the 27th ACM International Conference on Information and Knowledge Management. New York, NY, USA: ACM, 2018: 2193–2201.
[19] 杜明秋, 李妍, 王标, 等. 电动汽车充电控制的深度增强学习优化方法[J]. 中国电机工程学报, 2019, 39(14): 4042–4049
DU Mingqiu, LI Yan, WANG Biao, et al. Deep reinforcement learning optimization method for charging control of electric vehicles[J]. Proceedings of the CSEE, 2019, 39(14): 4042–4049
[20] 刘威, 张东霞, 王新迎, 等. 基于深度强化学习的电网紧急控制策略研究[J]. 中国电机工程学报, 2018, 38(1): 109–119,347
LIU Wei, ZHANG Dongxia, WANG Xinying, et al. A decision making strategy for generating unit tripping under emergency circumstances based on deep reinforcement learning[J]. Proceedings of the CSEE, 2018, 38(1): 109–119,347
[21] SU J J, MA C H, LI S, et al. An AGC dynamic control method based on DQN algorithm[J]. IOP Conference Series:Materials Science and Engineering, 2020, 729: 012009.
[22] 高宇, 李昀, 曹蓉蓉, 等. 基于多代理Double DQN算法模拟发电侧竞价行为[J]. 电网技术, 2020, 44(11): 4175–4183
GAO Yu, LI Yun, CAO Rongrong, et al. Simulation of generators' bidding behavior based on multi-agent double DQN[J]. Power System Technology, 2020, 44(11): 4175–4183
[23] THRUN S, SCHWARTZ A. Issues in using function approximation for reinforcement learning[C]// Proceedings of the Fourth Connectionist Models Summer School, 1993: 255–263.
[24] 谢畅, 王蓓蓓, 赵盛楠, 等. 基于双层粒子群算法求解电力市场均衡[J]. 电网技术, 2018, 42(4): 1170–1177
XIE Chang, WANG Beibei, ZHAO Shengnan, et al. Equilibrium solution for electricity market based on bi-level particle swarm optimization algorithm[J]. Power System Technology, 2018, 42(4): 1170–1177
[25] 马子明, 钟海旺, 李竹, 等. 美国电力市场信息披露体系及其对中国的启示[J]. 电力系统自动化, 2017, 41(24): 49–57
MA Ziming, ZHONG Haiwang, LI Zhu, et al. Information disclosure system in American electricity market and its enlightenment for China[J]. Automation of Electric Power Systems, 2017, 41(24): 49–57
[26] 高怡静, 肖艳炜, 杨朋朋, 等. 基于强化学习的电力市场信息披露程度对市场成员交易行为影响研究[J]. 智慧电力, 2020, 48(2): 109–118
GAO Yijing, XIAO Yanwei, YANG Pengpeng, et al. Influence of information disclosure on trading behavior in electricity market based on reinforcement learning[J]. Smart Power, 2020, 48(2): 109–118
[27] BELLMAN R. Dynamic programming and Lagrange multipliers[J]. Proceedings of the National Academy of Sciences, 1956, 42(10): 767–769.
[28] BEN J A K. Learning from delayed rewards[J]. Robotics and Autonomous Systems, 1995, 15(4): 233–235.
[29] 周志华. 机器学习[M]. 北京: 清华大学出版社, 2016.
[30] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Playing atari with deep reinforcement learning [EB/OL]. (2013-12-19) [2021-02-10]. https://arxiv.org/pdf/1312.5602.pdf.
[31] 夏宗涛, 秦进. 基于优势学习的深度Q网络[J]. 计算机工程与应用, 2019, 55(20): 101–106
XIA Zongtao, QIN Jin. Deep Q net based on advantage learning[J]. Computer Engineering and Applications, 2019, 55(20): 101–106
[32] MNIH V, KAVUKCUOGLU K, SILVER D, et al. Human-level control through deep reinforcement learning[J]. Nature, 2015, 518(7540): 529–533.
[33] VAN HASSELT H, GUEZ A, SILVER D. Deep reinforcement learning with double Q-learning[C]// Thirtieth AAAI Conference on Artificial Intelligence. 2016.
[34] FANG X, LI F X, WEI Y L, et al. Strategic scheduling of energy storage for load serving entities in locational marginal pricing market[J]. IET Generation, Transmission & Distribution, 2016, 10(5): 1258–1267.
[35] 刘军, 王苗, 严清心, 等. 基于组合赋权和梯形云模型的发电商市场力评价[J]. 电力科学与技术学报, 2021, 36(2): 58–66
LIU Jun, WANG Miao, YAN Qingxin, et al. Market power evaluation of generators based on combination weighting and trapezoidal cloud model[J]. Journal of Electric Power Science and Technology, 2021, 36(2): 58–66
[36] REITZES J D, PFEIFENBERGER J P, FOX-PENNER P, et al. Review of PJM’s market power mitigation practices in comparison to other organized electricity markets[EB/OL]. (2007-09-14) [2021-01-10]. http://files.brattle.com/system/publications/pdfs/000/004/868/original/review_of_pjm_market_power_mit_sep_14_2007_final.pdf.