中国电力 ›› 2023, Vol. 56 ›› Issue (7): 146-155.DOI: 10.11930/j.issn.1004-9649.202302001

• 电网 • 上一篇    下一篇

继电保护装置缺陷文本专业词典构建及其语言特性分析

刘中硕1, 郑少明2, 陶畅1, 刘一民2, 陈乾1, 王书鸿1, 于逸廷1, 薛安成1   

  1. 1. 新能源电力系统国家重点实验室(华北电力大学),北京 102206;
    2. 国家电网有限公司华北分部,北京 100053
  • 收稿日期:2023-02-01 修回日期:2023-03-13 发布日期:2023-07-28
  • 作者简介:刘中硕(1996-),男,硕士,从事电力系统二次设备评估研究,E-mail:lzs20122015@163.com;郑少明(1983-),男,高级工程师,从事继电保护运行管理研究,E-mail:sky2001mh@163.com;薛安成(1979-),男,通信作者,博士,教授,从事模型和数据驱动的新能源电力系统稳定分析、数据安全和二次设备评估研究,E-mail:acxue@ncepu.edu.cn
  • 基金资助:
    国家电网有限公司华北分部技术服务项目(SGNC0000DKJS2100369)。

The Construction of the Professional Dictionary of Relay Protection Defect Text in a Regional Power Grid and Its Natural Language Characteristics Analysis

LIU Zhongshuo1, ZHENG Shaoming2, TAO Chang1, LIU Yimin2, CHEN Qian1, WANG Shuhong1, YU Yiting1, XUE Ancheng1   

  1. 1. State Key Laboratory of Alternate Electrical Power System with Renewable Energy Source (North China Electric Power University), Beijing 102206, China;
    2. North China Branch of State Grid Corporation of China, Beijing 100053, China
  • Received:2023-02-01 Revised:2023-03-13 Published:2023-07-28
  • Supported by:
    This work is supported by Technical Service Project of North China Branch of State Grid Corporation of China (No.SGNC0000DKJS2100369).

摘要: 继电保护装置缺陷文本缺乏基于专业词典的数据挖掘,对继电保护缺陷定级、诊断和消除支撑不足,无法满足高效运维需求。结合某区域电网继电保护缺陷数据,提出了适用于继电保护装置缺陷的专业词典构建方法,并构建了相关专业词典。首先,汇总了该区域继电保护装置缺陷文本数据,形成缺陷文本语料库;其次,应用基于正则表达式的停用词识别方法,实现缺陷文本中无关字词的剔除;然后,采用机器与人工相结合的方法,构建了缺陷文本分词词典,采用潜在语义分析和决策树分类,实现了同义词合并;然后,通过整合停用词表、分词词典、同义词表,构建了该区域电网保护装置缺陷专业词典;最后,对比了使用词典前后的专业词汇齐普夫分布和语料库信息熵,验证了所构建专业词典的有效性。

关键词: 继电保护, 语料库, 缺陷记录, 文本挖掘, 专业词典

Abstract: Massive defect text data of relay protection devices is lack of data mining based on professional dictionary. It cannot provide sufficient support for grading, diagnosing, and eliminating relay protection defects, thus unable to meet efficient operation and maintenance needs. A professional dictionary construction method suitable for defects in relay protection devices is proposed, and relevant professional dictionaries are constructed taking a regional power grid as an example. Firstly, relevant defect logs and management protocols are aggregated to form a defect text corpus; secondly, a regular expression-based deactivation word identification method is applied to realize the rejection of irrelevant words in the defect text; then, a combined machine and manual method is used to build a relay protection defect text dictionary; Besides, it adopts latent semantic analysis and decision tree classification to achieve synonym merging. By integrating the deactivation word list, the split word lexicon and the synonym list, a specialized dictionary of protection device defects in the regional power grid is constructed. Finally, the Zipf distribution feature analysis of the professional dictionary and the corpus information entropy analysis before and after using the dictionary are carried out, which shows the effectiveness of the professional dictionary.

Key words: relay protection, corpus, defect record, text mining, professional dictionary