上海大学学报(自然科学版) ›› 2014, Vol. 20 ›› Issue (2): 190-198.doi: 10.3969/j.issn.1007-2861.2013.07.003

• 计算机工程与科学 • 上一篇    下一篇

基于关联语义链网络的文本聚类方法

何 祥, 骆祥峰   

  1. (上海大学 计算机工程与科学学院, 上海 200444)
  • 出版日期:2014-04-26 发布日期:2014-04-26
  • 通讯作者: 骆祥峰(1970—), 男, 研究员, 博士, 研究方向为海量网络信息处理、认知信息学与人工智能等. E-mail: luoxf@shu.edu.cn
  • 作者简介:骆祥峰(1970—), 男, 研究员, 博士, 研究方向为海量网络信息处理、认知信息学与人工智能等.
  • 基金资助:

    国家自然科学基金资助项目(61071110)

Document Clustering Method Based on Association Link Network

HE Xiang, LUO Xiang-feng   

  1. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
  • Online:2014-04-26 Published:2014-04-26

摘要:  基于关联语义链网络提出了一种自适应分裂的文本聚类方法. 该方法通过从关联语义链网络中检测出各个社团结构作为文本集中的类别, 以避免对聚类数目的预先确定. 同时, 针对高维稀疏的词向量导致的文本之间或文本与类之间相似性低的问题, 将关联语义链网络中词与词之间的关联关系映射到文本与类之间的关联关系中去, 以增强文本与类之间关系的强度. 通过与其他主要聚类方法进行实验对比, 发现该聚类方法不仅能够对文本集合进行准确的聚类, 而且能够较准确地确定聚类中心数目和识别出文本集中的话题信息.

关键词: 关联语义链网络, 社区检测, 文本聚类

Abstract: This paper proposes a document clustering method with adaptive divisions based on association link network. Instead of explicitly offering the number of cluster centers in the traditional document clustering algorithms, categories were acquired auto- matically by detecting the community structure in association link network. Simultane- ously, with the consideration of the high-dimension and sparse word vectors that result in low similarities between the documents, the relationships were mapped between words in association link network to the relationships between the documents. Through the experimental comparisons with other clustering methods, it was found that the proposed clustering method not only obtains a high aggregation accuracy, but also are good at adap- tively discovering the number of cluster centers and distinguishing categories of topics.

Key words: association link network, community detection, document clustering

中图分类号: