上海大学学报(自然科学版)

• 计算机工程与科学 • 上一篇    下一篇

用主成份提取进行数据库聚类预处理

徐俊,夏骄雄,李青   

  1. 上海大学 计算机工程与科学学院,上海 200072
  • 收稿日期:2007-04-07 修回日期:1900-01-01 出版日期:2007-12-20 发布日期:2007-12-20
  • 通讯作者: 徐俊

Database Cluster Preprocessing with Principal Component Extraction

XU Jun,XIA Jiao-xiong,LI Qing   

  1. School of Computer Engineering and Science, Shanghai University,
    Shanghai 200072, China
  • Received:2007-04-07 Revised:1900-01-01 Online:2007-12-20 Published:2007-12-20
  • Contact: XU Jun

摘要:

按照相关性最小原则提出数据库主成份提取的聚类预处理方法(DCPPCE)对高维数据进行降维,以数据对象变异最大方向的投影作为特定数据对象集的主成份,实现分层次主成份
聚类提取.用DCPPCE方法验证主成份对于原有信息全面覆盖的特性,同步解决了综合
变量覆盖和降维问题,降低了数据对象集合的相异度和维度,实现了数据对象集合的聚类归约.将聚类分析引入高校数据资源的预处理环节,给出应用实例,为深入探索相关模式提供有效的分析方法.

关键词: 聚类预处理, 数据库主成份提取, 数据资源, 主成份分析

Abstract:

According to the principle of least relativity of the data object, database cluster preprocessing with principal component extraction (DCP-PCE) is proposed to reduce dimension of a high dimensional system. Cluster extraction is carried out with hierarchical principal component analysis. The projection on the most differentiation of the data object is defined as principal component, which can be proved to include all the original information of the data object sets. With the DCP-PCE, comprehensive coverage of variables and lower dimension of principal component are solved synchronously, dissimilarity and dimension of the data object sets are decreased, and clustering reduction of the data object sets are reached. By leading the clustering analysis into the preprocessing of data resource on the colleges and universities, the application example is given to illustrate the effectiveness for exploration model.

Key words: cluster preprocessing, data resource, database principal
component extraction,
principal component analysis