大规模数据集聚类的K邻近均匀抽样数据预处理算法

doi:10.3969/j.issn.1007-2861.2015.04.020

Abstract

Abstract:

To solve the problem of low efficiency and high storage overheads in densitybased clustering algorithms, an algorithm of even data sampling based on K nearest neighbors (KNN) is proposed as a data preprocessing method of clustering applications. The sampling algorithm slices dataset and gets samples evenly. After slicing a dataset, for part of the samples, the algorithm removes each sample’s K nearest neighbors in a descending order according to the density. The remaining samples are then used as the sample dataset. Experimental results show that, with the increase of data size and the guaranteed accuracy, the sampling algorithm can effectively improve efficiency of clustering by reducing the amount of data needed in clustering.

Key words: K nearest neighbors (KNN), clustering, density descending order, spatial even sampling

JI Chengheng, LEI Yongmei. KNN-based even sampling preprocessing algorithm for big dataset[J]. Journal of Shanghai University（Natural Science Edition）, 2016, 22(1): 28-35.

[1]	LI Jing, YU Liying. Improved fuzzy C-means clustering algorithm based on intuitionistic fuzzy sets [J]. Journal of Shanghai University（Natural Science Edition）, 2018, 24(4): 634-641.
[2]	GUO Peng1, LI Jun2, ZHANG Haiyan3. Intelligent remote planting system based on cloud platform [J]. Journal of Shanghai University（Natural Science Edition）, 2017, 23(2): 244-251.
[3]	LU Dongfang, XU Junfu, XIANG Chaojuan, XIE Jiang. Survey of clustering methods for big data in biology [J]. Journal of Shanghai University（Natural Science Edition）, 2016, 22(1): 45-57.
[4]	ZHANG Qi1, HUANG Chun-chun1, HAN Hong2, LI Chao-lun2, WANG Wen-ping2. CEUS Image Segmentation of Carotid Arteries Using Multi-scale Fuzzy Clustering and DGVF Model [J]. Journal of Shanghai University（Natural Science Edition）, 2014, 20(5): 633-644.
[5]	HE Xiang, LUO Xiang-feng. Document Clustering Method Based on Association Link Network [J]. Journal of Shanghai University（Natural Science Edition）, 2014, 20(2): 190-198.
[6]	ZHOU Jie1,2, SHI Zhi-dong1, ZHANG Zhen1,2, SHAN Lian-hai2,3, FANG Wei-dong1,2. An Improved Algorithm Based on LEACH in WSN [J]. Journal of Shanghai University（Natural Science Edition）, 2013, 19(2): 116-119.
[7]	BI Hang,XU Wei-in. A Hybrid Personal Recommendation Algorithm Based on Designated Group Interest [J]. Journal of Shanghai University（Natural Science Edition）, 2010, 16(3): 318-322.
[8]	CHEN Jun;WU Shao-chun;SHENG Chun-jian. Clustering Analysis Based on Concept Lattice [J]. Journal of Shanghai University（Natural Science Edition）, 2008, 14(4): 432-435 .

KNN-based even sampling preprocessing algorithm for big dataset

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 8

Recommended Articles

Metrics

Comments