上海大学学报(自然科学版) ›› 2022, Vol. 28 ›› Issue (3): 440-450.doi: 10.12066/j.issn.1007-2861.2382

• 数据采集、数据库和数据处理 • 上一篇    下一篇

聚丙烯复合材料老化数据集成学习

武星1,2,4(), 高进1, 丁鹏3,4   

  1. 1.上海大学 计算机工程与科学学院, 上海 200444
    2.之江实验室, 浙江 杭州 311100
    3.上海大学 理学院, 上海 200444
    4.上海大学 材料基因组工程研究院 材料信息与数据科学中心, 上海 200444
  • 收稿日期:2022-03-26 出版日期:2022-06-30 发布日期:2022-05-27
  • 通讯作者: 武星 E-mail:xingwu@shu.edu.cn
  • 作者简介:武星 (1980—), 男, 教授, 博士生导师, 博士,研究方向为多模态数据挖掘、机器学习. E-mail: xingwu@shu.edu.cn
  • 基金资助:
    国家重点研发计划资助项目(2018YFB0704400);国家自然基金资助项目(52073168);云南省重大科技专项资助项目(202102AB080019-3);云南省重大科技专项资助项目(202002AB080001-2);之江实验室科研攻关资助项目(2021PE0AC02);上海张江国家自主创新示范区专项发展资金重大资助项目(ZJ2021-ZD-006)

Ensemble learning of polypropylene-composite aging data

WU Xing1,2,4(), GAO Jin1, DING Peng3,4   

  1. 1. School of Computer Science and Engineering, Shanghai University, Shanghai 200444, China
    2. Zhejiang Laboratory, Hangzhou 311100, Zhejiang, China
    3. Research Center of Nanoscience and Nanotechnology, College of Sciences, Shanghai University, Shanghai 200444, China
    4. Center of Materials Informatics and Data Science, Materials Genome Institute, Shanghai University, Shanghai 200444, China
  • Received:2022-03-26 Online:2022-06-30 Published:2022-05-27
  • Contact: WU Xing E-mail:xingwu@shu.edu.cn

摘要:

聚丙烯复合材料老化实验周期长, 且单次实验采集的数据样本少, 使用传统机器学习方法进行预测的准确度较低. 为了解决聚丙烯复合材料老化数据样本少与预测准确性低的问题, 提出了一种虚拟样本生成(virtual sample generation, VSG)的集成学习预测方法. 首先, 对聚丙烯复合材料老化数据使用高斯混合模型(Gaussian mixed model, GMM)虚拟样本生成方法平滑生成验证有效的虚拟样本; 然后, 使用生成后的数据集建立集成学习预测模型, 该模型包含随机森林(random forest, RF)、极端梯度提升(extreme gradient boosting, XGBoost)算法、轻量级梯度提升机(light gradient boosting machine, LightGBM)算法以及分类梯度提升(categorical boosting, CatBoost)算法. 实验表明: 集成学习模型的 LightGBM 算法与 CatBoost 算法性能最优, 在测试数据上均方误差为 0.001 3 与 0.000 1, 比 RF 算法与 XGBoost 算法分别高出 0.4 与 0.2. 聚丙烯复合材料老化虚拟样本生成与集成学习方法可以有效解决实验周期长、单次实验采集的数据样本少的问题, 并可取得比单一机器学习算法更优的性能.

关键词: 聚丙烯复合材料, 材料老化, 集成学习, 高斯混合模型

Abstract:

Aging experiments conducted on polypropylene composites have long durations, and a limited number of samples can be collected in a single experiment. As a result, traditional machine-learning approaches have a low prediction accuracy. To address these issues, we present an ensemble learning prediction based on virtual sample generation (VSG). To generate valid virtual samples of aging data for polypropylene composites, we first adopted the Gaussian mixed model (GMM) method and then used the generated data set to build an ensemble-learning prediction model comprising the random forest (RF), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and categorical boosting (CatBoost) algorithms. The LightGBM and CatBoost algorithms in the ensemble learning model demonstrate the best performance on the test data; the mean square errors are 0.001 3 and 0.000 1, respectively, which are 0.4 and 0.2 higher than those of the RF algorithm and XGBoost algorithm, respectively. This study's aging VSG and ensemble learning approach for polypropylene composites can not only successfully overcome the long experimental times and insufficient number of data samples acquired in a single experiment but outperforms a single machine-learning algorithm.

Key words: polypropylene composites, material aging, ensemble learning, Gaussian mixture model

中图分类号: