收稿日期: 2022-03-26
网络出版日期: 2022-05-27
基金资助
国家重点研发计划资助项目(2018YFB0704400);国家自然基金资助项目(52073168);云南省重大科技专项资助项目(202102AB080019-3);云南省重大科技专项资助项目(202002AB080001-2);之江实验室科研攻关资助项目(2021PE0AC02);上海张江国家自主创新示范区专项发展资金重大资助项目(ZJ2021-ZD-006)
Ensemble learning of polypropylene-composite aging data
Received date: 2022-03-26
Online published: 2022-05-27
聚丙烯复合材料老化实验周期长, 且单次实验采集的数据样本少, 使用传统机器学习方法进行预测的准确度较低. 为了解决聚丙烯复合材料老化数据样本少与预测准确性低的问题, 提出了一种虚拟样本生成(virtual sample generation, VSG)的集成学习预测方法. 首先, 对聚丙烯复合材料老化数据使用高斯混合模型(Gaussian mixed model, GMM)虚拟样本生成方法平滑生成验证有效的虚拟样本; 然后, 使用生成后的数据集建立集成学习预测模型, 该模型包含随机森林(random forest, RF)、极端梯度提升(extreme gradient boosting, XGBoost)算法、轻量级梯度提升机(light gradient boosting machine, LightGBM)算法以及分类梯度提升(categorical boosting, CatBoost)算法. 实验表明: 集成学习模型的 LightGBM 算法与 CatBoost 算法性能最优, 在测试数据上均方误差为 0.001 3 与 0.000 1, 比 RF 算法与 XGBoost 算法分别高出 0.4 与 0.2. 聚丙烯复合材料老化虚拟样本生成与集成学习方法可以有效解决实验周期长、单次实验采集的数据样本少的问题, 并可取得比单一机器学习算法更优的性能.
武星, 高进, 丁鹏 . 聚丙烯复合材料老化数据集成学习[J]. 上海大学学报(自然科学版), 2022 , 28(3) : 440 -450 . DOI: 10.12066/j.issn.1007-2861.2382
Aging experiments conducted on polypropylene composites have long durations, and a limited number of samples can be collected in a single experiment. As a result, traditional machine-learning approaches have a low prediction accuracy. To address these issues, we present an ensemble learning prediction based on virtual sample generation (VSG). To generate valid virtual samples of aging data for polypropylene composites, we first adopted the Gaussian mixed model (GMM) method and then used the generated data set to build an ensemble-learning prediction model comprising the random forest (RF), extreme gradient boosting (XGBoost), light gradient boosting machine (LightGBM), and categorical boosting (CatBoost) algorithms. The LightGBM and CatBoost algorithms in the ensemble learning model demonstrate the best performance on the test data; the mean square errors are 0.001 3 and 0.000 1, respectively, which are 0.4 and 0.2 higher than those of the RF algorithm and XGBoost algorithm, respectively. This study's aging VSG and ensemble learning approach for polypropylene composites can not only successfully overcome the long experimental times and insufficient number of data samples acquired in a single experiment but outperforms a single machine-learning algorithm.
| [1] | Oladele I O, Oladejo M O, Adediran A A, et al. Influence of designated properties on the characteristics of dombeya buettneri fiber/graphite hybrid reinforced polypropylene composites[J]. Scientific Reports, 2020, 10(1): 1-13. |
| [2] | 李宏岩, 周琳霞. 聚丙烯/竹纤维复合材料的制备及力学和抗老化性能研究[J]. 塑料科技, 2021, 49(7): 43-46. |
| [3] | Nishiwaki M, Fujiwara H. Highly accurate prediction of material optical properties based on density functional theory[J]. Computational Materials Science, 2020, 172: 109315. |
| [4] | He X L, Lei Z, Han Z, et al. Virtual sample generation method and its application in reforming data modeling[J]. Petroleum Processing and Petrochemicals, 2021, 52(6): 92-95. |
| [5] | Han M, Wang Z, Zhang X. An approach to data acquisition for urban building energy modeling using a gaussian mixture model and expectation-maximization algorithm[J]. Buildings, 2021, 11(1): 30-48. |
| [6] | Delon J, Desolneux A. A Wasserstein-type distance in the space of Gaussian mixture models[J]. SIAM Journal on Imaging Sciences, 2020, 13(2): 936-970. |
| [7] | Kopp M, Pevný T, Holeñ M. Anomaly explanation with random forests[J]. Expert Systems with Applications, 2020, 149: 113187-113202. |
| [8] | Aldrich C. Process variable importance analysis by use of random forests in a shapley regression framework[J]. Minerals, 2020, 10(5): 420-436. |
| [9] | Daneshvar D, Behnood A. Estimation of the dynamic modulus of asphalt concretes using random forests algorithm[J]. International Journal of Pavement Engineering, 2022, 23(2): 250-260. |
| [10] | Sagi O, Rokach L. Approximating XGBoost with an interpretable decision tree[J]. Information Sciences, 2021, 572: 522-542. |
| [11] | Yan J, Xu Y, Cheng Q, et al. LightGBM: Accelerated genomically designed crop breeding through ensemble learning[J]. Genome Biology, 2021, 22(1): 1-24. |
| [12] | Hancock J T, Khoshgoftaar T M. CatBoost for big data: an interdisciplinary review[J]. Journal of Big Data, 2020, 7(1): 1-45. |
| [13] | de Rooij M, Weeda W. Cross-validation: A method every psychologist should know[J]. Advances in Methods and Practices in Psychological Science, 2020, 3(2): 248-263. |
| [14] | Cai S, Zhao L, Ban Y, et al. GAN-based image-to-friction generation for tactile simulation of fabric material[J]. Computers & Graphics, 2021, 102: 460-473. |
| [15] | Ali M A, Guan Q, Umer R, et al. Deep learning based semantic segmentation of $\mu $CT images for creating digital material twins of fibrous reinforcements[J]. Composites Part A: Applied Science and Manufacturing, 2020, 139: 106131-106137. |
| [16] | Li L, Damarla S K, Wang Y, et al. A Gaussian mixture model based virtual sample generation approach for small datasets in industrial processes[J]. Information Sciences, 2021, 581: 262-277. |
| [17] | Zhang Z, Mansouri Tehrani A, Oliynyk A O, et al. Finding the next superhard material through ensemble learning[J]. Advanced Materials, 2021, 33(5): 2005112-2005119. |
| [18] | Talukdar S, Ghose B, Salam R, et al. Flood susceptibility modeling in Teesta River basin, Bangladesh using novel ensembles of bagging algorithms[J]. Stochastic Environmental Research and Risk Assessment, 2020, 34(12): 2277-2300. |
| [19] | Liu K, Hu X, Zhou H, et al. Feature Analyses and Modeling of Lithium-Ion Battery Manufacturing Based on Random Forest Classification[J]. IEEE/ASME Transactions on Mechatronics, 2021, 26(6): 2944-2955. |
| [20] | Gao X, Wang L, Yao L. Porosity prediction of ceramic matrix composites based on random forest[C]// IOP Conference Series: Materials Science and Engineering. IOP Publishing: Information Technology, 2020: 052115-052121. |
| [21] | Khan M A, Memon S A, Farooq F, et al. Compressive strength of fly-ash-based geopolymer concrete by gene expression programming and random forest[J]. Advances in Civil Engineering, 2021, 2021: 1-17. |
| [22] | Ebrahimy H, Feizizadeh B, Salmani S, et al. A comparative study of land subsidence susceptibility mapping of Tasuj plane, Iran, using boosted regression tree, random forest and classification and regression tree methods[J]. Environmental Earth Sciences, 2020, 79(10): 1-12. |
| [23] | Song K, Yan F, Ding T, et al. A steel property optimization model based on the XGBoost algorithm and improved PSO[J]. Computational Materials Science, 2020, 174: 109472-109484. |
| [24] | Zhao Y, Fu C, Fu L, et al. Data-driven machine learning models for the quick and accurate prediction of Tg and Td of OLED materials[J]Materials Chemistry, 2021, 22: 1-30. |
| [25] | Bhamare D K, Saikia P, Rathod M K, et al. A machine learning and deep learning based approach to predict the thermal performance of phase change material integrated building envelope[J]. Building and Environment, 2021, 199: 107927-107938. |
/
| 〈 |
|
〉 |