上海大学学报(自然科学版) ›› 2022, Vol. 28 ›› Issue (3): 451-462.doi: 10.12066/j.issn.1007-2861.2387

• 数据采集、数据库和数据处理 • 上一篇    下一篇

小样本分散数据的回归建模和多目标优化

姚煜1, 胡涛2, 付建勋2(), 胡顺波3,4   

  1. 1.上海大学 计算机工程与科学学院, 上海 200444
    2.上海大学 材料科学与工程学院 先进凝固技术中心 省部共建高品质特殊钢冶金与制备国家重点实验室, 上海 200444
    3.上海大学 材料基因组工程研究院 材料信息与数据科学中心, 上海 200444
    4.之江实验室, 浙江 杭州 311100
  • 收稿日期:2022-03-18 出版日期:2022-06-30 发布日期:2022-05-27
  • 通讯作者: 付建勋 E-mail:fujianxun@shu.edu.cn
  • 作者简介:付建勋(1960—), 男, 教授,博士生导师, 博士, 研究方向为高品质特殊钢.E-mail: fujianxun@shu.edu.cn
  • 基金资助:
    国家重点研发计划资助项目(2018YFB0704400);云南省重大科技专项资助项目(202102AB080019-3);云南省重大科技专项资助项目(202002AB080001-2);之江实验室科研攻关资助项目(2021PE0AC02);上海张江国家自主创新示范区专项发展资金重大资助项目(ZJ2021-ZD-006)

Regression modeling and multi-objective optimization for small sample scattered data

YAO Yu1, HU Tao2, FU Jianxun2(), HU Shunbo3,4   

  1. 1. School of Computer Engineering & Science, Shanghai University, Shanghai 200444, China
    2. Center for Advanced Solidification Technology (CAST), State Key Laboratory of Advanced Special Steel, School of Materials Science and Engineering, Shanghai University, Shanghai 200444, China
    3. Center of Materials Informatics and Data Science, Materials Genome Institute, Shanghai University, Shanghai 200444, China
    4. Zhejiang Laboratory, Hangzhou 311100, Zhejiang, China
  • Received:2022-03-18 Online:2022-06-30 Published:2022-05-27
  • Contact: FU Jianxun E-mail:fujianxun@shu.edu.cn

摘要:

小样本分散数据上的回归对建模有一定挑战, 利用高斯过程对其回归进行建模, 即采用极大似然估计进行核函数的超参数学习, 通过后验来计算回归结果并预测出目标函数的均值和方差. 在此基础上结合方差的多目标优化, 在进行材料逆向设计的同时能对设计结果的不确定性进行估计. 对 1215MS 非调质钢和三点弯混凝土数据集进行了实验验证. 实验结果表明, 对于三点弯混凝土平均有 50% 实验数据落在预测的 95% 置信区间内, 高斯过程回归 (Gaussian process regression, GPR) 模型可以较好地度量分散小样本数据的不确定性, 进行合理预测. 对于 1215MS 非调质钢数据集, 在高斯过程回归模型的基础上, 运用带精英策略的非支配遗传算法 (elitist non-dominated sorting genetic algorithm, NSGA-Ⅱ) 进行多目标优化, 将材料的力学性能以及所对应的方差作为优化目标, 在考虑最优力学性能的同时兼顾不确定因素对实验结果的影响, 得到最优帕累托解集, 以此作为下次实验的候选点, 辅助材料设计和制备优化.

关键词: 小样本分散数据, 高斯过程回归, 多目标优化, NSGA-Ⅱ

Abstract:

Regression modeling on small-sample scattered data poses certain challenges. In this study, the Gaussian process is used to model regression, and maximum likelihood estimation is performed to learn the hyperparameters of the kernel function. The regression results, i.e., the mean and variance of the objective function, are calculated and predicted from the posterior. Combining the results with the multi-objective optimization of variance, the uncertainty of material reverse design can be estimated. Experimental verifications are conducted on 1215MS non-quenched and tempered steel and three-point bending concrete datasets. The results show that for the three-point bending concrete, 50% of the experimental data are within the 95% confidence interval of the prediction, and the Gaussian process regression (GPR) model can measure the uncertainty of the scattered small-sample data more effectively and yield reasonable predictions. For the 1215MS dataset, a non-dominated genetic algorithm with an elite strategy is used to perform multi-objective optimization based on the GPR model. The mechanical properties of the material and the corresponding variance are used as optimization objectives, and the optimal mechanical properties are considered while considering the effect of uncertainties on the experimental results. The optimal Pareto solution set is obtained, which is subsequently used as candidate points for the next experiment to assist material design and preparation optimization.

Key words: small sample scattered data, Gaussian process regression, multi-objective optimization, elitist non-dominated sorting genetic algorithm (NSGA-Ⅱ)

中图分类号: