Journal of Shanghai University(Natural Science Edition) ›› 2022, Vol. 28 ›› Issue (3): 451-462.doi: 10.12066/j.issn.1007-2861.2387

• Data Collection, Database and Data Processing • Previous Articles     Next Articles

Regression modeling and multi-objective optimization for small sample scattered data

YAO Yu1, HU Tao2, FU Jianxun2(), HU Shunbo3,4   

  1. 1. School of Computer Engineering & Science, Shanghai University, Shanghai 200444, China
    2. Center for Advanced Solidification Technology (CAST), State Key Laboratory of Advanced Special Steel, School of Materials Science and Engineering, Shanghai University, Shanghai 200444, China
    3. Center of Materials Informatics and Data Science, Materials Genome Institute, Shanghai University, Shanghai 200444, China
    4. Zhejiang Laboratory, Hangzhou 311100, Zhejiang, China
  • Received:2022-03-18 Online:2022-06-30 Published:2022-05-27
  • Contact: FU Jianxun E-mail:fujianxun@shu.edu.cn

Abstract:

Regression modeling on small-sample scattered data poses certain challenges. In this study, the Gaussian process is used to model regression, and maximum likelihood estimation is performed to learn the hyperparameters of the kernel function. The regression results, i.e., the mean and variance of the objective function, are calculated and predicted from the posterior. Combining the results with the multi-objective optimization of variance, the uncertainty of material reverse design can be estimated. Experimental verifications are conducted on 1215MS non-quenched and tempered steel and three-point bending concrete datasets. The results show that for the three-point bending concrete, 50% of the experimental data are within the 95% confidence interval of the prediction, and the Gaussian process regression (GPR) model can measure the uncertainty of the scattered small-sample data more effectively and yield reasonable predictions. For the 1215MS dataset, a non-dominated genetic algorithm with an elite strategy is used to perform multi-objective optimization based on the GPR model. The mechanical properties of the material and the corresponding variance are used as optimization objectives, and the optimal mechanical properties are considered while considering the effect of uncertainties on the experimental results. The optimal Pareto solution set is obtained, which is subsequently used as candidate points for the next experiment to assist material design and preparation optimization.

Key words: small sample scattered data, Gaussian process regression, multi-objective optimization, elitist non-dominated sorting genetic algorithm (NSGA-Ⅱ)

CLC Number: