Journal of Shanghai University(Natural Science Edition) ›› 2022, Vol. 28 ›› Issue (3): 504-511.doi: 10.12066/j.issn.1007-2861.2389

• Machine Learning • Previous Articles     Next Articles

Two-stage ensemble learning model for predicting band gaps of composites

XU Yan1(), HU Hongqing2, LIU Xi2, ZHANG Yufeng1, DING Guangtai2, ZHANG Huiran2,3,4   

  1. 1. College of Mathematics and Physics, Shanghai University of Electric Power, Shanghai 201306, China
    2. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
    3. Center of Materials Informatics and Data Science, Materials Genome Institute, Shanghai University, Shanghai 200444, China
    4. Zhejiang Laboratory, Hangzhou 311100, Zhejiang, China
  • Received:2022-04-09 Online:2022-06-30 Published:2022-05-27
  • Contact: XU Yan E-mail:xuyan@shiep.edu.cn

Abstract:

The band gap is an important parameter that can affect the physical and chemical properties of perovskite oxide composites, such as their conductivity and photo-electricity. To identify new perovskites for different applications, their band gap should be predicted via machine learning. Herein, a two-stage ensemble learning model that can predict the band gap of perovskite oxide composites is proposed by combining multiple individual base learners via a certain strategy. The first stage involves individual test functions produced by multiple regression learners. All individual base learners and some specific descriptors are aggregated into an ensemble model in the second stage. Subsequently, a dataset comprising the data of 210 ABX$_3$-type perovskites is used to evaluate the proposed ensemble learning model. Results show that the proposed two-stage ensemble methodology can improve the generalization performance. Its successful application on ABX$_3$-type perovskites indicates the effectiveness and practicability of ensemble learning in material research.

Key words: ensemble learning model, combination strategy, band gap predicting, perovskite oxides composite, generalization performance

CLC Number: