上海大学学报(自然科学版) ›› 2022, Vol. 28 ›› Issue (3): 504-511.doi: 10.12066/j.issn.1007-2861.2389

• 机器学习 • 上一篇    下一篇

面向复合材料带隙预测的两段式集成学习模型构建

徐燕1(), 胡红青2, 刘茜2, 张玉凤1, 丁广太2, 张惠然2,3,4   

  1. 1.上海电力大学 数理学院, 上海 201306
    2.上海大学 计算机工程与科学学院, 上海 200444
    3.上海大学 材料基因组工程研究院 材料信息与数据科学中心, 上海 200444
    4.之江实验室, 浙江 杭州 311100
  • 收稿日期:2022-04-09 出版日期:2022-06-30 发布日期:2022-05-27
  • 通讯作者: 徐燕 E-mail:xuyan@shiep.edu.cn
  • 作者简介:徐燕(1982—), 女, 博士, 研究方向为钙钛矿材料、机器学习等. E-mail: xuyan@shiep.edu.cn
  • 基金资助:
    国家重点研发计划资助项目(2018YFB0704400);云南省重大科技专项资助项目(202002AB080001-2);云南省重大科技专项资助项目(202102AB080019-3);之江实验室科研攻关资助项目(2021PE0AC02);上海张江国家自主创新示范区专项发展资金重大资助项目(ZJ2021-ZD-006)

Two-stage ensemble learning model for predicting band gaps of composites

XU Yan1(), HU Hongqing2, LIU Xi2, ZHANG Yufeng1, DING Guangtai2, ZHANG Huiran2,3,4   

  1. 1. College of Mathematics and Physics, Shanghai University of Electric Power, Shanghai 201306, China
    2. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
    3. Center of Materials Informatics and Data Science, Materials Genome Institute, Shanghai University, Shanghai 200444, China
    4. Zhejiang Laboratory, Hangzhou 311100, Zhejiang, China
  • Received:2022-04-09 Online:2022-06-30 Published:2022-05-27
  • Contact: XU Yan E-mail:xuyan@shiep.edu.cn

摘要:

带隙是钙钛矿型复合氧化物材料重要的特征参数, 对材料的物理化学性质起决定性作用, 如导电性能和光电性能等. 为了寻找适合不同应用领域的钙钛矿型材料, 利用机器学习进行带隙预测是一种重要的研究手段. 构建了一个两阶段异质集成学习模型, 在第一阶段使用多种不同的基础机器学习器(回归模型)进行预测; 在第二阶段把对预测结果影响较大的描述子和基础机器学习器进行集成学习.利用该模型对 210 种钙钛矿型复合氧化物材料的带隙进行预测, 并与多种独立的机器学习算法以及不同集成策略模型的预测性能相对比, 评估了本模型的预测性能. 结果表明, 这种两段式的集成学习模型能够更好地学习到材料数据的内在关系, 并具有较好的预测效果和较强的泛化能力.

关键词: 集成学习模型, 组合策略, 带隙预测, 钙钛矿型复合氧化物材料, 泛化能力

Abstract:

The band gap is an important parameter that can affect the physical and chemical properties of perovskite oxide composites, such as their conductivity and photo-electricity. To identify new perovskites for different applications, their band gap should be predicted via machine learning. Herein, a two-stage ensemble learning model that can predict the band gap of perovskite oxide composites is proposed by combining multiple individual base learners via a certain strategy. The first stage involves individual test functions produced by multiple regression learners. All individual base learners and some specific descriptors are aggregated into an ensemble model in the second stage. Subsequently, a dataset comprising the data of 210 ABX$_3$-type perovskites is used to evaluate the proposed ensemble learning model. Results show that the proposed two-stage ensemble methodology can improve the generalization performance. Its successful application on ABX$_3$-type perovskites indicates the effectiveness and practicability of ensemble learning in material research.

Key words: ensemble learning model, combination strategy, band gap predicting, perovskite oxides composite, generalization performance

中图分类号: