上海大学学报(自然科学版) ›› 2022, Vol. 28 ›› Issue (3): 399-412.doi: 10.12066/j.issn.1007-2861.2388

• 数据采集、数据库和数据处理 • 上一篇    下一篇

材料基因组工程专用数据库

岳溪朝1, 冯燕1, 刘健1, 于烨泳1, 席慷杰2, 钱权1,3,4()   

  1. 1.上海大学 计算机工程与科学学院, 上海 200444
    2.国家超级计算无锡中心, 江苏 无锡 214072
    3.上海大学 材料基因组工程研究院 材料信息与数据科学中心, 上海 200444
    4.之江实验室, 浙江 杭州 311100
  • 收稿日期:2022-03-30 出版日期:2022-06-30 发布日期:2022-05-27
  • 通讯作者: 钱权 E-mail:qqian@shu.edu.cn
  • 作者简介:钱权(1972—), 男, 研究员, 博士生导师, 博士, 研究方向为材料信息学、机器学习、网络安全等. E-mail: qqian@shu.edu.cn
  • 基金资助:
    国家重点研发计划资助项目(2018YFB0704400);云南省重大科技专项资助项目(202102AB080019-3);云南省重大科技专项资助项目(202002AB080001-2);之江实验室科研攻关资助项目(2021PE0AC02);上海张江国家自主创新示范区专项发展资金重大资助项目(ZJ2021-ZD-006)

Database for materials genome engineering

YUE Xichao1, FENG Yan1, LIU Jian1, YU Yeyong1, XI Kangjie2, QIAN Quan1,3,4()   

  1. 1. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
    2. National Supercomputing Center in Wuxi, Wuxi 214072, Jiangsu, China
    3. Center of Materials Informatics and Data Science, Materials Genome Institute, Shanghai University, Shanghai 200444, China
    4. Zhejiang Laboratory, Hangzhou 311100, Zhejiang China
  • Received:2022-03-30 Online:2022-06-30 Published:2022-05-27
  • Contact: QIAN Quan E-mail:qqian@shu.edu.cn

摘要:

材料数据具有多源、异构、高维等特点, 收集纷繁复杂的材料数据, 建立材料基因工程专用数据库, 是实现数据驱动的新材料研发的基础. 以材料数据的规范化表示、机器学习建模及模型跨域部署、材料数据隐私保护下的机器学习、利用知识图谱从材料数据库到知识库等材料基因专用数据库的若干核心技术为基础, 介绍了材料基因数据库平台的系统架构及实现、平台超算部署及运行. 最后以反钙钛矿负膨胀材料为例, 介绍了材料基因工程数据库平台从数据归档到机器学习建模, 再到逆向设计, 以及最终实验验证的整个流程.

关键词: 材料基因组工程, 数据库, 机器学习, 知识图谱

Abstract:

Materials data are multi-source, heterogeneous, and high-dimensional. Acquiring diverse and complex materials data as well as establishing a dedicated database for materials genome engineering (MGE) is the foundation for realizing data-driven new materials design. Herein, the materials genome database platform is introduced in terms of its system architecture, implementation, and deployment on a supercomputer. It is based on several core technologies, such as normalized representation of materials data, machine-learning modeling and model cross-domain deployment, machine learning under data privacy protection, and a materials database to a knowledge base using a knowledge graph. Finally, based on an anti-perovskite negative expansion material as an example, the entire application process of the MGE database platform from data curation to machine learning modeling followed by inverse design, in addition to a final experimental validation are discussed comprehensively herein.

Key words: material genome engineering, database, machine learning, knowledge graph

中图分类号: