数据采集、数据库和数据处理

材料基因组工程专用数据库

展开
  • 1.上海大学 计算机工程与科学学院, 上海 200444
    2.国家超级计算无锡中心, 江苏 无锡 214072
    3.上海大学 材料基因组工程研究院 材料信息与数据科学中心, 上海 200444
    4.之江实验室, 浙江 杭州 311100
钱权(1972—), 男, 研究员, 博士生导师, 博士, 研究方向为材料信息学、机器学习、网络安全等. E-mail: qqian@shu.edu.cn

收稿日期: 2022-03-30

  网络出版日期: 2022-05-27

基金资助

国家重点研发计划资助项目(2018YFB0704400);云南省重大科技专项资助项目(202102AB080019-3);云南省重大科技专项资助项目(202002AB080001-2);之江实验室科研攻关资助项目(2021PE0AC02);上海张江国家自主创新示范区专项发展资金重大资助项目(ZJ2021-ZD-006)

Database for materials genome engineering

Expand
  • 1. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
    2. National Supercomputing Center in Wuxi, Wuxi 214072, Jiangsu, China
    3. Center of Materials Informatics and Data Science, Materials Genome Institute, Shanghai University, Shanghai 200444, China
    4. Zhejiang Laboratory, Hangzhou 311100, Zhejiang China

Received date: 2022-03-30

  Online published: 2022-05-27

摘要

材料数据具有多源、异构、高维等特点, 收集纷繁复杂的材料数据, 建立材料基因工程专用数据库, 是实现数据驱动的新材料研发的基础. 以材料数据的规范化表示、机器学习建模及模型跨域部署、材料数据隐私保护下的机器学习、利用知识图谱从材料数据库到知识库等材料基因专用数据库的若干核心技术为基础, 介绍了材料基因数据库平台的系统架构及实现、平台超算部署及运行. 最后以反钙钛矿负膨胀材料为例, 介绍了材料基因工程数据库平台从数据归档到机器学习建模, 再到逆向设计, 以及最终实验验证的整个流程.

本文引用格式

岳溪朝, 冯燕, 刘健, 于烨泳, 席慷杰, 钱权 . 材料基因组工程专用数据库[J]. 上海大学学报(自然科学版), 2022 , 28(3) : 399 -412 . DOI: 10.12066/j.issn.1007-2861.2388

Abstract

Materials data are multi-source, heterogeneous, and high-dimensional. Acquiring diverse and complex materials data as well as establishing a dedicated database for materials genome engineering (MGE) is the foundation for realizing data-driven new materials design. Herein, the materials genome database platform is introduced in terms of its system architecture, implementation, and deployment on a supercomputer. It is based on several core technologies, such as normalized representation of materials data, machine-learning modeling and model cross-domain deployment, machine learning under data privacy protection, and a materials database to a knowledge base using a knowledge graph. Finally, based on an anti-perovskite negative expansion material as an example, the entire application process of the MGE database platform from data curation to machine learning modeling followed by inverse design, in addition to a final experimental validation are discussed comprehensively herein.

参考文献

[1] Hey T, Tansley S, Tolle K. The fourth paradigm: data-intensive scientific discovery[J]. Proceedings of the IEEE, 2011, 99(8): 1334-1337.
[2] Pratt M J. Introduction to ISO 10303: the step standard for product data exchange[J]. Journal of Computing and Information Science in Engineering, 2001, 1(1):102-103.
[3] Hill J, Mulholland G, Persson K, et al. Materials science with large-scale data and informatics: unlocking new opportunities[J]. MRS Bulletin, 2016, 41(5): 399-409.
[4] Zhang X, Hu C, Li H. Semantic query on materials data based on mapping MatML to an OWL ontology[J]. Data Science Journal, 2009, 8: 1-17.
[5] Liu S L, Su Y J, Yin H Q, et al. An infrastructure with user-centered presentation data model for integrated management of materials data and services[J]. npj Computational Materials, 2021, 7(1): 779-786.
[6] Wilkinson M D, Dumontier M, Aalbersberg I J, et al. The FAIR guiding principles for scientific data management and stewardship[J]. Scientific Data, 2016, 3: 167-172.
[7] Bray T, Paoli J, Sperberg-McQueen C M, et al. Extensible markup language (XML) 1.0[J]. World Wide Web Journal, 1997, 2(4): 29-66.
[8] Sperberg-McQueen C M, Thompson H S. W3C XML schema definition language (XSD) 1.1 part 1: structures[EB/OL]. (2012-04-15)[2022-03-20]. https://www.w3.org/TR/xmlschema11-1/.
[9] Saxonica M K. XSL transformations (XSLT) version 2.0 (second edition)[EB/OL]. (2007-01-23)[2022-03-30]. https://www.w3.org/TR/2021/REC-xslt20-20210330/.
[10] Butler K T, Davies D W, Cartwright H, et al. Machine learning for molecular and materials science[J]. Nature, 2018, 559(7715): 547-555.
[11] Bai J, Lu F, Zhang K, et al. ONNX: open neural network exchange, Github[EB/OL]. (2022-03-18) [2022-03-20] https://github.com/onnx/onnx.
[12] Guazzelli A, Zeller M, Lin W C, et al. PMML: an open standard for sharing models[J]. The R Journal, 2009, 1(1): 60-65.
[13] Kosba A, Miller A, Shi E, et al. Hawk: the blockchain model of cryptography and privacy-preserving smart contracts[C]// 2016 IEEE Symposium on Security and Privacy. 2016: 839-858.
[14] Yang Q, Liu Y, Chen T, et al. Federated machine learning: concept and applications[EB/OL]. (2019-02-13)[2022-03-20]. http://arxiv.org/abs/1902.04885.
[15] Dwork C, McSherry F, Nissim K, et al. Calibrating noise to sensitivity in private data analysis[C]// Proceeding of Theory of Cryptography Conference. 2006: 1-20.
[16] Rivest R L, Adleman L, Dertouzos M L, et al. On data banks and privacy homomorphisms[J]. Foundations of Secure Computation, 1978, 4(11): 169-180.
[17] Mohassel P, Zhang Y. SecureML: a system for scalable privacy-preserving machine learning[C]// 2017 IEEE Symposium on Security and Privacy. 2017: 19-38.
[18] Yamazaki M, Xu Y. Current status of NIMS structural materials database[C]// ASME Pressure Vessels & Piping Conference. 2009: 1561-1568.
[19] Bordes A, Usunier N, Garcia-Duran A, et al. Translating embeddings for modeling multi-relational data[C]// Advances in Neural Information Processing Systems. 2013: 2787-2795.
[20] Wang Z, Zhang J, Feng J, et al. Knowledge graph embedding by translating on hyperplanes[C]// Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence. 2014: 1112-1119.
[21] Hu A, Chen H. Data visualization analysis of knowledge graph application[C]// 2021 2nd International Conference on Artificial Intelligence and Information Systems. 2021: 1-10.
[22] Gonçalves R S, Horridge M, Li R, et al. Use of OWL and semantic Web technologies at Pinterest[C]// The Semantic Web-ISWC 2019. 2019: 418-435.
[23] Bienvenu M, Bourgaux C, Goasdoué F, et al. Computing and explaining query answers over inconsistent DL-Lite knowledge bases[J]. Journal of Artificial Intelligence Research, 2019, 64: 563-644.
[24] Zhang X M, Liu X, Li X, et al. MMKG: an approach to generate metallic materials knowledge graph based on DBpedia and Wikipedia[J]. Computer Physics Communications, 2017, 211: 98-112.
文章导航

/