收稿日期: 2022-03-30
网络出版日期: 2022-05-27
基金资助
国家重点研发计划资助项目(2018YFB0704400);云南省重大科技专项资助项目(202102AB080019-3);云南省重大科技专项资助项目(202002AB080001-2);之江实验室科研攻关资助项目(2021PE0AC02);上海张江国家自主创新示范区专项发展资金重大资助项目(ZJ2021-ZD-006)
Database for materials genome engineering
Received date: 2022-03-30
Online published: 2022-05-27
岳溪朝, 冯燕, 刘健, 于烨泳, 席慷杰, 钱权 . 材料基因组工程专用数据库[J]. 上海大学学报(自然科学版), 2022 , 28(3) : 399 -412 . DOI: 10.12066/j.issn.1007-2861.2388
Materials data are multi-source, heterogeneous, and high-dimensional. Acquiring diverse and complex materials data as well as establishing a dedicated database for materials genome engineering (MGE) is the foundation for realizing data-driven new materials design. Herein, the materials genome database platform is introduced in terms of its system architecture, implementation, and deployment on a supercomputer. It is based on several core technologies, such as normalized representation of materials data, machine-learning modeling and model cross-domain deployment, machine learning under data privacy protection, and a materials database to a knowledge base using a knowledge graph. Finally, based on an anti-perovskite negative expansion material as an example, the entire application process of the MGE database platform from data curation to machine learning modeling followed by inverse design, in addition to a final experimental validation are discussed comprehensively herein.
Key words: material genome engineering; database; machine learning; knowledge graph
| [1] | Hey T, Tansley S, Tolle K. The fourth paradigm: data-intensive scientific discovery[J]. Proceedings of the IEEE, 2011, 99(8): 1334-1337. |
| [2] | Pratt M J. Introduction to ISO 10303: the step standard for product data exchange[J]. Journal of Computing and Information Science in Engineering, 2001, 1(1):102-103. |
| [3] | Hill J, Mulholland G, Persson K, et al. Materials science with large-scale data and informatics: unlocking new opportunities[J]. MRS Bulletin, 2016, 41(5): 399-409. |
| [4] | Zhang X, Hu C, Li H. Semantic query on materials data based on mapping MatML to an OWL ontology[J]. Data Science Journal, 2009, 8: 1-17. |
| [5] | Liu S L, Su Y J, Yin H Q, et al. An infrastructure with user-centered presentation data model for integrated management of materials data and services[J]. npj Computational Materials, 2021, 7(1): 779-786. |
| [6] | Wilkinson M D, Dumontier M, Aalbersberg I J, et al. The FAIR guiding principles for scientific data management and stewardship[J]. Scientific Data, 2016, 3: 167-172. |
| [7] | Bray T, Paoli J, Sperberg-McQueen C M, et al. Extensible markup language (XML) 1.0[J]. World Wide Web Journal, 1997, 2(4): 29-66. |
| [8] | Sperberg-McQueen C M, Thompson H S. W3C XML schema definition language (XSD) 1.1 part 1: structures[EB/OL]. (2012-04-15)[2022-03-20]. https://www.w3.org/TR/xmlschema11-1/. |
| [9] | Saxonica M K. XSL transformations (XSLT) version 2.0 (second edition)[EB/OL]. (2007-01-23)[2022-03-30]. https://www.w3.org/TR/2021/REC-xslt20-20210330/. |
| [10] | Butler K T, Davies D W, Cartwright H, et al. Machine learning for molecular and materials science[J]. Nature, 2018, 559(7715): 547-555. |
| [11] | Bai J, Lu F, Zhang K, et al. ONNX: open neural network exchange, Github[EB/OL]. (2022-03-18) [2022-03-20] https://github.com/onnx/onnx. |
| [12] | Guazzelli A, Zeller M, Lin W C, et al. PMML: an open standard for sharing models[J]. The R Journal, 2009, 1(1): 60-65. |
| [13] | Kosba A, Miller A, Shi E, et al. Hawk: the blockchain model of cryptography and privacy-preserving smart contracts[C]// 2016 IEEE Symposium on Security and Privacy. 2016: 839-858. |
| [14] | Yang Q, Liu Y, Chen T, et al. Federated machine learning: concept and applications[EB/OL]. (2019-02-13)[2022-03-20]. http://arxiv.org/abs/1902.04885. |
| [15] | Dwork C, McSherry F, Nissim K, et al. Calibrating noise to sensitivity in private data analysis[C]// Proceeding of Theory of Cryptography Conference. 2006: 1-20. |
| [16] | Rivest R L, Adleman L, Dertouzos M L, et al. On data banks and privacy homomorphisms[J]. Foundations of Secure Computation, 1978, 4(11): 169-180. |
| [17] | Mohassel P, Zhang Y. SecureML: a system for scalable privacy-preserving machine learning[C]// 2017 IEEE Symposium on Security and Privacy. 2017: 19-38. |
| [18] | Yamazaki M, Xu Y. Current status of NIMS structural materials database[C]// ASME Pressure Vessels & Piping Conference. 2009: 1561-1568. |
| [19] | Bordes A, Usunier N, Garcia-Duran A, et al. Translating embeddings for modeling multi-relational data[C]// Advances in Neural Information Processing Systems. 2013: 2787-2795. |
| [20] | Wang Z, Zhang J, Feng J, et al. Knowledge graph embedding by translating on hyperplanes[C]// Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence. 2014: 1112-1119. |
| [21] | Hu A, Chen H. Data visualization analysis of knowledge graph application[C]// 2021 2nd International Conference on Artificial Intelligence and Information Systems. 2021: 1-10. |
| [22] | Gonçalves R S, Horridge M, Li R, et al. Use of OWL and semantic Web technologies at Pinterest[C]// The Semantic Web-ISWC 2019. 2019: 418-435. |
| [23] | Bienvenu M, Bourgaux C, Goasdoué F, et al. Computing and explaining query answers over inconsistent DL-Lite knowledge bases[J]. Journal of Artificial Intelligence Research, 2019, 64: 563-644. |
| [24] | Zhang X M, Liu X, Li X, et al. MMKG: an approach to generate metallic materials knowledge graph based on DBpedia and Wikipedia[J]. Computer Physics Communications, 2017, 211: 98-112. |
/
| 〈 |
|
〉 |