Data Collection, Database and Data Processing

Database for materials genome engineering

Expand
  • 1. School of Computer Engineering and Science, Shanghai University, Shanghai 200444, China
    2. National Supercomputing Center in Wuxi, Wuxi 214072, Jiangsu, China
    3. Center of Materials Informatics and Data Science, Materials Genome Institute, Shanghai University, Shanghai 200444, China
    4. Zhejiang Laboratory, Hangzhou 311100, Zhejiang China

Received date: 2022-03-30

  Online published: 2022-05-27

Abstract

Materials data are multi-source, heterogeneous, and high-dimensional. Acquiring diverse and complex materials data as well as establishing a dedicated database for materials genome engineering (MGE) is the foundation for realizing data-driven new materials design. Herein, the materials genome database platform is introduced in terms of its system architecture, implementation, and deployment on a supercomputer. It is based on several core technologies, such as normalized representation of materials data, machine-learning modeling and model cross-domain deployment, machine learning under data privacy protection, and a materials database to a knowledge base using a knowledge graph. Finally, based on an anti-perovskite negative expansion material as an example, the entire application process of the MGE database platform from data curation to machine learning modeling followed by inverse design, in addition to a final experimental validation are discussed comprehensively herein.

Cite this article

YUE Xichao, FENG Yan, LIU Jian, YU Yeyong, XI Kangjie, QIAN Quan . Database for materials genome engineering[J]. Journal of Shanghai University, 2022 , 28(3) : 399 -412 . DOI: 10.12066/j.issn.1007-2861.2388

References

[1] Hey T, Tansley S, Tolle K. The fourth paradigm: data-intensive scientific discovery[J]. Proceedings of the IEEE, 2011, 99(8): 1334-1337.
[2] Pratt M J. Introduction to ISO 10303: the step standard for product data exchange[J]. Journal of Computing and Information Science in Engineering, 2001, 1(1):102-103.
[3] Hill J, Mulholland G, Persson K, et al. Materials science with large-scale data and informatics: unlocking new opportunities[J]. MRS Bulletin, 2016, 41(5): 399-409.
[4] Zhang X, Hu C, Li H. Semantic query on materials data based on mapping MatML to an OWL ontology[J]. Data Science Journal, 2009, 8: 1-17.
[5] Liu S L, Su Y J, Yin H Q, et al. An infrastructure with user-centered presentation data model for integrated management of materials data and services[J]. npj Computational Materials, 2021, 7(1): 779-786.
[6] Wilkinson M D, Dumontier M, Aalbersberg I J, et al. The FAIR guiding principles for scientific data management and stewardship[J]. Scientific Data, 2016, 3: 167-172.
[7] Bray T, Paoli J, Sperberg-McQueen C M, et al. Extensible markup language (XML) 1.0[J]. World Wide Web Journal, 1997, 2(4): 29-66.
[8] Sperberg-McQueen C M, Thompson H S. W3C XML schema definition language (XSD) 1.1 part 1: structures[EB/OL]. (2012-04-15)[2022-03-20]. https://www.w3.org/TR/xmlschema11-1/.
[9] Saxonica M K. XSL transformations (XSLT) version 2.0 (second edition)[EB/OL]. (2007-01-23)[2022-03-30]. https://www.w3.org/TR/2021/REC-xslt20-20210330/.
[10] Butler K T, Davies D W, Cartwright H, et al. Machine learning for molecular and materials science[J]. Nature, 2018, 559(7715): 547-555.
[11] Bai J, Lu F, Zhang K, et al. ONNX: open neural network exchange, Github[EB/OL]. (2022-03-18) [2022-03-20] https://github.com/onnx/onnx.
[12] Guazzelli A, Zeller M, Lin W C, et al. PMML: an open standard for sharing models[J]. The R Journal, 2009, 1(1): 60-65.
[13] Kosba A, Miller A, Shi E, et al. Hawk: the blockchain model of cryptography and privacy-preserving smart contracts[C]// 2016 IEEE Symposium on Security and Privacy. 2016: 839-858.
[14] Yang Q, Liu Y, Chen T, et al. Federated machine learning: concept and applications[EB/OL]. (2019-02-13)[2022-03-20]. http://arxiv.org/abs/1902.04885.
[15] Dwork C, McSherry F, Nissim K, et al. Calibrating noise to sensitivity in private data analysis[C]// Proceeding of Theory of Cryptography Conference. 2006: 1-20.
[16] Rivest R L, Adleman L, Dertouzos M L, et al. On data banks and privacy homomorphisms[J]. Foundations of Secure Computation, 1978, 4(11): 169-180.
[17] Mohassel P, Zhang Y. SecureML: a system for scalable privacy-preserving machine learning[C]// 2017 IEEE Symposium on Security and Privacy. 2017: 19-38.
[18] Yamazaki M, Xu Y. Current status of NIMS structural materials database[C]// ASME Pressure Vessels & Piping Conference. 2009: 1561-1568.
[19] Bordes A, Usunier N, Garcia-Duran A, et al. Translating embeddings for modeling multi-relational data[C]// Advances in Neural Information Processing Systems. 2013: 2787-2795.
[20] Wang Z, Zhang J, Feng J, et al. Knowledge graph embedding by translating on hyperplanes[C]// Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence. 2014: 1112-1119.
[21] Hu A, Chen H. Data visualization analysis of knowledge graph application[C]// 2021 2nd International Conference on Artificial Intelligence and Information Systems. 2021: 1-10.
[22] Gonçalves R S, Horridge M, Li R, et al. Use of OWL and semantic Web technologies at Pinterest[C]// The Semantic Web-ISWC 2019. 2019: 418-435.
[23] Bienvenu M, Bourgaux C, Goasdoué F, et al. Computing and explaining query answers over inconsistent DL-Lite knowledge bases[J]. Journal of Artificial Intelligence Research, 2019, 64: 563-644.
[24] Zhang X M, Liu X, Li X, et al. MMKG: an approach to generate metallic materials knowledge graph based on DBpedia and Wikipedia[J]. Computer Physics Communications, 2017, 211: 98-112.
Outlines

/