opXRD: Open Experimental Powder X‐Ray Diffraction Database

Hollarek, Daniel; Schopmans, Henrik; Östreicher, Jona; Teufel, Jonas; Cao, Bin; Alwen, Adie; Schweidler, Simon; Singh, Mriganka; Kodalle, Tim; Hu, Hanlin; Heymans, Gregoire; Abdelsamie, Maged; Hardiagon, Arthur; Wieczorek, Alexander; Zhuk, Siarhei; Schwaiger, Ruth; Siol, Sebastian; Coudert, François‐Xavier; Wolf, Moritz; Sutter‐Fella, Carolin M; Breitung, Ben; Hodge, Andrea M; Zhang, Tong‐yi; Friederich, Pascal  (ORCID:0000000344651465)

doi:10.1002/aidi.202500044

Powder X‐ray diffraction (pXRD) experiments are a cornerstone for materials structure characterization. Despite their widespread application, analyzing pXRD diffractograms still presents a significant challenge to automation and a bottleneck in high‐throughput discovery in self‐driving labs. Machine learning promises to resolve this bottleneck by enabling automated powder diffraction analysis. A notable difficulty in applying machine learning to this domain is the lack of sufficiently sized experimental datasets, which has constrained researchers to train primarily on simulated data. However, models trained on simulated pXRD patterns showed limited generalization to experimental patterns, particularly for low‐quality experimental patterns with high noise levels and elevated backgrounds. With the Open Experimental Powder X‐ray Diffraction Database (opXRD), we provide an openly available and easily accessible dataset of labeled and unlabeled experimental powder diffractograms. Labeled opXRD data can be used to evaluate the performance of models on experimental data and unlabeled opXRD data can help improve the performance of models on experimental data, for example, through transfer learning methods. We collected 92,552 diffractograms, 2179 of them labeled, from a wide spectrum of material classes. We hope this ongoing effort can guide machine learning research toward fully automated analysis of pXRD data and thus enable future self‐driving materials labs.

More Like this