Home  | Publications | HLR25a

Efficient Dataset Generation for Machine Learning Halide Perovskite Alloys

MCML Authors

Link to Profile Patrick Rinke

Patrick Rinke

Prof. Dr.

Principal Investigator

Abstract

Lead-based perovskite solar cells have reached high efficiencies, but toxicity and lack of stability hinder their wide-scale adoption. These issues have been partially addressed through compositional engineering of perovskite materials, but the vast complexity of the perovskite materials space poses a significant obstacle to exploration. We previously demonstrated how machine learning (ML) can accelerate property predictions for the CsPb⁢(Cl/Br)3 perovskite alloy. However, the substantial computational demand of density functional theory (DFT) calculations required for model training prevents applications to more complex materials. Here, we introduce a data-efficient scheme to facilitate model training, validated initially on CsPb⁢(Cl/Br)3 data and extended to the ternary alloy CsSn⁢(Cl/Br/I)3. Our approach employs clustering to construct a compact yet diverse initial dataset of atomic structures. We then apply a two-stage active learning approach to first improve the reliability of the ML-based structure relaxations and then refine accuracy near equilibrium structures. Tests for CsPb⁢(Cl/Br)3 demonstrate that our scheme reduces the number of required DFT calculations during the different parts of our proposed model training method by up to 20% and 50%. The fitted model for CsSn⁢(Cl/Br/I)3 is robust and highly accurate, evidenced by the convergence of all ML-based structure relaxations in our tests and an average relaxation error of only 0.5 meV/atom.

article


Physical Review Materials

9.053802. May. 2025.
Top Journal

Authors

H. Homm • J. Laakso • P. Rinke

Links

DOI

Research Area

 C3 | Physics and Geo Sciences

BibTeXKey: HLR25a

Back to Top