Data imbalance in the protected attributes can lead to machine learning models that perform better on the majority than on the minority group, giving rise to unfairness issues. While baseline methods like undersampling or SMOTE can balance datasets, we investigate how methods of generative artificial intelligence compare concerning classical fairness metrics. Using generated fake data, we propose different balancing methods and investigate the behavior of classification models in thorough benchmark studies using German credit and Berkeley admission data. While our experiments suggest that such methods may improve fairness metrics, further investigations are necessary to derive clear practical recommendations.
inproceedings
BibTeXKey: RNB24