Divide and Imitate: Multi-cluster Identification and Mitigation of Selection Bias

Katharina Dost; Hamish Duncanson; Ioannis Ziogas; Patricia Riddle; Jörg Wicker

Conference Proceedings

Divide and Imitate: Multi-cluster Identification and Mitigation of Selection Bias

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (2022) 13281 LNAI 149-160

DOI: 10.1007/978-3-031-05936-0_12

1Citations

1Readers

Get full text

Abstract

Machine Learning can help overcome human biases in decision making by focussing on purely logical conclusions based on the training data. If the training data is biased, however, that bias will be transferred to the model and remains undetected as the performance is validated on a test set drawn from the same biased distribution. Existing strategies for selection bias identification and mitigation generally rely on some sort of knowledge of the bias or the ground-truth. An exception is the Imitate algorithm that assumes no knowledge but comes with a strong limitation: It can only model datasets with one normally distributed cluster per class. In this paper, we introduce a novel algorithm, Mimic, which uses Imitate as a building block but relaxes this limitation. By allowing mixtures of multivariate Gaussians, our technique is able to model multi-cluster datasets and provide solutions for a substantially wider set of problems. Experiments confirm that Mimic not only identifies potential biases in multi-cluster datasets which can be corrected early on but also improves classifier performance.

Cite

CITATION STYLE

APA

Dost, K., Duncanson, H., Ziogas, I., Riddle, P., & Wicker, J. (2022). Divide and Imitate: Multi-cluster Identification and Mitigation of Selection Bias. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 13281 LNAI, pp. 149–160). Springer Science and Business Media Deutschland GmbH. https://doi.org/10.1007/978-3-031-05936-0_12

Divide and Imitate: Multi-cluster Identification and Mitigation of Selection Bias

Abstract

Cite

Register to see more suggestions