Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial AI products

343Citations
Citations of this article
251Readers
Mendeley users who have this article in their library.
Get full text

Abstract

Although algorithmic auditing has emerged as a key strategy to expose systematic biases embedded in software platforms, we struggle to understand the real-world impact of these audits, as scholarship on the impact of algorithmic audits on increasing algorithmic fairness and transparency in commercial systems is nascent. To analyze the impact of publicly naming and disclosing performance results of biased AI systems, we investigate the commercial impact of Gender Shades, the first algorithmic audit of gender and skin type performance disparities in commercial facial analysis models. This paper 1) outlines the audit design and structured disclosure procedure used in the Gender Shades study, 2) presents new performance metrics from targeted companies IBM, Microsoft and Megvii (Face++) on the Pilot Parliaments Benchmark (PPB) as of August 2018, 3) provides performance results on PPB by non-target companies Amazon and Kairos and, 4) explores differences in company responses as shared through corporate communications that contextualize differences in performance on PPB. Within 7 months of the original audit, we find that all three targets released new API versions. All targets reduced accuracy disparities between males and females and darker and lighter-skinned subgroups, with the most significant update occurring for the darker-skinned female subgroup, that underwent a 17.7% - 30.4% reduction in error between audit periods. Minimizing these disparities led to a 5.72% to 8.3% reduction in overall error on the Pilot Parliaments Benchmark (PPB) for target corporation APIs. The overall performance of non-targets Amazon and Kairos lags significantly behind that of the targets, with error rates of 8.66% and 6.60% overall, and error rates of 31.37% and 22.50% for the darker female subgroup, respectively.

References Powered by Scopus

The FERET evaluation methodology for face-recognition algorithms

4047Citations
N/AReaders
Get full text

A survey of methods for explaining black box models

3092Citations
N/AReaders
Get full text

How the machine ‘thinks’: Understanding opacity in machine learning algorithms

1613Citations
N/AReaders
Get full text

Cited by Powered by Scopus

A Survey on Bias and Fairness in Machine Learning

2411Citations
N/AReaders
Get full text

Closing the AI accountability gap: Defining an end-to-end framework for internal algorithmic auditing

462Citations
N/AReaders
Get full text

Racial disparities in automated speech recognition

450Citations
N/AReaders
Get full text

Register to see more suggestions

Mendeley helps you to discover research relevant for your work.

Already have an account?

Cite

CITATION STYLE

APA

Raji, I. D., & Buolamwini, J. (2019). Actionable auditing: Investigating the impact of publicly naming biased performance results of commercial AI products. In AIES 2019 - Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society (pp. 429–435). Association for Computing Machinery, Inc. https://doi.org/10.1145/3306618.3314244

Readers' Seniority

Tooltip

PhD / Post grad / Masters / Doc 65

63%

Researcher 17

16%

Professor / Associate Prof. 14

13%

Lecturer / Post doc 8

8%

Readers' Discipline

Tooltip

Computer Science 38

44%

Business, Management and Accounting 18

21%

Engineering 16

19%

Social Sciences 14

16%

Article Metrics

Tooltip
Mentions
Blog Mentions: 1
News Mentions: 13

Save time finding and organizing research with Mendeley

Sign up for free