Evaluating Fairness of Algorithmic Risk Assessment Instruments: The Problem With Forcing Dichotomies
Evaluating Fairness of Algorithmic Risk Assessment Instruments: The Problem With Forcing Dichotomies
Criminal Justice and Behavior, Ahead of Print.
Researchers and stakeholders have developed many definitions to evaluate whether algorithmic pretrial risk assessment instruments are fair in terms of their error and accuracy. Error and accuracy are often operationalized using three sets of indicators: false-positive and false-negative percentages, false-positive and false-negative rates, and positive and negative predictive value. To calculate these indicators, a threshold must be set, and continuous risk scores must be dichotomized. We provide a data-driven examination of these three sets of indicators using data from three studies on the most widely used algorithmic pretrial risk assessment instruments: the Public Safety Assessment, the Virginia Pretrial Risk Assessment Instrument, and the Federal Pretrial Risk Assessment. Overall, our findings highlight how conclusions regarding fairness are affected by the limitations of these indicators. Future work should move toward examining whether there are biases in how the risk assessment scores are used to inform decision-making.
Researchers and stakeholders have developed many definitions to evaluate whether algorithmic pretrial risk assessment instruments are fair in terms of their error and accuracy. Error and accuracy are often operationalized using three sets of indicators: false-positive and false-negative percentages, false-positive and false-negative rates, and positive and negative predictive value. To calculate these indicators, a threshold must be set, and continuous risk scores must be dichotomized. We provide a data-driven examination of these three sets of indicators using data from three studies on the most widely used algorithmic pretrial risk assessment instruments: the Public Safety Assessment, the Virginia Pretrial Risk Assessment Instrument, and the Federal Pretrial Risk Assessment. Overall, our findings highlight how conclusions regarding fairness are affected by the limitations of these indicators. Future work should move toward examining whether there are biases in how the risk assessment scores are used to inform decision-making.
Samantha A. Zottola