close
close

Judges use algorithms to justify doing what they already want

Judges use algorithms to justify doing what they already want

When Northwestern University student Sino Esthappan began researching how algorithms decide who is in prison, he expected a “people vs. technology story.” On the one hand, they would be judges whom Esthappan interviewed at length. On the other side would be risk assessment algorithms, which are used in hundreds of U.S. counties to assess the unsafety of bail for accused criminals. What he found was more complicated and suggests that these tools may be hiding larger problems with the deposit system itself.

Algorithmic risk assessments aim to calculate the risk that a defendant in a criminal case will not return to court or, worse yet, will harm others – if released. By comparing defendants’ backgrounds with an extensive database of past cases, they aim to help judges assess how risky it would be to release a person from prison. Along with other algorithm-based tools, they are playing an increasingly important role in the often overburdened criminal justice system. Theoretically, they are intended to help reduce bias from human judges.

But Estappan’s work, published in the magazine Social problemshave shown that judges neither accept nor reject the advice of these algorithms. Instead, they report using them selectively, motivated by deeply human factors, to accept or ignore their results.

Pretrial risk assessment tools estimate the likelihood that accused criminals will return to trial if released from prison. The tools collect detailed information provided to them by pretrial officers, including criminal histories and family profiles. They compare this information with a database of hundreds of thousands of previous case records and look at how defendants with similar histories behaved. They then provide a rating, which may take the form of a “low”, “medium” or “high” risk label or a number on a scale. Judges are given points to use during pre-trial hearings: short meetings held soon after a defendant is arrested to determine whether (and under what conditions) he will be released.

As with other algorithmic criminal justice tools, proponents see them as neutral, data-driven correctives for human capriciousness and bias. Opponents raise issues such as the risk of racial profiling. “Because many of these tools are based on criminal history, the argument is that criminal history is also racially coded based on law enforcement surveillance practices,” Esthappan says. “So there is already an argument that these tools reproduce biases from the past and encode them into the future.”

It’s also unclear how well they work. 2016 ProPublica investigation found that a risk assessment algorithm used in Broward County, Florida was “extremely unreliable at predicting violent crimes.” Just 20 percent of people predicted by the algorithm committed violent crimes in the next two years after their arrest. The program also marked black defendants more often as future criminals or as higher risk compared to white defendants, ProPublica found.

Both concerns and promises about algorithms in the courtroom assume that judges consistently use them

Still, University of Pennsylvania criminology professor Richard Berk says policymakers can make just as many mistakes. “These criminal justice systems are made up of institutions and people, each of which is imperfect, and it’s no surprise that they don’t do a very good job of identifying or predicting people’s behavior,” Berk says. “So the bar is really quite low and the question is, can algorithms raise it? The answer is yes, if the right information is provided.”

Both concerns and promises about algorithms in the courtroom, however, assume that judges consistently use them. Esthappan’s study shows that this is, at best, a false assumption.

Esthappan interviewed 27 judges from four criminal courts in different regions of the country over one year in 2022–2023, asking questions such as: “When do you find risk assessments more or less useful?” and “How and with whom do you discuss risk assessment during pretrial hearings?” He also analyzed local news and case files, observed 50 hours of bail hearings and interviewed others working in the criminal justice system to help contextualize the findings.

Judges told Esthappan they used algorithmic tools to quickly hear lower-stakes cases, relying on automatic results, even if they were unsure of their validity. Overall, they were wary of using low risk scores for defendants accused of crimes such as sexual assault and intimate partner violence – sometimes because they believed that algorithms underestimated or overestimated various risk factors, but also because was their own reputation. Conversely, some described using systems to explain why they made an unpopular decision, believing that risk assessments added credibility.

“Many judges have used their own moral views on specific allegations as yardsticks in deciding when risk assessments are and are not warranted under the law.”

Interviews revealed recurring patterns in judges’ decisions to use risk assessment scores, often based on defendants’ criminal history or social background. Some judges believed the systems underestimated the importance of certain red flags – such as extensive juvenile records or certain types of gun charges – or overestimated factors such as a previous criminal record or low educational attainment. “Many judges have used their own moral views about specific allegations as a yardstick in deciding when risk assessments are and are not warranted under the law,” Esthappan writes.

Some judges also said they used the results for efficiency reasons. These pretrial hearings are short – often less than five minutes – and require quick decisions based on limited information. The algorithmic score provides at least one more factor to consider.

However, the judges were also fully aware of how the decision would affect them – and according to Estthappan, this was a huge factor in whether they trusted the risk points. When judges perceived that an allegation was less a threat to public safety than the result of poverty or addiction, they often postponed risky judgments, seeing little risk to their own reputations if they were wrong, and viewing their role, as one judge put it, as calling for “balls and strikes” instead of becoming a “social engineer.”

For high-level allegations that carried some moral weight, such as rape or domestic violence, judges said they were more skeptical. This was partly because problems were identified with the way the system weighed information about specific crimes – for example, in intimate partner violence cases, they believed that even defendants without long criminal histories could be dangerous. But they also realized that the stakes – for themselves and others – were higher. “Your worst nightmare is that you let someone out on lower bail and then they go and hurt someone. I mean all of us, when I see these stories on the news, I think it could have been any of us,” said one judge quoted in the study.

Keeping a truly low-risk defendant in prison also comes with costs. It keeps a person who is unlikely to harm anyone away from work, school or family before being convicted of a crime. However, the risk to judges’ reputation is small, and adding a risk score does not change this calculus.

For judges, the deciding factor was often not whether the algorithm seemed trustworthy, but whether it would help them justify the decision they wanted to make. For example, judges who released a defendant because of a low risk score could “shift some of that responsibility from themselves to the outcome,” Esthappan said. If the alleged victim “wants someone to be locked up,” one respondent said, “what do you do when the judge says, ‘We’re guided by a risk assessment that takes into account the likelihood of the defendant showing up and being arrested again.’ Based on the statute and the outcome, my job is to create a bond that protects other members of the community.”

“In practice, risk assessments expand the discretion of judges, who strategically use it to justify criminal sanctions.”

Estthappan’s study sheds holes in the notion that algorithmic tools allow for more fair and consistent decisions. If judges choose when to rely on results based on factors such as reputational risk, notes Esthappan, they may not reduce human-caused bias – in fact, they may legitimizing it’s a bias and makes it hard to see. “While policymakers tout their ability to limit judicial discretion, in practice, risk assessments expand the exercise of that discretion by judges, who strategically use it to justify criminal sanctions,” Esthappan writes in the study.

Megan Stevenson, an economist and criminal justice researcher at the University of Virginia School of Law, says risk assessments are something of a “technocratic toy for policymakers and scientists.” He says this seems an attractive tool to “remove randomness and uncertainty from the process,” but research on their impact shows that they often have little impact on outcomes.

The bigger problem is that judges are forced to work with very limited time and information. Berk, the University of Pennsylvania professor, says collecting more and better information could help algorithms make better judgments. However, this would require time and resources that court systems may not have available.

But when Esthappan spoke to public defenders, an even more fundamental question was raised: Should pretrial detention in its current form even exist? Judges don’t just work with inconsistent data. They determine someone’s freedom before he or she has a chance to fight the charges, often based on predictions that rely largely on guesswork. “In this context, I think it is reasonable for judges to rely on a risk assessment tool because they have such limited information,” says Esthappan Edge. “But then again, I see it as a bit of a distraction.”

Algorithmic tools aim to solve a real problem related to imperfect human decision-making. “I have a question, is this really a problem?” Esthappan says Edge. “Is it about judges acting in a biased way, or is there something more structurally problematic about the way we question people in pretrial hearings?” The answer is “there is a problem that cannot necessarily be solved by risk assessment, but it has a deeper cultural problem in criminal courts.”