|A story of both a wrongful conviction and scientific fraud
We’ve talked about many of the ways police investigations can go wrong, including mistaken eyewitness identifications, memory errors, and false confessions. Often, when people imagine police investigations running afoul, they imagine egregious cases in which police plant evidence or physically torture suspects to get them to produce confessions they know are false. Although situations like that do occur, mistakes in investigations require no intentional wrongdoing. A detective doesn’t need to be trying to get a false confession, for instance, in order to get one (as our guest writer Fabi Alceste has written about). Errors happen often without the investigators realizing anything has gone wrong. Similarly, when people imagine bad scientific research happening, they often imagine scientists fabricating data or committing outright fraud. Scientific fraud is a problem, but it’s quite rare. However, there are many questionable research practices (sometimes shortened to “QRPs”) that can make science go wrong, often for fundamentally the same reasons police investigations can go wrong.
The confirmation bias refers to the general tendency for people to look for or interpret evidence in a way that suits their expectations1. That is, people often ignore evidence that contradicts what they believe and interpret ambiguous evidence as support for their beliefs. This bias can be problematic, for example when forensic examiners are testing physical evidence. For example, imagine a fingerprint examiner has been led to believe that the prints they are looking at came from the same person – perhaps by learning that the suspect confessed to the crime. This examiner is more likely to conclude the prints are a match, even if it is actually quite difficult to tell2. Expectations that a suspect is guilty can also lead interrogators to question suspects more aggressively3.
Just like police and forensic examiners, scientists can fall prey to the confirmation bias. Strong belief in a theory can lead us to interpret ambiguous data in a way that supports our predictions. This can happen, in part, because there are lots of ways to analyze data, and there is rarely (if ever) a single best way. When researchers use statistical analyses to test whether there are differences between groups in a study (for example, comparing the memory accuracy of witnesses who had a clear view of a perpetrator and those who did not), there are many statistical techniques to choose from, and in any given study there are often several possible comparisons to make. This means researchers have to make judgment calls about which analyses to use, and readers have to take the decision in mind when deciding which results to trust. If some statistics seem to say that our predictions were wrong and some that our predictions were right, we might (mistakenly) trust the ones that say we were right. It’s shockingly easy for us to do this without even realizing that we are ignoring signs that we’re wrong.
In many statistical techniques, we look at numbers called p-values in order to make decisions about whether a test supports our predictions. How we get a p-value is the result of judgment calls about which data to analyze and which analysis method to use – and these choices can introduce bias into the analysis. In short, researchers can make small choices that make it more likely you’ll find the results you’re looking for. When we abuse statistics in ways that might give us results that we like – regardless of our intentions – it’s called p-hacking4. Several years ago, researchers Simmons, Nelson, and Simonsohn demonstrated how, with a little p-hacking, it was possible to present nearly any data as if it supported your predictions. The trouble is, because researchers often don’t report all the statistical analyses they performed before getting to the ones that seem to support their theories, it can be very hard to tell when research has been p-hacked. Imagine if your friend bragged about predicting the winner of a baseball game, without telling you that they tried predicting 10 other games and got them all wrong. Such selective reporting of results paints an incomplete picture of a study’s results. Just like false confessions can be chock full of incredibly compelling details that make it seem like the only way it could have been produced is if the suspect committed the crime, a p-hacked paper can look like a great piece of trustworthy science.
|Three of the Central Park Five: Raymond Santana, Yusef Salaam,
and Kevin Richardson
Another related QRP occurs when investigators – whether they’re working in science or the legal system – change their predictions to fit the evidence. You might be thinking, wait, isn’t that what we’re supposed to do? Shouldn’t we change what we think to fit the facts? Yes, we should – but problems can happen when we don’t admit that we were probably wrong and instead twist our interpretations of the facts so it appears we were right all along. In the legal system, we see this sometimes when police or prosecutors concoct unlikely theories to explain evidence that exonerates the defendant. For example, when the Central Park Five – five teenagers who were wrongfully convicted of a rape and attempted murder they didn’t commit – were exonerated, the NYPD offered an improbable, complicated theory rather than admit they were wrong the first time. Matias Reyes, a serial rapist whose DNA matched samples from the victim’s body, had admitted to being the sole perpetrator of the crime. The NYPD speculated that the boys indeed committed the assault while Reyes waited nearby until the five teenagers had left the scene before committing his own rape. This is very unlikely, given that the physical evidence did not indicate more than one perpetrator and there was no credible evidence linking the five boys to the scene*. Tortured reasoning by prosecutors and investigators has appeared in other cases as well.
In a similar fashion, scientists sometimes inappropriately benefit from hindsight and act as if they had predicted unexpected results. We call this hypothesizing after the results are known – or HARKing, for short5. HARKing can be a serious problem because often results that seem as though they provide statistical support for some conclusion are actually just flukes. Sometimes, when we find an unexpected result, it is a real surprising discovery. Other times, however, it’s just a statistical accident – a random error that looks like something real. If a researcher HARKs and pretends as if they had predicted an unexpected result, they run the risk of first mistaking a fluke for something true, then acting as if they had predicted it in advance. Saying they had predicted it ahead of time makes the conclusions seem all the more credible – even if it turns out to have been a false positive. This additional apparent credibility can make false claims stick around in the research literature for a long time before being corrected with later research. Additionally, the scientific method is built on testing hypotheses – taking what we already know, applying to a new situation and making a prediction about what we’ll find. The better our ability to make accurate predictions based on our existing knowledge, the stronger the science is. HARKing, however, violates this feature of science. It makes science seem stronger than it really is.
Yet another problem that can undermine scientific conclusions occurs when we don’t see all the data and evidence that has actually been found. In the legal system, sometimes prosecutors fail to disclosure exonerating evidence, and this influences the outcomes of cases. For instance, Joe Buffey was wrongfully convicted of rape and robbery after prosecutors withheld the fact that DNA from the victim’s body did not match him. Without knowing that there was evidence that could have led to his acquittal, Buffey pled guilty to avoid a longer prison sentence at the advice of his attorney. In the United States, it is the prosecutor’s responsibility to inform the defendant of relevant evidence before trial, but it is the prosecutors themselves who decide what is relevant. Suppression of exonerating evidence doesn’t have to be the result of intentional misconduct. Perhaps otherwise well-intentioned prosecutors succumb to the confirmation bias and do not realize how important some pieces of evidence are. Nevertheless, not being able to see all the evidence in a case can distort people’s conclusions and decisions in obvious ways.
|Are the records complete?
In science, not reporting all data or studies is usually less dramatic, but it is quite problematic when results that conflict with theories are never published6. This can happen for a variety of reasons. First, sometimes researchers run many studies to test new procedures and new ideas, but they only stick with (and only report) the methods that appear to work. It’s easy to explain away the failures when trying something new (“Perhaps we’re not measuring the variables correctly…” “Perhaps the experimental manipulation isn’t strong enough…”), so researchers can sometimes mislead themselves into thinking they are doing the right thing by ignoring their failures until they get results that seem to fit their expectations (which may end up just having been a fluke. But they really should have been learning from the failures that their predictions may simply have been wrong**. Second, even when researchers want to publish the results of their “failed” studies, academic journals often don’t want to publish them. More specifically, the other researchers who read and critique the papers as part of the peer review process might be more skeptical of the failure (again, in part because of the confirmation bias) than they would of an apparent success. This means that many studies that provide evidence against theories may never have been published. This publication bias leaves mostly positive results in the available research literature and can distort people’s view of how viable a scientific idea is.
Thankfully, there are ways of improving both police investigations and scientific investigations to protect ourselves from these pitfalls. In previous posts and chats, we’ve talked about many possible criminal justice reforms that may be useful. Recently, there have been major efforts by many researchers – part of the “open science” movement – to improve transparency in science, to ensure that more scientific evidence is available for closer scrutiny. This movement encourages researchers, for example, to make their data and research materials available for others to scrutinize, to publicly register their predictions and analysis plans in advance (to avoid p-hacking and HARKing), and to publicize their “failures” (to mitigate publication bias). The open science movement and criminal justice reform both have huge tasks ahead of them, but if we are serious about getting to the truth – whether it’s solving a criminal case or making scientific discoveries – it’s well worth the effort.
This post was written by Timothy Luke/edited by Will Crozier
* The NYPD notes that there was physical evidence on the five boys. For example, there was hair found on them that was said to be consistent with the victim’s (but hair evidence may have been compromised and was later reanalyzed and found to be inconclusive). There was semen on some of their clothes (but they were a bunch of adolescent boys, so this might not be so surprising). By the NYPD’s own admission, none of the evidence on them decisively links them to the crime. It’s possible that viewing this ambiguous evidence as incriminating is another instance of the confirmation bias.
** There are proper ways to test out new methods (often called “pilot testing”), and frequently this kind of work is essential to the development of new scientific tools. Proper pilot testing is a complicated issue, however. When done correctly, it is very useful, but it can easily go wrong when researchers draw improper conclusions from their pilot tests.
 Nickerson, R. S. (1998). Confirmation Bias: A Ubiquitous Phenomenon in Many Guises. Review of General Psychology, 2(2), 175-220.
 Kassin, S. M., Dror, I. E., & Kukucka, J. (2013). The forensic confirmation bias: Problems, perspectives, and proposed solutions. Journal of Applied Research in Memory and Cognition, 2(1), 42-52.
 Kassin, S. M., Goldstein, C. C., & Savitsky, K. (2003). Behavioral Confirmation in the Interrogation Room: On the Dangers of Presuming Guilt. Law and Human Behavior, 2(27), 187-203.
 Simmons, J. P., Nelson, L. D., & Simonsohn, U. (2011). False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychological Science, 22(11), 1359-1366.
 Kerr, N. L. (1998). HARKing: Hypothesizing after the results are known. Personality and Social Psychology Review, 2(3), 196-217.
 Rosenthal, R. (1979). The “File Drawer Problem” and Tolerance for Null Results. Psychological Bulletin, 86(3), 638-641.