Interaction of Error Rate and Likelihood of Guilt
In the paper “How the Probability of a False Positive Affects the Value of DNA Evidence” (Prof. William C. Thompson, Chair, Department of Criminology, Law & Society, University of California, Irvine and others) it is noted that courts require the random match probability (RMP) of DNA profiles when DNA evidence is used, but the frequency of false positives (lab errors) is neither known nor required. This is of concern because when the RMP is very low, false positives would be the significant source of errors. The paper also shows that the importance of the error rate is not the same across all cases.
“Particularly in cases in which there is little other evidence against the subject, ignorance of the true probability of error creates a disturbing element of uncertainty about the value of DNA evidence”
This is similar to the point I have been making about the McKie case. I thought it would be interesting to see how varying the error rate altered the likelihood of guilt in different situations. The paper of course gives a Bayesian proof but I am more comfortable with a mechanistic/frequency view so I did an imaginary exercise using fingerprints.:
-=-=-=-=
Consider 3 fingerprint cases, in each a house is robbed. People in the neighbourhood provide their fingerprints to help with the investigation and one of them is identified from a fingerprint in the house. This person denies ever having been in the house.
Case 1 - The police call on the identified person at home and find the stolen goods there. The suspect refuses to explain how the goods got into his possession.
Case 2 – No other evidence is found to link the identified person to the crime.
Case 3 – The identified person can prove that at the time of the robbery he was thousands of miles away attending a conference of forensic statisticians.
We need to make some simplifications and constants to make the principle clear. All crime scenes contain 100 identifiable latent prints. When someone enters a house without the knowledge of the householder they will deposit one identifiable latent print and we will have that person’s prints on file. This person will deny having been in the location, as will someone who has been misidentified.
Let’s try the exercise with 3 after-verification error rates, (a) zero, (b) two errors per million identifiable latent prints and (c) two errors per 10,000 identifiable latent prints. Half the people misidentified will be charged with the crime under investigation but for the other half this is not feasible. The question is - what is the likelihood that the identified person is lying about visiting the house? For cases 1 and 2 this would be in connection with the robbery, for case 3 this would be for some unspecified reason. Case 3 cannot be answered without more information.
Additional Information for Case 3
A survey has recently been published. It found that in one house in every 10,000, a neighbour creeps in unseen for no obvious reason, leaves no trace except a fingerprint and will lie about it (this was a very difficult research project).
-=-=-=-=
MY ANSWERS
For case 1, I think that no matter what the error rate is, it is certain or nearly certain that the identified person has visited the house. The fingerprint ID led the police to his home but after the goods were found we have additional evidence that he is implicated in the crime. Even if the error rate was very high, finding the goods suggests that this identification is not one of the errors. A statistician could give a Bayesian explanation but a normal person could figure this out for themselves.
In case 2 with an error rate of zero he is lying, infallible is infallible. For the other error rates, the crime scene contains 100 latents so the chances of an error occurring that could incriminate in the robbery are (b) 1 in 1,000,000 times 100 = 1 in 10,000 and (c) 1 in 10,000 times 100 = 1 in 100. This will have to be compared with the chances of identifying the perpetrator of the crime. We have simplified this to certainty by saying that he will leave a print and we have his prints on file. The odds that an identified person is telling the truth because of misidentification in these circumstances therefore are (b) 1 in 10,000 and (c) 1 in 100 (if it was not certain that we would identify the perpetrator the error rate at this point would rise because the number of misidentifications would stay the same).
In case 3 the identified person cannot be accused of the robbery, so what are the chances that this person is trying to conceal an alternative act of wrongdoing? The starting point is the knowledge that one house out of every 10,000 has had an uninvited visitor for reasons unknown. So in 10,000 crime scenes one latent print will have been deposited by this type of visitor. The same 10,000 crime scenes contain 1,000,000 latent prints and where (b) 1 or (c) 100 misidentifications have occurred that cannot be connected with the crimes. So the odds that an identified person is telling the truth when they deny depositing a print in these circumstances are (b) 1 in 2 (evens) and (c) 100 in 101.
So for the three cases and three error rates the chances of guilt (lying about entering the house) are:
Case 1: (a) guilty, (b) guilty, (c) guilty
Case 2: (a) guilty, (b) 1 in 10,000 of innocence, (c) 1 in 100 of innocence
Case 3: (a) guilty, (b) evens, (c) 100 in 101 of innocence
The trend can be seen but the absolute values have no meaning. I just chose the input values to make the calculations easy. I see a scale which starts at case 1, goes through case 2 and ends with case 3 (and Shirley McKie). That scale is the strength of all the other evidence in the case (the probability of guilt prior to considering the effect of the fingerprint evidence). If there is other evidence against the accused the fingerprint match adds another element but whether the error rate is low or very low does not have a big impact on overall likelihood of guilt. As we move down the scale, knowing the approximate error rate becomes more important.
I am not sure where database searching fits into this picture. It would seem that there would be an increased risk of error because the database will contain a number of non-matches that are very similar to the real match. But if the closest match from 50 million records hits on someone with a very close connection to the crime, that must be significant.
If someone denies depositing a fingerprint and they cannot be linked to the crime and there is no other evidence to suggest that another act of wrongdoing has occurred, this should be treated as a very special case. It is essential that both the error rate and the frequency of the proposed act of wrongdoing are known, or at least the approximate ratio of the two. To assume dishonesty without knowing this would be dangerous and irresponsible (a discrete but thorough investigation might be appropriate). It might seem that an independent out-of-department verification would settle the matter but the Cowans, Mayfield and McKie cases all show that independent verification can not be trusted to catch every, or indeed any, misidentification.
The point about the survey of secret house-creepers being a difficult research project is not a flippant remark. For case 3 we need to know how often people enter places and leave no trace or any reason to believe that someone has been there - and lie about it. Did the research team employ a clairvoyant? I am wondering if a similar difficulty might apply to other cases where expert testimony is the only evidence that a crime has occurred. The court needs to know how often there is an innocent explanation for the observed facts compared with a criminal explanation. The hypothesis that certain observed facts can only be explained by criminal action or dishonesty is untestable.
Paper:
http://www.bioforensics.com/conference/Examiner%20Bias/JFS%20False%20Pos.pdf
Back to previous document