The Painted Ping Pong Ball Problem

 

Imagine several cartons each containing thousands of ping pong balls are emptied on to the floor.  The balls should all be white but sometimes a coloured painted ball is included by mistake. The manufacturer takes every care to avoid this and they are very rare but with such a large quantity of balls on the floor we cannot rule out the possibility that one or two coloured ones are among them.

 

As long as the manufacturer’s error rate is low, if we close our eyes and pick a few balls from the floor (random selection) it is extremely unlikely that a coloured ball would be among those picked. But with our eyes open we could pick a coloured ball because it has drawn our attention.  Statisticians call this self-selection, the item is selected because it possess the property we are interested in.

 

Someone enters the room without being seen and maliciously paints some of the ping pong balls. We must find these balls and we must avoid the factory-painted ones. Unfortunately they are indistinguishable from one another but when a ball is painted some paint is nearly always left on the floor nearby.

 

How can we tell if a coloured ball was painted in the room or was an error? The answer is to look at the context. If there is paint on the floor nearby, a coloured ball is extremely unlikely to be there due to an error. We don’t need to know the manufacture’s error rate exactly to have this confidence, we only need to know that it is low. Balls selected for their proximity to paint on the floor is random selection with respect to the colour of the ball, like picking with our eyes closed. We can then go on to select a ball by colour from within this small group with very little risk of it being an error.

 

What if we notice a coloured ball and there is no paint on the floor nearby?. This ball has self-selected because of its colour out of the large population of balls on the floor.  The odds that it was painted in the room will be the ratio of the number of balls painted in the room where no paint was dropped (we have said that this is very unusual) to the number of painted balls supplied by the manufacturer (errors are also very unusual). Unless we know these two numbers precisely and one of them is much bigger than the other, uncertainty is the only rational conclusion. It would be wrong to say that because the manufacturer’s error rate is low it must have been painted in the room and it would be wrong to say that because there is no paint nearby it must be an error. Both of these are errors of logic.

 

The analogy in Bayesian terms:

What are the chances that a ball has been maliciously painted in the room? Paint on the floor gives every ball nearby a high prior probability. After we consider the colour of the ball - where there is a chance that a coloured ball might be an error - the posterior probability will be very high for a coloured ball near paint on the floor but not high otherwise.

 

To connect the analogy to fingerprinting and the McKie case:

All the balls are verified fingerprint identifications. A coloured ball is an identification that is denied by the subject and the ones painted in the room are dishonest denials.

 

Most balls are white and most identifications are not denied (eliminations).

 

Painting a ball in the room is the crime and paint on the floor is a sign of it. Where there is no paint on the floor it is unlikely that a ball was painted nearby and if we are unaware that wrongdoing has occurred in a location it probably hasn’t.

 

Balls chosen because they are close to paint on the floor are a small group that has been randomly selected with regard to the colour of the ball.  Latent marks in a crime scene are a small group that has been chosen because they are in a location that the criminal has attended, this is random selection with regard to whether the identification will later be denied. The initial random selection of a small group makes the subsequent self-selection (by colour or a denial) safe.

 

Paint on the floor is a fact and it provides additional evidence independent of ball colour.  A crime is a fact, somebody was in the location who has a reason to lie and this is additional evidence independent of a denied fingerprint identification.

 

The Shirley McKie’s case is like noticing a coloured ball near paint on the floor (the Ross murder) but the colours are different. As a result of McKie’s identification latent mark Y7 was excluded from the Ross murder inquiry so like all other elimination identifications it has no incriminating connection with a known crime. The identification self-selected from this very large population because of the denial and originated the idea that wrongdoing had occurred (lying about entering the murder house and perjury) but there is no other evidence of this (no paint of the right colour nearby).

 

Any erroneous identification anywhere that does not incriminate in the crime under investigation would create a Shirley McKie-type case if it was subsequently handled in the same way. Any police officer anywhere who did what Shirley McKie was accused of doing, did it unobserved, left no trace except a fingerprint and then lied about it would also create a McKie-type case. Either case would be highly irregular because our attention would have been drawn to it because of a denied fingerprint identification rather than a crime. The identification self-selected from an unlimited population so it would be wrong to say that the identified person must be lying because fingerprint errors are very rare (prosecutor’s fallacy) and it would be wrong to say that the identified person must be innocent because there is no other evidence of a crime (defence attorney's fallacy) #. Very little is known about the frequencies and probabilities needed to make a judgement in the McKie case so from the circumstances of how the case originated I think that uncertainty is, and always was, the only rational and safe conclusion*.

 

In early 1997 there was no reason to question the competence of the SCRO but the context of the identification was peculiar, indicating an abnormally high risk of misidentification. Investigators would have been aware of this if they understood how selection affect certainties or if they adopted a Bayesian approach.  I think that all forensic evidence should be evaluated taking the context into account and with the knowledge that errors are possible. If investigators understood the principles outlined here they would have concluded that there was no good reason to believe that Shirley McKie was dishonest.

 

This considers the effects of errors where the victim is selected at random, such as a failure to individualize leading to a random match. Bias is likely to have non-random effects. The separate issue of the causes of errors is not addressed here. Whether a competent fingerprint identification has an error rate of zero is another separate matter.

 

Steve Horn.

West Lothian

sz@hornsc.clara.co.uk

Computer programmer working in the field of statistics for industry.  Background in engineering (electronics) and quality assurance.

 

# Thompson, W.C. & Schumann, E.L. Interpretation of statistical evidence in criminal trials: The prosecutor's fallacy and the defence attorney's fallacy. 1987, 11, 167-187.

 

* “Particularly in cases in which there is little other evidence against the suspect, ignorance of the true probability of error creates a disturbing element of uncertainty about the value of the (DNA) evidence.”  Thompson, W.C., Taroni, F. & Aitken, C.G.G. (2003). How the probability of a false positive affects the value of DNA evidence. Journal of Forensic Sciences, 48(1), 47-54. http://www.bioforensics.com/conference/Examiner%20Bias/JFS%20False%20Pos.pdf

 

Back to previous document