AI could revolutionise DNA evidence – but right now we can’t trust the machines

DNA evidence often isn’t as watertight as many people think. Sensitive techniques developed over the past 20 years mean that police can now detect minute traces of DNA at a crime scene or on a piece of evidence. But traces from a perpetrator are often mixed with those from many other people that have been transferred to the sample site, for example via a handshake. And this problem has led to people being wrongly convicted.

Scientists have developed algorithms to separate this DNA soup and to measure the relative amounts of each person’s DNA in a sample. These “probabilsitic genotyping” methods have enabled forensic investigators to indicate how likely it is that an individual’s DNA was included in a mixed sample found at the crime scene.

And now, more sophisticated artificial intelligence (AI) techniques are being developed in an attempt to extract DNA profiles and try to work out whether a DNA sample came directly from someone who was at the crime scene, or whether it had just been innocently transferred.

But if this technology is successful, it could introduce a new problem, because it’s currently impossible to understand exactly how this AI reaches its conclusions. And how can we trust technology to provide vital evidence if we can’t interrogate how it produced that evidence in the first place? It has the potential to open the way to even more miscarriages of justice and so this lack of transparency may be a barrier to the technology’s use in forensic investigations.

Similar challenges emerged when DNA analysis software was first developed a decade ago. Evidence derived from DNA mixture software very quickly ran into challenges from defence teams (including that of OJ Simpson), who were concerned that the prosecution should demonstrate that the software was correctly validated.

How accurate were the results, and what was the known error rate? How exactly did the software work and could it accommodate defence hypotheses? Were the results really so dependable that a jury could safely convict?

It is a fundamental tenet of the law that evidence must be open to scrutiny. The jury cannot rely on bald assertions (claims made without evidence), no matter who makes them and what expertise they have. But the owners of the software argued it was their protected intellectual property and how it worked shouldn’t be made public.

A battle ensued that involved the use of novel court procedures to allow defence teams to privately examine how the software worked. Finally, the courts were persuaded that full access to the source code was needed, not least to test hypotheses other than those put forward by the prosecution.

*AI can predict whether someone was actually at the site of a DNA sample.* *Gorodenkoff/Shutterstock*

But the software hasn’t completely solved the issues of DNA mixtures and small, degraded samples. We still don’t know definitively if the DNA in a sample came directly from a person or was transferred there. This is complicated by the fact that different people shed DNA at different rates – a phenomenon known as their “shedder status”.

For example, a sample taken from a murder weapon could contain more DNA from someone who hasn’t touched it than from the person who actually committed the murder. People have been charged with serious offences because of this.

Add the fact that DNA is transferred at different rates across different surfaces and in different environmental conditions and it may become almost impossible to know exactly where DNA in a sample came from. This problem of “transfer and persistence” threatens to seriously undermine forensic DNA.

As a result, experiments are underway to find ways of more accurately quantifying DNA transfer in different circumstances. And AI has the potential to analyse the data from these experiments and use it to indicate the origin of DNA in a sample.

But AI-based software has an even greater transparency problem than probabilistic genotyping software did, and one that’s currently fundamental to the way it works. The exact way the software works isn’t just a commercial secret – it’s unclear even to the software developers.

Transparency issues

AI uses mathematical algorithms to complete tasks such as matching a facial expression to a particular set of emotions. But, crucially, it is able to learn through a process of trial and error and gradually manipulates its underlying algorithms in order to become more efficient.

It’s this process of manipulation and change that isn’t always transparent. The software makes its changes incredibly rapidly according to its own indecipherable logic. It can derive fantastically efficient results but we can’t say how it did so. It acts like a black box that takes inputs and gives outputs, but whose inner workings are invisible. Programmers can go through a clearer development process but it is slower and less efficient.

This transparency issue affects many broader applications of AI. For example, it makes it very difficult to correct AI systems whose decisions display a racial or gender bias, such those used to sift through employee resumes, or to target police resources.

And the advent of AI-driven DNA analysis will add a further dimension to the problems already encountered. Defence lawyers could rightly challenge the use of this technology, even if its use is limited to intelligence gathering rather than providing prosecution evidence. Unless transparency problems are addressed at an early stage, the obstacles to AI use in the forensic field could prove insurmountable.

How might we go about tackling these challenges? One option may be to opt for the less efficient, constrained forms of AI. But if the purpose of AI is to do the tasks we are less capable of or less willing to do ourselves, then reducing efficiency may be a poor solution. Whichever form of AI we opt to use, within an adversarial system of criminal justice there must be the potential for review, to reverse-engineer all automated decisions, and for third parties to provide unambiguous validation.

Ultimately, this is not merely a technical issue, but an urgent ethical problem that goes to the heart of our criminal justice systems. At stake is the right to a fair, open and transparent trial. This is a fundamental requirement that must be addressed before the headlong rush of technological advancement carries us past the point of no return.