Artificial intelligence (AI) has now closely matched or even surpassed humans in what were previously considered unattainable areas. These include chess, arcade games, Go, self-driving cars, protein folding and much more. This rapid technological progress has also had a huge impact on the financial services industry. More and more CEOs in the sector declare (explicitly or implicitly) that they run “technology companies with a banking license”.

There is also a rapid emergence and growth of the financial technology industry (fintech), where technology startups increasingly challenge established financial institutions in areas such as retail banking, pensions or personal investments. As such, AI often appears in behind-the-scenes processes such as cybersecurity, anti-money laundering, know-your-client checks or chatbots.

Among so many successful cases, one seems conspicuously absent: AI making money in financial markets. While simple algorithms are commonly used by traders, machine learning or AI algorithms are far less usual in investment decision-making. But as machine learning is based on analysing huge data sets and finding patterns in them, and financial markets generating enormous amounts of data, it would seem an obvious match. In a new study, published in the International Journal of Data Science and Analytics, we have shed some light on whether AI is any better than humans at making money.

Some specialist investment companies called quant (which stands for ‘quantative’) hedge funds declare that they employ AI in their investment decision-making process. However, they do not release official performance information. Also, despite some of them managing billions of dollars, they remain niche and small relative to the size of the larger investment industry.

On the other hand, academic research has repeatedly reported highly accurate financial forecasts based on machine-learning algorithms. These could in theory translate into highly successful mainstream investment strategies for the financial industry. And yet, that doens’t seem to be happening.

What is the reason for this discrepancy? Is it entrenched manager culture, or is it something related to practicalities of real-world investing?

AI’s financial forecasts

We analysed 27 peer-reviewed studies by academic researchers published between 2000 and 2018. These describe different kinds of stock market forecasting experiments using machine-learning algorithms. We wanted to determine whether these forecasting techniques could be replicated in the real world.

Our immediate observation was that most of the experiments ran multiple versions (in extreme cases, up to hundreds) of their investment model in parallel. In almost all the cases, the authors presented their highest-performing model as the primary product of their experiment – meaning the best result was cherry-picked and all the sub-optimal results were ignored. This approach would not work in real-world investment management, where any given strategy can be executed only once, and its result is unambiguous profit or loss – there is no undoing of results.

Running multiple variants, and then presenting the most successful one as representative, would be misleading in the finance sector and possibly regarded as illegal. For example, if we run three variants of the same strategy, with one losing -40%, the other one losing -20%, and the third one gaining 20%, and then only showcase the 20% gain, clearly this single result misrepresents the performance of the fund. Just one version of an algorithm should be tested, which would be representative of a real-world investment setup and therefore more realistic.

Models in the papers we reviewed achieved a very high level of accuracy, about 95% – a mark of tremendous success in many areas of life. But in market forecasting, if an algorithm is wrong 5% of the time, it could still be a real problem. It may be catastrophically wrong rather than marginally wrong – not only wiping out the profit, but the entire underlying capital.

We also noted that most AI algorithms appeared to be “black boxes”, with no transparency on how they worked. In the real world, this isn’t likely to inspire investors’ confidence. It is also likely to be an issue from a regulatory perspective. What’s more, most experiments did not account for trading costs. Though these have been decreasing for years, they’re not zero, and could make the difference between profit and loss.

None of the experiments we looked at gave any consideration to current financial regulations, such as the EU legal directive MIFID II or business ethics. The experiments themselves did not engage in any unethical activities – they did not seek to manipulate the market – but they lacked a design feature explicitly ensuring that they were ethical. In our view, machine learning and AI algorithms in investment decision-making should observe two sets of ethical standards: making the AI ethical per se, and making investment decision-making ethical, factoring in environmental, social and governance considerations. This would stop the AI from investing in companies that may harm society, for example.

All this means that the AIs described in the academic experiments were unfeasible in the real world of financial industry.

Are humans better?

We also wanted to compare the AI’s achievements with those of human investment professionals. If AI could invest as well as or better than humans, then that could herald a huge reduction in jobs.

We discovered that the handful of AI-powered funds whose performance data were disclosed on publicly available market data sources generally underperformed in the market. As such, we concluded that there is currently a very strong case in favour of human analysts and managers. Despite all their imperfections, empirical evidence strongly suggests humans are currently ahead of AI. This may be partly because of the efficient mental shortcuts humans take when we have to make rapid decisions under uncertainty.

In the future, this may change, but we still need evidence before switching to AI. And in the immediate future, we believe that, instead of pinning humans against AI, we should combine the two. This would mean embedding AI in decision-support and analytical tools, but leaving the ultimate investment decision to a human team.

Barbara Jacquelyn Sahakian is a Professor of Clinical Neuropsychology, University of Cambridge
Fabio Cuzzolin is a Professor of Artificial Intelligence, Oxford Brookes University
Wojtek Buczynski is a PhD candidate/consultant, University of Cambridge

This article first appeared on The Conversation