We might soon see AI step up to the next level, with impending upgrades to artificial intelligence (AI) systems developed by OpenAI and Meta. OpenAI’s GPT-5 will be the new “engine” within the AI chatbot ChatGPT, while Meta’s upgrade will be named Llama 3. Among other things, the current version of Llama powers chatbots on Meta’s social media platforms.

Statements to the media by executives at both OpenAI and Meta suggest that some ability to plan ahead will be incorporated into these upgraded systems. But how exactly will this innovation change the capabilities of AI chatbots?

Imagine you are driving from home to work and want to select the best route – that is, the sequence of choices that is optimal in some sense, based on cost or timing, for example. An AI system would be perfectly capable of choosing the better of two existing routes. But it would be a far more difficult task for it to generate the optimal route from scratch.

A route ultimately consists of a sequence of different choices. However, making individual decisions in isolation is not likely to lead to an optimal overall solution.

For instance, sometimes you have to make a little sacrifice at the start, to reap some benefit later on: maybe joining a slow queue to enter the motorway, in order to move faster later on. This is the essence of a planning problem, a classic topic in artificial intelligence.

Motorway. — Generating an optimal driving route from all possible ones is still beyond current AI systems. Jevanto Productions / Shutterstock

There are parallels here with board games such as Go: the outcome of a match depends on the overall sequence of moves, and some moves are aimed at unlocking opportunities that can be exploited later on.

The AI company Google DeepMind developed a powerful AI to play this game called AlphaGo, based on an innovative approach to planning. It was not only able to explore a tree of available options, but also to improve on that ability with experience.

Of course, the real point is not about finding optimal routes for driving or playing games. The technology that powers products such as ChatGPT and Llama 3 are called Large Language Models (LLMs). What is at stake here is providing these AI systems with the ability to consider the long term consequences of their actions. This skill is also necessary to solve mathematical problems, so it potentially unlocks other capabilities for LLMs.

Large language models are designed to predict the next word in a given sequence of words. But in practice, they are used to predict long series of words, such as the answers to questions from human users.

This is currently done by adding one word to the answer, then another word and so on, thereby extending the initial sequence. This is known in the jargon as “autoregressive” prediction. However, LLMs can sometimes paint themselves into corners that are impossible to get out of.

Expected development

An important goal for LLM designers has been to combine planning with deep neural networks, the type of algorithms – or set of rules – that sit behind the models. Deep neural networks were originally inspired by the nervous system. They can improve at what they do through a process called training, where they are exposed to large sets of data.

The wait for LLMs that can plan might be over, according to the comments by OpenAI and Meta executives. However, this comes as no surprise to AI researchers, who have been expecting such a development for some time.

Late last year, OpenAI’s CEO Sam Altman was fired and then rehired by the company. At the time, the drama was rumoured to have involved the company’s development of an advanced algorithm called Q*, although this explanation has since been superseded. Although it’s not clear what Q* does, at the time, the name rang bells with AI researchers because it echoed names for existing methods for planning.

Commenting on those rumours, Meta’s head of AI, Yann LeCun, wrote on X (formerly Twitter that replacing the process of auto regression with planning in LLMs was challenging, but that almost every top lab was working on it. He also thought it was likely that Q* was OpenAI’s attempt to incorporate planning into its LLMs.

Please ignore the deluge of complete nonsense about Q*.
One of the main challenges to improve LLM reliability is to replace Auto-Regressive token prediction with planning.

Pretty much every top lab (FAIR, DeepMind, OpenAI etc) is working on that and some have already published…

— Yann LeCun (@ylecun) November 24, 2023

LeCun was onto something in what he said about the top labs, because recently, Google DeepMind published a patent application that hinted at planning capabilities.

Intriguingly, the listed inventors were members of the AlphaGo team. The method described in the application looks much like the one that guides AlphaGo towards its goals. It would also be compatible with the current neural network architectures used by large language models.