OpenAI, the forerunner of this marginally terrifying Skynet-type world we’re currently living in, just dropped a new flagship AI model going by the name ‘GPT-4o’. The “o” stands for ‘omni’ and understandably so, considering the model’s ability to understand input from text, audio and imagery — while regurgitating replies in equally impressive ways. Did we mention that it was free?
In a blog post detailing GPT-4o, OpenAI says it will roll out “iteratively” over the coming weeks, though its text and image capabilities have already hit ChatGPT*. Users on the free tier will have no trouble using GPT-4o, though those with a Plus subscription get “up to 5x higher message limits.” The upgraded GPT-40-fied ‘Voice Mode’ will first debut in alpha within ChatGPT Plus soon, though no exact date is mentioned.
Where is Scarlett Johansson when you need Her?
AI, particularly ChatGPT, being able to understand its user’s voice isn’t anything new. It was elementary at best, mimicking a true understanding of your voice without picking up intricacies such as tone, background noises or multiple speakers. The model’s responses amplified that fact without any sort of emotion, laughter or singing.
But with the power of GPT-40, the new Voice Mode turns the AI into more of an assistant, capable of responding in almost real-time with a speed of 232ms, or an average of 320ms. That’s a massive jump from GPT-3.5 and GPT-4’s average response times of 2.5 and 5.4 seconds each. It’s also more understanding of intent, like picking up on a range of emotions or being interrupted with new questions mid-response.
GPT-4o is seeing new things
ChatGPT’s vision gets a boost as well. It’ll more quickly identify whatever you’re trying to show it, whether it be through your device’s camera or just by showing it something on your desktop. OpenAI demoed the enhanced feature set on stage, asking ChatGPT to understand what it was being shown (a maths equation) and not simply blurt out the answer, but instead to help the user reach it himself. It went off (mostly) without a hitch.
Read More: What to expect from the next generation of chatbots: OpenAI’s GPT-5 and Meta’s Llama-3
OpenAI is also bettering itself in terms of languages. OpenAI claims ChatGPT is proficient in around 50 languages or so, expanding the model’s presence globally in a big way.
As OpenAI CTO Mira Murati puts it: “We know that these models are getting more and more complex, but we want the experience of interaction to actually become more natural, easy, and for you not to focus on the UI at all, but just focus on the collaboration with ChatGPT,” she said in the GPT-4o live stream. “…This is the first time that we are really making a huge step forward when it comes to the ease of use.”
This is only the beginning for GPT-4o, too. OpenAI will continually update the service to reflect more well-thought responses, and hopefully cut down on those few ‘hallucinations’ you can still see plaguing ChatGPT during the company’s live demos.
*We weren’t able yet to access GPT-40 on ChatGPT via Android or through the service’s web app, but we’re seeing reports of other users experiencing the same issue, while others are already using GPT-4o. Be patient. It’ll be along soon.