OpenAI unveils new GPT-4o model that's only slightly terrifying

OpenAI, the forerunner of this marginally terrifying Skynet-type world we’re currently living in, just dropped a new flagship AI model going by the name ‘GPT-4o’. The “o” stands for ‘omni’ and understandably so, considering the model’s ability to understand input from text, audio and imagery — while regurgitating replies in equally impressive ways. Did we mention that it was free?

In a blog post detailing GPT-4o, OpenAI says it will roll out “iteratively” over the coming weeks, though its text and image capabilities have already hit ChatGPT*. Users on the free tier will have no trouble using GPT-4o, though those with a Plus subscription get “up to 5x higher message limits.” The upgraded GPT-40-fied ‘Voice Mode’ will first debut in alpha within ChatGPT Plus soon, though no exact date is mentioned.

Where is Scarlett Johansson when you need Her?

AI, particularly ChatGPT, being able to understand its user’s voice isn’t anything new. It was elementary at best, mimicking a true understanding of your voice without picking up intricacies such as tone, background noises or multiple speakers. The model’s responses amplified that fact without any sort of emotion, laughter or singing.

But with the power of GPT-40, the new Voice Mode turns the AI into more of an assistant, capable of responding in almost real-time with a speed of 232ms, or an average of 320ms. That’s a massive jump from GPT-3.5 and GPT-4’s average response times of 2.5 and 5.4 seconds each. It’s also more understanding of intent, like picking up on a range of emotions or being interrupted with new questions mid-response.

GPT-4o is seeing new things

ChatGPT’s vision gets a boost as well. It’ll more quickly identify whatever you’re trying to show it, whether it be through your device’s camera or just by showing it something on your desktop. OpenAI demoed the enhanced feature set on stage, asking ChatGPT to understand what it was being shown (a maths equation) and not simply blurt out the answer, but instead to help the user reach it himself. It went off (mostly) without a hitch.