OpenAI recently announced Sora, its new generative AI model designed to create shockingly impressive videos from text prompts. It’s built on Dall-E 3, the company’s image-generation model which itself uses a version of the company’s GPT large language model.
This isn’t the first text-to-video tool to emerge from the generative AI boom but, based on the examples shown, it generates the most realistic videos we’ve seen so far.
While Sora hasn’t received a full release yet, it is already capable of creating “complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background,” while also understanding the context of the user’s prompt and how it might affect the simulated physical world.
It can also provide multiple shots of the generated video from the original prompt while maintaining the visual style and any persistent subjects or characters in the prompt.
Sora still makes mistakes, how long will that last?
When Sora eventually becomes available, it won’t be without limitations. Generated videos will be capped at 60 seconds, at least initially, so we probably won’t see any feature-length Sora-generated films in a hurry.
As with any generative AI model, Sora is still prone to mistakes. OpenAI says Sora struggles to accurately simulate a complex scene’s physics and has trouble with “specific instances of cause and effect.” Adding a bite mark to a cookie someone’s tasted, for example.
Sora is currently only available to OpenAI’s ‘Red Team’. These are folks who look for possible ways it could be abused or exploited, like prompting it with malicious material, to learn how the model reacts so they can make adjustments to prevent the same reaction when it launches — kinda like breaking into your own house to find your security weak points.
OpenAI is also working with visual artists and filmmakers who will hopefully provide constructive feedback to improve the model before a wider release.
There are already examples of short films made with AI-generated content, like Sunspring, which was written by an AI model trained on existing movie scripts. That was released in 2018, a good few years before ChatGPT was introduced — and it shows. It also still required humans to act and shoot the film.
With Sora, it may soon be possible to remove humans from the process altogether. We’re sure Martin Scorsese is thrilled about that.