OpenAI Releases Its Text-To-Video Tool Sora & Sparks Debate

🖐️

Editor's note: The story was updated on March 14. OpenAI's comments regarding the data training sources were added.

It is a historic day for AI.

OpenAI just dropped the bomb on video generation.

Sora 🤯

That’s how everyone’s X-formerly-Twitter feeds look the last few days. Nearly every post on social media is about Sora, OpenAI’s first text-to-video tool that creates videos so impressive it’s scary.

According to OpenAI's announcement, "Sora is capable of generating complex scenes featuring multiple characters, dynamic motion, and detailed backgrounds. Sora can generate videos up to a minute long while maintaining visual quality and adherence to the user’s prompt."

The model is designed to understand how objects interact in the physical world and can accurately interpret props, generate expressive characters, and create videos based on still images, fill in missing frames, or extend existing videos.

"Sora is able to generate complex scenes with multiple characters, specific types of motion, and accurate details of the subject and background. The model understands not only what the user has asked for in the prompt, but also how those things exist in the physical world."

And we must say that they really do look impressive.

Introducing Sora, our text-to-video model.

Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. https://t.co/7j2JN27M3W

Prompt: “Beautiful, snowy… pic.twitter.com/ruTEWn87vf
— OpenAI (@OpenAI) February 15, 2024

Prompt: “A movie trailer featuring the adventures of the 30 year old space man wearing a red wool knitted motorcycle helmet, blue sky, salt desert, cinematic style, shot on 35mm film, vivid colors.” pic.twitter.com/0JzpwPUGPB
— OpenAI (@OpenAI) February 15, 2024

Prompt: “Animated scene features a close-up of a short fluffy monster kneeling beside a melting red candle. the art style is 3d and realistic, with a focus on lighting and texture. the mood of the painting is one of wonder and curiosity, as the monster gazes at the flame with… pic.twitter.com/aLMgJPI0y6
— OpenAI (@OpenAI) February 15, 2024

According to OpenAI, though, the model has its weaknesses. As such, it may encounter difficulties accurately simulating the physics of intricate scenes and may fail to grasp particular cause-and-effect scenarios. For instance, while a person might be depicted taking a bite out of a cookie, the resulting image might not show the expected bite mark.

Spatial intricacies within prompts may confound the model, leading to occasional confusion between left and right orientations. Additionally, it might face challenges in providing precise descriptions of temporal sequences, such as tracking a predetermined camera trajectory over time.

"You're literally hurting jobs with this."

Despite the huge soar of excitement, it seems like doubt and concerns take centre stage anyway. Sora creates videos so realistic they might (will) actually take the jobs of lots of people, and those who now only consider careers in motion design and animation will have to think twice before applying to a related programme.

"The entire stock footage industry just died with this one tweet. RIP," says one user on X. "It’s so over I’m going to lose my job," comments another.

The whole comment section under the Sora launch announcement on X is filled with replies like this: "You scientists are so preoccupied with whether or not you can, you don't stop to think if you should." "OpenAI just can’t stop killing startups." "I don’t think y’all realize how many artists you’re fucking over right now." "If you see this @OpenAI please answer, in what way does this more good than bad? Like I am legit curious, yes I can see it's cool but long term this will do so much damage is how I see it." And this goes on and on.

Ethical data training concerns

After the launch, reporters and AI ethics advocate Ed Newton-Rex (former Stable AI executive and the founder of the nonprofit organisation Fairly Trained) rightfully enquired about how Sora has been trained: whose works scientists used to train Sora and whether the original creators of those works gave their explicit consent for their art to be fed to AI. There's an assumption, though, that the model was trained on 3D simulations (in contrast to copyrighted material), as "all the generations look 3D rendered."

March 14 update: In an interview to The Wall Street Journal reporter, OpenAI CTO Mira Murati said they used "publicly available videos" to train Sora. Murati doesn't go into the details of which kind of public videos they used and if they were from YouTube, Instagram, or Facebook.

Unfortunately the additional details in the Sora research blog don’t mention the source of the training data.

If OpenAI have managed this without using copyrighted work without consent, I will be the first to tip my hat to them.

Given their public position on fair use and their… https://t.co/Nl5rBxA6Pv
— Ed Newton-Rex (@ednewtonrex) February 16, 2024

“Absolutely zero ways in which this might be abused”

Another concern that has risen after the launch is how Sora might be used. Clearly, not all use cases will be innocent. In the age of deepfakes when no one is protected from being a victim of a harmful image or video, the concern is extremely valid.

According to OpenAI, "We’ll be taking several important safety steps ahead of making Sora available in OpenAI’s products. We are working with red teamers — domain experts in areas like misinformation, hateful content, and bias — who will be adversarially testing the model.

"We're also building tools to help detect misleading content such as a detection classifier that can tell when a video was generated by Sora. We plan to include C2PA metadata in the future if we deploy the model in an OpenAI product."

As per OpenAI, the team is not only developing new techniques but also leveraging existing safety methods implemented for products utilising DALL·E 3. These methods are applicable to Sora as well. For instance, a text classifier scrutinises and rejects input prompts that violate OpenAI's usage policies, such as those soliciting extreme violence, sexual content, hateful imagery, celebrity likenesses, or intellectual property belonging to others. Additionally, robust image classifiers are employed to review every generated video frame, ensuring compliance with the company's usage policies before presentation to the user.

OpenAI claims they are also actively engaging with policymakers, educators, and artists globally to understand their concerns and to explore positive use cases for this innovative technology. "Despite extensive research and testing, we cannot predict all of the beneficial ways people will use our technology, nor all the ways people will abuse it. That’s why we believe that learning from real-world use is a critical component of creating and releasing increasingly safe AI systems over time ," OpenAI's announcement says.

How to try Sora

For better or worse, trying out OpenAI's new AI video generator isn't something most of us can do straight away.

Despite the public announcement, Sora is currently undergoing a red-teaming phase, where it's tested to ensure it doesn't produce harmful or inappropriate content. While OpenAI is granting access to a select group of visual artists, designers, and filmmakers to provide feedback, access for the general public is still pending. The aim is to refine the model to be most beneficial for creative professionals, rather than replacing their expertise entirely. The true impact of Sora on industries will only become clear once it's widely available and adopted by businesses and individuals.

Hit Subscribe if you want to get stories like this straight to your inbox