The Rapid Evolution of Generative AI Video footage

January 17, 2024

10.7K views

3 minute read

2023 was the year of generative AI, but more specifically, the year we witnessed the power and potential of LLMs, large language models. A lot of the world of work is based around text: documents, email, content, media. Both startups and large tech companies leaned in hard, incorporating automation tools and generative AI applications across verticals.

Visual generative AI made strides as well. Midjourney V6, which was launched in December 2023, and and OpenAI’s Dalle-3 both provided a step jump in image creation.

But the next frontier is video. Progress in generative AI technologies for video has also be moving very fast, but it’s generally less talked about than text and images, which already have products with wide consumer adoption.

Generative AI in video consists of several buckets:

Automatic video editing (includes descript
Talking avatars – text to video (includes companies like HourOne, Synthesia, HeyGen)
Video footage generation (i.e. moving pictures) from prompt

This post focuses on video footage generation.

Timeline of Generative AI for video progress in 2023

A16Z partner Justine Moore posted an excellent X thread on the advances of generative AI for video right before the end of the year.

As Justine’s timeline shows, the big players in this space are the large tech platforms: Google, Meta, Nvidia in the US and in China, Bytedance, Alibaba and Baidu. While Google and Meta shared they are working on AI Video generation, they’ve yet to release their products to the public.

The large tech players are well positioned to lead in this space given their access to deep learning talent, unlimited cloud resources and deep pockets. Google Brain recently open-sourced Phenaki, a video diffusion model that points towards YouTube’s internal capabilities. It is capable of generating a two minute AI generated video, using a series of prompts. Meta’s Make-A-Video builds on the recent progress made in text-to-image generation technology built to enable text-to-video generation. Many other paper in this space were published in 2023.

On the startup front, up and coming players like PikaAI and RunwayML, offer very short, but high quality video creation tools. And then, there are open source solutions like Stability.ai’s Stable Video Diffusion launched in November 2023.

Pika AI 1.0 – idea to video

RunwayML is targeting Holywood and AI filmmaking

Another tool worth calling out, generating videos from Images is FinalFrame. Here’s my video for “Panda bear surfing in Hawaii”

AI that makes everybody dance, using a pictur

Justine Moore tracked 21 products publicly available that enable users to generate AI video footage (you can check them out in this Google doc created by Justine). Note that the majority of tools generate very short videos (up to 16 seconds).

With sufficient data and compute, photorealistic, interactive video generation seems within reach. As an investor in generative AI/ interactive entertainment, this is an incredibly exciting time for the Generative AI video field as these models begin crossing the threshold of usefulness. However, significant challenges remain around bias, misinformation, and intellectual property, in addition to the yet unknown impact of incoming regulation. Also, investors have a tough question to ask: is generative AI a real platform shift, or are we in a bubble?

Addition (Jan 24th) – Google presents LUMIERE A Space-Time Diffusion Model for Video Generation. Demonstrate state-of-the-art text-to-video generation results, and show that our design easily facilitates a wide range of content creation tasks and video editing applications, including image-to-video, video in-painting, and stylised generation.

Update (Feb 17th): OpenAI launched Sora, a new text to video diffusion model that will enable the creation of videos from a prompt at 1080p quality. Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. Currently the model is not yet open for public use, but the demo videos released seem high quality and coherent.

Example Prompt: “Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes.”

Author
Recent Posts

Follow me

Eze Vidra

Co Founder and Managing Partner at Remagine Ventures

Eze is managing partner of Remagine Ventures, a seed fund investing in ambitious founders at the intersection of tech, entertainment, gaming and commerce with a spotlight on Israel.

I'm a former general partner at google ventures, head of Google for Entrepreneurs in Europe and founding head of Campus London, Google's first physical hub for startups.

I'm also the founder of Techbikers, a non-profit bringing together the startup ecosystem on cycling challenges in support of Room to Read. Since inception in 2012 we've built 11 schools and 50 libraries in the developing world.

Follow me