2023 was the year of generative AI, but more specifically, the year we witnessed the power and potential of LLMs, large language models. A lot of the world of work is based around text: documents, email, content, media. Both startups and large tech companies leaned in hard, incorporating automation tools and generative AI applications across verticals.
Visual generative AI made strides as well. Midjourney V6, which was launched in December 2023, and and OpenAI’s Dalle-3 both provided a step jump in image creation.
But the next frontier is video. Progress in generative AI technologies for video has also be moving very fast, but it’s generally less talked about than text and images, which already have products with wide consumer adoption.
Generative AI in video consists of several buckets:
- Automatic video editing (includes descript
- Talking avatars – text to video (includes companies like HourOne, Synthesia, HeyGen)
- Video footage generation (i.e. moving pictures) from prompt
This post focuses on video footage generation.
Timeline of Generative AI for video progress in 2023
A16Z partner Justine Moore posted an excellent X thread on the advances of generative AI for video right before the end of the year.
As Justine’s timeline shows, the big players in this space are the large tech platforms: Google, Meta, Nvidia in the US and in China, Bytedance, Alibaba and Baidu. While Google and Meta shared they are working on AI Video generation, they’ve yet to release their products to the public.
The large tech players are well positioned to lead in this space given their access to deep learning talent, unlimited cloud resources and deep pockets. Google Brain recently open-sourced Phenaki, a video diffusion model that points towards YouTube’s internal capabilities. It is capable of generating a two minute AI generated video, using a series of prompts. Meta’s Make-A-Video builds on the recent progress made in text-to-image generation technology built to enable text-to-video generation. Many other paper in this space were published in 2023.
On the startup front, up and coming players like PikaAI and RunwayML, offer very short, but high quality video creation tools. And then, there are open source solutions like Stability.ai’s Stable Video Diffusion launched in November 2023.
RunwayML is targeting Holywood and AI filmmaking
Another tool worth calling out, generating videos from Images is FinalFrame. Here’s my video for “Panda bear surfing in Hawaii”
AI that makes everybody dance, using a pictur
Justine Moore tracked 21 products publicly available that enable users to generate AI video footage (you can check them out in this Google doc created by Justine). Note that the majority of tools generate very short videos (up to 16 seconds).
With sufficient data and compute, photorealistic, interactive video generation seems within reach. As an investor in generative AI/ interactive entertainment, this is an incredibly exciting time for the Generative AI video field as these models begin crossing the threshold of usefulness. However, significant challenges remain around bias, misinformation, and intellectual property, in addition to the yet unknown impact of incoming regulation. Also, investors have a tough question to ask: is generative AI a real platform shift, or are we in a bubble?
Addition (Jan 24th) – Google presents LUMIERE A Space-Time Diffusion Model for Video Generation. Demonstrate state-of-the-art text-to-video generation results, and show that our design easily facilitates a wide range of content creation tasks and video editing applications, including image-to-video, video in-painting, and stylised generation.
Update (Feb 17th): OpenAI launched Sora, a new text to video diffusion model that will enable the creation of videos from a prompt at 1080p quality. Sora can create videos of up to 60 seconds featuring highly detailed scenes, complex camera motion, and multiple characters with vibrant emotions. Currently the model is not yet open for public use, but the demo videos released seem high quality and coherent.
Example Prompt: “Beautiful, snowy Tokyo city is bustling. The camera moves through the bustling city street, following several people enjoying the beautiful snowy weather and shopping at nearby stalls. Gorgeous sakura petals are flying through the wind along with snowflakes.”
.
- Weekly #Firgun Newsletter – November 29 2024 - November 29, 2024
- Finding Alpha: Why the Best Startups Often Buck the Trends - November 25, 2024
- Weekly #FIRGUN Newsletter – November 22 2024 - November 22, 2024