Creative Automation: How Generative AI is reshaping creativity

The introduction of ChatGPT in November 2022 marked a transformative moment in content creation. I’ve been writing about this since 2021 on VC Cafe. As an investor in entertainment tech, gaming, and next-gen consumer tech at Remagine Ventures, this topic is of particular interest to me. In this post, I’ll explore some of the best-in-class examples of AI content creation and attempt to forecast future trends in this rapidly evolving field.

While the seemingly overnight success of ChatGPT was culmination of more than a decade of iterations in ‘creative automation’, the progress we’ve witnessed in AI over the last couple of years is poised to forever change the landscape of content creation. McKinsey predicts that Generative AI could add to the economy between $2.6 trillion and $4.4 trillion annually. But Generative AI has yet to live to that hype. As venture firm NEA puts it, creativity might be Generative AI’s first ‘killer’ use case.

The TMT industry stands at the forefront of this disruption, so it’s not surprising that it’s also the industry that has seen the highest adoption rate of generative AI technology (see graph below).

As LLMs become more multi modal, the same model should be able to help with all tasks, but for the time being, dedicated models to a specific type of output perform better. For a more detailed model comparison, including open-source models, I recommend leaderboards like Chatbot Arena. This post is not meant to be comprehensive, as it would require a separate post on each modality to do it justice. It is rather a sampling on how far, and how fast this space is moving, with an important caveat on enterprise adoption towards the end of the post.

Text generation

Writing

Foundational models continue to dominate this space. While ChatGPT-4 maintains its position as a leading text generator, newer models like GPT-4 Turbo (referred to as the ‘strawberry’ model) excel in reasoning tasks but may not surpass their predecessors in pure writing ability. In my experience, models like Grok and Google’s Gemini tend to produce more robotic-sounding text. For the most human-sounding, natural writing, Anthropic’s Claude has consistently impressed me.

Limitations: Model hallucinations remain a significant challenge in AI text generation. For instance, when searching for a ‘quote of the week’ for my newsletter and requesting quotes from famous rappers, the model sometimes repurposed previous quotes by simply changing the attributed names to rappers. Another limitation is language diversity. While English remains the dominant language for AI text generation, new models are gradually emerging that are trained on other languages, such as Hebrew.

AI Research

Perplexity deserves special mention as an AI-enhanced search tool. Unlike traditional search engines that provide links to relevant information, Perplexity generates direct answers to queries and cites its sources. However, Google is not far behind, having begun to integrate AI-generated answers (powered by Gemini) into its search results.

Projects like NotebookLM (currently in beta) offer powerful, in-depth research capabilities for text-based information. Users can ask questions, create summaries, generate realistic podcasts for learning purposes, and add notes on the fly to expand available information.

Emerging startups like Unriddle AI (part of the current Y Combinator batch) are also making strides in this space, gaining early traction with their innovative approaches to AI-powered research.

It turns out that AI can be more creative than humans when coming up with research ideas!

Can AI generative novel research ideas? (source)

AI Video

The video domain represents perhaps the most exciting frontier in AI content creation. As the following timeline by A16Z shows, the progress in AI video over the past year has been significant

To give you a sense on how far generative AI video has come, take a look at the video below and some of these examples.

Text to video footage

OpenAI’s Sora was the first head turner in this category. Kling AI is another text-to-video AI model developed by Kuaishou, a prominent Chinese technology company known for its short-video platform similar to TikTok. Alibaba also recently released a new text-to-video AI model as part of its broader initiative in AI technology. So far, the output tends to be short clips (below 10 seconds) and looks more like stock footage.

Another example from the latest model, Kling v1.5, which added realistic human emotion (from prompt)

AI video startups Runway and Luma recently released APIs, signalling a major step forward in accessibility for video generation tools. Meanwhile, Google has integrated its flagship video model, Veo, into YouTube so you can create + post 6-second clips directly on the platform.

Soon, ordering a Pizza will be a fully automated multimodal AI experience:

Text-to-Video Avatars

We are used to receiving most information from other people, and specifically, faces. That’s why hyper-realistic synthetic video based on human avatars has become so popular as a text replacement – for education, communication, advertising, sales and more. These virtual characters can speak any language and videos can be tailored for their audience.

Hour One, a pioneer in AI-powered video creation, has made significant strides in generating realistic human presenters for various applications (disclosure: Remagine Ventures portfolio company). Their technology allows businesses to create professional-looking videos with virtual hosts, dramatically reducing the time and cost associated with traditional video production. To see how far AI generated avatars have come, watch how Reid Hoffman created an AI clone of himself in the form of Reid AI, using Hour One.

Other notable examples in the AI video space include:

  1. Heygen/ Synthesia: Allow users to create AI-powered videos with virtual presenters speaking in multiple languages.
  2. D-ID: Specialises in creating talking head videos from still images, enabling the animation of historical figures or the creation of personalised video messages.
  3. Fliki: Combines text-to-speech and AI-generated visuals to create engaging video content from written scripts.

And the list goes on. For more exploration, check out this Github repo with a list of text-to-video companies.

These advancements in AI-generated video are not only democratising video creation but also opening up new possibilities for personalised content at scale. However, ethical considerations surrounding deepfakes and the potential for misinformation remain important challenges that the industry must address.

Short form videos and video editing

Munch focuses on automatically creating short-form video content from longer videos (disclosure: Remagine Ventures portfolio company). Their AI-powered tool analyzes long-form content to identify the most engaging moments, then generates short clips optimized for social media platforms. This technology is particularly valuable for content creators and marketers looking to repurpose existing video content for platforms like TikTok, Instagram Reels, and YouTube Shorts.

Descript enables users to automatically transcribe the video and edit the video by editing the text.

Image generation

In the realm of AI-generated images, Midjourney continues to lead the pack with its impressive quality and versatility. Users can now skip Discord to generate images, and the quality of the output still very much depends on the quality of the prompt. However, competition in this space has intensified significantly.

Stable Diffusion, an open-source image generation model commercialised by Stability AI, has gained substantial traction due to its flexibility and the ability for developers to fine-tune it for specific use cases. Stability has very powerful editing capabilities and is used by developers via APIs.

Adobe has also entered the fray with Firefly, integrating AI image generation capabilities directly into its suite of creative tools. The images are copyrighted, making them safe for commercial use.

I’m a fan of Ideogram, which initially got recognition for its ability to generate text in images.

DALL-E 3, developed by OpenAI, has made significant strides in generating images that more accurately reflect detailed text prompts. Google’s Imagen and Meta’s Make-A-Scene are also pushing the boundaries of what’s possible in AI image generation.

AI Animation

The realm of AI-powered animation has seen remarkable advancements, with several companies pushing the boundaries of what’s possible in automated animation creation. A16Z said that the next Pixar could be a generative AI company… I’ve seen generative AI animation companies that have scaled Youtube Channels to over 100K subscribers with AI generated content. And much of multi billion dollar animation companies like Moonbug’s Cocomelon is data-driven, AI-assisted production.

  1. Cascadeur: This AI-assisted animation software uses physics-based algorithms to create realistic character movements. It recently raised $7.6 million in Series A funding led by IOLA Venture Capital in 2023.
  2. Kinetix: A no-code AI-powered 3D animation platform that allows users to create animations from video. They raised $11 million in Series A funding in 2022, led by Adam Ghobarah, founder of Top Harvest Capital.
  3. Rokoko: Specializing in motion capture technology, Rokoko’s AI-enhanced tools make high-quality animation accessible to indie creators. They secured $3 million in seed funding in 2021 from The Danish Growth Fund and Vækstfonden.
  4. Wonder Dynamics: Founded by actor Tye Sheridan and VFX expert Nikola Todorovic, this startup uses AI to automate CGI and 3D animation for film production. They raised $9 million in Series A funding in 2022, backed by Horizons Ventures, Epic Games, and Samsung Next Ventures.
  5. Cartoon Animator (formerly CrazyTalk Animator): While not a recent startup, this software by Reallusion has integrated AI to enhance its 2D animation capabilities, making it easier for creators to produce high-quality animations quickly.

These advancements in AI-assisted animation are democratising the animation process, allowing creators with limited technical expertise to produce professional-quality animations. This trend is particularly impactful in the fields of entertainment, education, and marketing, where animated content can significantly boost engagement and understanding.

3D Content and AI gaming

The creation of 3D content is another area where AI is making significant strides, with applications ranging from game development to architectural visualisation, digital twins and virtual reality experiences.

Recent landscape of AI gaming startups (source)
  1. Playo.ai  has announced the launch of the world’s first foundation model specifically designed for game development. Playo’s technology simplifies the complex process of game development by enabling users to generate entire games from a single prompt.
  2. Promethean AI: This AI-powered tool assists in the rapid creation of 3D environments for games and virtual worlds. They raised $6 million in a seed round in 2021, led by Andreessen Horowitz.
  3. Scenario: Specializing in AI-generated 3D assets for game development, Scenario raised $6 million in seed funding in 2022, with Play Ventures and Anorak Ventures among the investors.
  4. Luma AI: While mentioned earlier for video, Luma AI also excels in creating 3D models from 2D images. They raised $20 million in Series A funding in 2023, led by Andreessen Horowitz.
  5. Hypothetic: This startup uses AI to generate 3D game assets from text descriptions. They raised $3.6 million in seed funding in 2023, with backers including South Park Commons and NEA.
  6. InstaLOD: Offering AI-powered 3D optimization and automation solutions, InstaLOD raised €3.4 million in Series A funding in 2022 from High-Tech Gründerfonds and Capnamic Ventures.
  7. Inworld AI: a platform for creating AI-powered virtual characters and interactive experiences for games, metaverse applications, and customer service.

The impact of AI on 3D content creation is transformative, significantly reducing the time and expertise required to produce complex 3D models and environments. This democratization of 3D content creation is opening new possibilities in fields such as:

  • Game Development: Faster creation of detailed game worlds and assets.
  • Architecture and Real Estate: Quick generation of 3D models for buildings and interiors.
  • E-commerce: Easy production of 3D product models for enhanced online shopping experiences.
  • Virtual and Augmented Reality: Rapid development of immersive environments and objects.

As AI continues to evolve in this space, we can expect to see even more sophisticated tools that blur the line between human-created and AI-generated 3D content. This progression will likely lead to more immersive and detailed digital experiences across various industries.

A few big caveats/ concerns for the future of generative AI startups

The future of these amazing creative tech startups is still uncertain and I would be remiss not to mention the limitations plaguing these companies. As Deloitte’s Q3 2024 State of GenAI in Enterprise Report reveals, enterprise interest in generative AI remains high, but actual adoption levels are still low. For example, 75% of respondents have increased investments in data life cycle management, a critical factor in enabling large-scale deployments, but the majority of organisations (70%) have moved 30% or fewer of their GenAI experiments into production.

Reasons for limited Enterprise AI adoption

1. Copyright/ Training data – there are several lawsuits going on against companies that are suspected (or confirmed) to use scraped training data from the Internet/ Youtube/ Publishers without permission. Large companies (like OpenAI) have deep pockets and can handle the heat, but for startups, that can be fatal. So far, industry players have adopted two primary strategies: either negotiating compensation from Large Language Models (LLMs) and foundational models for their data, or pursuing legal action when they can prove their content has been used for training purposes without permission. Notable examples include Getty Images vs Stability AI, as well as record labels’ lawsuits against AI music generators like Suno and Udio.

2. It’s safer to wait and see. As Deloitte and others report, enterprise clients are very interested in AI, but aren’t in a rush to deploy solutions. Partly due to compliance, security and privacy concerns, and partly due to the issue with copyright mentioned above. Until that changes, much of the adoption is coming from SMBs or Prosumers, which could in aggregate generate a large business if you’re market leader, but it’s tougher when there’s a lot of competition.

3. The large tech companies are leaning in on generative AI, hard. In the past, startups benefitted from the FAANG companies being slow to adopt new trends, but in the case of generative AI, it’s different. Google, Microsoft, Amazon, Nvidia, Meta and others are pouring billions of dollars into GenAI – from foundational models, to tooling and infrastructure. They also see the potential in enterprise adoption and will compete head to head with startups.

4. It’s still very expensive to train a new model. It requires training data (either legally obtained or scraped), expensive GPUs, and expensive talent (data scientists, ML engineers). That gives a huge advantage to the large players, who already have many of these resources.

5. Commoditisation and platform risk is real. The generative AI space is moving so fast, that a new technology, say text to animation, that looks novel today, might become commoditised tomorrow by one of the large tech companies or Generative AI scale ups. This makes it very difficult for investors to allocate into the space as they prefer to wait until the dust settles. Every major announcement by OpenAI, Google, Meta etc sends aftershocks to startups working in the same space. Startups can only find relative safety in niches, where there’s less risk that the incumbents will enter.

Conclusion

The rapid advancements in AI-powered creative automation are reshaping the content creation landscape across text, video, and images. While these technologies offer unprecedented opportunities for efficiency and scalability, they also raise important questions about the future of human creativity, copyright, and the authenticity of digital content.

Competition and go to market remain a big issue for Generative AI startups in the creative space. In the application layer, i.e. startups who have built wrappers around API from other foundational models like OpenAI, Anthropic, Stability AI, etc, are vulnerable to be disrupted by the LLMs themselves, unless they are able to find a niche with relative safety. As models become more multimodal, it’s safer to expect more consolidation in the number of tools.

For creators, it’s like eating from a fancy buffet right now. As startups compete on reaching prosumers who are willing to pay a few dollars for access, amateurs and talented entry level creatives are able to create content that would previously cost much more and require a team of professionals to produce. This can contribute to the flourishing of the creator economy, if they are able to monetise their work either via the content platforms (youtube, instagram, tiktok, linkined, X, etc) or directly from their audience.

The coming years will likely see even more integration of AI tools into existing workflows, further blurring the lines between human-generated and AI-assisted content. For investors, content creators, and technology enthusiasts alike, staying informed about these developments will be key to understanding and shaping the future of creative industries.

Follow me
Co Founder and Managing Partner at Remagine Ventures
Eze is managing partner of Remagine Ventures, a seed fund investing in ambitious founders at the intersection of tech, entertainment, gaming and commerce with a spotlight on Israel.

I'm a former general partner at google ventures, head of Google for Entrepreneurs in Europe and founding head of Campus London, Google's first physical hub for startups.

I'm also the founder of Techbikers, a non-profit bringing together the startup ecosystem on cycling challenges in support of Room to Read. Since inception in 2012 we've built 11 schools and 50 libraries in the developing world.
Eze Vidra
Follow me
Total
0
Shares
Previous Article
#Firgun newsletter #bringthemhome

Weekly #FIRGUN Newsletter - September 20 2024

Next Article
Finding the needle in the haystack

Needle in the haystack: tips for founders to find the right investor

Related Posts
Total
0
Share