Text to Video AI Generators Is the Future of Movies. Does It Mean the Extinction of Directors?

Text to Video AI Generators Is the Future of Movies. Does It Mean the Extinction of Directors?
Published on

The paper claims the AI algorithm CogVideo, to be superior to all other text-to-video AI generators so far developed

As a quite considerable addition to AI's kit, new research has revealed the progress of the text-to-art genre of applications. Though text-to-video generators have been in use for quite some time, their potential was limited because of the high computation cost to train the machine learning model. The paper claims the AI algorithm CogVideo, to be superior to all other text-to-video AI generators so far developed. Glenn Marshall, a computer artist, who gave a try to this model, goes too far in praise of it that he warns directors of being left without a job!! His short film The Crow, which he made with the CogVideo generator has gained eligibility for submission to the prestigious BAFTA Awards. Talking to a TNW, he said "I haven't got a speech prepared, but I fantasize about collecting an award, in the role of a herald of AI, and proclaiming to the star-studded audience that [for] every one of you, actor, director, set designer, costume designer, artist, composer… AI is coming, and you'll find yourself in a very different job soon — or out of a job altogether." Though sounds like an exaggerated claim, or an overtly emotional take over the newfound charm, the text-to-video AI generator seems to have the potential.

Seeking beyond the video prediction:

Unlike image generation, video generation from text has always been a challenge, as it involves extracting both static and dynamic information from the text to train a conditional generative model. It requires a hybrid framework, ie., the combination of Variational Autoencoder (VAE) and a Generative Adversarial Network (GAN), converting text into corresponding images. Video generation, which essentially entails video prediction, needs the model to learn a nonlinear transfer function between given frames to predict subsequent frames. But just predicting future frames is not enough to generate a complete video. While image generators convert text into corresponding images, video generators employ autoregressive transformer models which understand text-image relations pretty much well but fail in interpreting text-action relations in videos. This brings us to the perennial problem of data inadequacy. While we can find zillions of high-quality images and text-image pairs on the internet, text-image pairs are rare to find. CogView, as per the research paper, can generate an entirely new video with a video as an input, thanks to the pre-trained model that it inherits.

A few specs for comparison:

CogVideo promises a high video resolution of 480 x 480 but needs a GPU with 40GB+ Vram though results can be obtained from running the first step on RTX 3090. It employs a multi-frame-rate hierarchical training strategy for easy text video alignment using 9 billion parameters making it the largest open-source pre-trained text-to-video AI model.

Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp

                                                                                                       _____________                                             

Disclaimer: Analytics Insight does not provide financial advice or guidance. Also note that the cryptocurrencies mentioned/listed on the website could potentially be scams, i.e. designed to induce you to invest financial resources that may be lost forever and not be recoverable once investments are made. You are responsible for conducting your own research (DYOR) before making any investments. Read more here.

Related Stories

No stories found.
logo
Analytics Insight
www.analyticsinsight.net