Aware of AI Dungeon 2's vision? For the uninitiated, it is an AI text adventure application, is a text adventure app to explore infinite content, choosing actions beyond what the developers could imagine. Developed through GPT-2 algorithm, had its own limitations: identifying who is who accurately, difficulty in understanding what is happening in the story, or having a tendency to reply to anything irrespective of how much ever outrageous the replies are in generating social stereotypes and moral implications for the users. And surprisingly, the pattern is not limited to the model alone. The just released Google's Imagen AI video generator struggles to thrive in similar fears. Only a week ago Google released its text-to-video-making application, Imagen AI only to keep itself away from sharing Imagen AI's source code. The reason is simple and pretty obvious: Imagen AI is vulnerable to generating sexually explicit, violent, and sometimes fake and hateful content. "While our internal testing suggests much of explicit and violent content can be filtered out, there still exists social biases and stereotypes which are challenging to detect and filter. We have decided not to release the Imagen Video model or its source code until these concerns are mitigated", Google's researchers wrote in the paper.
Imagen video is built on Google's text-to-image generator, Imagen, which is similar to DALL-E by Open AI. Trained on 14 million video-text pairs, 60 million image-text pairs, and the LAION-400M image-text dataset, it was designed to mitigate the existing difficulties in high-quality video generation. The model could generalize a wide range of aesthetics all because of the higher quality datasets. That apart, the cascading model discovered by the research team allows each diffusion model to be trained simultaneously. It understands the depth and 3D aspect of objects capable of making something like a drone flythrough video, rotating around and capturing images without distortion. It can render text making it even superior to Stable Diffusion and DALL-E models. Google says, that they have applied input text prompt filtering and output video content filtering, but the model falls short against many of the safety and ethical benchmarks. It says, Imagen Video and its frozen T5-XXL text encoder were trained on problematic data, and the model has been proven successful in filtering out biases and prejudices. Nevertheless, Google believes, a few more issues are left to be addressed for the model to detect corrupt content thoroughly. The team had numerous attempts with the model but could not deliver a fool-proof model – which ideally should not be the case – says the issue has been a constant. "We have decided not to release the Imagen Video model or its source code until these concerns are mitigated", declares the paper. Dwelling on the complexity of developing a text-to-video model the team writes, "Video modeling is computationally demanding, and we found that progressive distillation is a valuable technique for speeding up video diffusion models at sampling time. Given the tremendous recent progress in generative modeling, we believe there is ample scope for further improvements in video generation capabilities in future work." This brings us to the question, of whether LLMs hailed as dent makers can ever come out of the metaphoric stochastic parrot cage.
Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp
_____________
Disclaimer: Analytics Insight does not provide financial advice or guidance. Also note that the cryptocurrencies mentioned/listed on the website could potentially be scams, i.e. designed to induce you to invest financial resources that may be lost forever and not be recoverable once investments are made. You are responsible for conducting your own research (DYOR) before making any investments. Read more here.