A powerful new form of artificial intelligence has burst onto the scene and captured the public's imagination in recent months: text-to-image AI. Text-to-image AI models generate original images based solely on simply written inputs. Users can input any text prompt they like—say, "a cute corgi lives in a house made out of sushi"—and, as if by magic, the AI will produce a corresponding image.
These models produce images that have never existed in the world nor in anyone's imagination. They are not simple manipulations of existing images on the Internet; they are novel creations, breathtaking in their originality and sophistication. The most well-known text-to-image model is OpenAI's DALL-E. OpenAI debuted the original DALL-E model in January 2021. DALL-E 2, its successor, was announced in April 2022. DALL-E 2 has attracted widespread public attention, catapulting text-to-image technology into the mainstream.
In the wake of the excitement around DALL-E 2, it hasn't taken long for competitors to emerge. Within weeks, a lightweight open-source version dubbed "DALL-E Mini" went viral. Unaffiliated with OpenAI or DALL-E, DALL-E Mini has since been rebranded as Craiyon following pressure from OpenAI. In May, Google published its own text-to-image model, named Imagen. (All the images included in this article come from Imagen.)
Soon after that, a startup named Midjourney emerged with a powerful text-to-image model that it has made available for public use. Midjourney has seen astonishing user growth: launched only two months ago, the service has over 1.8 million users in its Discord group as of this writing. Midjourney has recently been featured on the cover of The Economist and on John Oliver's late-night TV show.
Another key entrant in this category is Stability.ai, the startup behind the Stable Diffusion model. Unlike any other competitor, Stability.ai has publicly released all the details of its AI model, publishing the model's weights online for anyone to access and use. This means that, unlike DALL-E or Midjourney, there are no filters or limitations on what Stable Diffusion can be used to generate—including violent, pornographic, racist, or otherwise harmful content.
Stability.ai's completely unrestricted release strategy has been controversial. On the other hand, the company's unapologetically open ethos is helping it build a strong community of developers and users around its platform, which may prove to be a valuable competitive advantage.
There is much to be said about the groundbreaking technology that underlies today's generative AI, but one key innovation, in particular, is worth briefly highlighting: diffusion models. Originally inspired by concepts from thermodynamics, diffusion models have seen a surge in popularity over the past year, rapidly displacing generative artificial networks (GANs) as the go-to method for AI-based image generation. DALL-E 2, Imagen, Midjourney, and Stable Diffusion all use diffusion models.
In a nutshell, diffusion models learn by corrupting their training data with incrementally added noise and then figuring out how to reverse this noising process to recover the original image. Once trained, diffusion models can then apply these denoising methods to synthesize novel "clean" data from random input.
Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp
_____________
Disclaimer: Analytics Insight does not provide financial advice or guidance. Also note that the cryptocurrencies mentioned/listed on the website could potentially be scams, i.e. designed to induce you to invest financial resources that may be lost forever and not be recoverable once investments are made. You are responsible for conducting your own research (DYOR) before making any investments. Read more here.