GPT-3 or Generative Pre-trained Transformer 3 is a language model that was created by OpenAI, an artificial intelligence research laboratory in San Francisco. The 175-billion parameter deep learning model is capable of producing human-like text and was trained on large text datasets with hundreds of billions of words. When OpenAI released GPT-3, in June 2020, the neural network's apparent grasp of the language was uncanny. It could generate convincing sentences, converse with humans, and even autocomplete code. GPT-3 was also monstrous in scale—larger than any other neural network ever built. It kicked off a whole new trend in AI, one in which bigger is better.
GPT-3 is the third generation of the GPT language models created by OpenAI. The main difference that sets GPT-3 apart from previous models is its size. The 175 billion parameters of GPT-3 make it 17 times as large as GPT-2, and about 10 times as Microsoft's Turing NLG model. Referring to the transformer architecture described in my previous article listed above, GPT-3 has 96 attention blocks that each contains 96 attention heads. In other words, GPT-3 is basically a giant transformer model.
However, the impact of GPT-3 became even clearer in 2021. This year brought a proliferation of large AI models built by multiple tech firms and top AI labs, many surpassing GPT-3 itself in size and ability. GPT-3 grabbed the world's attention not only because of what it could do but because of how it did it. The striking jump in performance, especially GPT-3's ability to generalize across language tasks that it had not been specifically trained on, did not come from better algorithms but from sheer size.
The trend is not just in the US. This year the Chinese tech giant Huawei built a 200-billion-parameter language model called PanGu. Inspur, another Chinese firm, built Yuan 1.0, a 245-billion-parameter model. Baidu and Peng Cheng Laboratory, a research institute in Shenzhen, announced PCL-BAIDU Wenxin, a model with 280 billion parameters that Baidu is already using in a variety of applications, including internet search, news feeds, and smart speakers. And the Beijing Academy of AI announced Wu Dao 2.0, which has 1.75 trillion parameters. Meanwhile, South Korean internet search firm Naver announced a model called HyperCLOVA, with 204 billion parameters.
Large language models have become prestige projects that showcase a company's technical prowess. Yet few of these new models move the research forward beyond repeating the demonstration that scaling up gets good results. There are a handful of innovations. Once trained, Google's Switch-Transformer and GLaM use a fraction of their parameters to make predictions, so they save computing power. PCL-Baidu Wenxin combines a GPT-3-style model with a knowledge graph, a technique that makes it less costly to train than its giant rivals.
Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp
_____________
Disclaimer: Analytics Insight does not provide financial advice or guidance. Also note that the cryptocurrencies mentioned/listed on the website could potentially be scams, i.e. designed to induce you to invest financial resources that may be lost forever and not be recoverable once investments are made. You are responsible for conducting your own research (DYOR) before making any investments. Read more here.