BLOOM (BigScience Language Open-science Open-access Multilingual) is the new kid in the language model block. Rather it would be fair to say, that it is a game changer in the history of language model development. It is developed by Hugging Face, an AI research body working towards developing ethical AI, through a year-long research workshop on large multilingual models and datasets, and is set to turn our equation with large language models upside down. Ever since GPT-3 model made waves, tech companies were attempting to develop a superior model but only in vain. GPT-3 model, developed on a whooping 175 billion parameters, lags in many respects. BLOOM is trained on around 176 billion parameters and can generate text in 46 natural languages and 13 programming languages. By and large language models are designed for transactions in the English language and are costly to create, thus priced heavily to prevent well-minded people gain access to and dissecting them. BLOOM, a multilingual model, which boasts of universal accessibility and transparency clears many of these barriers for researchers and common people as well to study and understand LLMs from the right perspective.
As the name suggests, it is an open-source language model designed by a diverse group of around 1000+ researchers including Nvidia's Megatron, and Microsoft DeepSpeed teams, as well as with support from CNRS, the French National Research Agency, who all belief in inclusive and responsible technologies. This is not the first time a language model is open sourced. Earlier big names in the tech industry like Google and Meta have open-sourced few models, except for releasing their state-of-the-art research, only to make money out of those models.
As Le Scao, a researcher at Hugging Face explains, Hugging Face used Nvidia's Megatron and Microsoft's DeepSpeed open-source projects, which are based on the open-source PyTorch machine learning framework. The researchers developed a fork for BLOOM from Megatron and DeepSpeed projects for it to respond in different languages. But as the most important question of ethics shows up – if it would end up prejudiced and biased – it is reasonable to assume its fairness as the model itself was developed in the open and uses its own open license modeled on the Responsible AI license. "We're trying to define what open source means in the context of large AI models because they don't really work as the software does," Le Scao said. In fact, when the very purpose of making it open source is to comprehend the language models in their entirety, there is no point in looking for ethicality. Only the area of concern is if it can be misused by ill-intentioned actors. Leo Scao says, "The goal of the licensing for BLOOM was to make the model as open as possible, while still retaining a degree of control on the use cases that organizations have for the model". Does that mean BLOOM can dominate LLM domain? Experts seem to not hold many hopes for it to make any significant changes. "OpenAI and Google and Microsoft are still blazing ahead," Liang says. However, they are hopeful of it finding the right place in the LLM space by helping people scrutinize the mechanism of language models, that come with inherent biases and are monopolized by big players. "BLOOM is also likely to incorporate inaccuracies and biased language, but since everything about the model is out in the open, people will be able to interrogate the model's strengths and weaknesses", says Margaret Mitchell, an AI researcher and ethicist at Hugging Face.
Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp
_____________
Disclaimer: Analytics Insight does not provide financial advice or guidance. Also note that the cryptocurrencies mentioned/listed on the website could potentially be scams, i.e. designed to induce you to invest financial resources that may be lost forever and not be recoverable once investments are made. You are responsible for conducting your own research (DYOR) before making any investments. Read more here.