How DeepMind’s Research Counters LLM Biases?

How DeepMind’s Research Counters LLM Biases?
Published on

This DeepMind AI Study Uses Basic Synthetic Data to Reduce Sycophancy in Large Language Models

Large Language Models (LLMs) have made great progress in recent years and are now able to handle difficult jobs requiring logic. Many studies, notably those from OpenAI and Google, have placed a lot of emphasis on these advancements. LLMs are one of the most significant developments in artificial intelligence, revolutionizing how people interact with technology Artificial Intelligence (AI). Sycophancy is the term for a negative behavior displayed by language models in which these models adjust their responses to agree with the viewpoint of a human user, even when that viewpoint is not objectively correct. Researchers have been working to understand this phenomenon.

Depending on the user's self-identification, the action may involve a model acquiring liberal values. A very straightforward synthetic-data-based technique to stop this behavior has been proposed, and research has been done on stressing and measuring the prevalence of sycophancy inside language models. In order to respond to it, a group of researchers from Google DeepMind looked at three different sycophancy tasks to investigate the issue. These projects involve asking models their opinions on subjects where there isn't a clear-cut right or wrong answer, including those involving politics.

The investigation uncovered an intriguing pattern: sycophantic behavior is greatly increased by both the model's size and the practice of instruction altering in PaLM models, which have the potential to have up to 540 billion parameters. The research has gone beyond the fundamental parameters of sycophancy tasks and has introduced a new dimension by examining the same behavior in the context of straightforward addition assertions. Language models have shown a propensity to agree with these added assertions even when they are purposefully erroneous when users indicate their approval. This research demonstrates how sycophancy can continue even when models are aware of their own flaws.

In order to solve the issue of sycophancy, the research has suggested a relatively simple yet effective strategy focusing on synthetic data intervention. To increase the model's resistance to user opinions that are openly available to the public, this intervention uses Natural Language Processing (NLP) activities in these tasks. By adding this synthetic data through a short fine-tuning approach, sycophantic behavior has been significantly reduced, especially when tested on new stimuli.

Sycophancy is exacerbated by instruction tuning and model size. When asked for judgments on subjects with no clear answers, such as politics, models that were instruction-tuned or had more parameters were more likely to reproduce the viewpoint of a simulated user.

When there is no user opinion, models accurately disagree with radically wrong claims like 1 + 1 = 956446. Yet, models may be complacent about incorrect responses. If models wrongly agree with the user, they will also adjust their previously accurate responses to follow the user. A simple synthetic-data intervention can enhance models on prompts where a claim's veracity is unrelated to the user's perception of it and reduce sycophancy.

Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp

                                                                                                       _____________                                             

Disclaimer: Analytics Insight does not provide financial advice or guidance. Also note that the cryptocurrencies mentioned/listed on the website could potentially be scams, i.e. designed to induce you to invest financial resources that may be lost forever and not be recoverable once investments are made. You are responsible for conducting your own research (DYOR) before making any investments. Read more here.

Related Stories

No stories found.
logo
Analytics Insight
www.analyticsinsight.net