Gemini Loses to GPT-3.5 Turbo, According to Study

Gemini Loses to GPT-3.5 Turbo, According to Study
Published on

Study reveals GPT-3.5 turbo outperforms Gemini in various domains

Google attracted criticism for what appeared to be contrived interactions between the presenter and the AI less than a month after revealing its much-anticipated ChatGPT rival, Gemini, to the world through an eye-catching demo video. Recent studies show that even Gemini Pro, the most potent version, falls short of OpenAI's GPT-3.5 Turbo large language model (LLM) on most tasks.

The most recent LLM from Google, which has been under development for some months, outperforms OpenAI's more traditional, less advanced free model on the majority of tasks. After all, the underlying GPT-4 and GPT-4V LLMs are already regularly accessible and used by ChatGPT Plus and Enterprise paying subscribers, who have had access to the former for the majority of this year.

Their work, "An In-depth Look at Gemini's Language Abilities," was posted on the open-access, pre-peer-reviewed science website arXiv.org yesterday. In summary, as it stated quite clearly near the top, "As of this writing. we found that across all tasks, Gemini's Pro model achieved comparable but slightly inferior accuracy compared to the current version of OpenAI's GPT 3.5 Turbo."

That conclusion must hurt the Google researchers who have put in countless hours working on Gemini, as well as their leadership. After this story was published, we contacted Google, and a representative replied, stating that Google's internal research indicates Gemini Pro performs better than GPT-3.5 and that an even more potent version, Gemini Ultra, is scheduled for release in early 2024, scored higher than GPT-4. This is their complete response: In our technical article, we conduct several text-based academic assessments including reasoning, reading comprehension, STEM, and coding, and compare Gemini Pro and Ultra to a suite of external LLMs and our previous best model, PaLM 2. The outcomes demonstrate that Gemini Pro performs better than inference-optimized models like GPT-3.5.

We find that an extra hundred finetuning steps on particular website extracts corresponding to the HellaSwag training set improve the validation accuracy of Gemini Pro to 89.6% and Gemini Ultra to 96.0% when measured with 1-shot prompting as part of the evaluation process on a popular benchmark, HellaSwag

 This implies that the pretraining dataset composition has an impact on the benchmark results. We decided to limit the number of HellaSwag decontaminated results that we report to ten evaluation shots. More sophisticated and nuanced standardized evaluation benchmarks without compromised data, in our opinion, are required. Thus, we need to assess Gemini models using several newly announced new held-out assessment datasets, such as WMT23 and Math-AMC 2022–2023 problems, or internally created from offline sources, like Natural2Code.

Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp

                                                                                                       _____________                                             

Disclaimer: Analytics Insight does not provide financial advice or guidance. Also note that the cryptocurrencies mentioned/listed on the website could potentially be scams, i.e. designed to induce you to invest financial resources that may be lost forever and not be recoverable once investments are made. You are responsible for conducting your own research (DYOR) before making any investments. Read more here.

Related Stories

No stories found.
logo
Analytics Insight
www.analyticsinsight.net