Research Reveals the Flaws and Biases of ChatGPT Detectors

Published on:

14 Jul 2023, 12:00 pm

Research Findings Unveil the Flaws and Biases of ChatGPT Detectors

A person entered the content you're reading into a Google Doc. However, this may be different from content found elsewhere. With the development of free generative AI programs for text and graphics, such as ChatGPT, it's getting more challenging to distinguish text made by people from text generated by an AI.

Artificial intelligence (AI) has long been employed in social media, scientific research, advertising, agriculture, and business, mainly undetected. However, the advent of OpenAI's ChatGPT has sparked an arms race in places such as the classroom, where students have used the program to cheat, writing complete human-sounding essays. Teachers have installed detecting software to catch plagiarists in the act.

Stanford University researchers tested the reliability of these generative AI detectors in distinguishing whether the text was created by a human or an AI in a new study published in the journal Patterns on Monday. The researchers were astonished to discover that some of the most popular ChatGPT detectors, designed to detect text written by applications such as ChatGPT, consistently misclassified writing by non-native English speakers as AI-generated, revealing shortcomings and biases that users should be aware of.

The team collected 91 TOEFL essays from a Chinese forum and 88 from eighth-graders in the United States. They put them through seven commercially available GPT detectors, including OpenAI's detector and GPTZero, and discovered that just 5.1% of the US student essays were classed as "AI generated." On the other hand, human-written TOEFL essays were misclassified 61% of the time. One particular detector identified 97.8% of the TOEFL essays as AI-produced.

18 of the 91 TOEFL essays were classified as AI produced by all seven detectors. When the researchers dug deeper into these 18 essays, they discovered that a reduced "text perplexity" was most likely the cause. Perplexity functions as a proxy for variety or unpredictability in a particular text. Non-native English writers have been found to have a smaller vocabulary and utilize less complex syntax. According to the GPT detectors, this suggests that it was written by an AI.

ChatGPT and "literary language."

The researchers repeated their original experiment, basically turning it on its head. They employed AI this time to determine if the detection program accurately classified it as AI-created.

The researchers utilized ChatGPT to produce replies to the US college entrance essay topics for 2022-2023. They put the ChatGPT-generated essays through their seven detectors and discovered that the detectors detected AI-generated essays 70% of the time on average. They returned to ChatGPT, however, with a new suggestion to supplement the essays: "Elevate the provided text by employing literary language."

This question produced writings that perplexed the GPT detectors; they correctly classified material as AI-generated just 3.3% of the time. They got similar results when the team asked ChatGPT to produce scientific abstracts.

"We didn't expect these commercial detectors to perform so poorly on text from non-native speakers or to be so easily fooled by GPT," said James Zou, co-author of the current study and a biological data scientist at Stanford University.

Because they are easily duped, non-native English users may start using ChatGPT more frequently, urging the service to make their work look like it was authored by a native English speaker.

According to the researchers, the two trials ultimately pose a critical question: If it's so easy to mislead the detectors and human writing is regularly misclassified, what use are the detectors?

Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp

_____________

Disclaimer: Analytics Insight does not provide financial advice or guidance. Also note that the cryptocurrencies mentioned/listed on the website could potentially be scams, i.e. designed to induce you to invest financial resources that may be lost forever and not be recoverable once investments are made. You are responsible for conducting your own research (DYOR) before making any investments. Read more here.

ChatGPT

Generative AI

ChatGPT Detectors

ChatGPT for text

Research Reveals the Flaws and Biases of ChatGPT Detectors

Research Findings Unveil the Flaws and Biases of ChatGPT Detectors

Related Stories