Artificial Intelligence

Trained on the dark web, Darkbert AI can combat cyber crimes

Shiva Ganesh

The AI intends to assist cybersecurity experts in gathering cyber threat intelligence

In an unprecedented step, a group of South Korean academics created DarkBERT, an LLM trained only on dark web information. They aimed to develop an artificial intelligence tool that outperforms existing language models and aids threat researchers, law enforcement, and cybersecurity experts in combating cyber threats.

What is DarKBERT?

DarkBERT is a RoBERTa architecture-based transformer-based encoder model. The LLM was trained on millions of dark web pages, including data from hacker forums, scamming websites, and other criminal internet sources. The word dark web refers to an unreachable concealed area of the internet using standard web browsers. The sector is well-known for its anonymous websites and markets, which are notorious for criminal activities such as the trafficking of stolen data, narcotics, and firearms.

The researchers used the Tor network to obtain access to the dark web and collect raw data to train DarkBERT. They meticulously sifted this data using techniques such as deduplication, category balancing, and pre-processing to produce a refined dark web database. It was then fed to Roberta over around 15 days to produce DarkBERT.

DarkBERT's Potential Use in Cybersecurity: DarkBERT has an exceptional comprehension of cybercriminals' lingua franca and excels in identifying particular possible dangers. It can conduct dark web research and successfully discover and highlight cybersecurity dangers such as data breaches and ransomware, making it a potentially valuable weapon in the battle against cyber threats.

Researchers compared DarkBERT to two well-known NLP models, BERT and RoBERTa, analyzing their performance across three critical cybersecurity-related use cases, according to the research published on arxiv.org.

  1. Check Dark Web Forums for Potentially Hazardous Topics: Monitoring dark web forums, which are widely used to exchange unlawful information, is critical to discover potentially harmful posts. But, manually examining them may be time-consuming, so that security specialists will benefit from the automation of the process.
  2. Locate Websites That Store Sensitive Information: Hackers and ransomware groups use the dark web to set up leak sites to reveal secret information stolen from firms refusing to pay ransom demands. Some fraudsters just post leaked sensitive material to the dark web, such as passwords and bank information, intending to sell it.
  3. Detect Threat-Related Keywords on the Dark Web: DarkBERT uses the fill-mask function, a BERT-family language model feature, to reliably detect phrases linked with criminal activities, such as drug transactions on the dark web. DarkBERT created drug-related words when "MDMA" was hidden on a drug sales website, but other models suggested generic words and keywords unrelated to drugs, such as numerous professions. The capacity of DarkBERT to discover phrases associated with illegal actions might help identify and resolve new cyber risks.

Use of AI for Threat Detection and Prevention: DarkBERT was pre-trained on dark web data and outperformed existing language models across many cybersecurity use cases, establishing itself as a critical tool for furthering dark web research. The dark web-trained AI might be used for various cybersecurity activities, such as identifying websites selling leaked personal data, monitoring dark web forums for illicit information exchange, and finding keywords relevant to cyber dangers. However, remember that DarkBERT, like other LLMs, is a work in progress, and its performance may be increased with continual training and fine-tuning.

Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp

                                                                                                       _____________                                             

Disclaimer: Analytics Insight does not provide financial advice or guidance. Also note that the cryptocurrencies mentioned/listed on the website could potentially be scams, i.e. designed to induce you to invest financial resources that may be lost forever and not be recoverable once investments are made. You are responsible for conducting your own research (DYOR) before making any investments. Read more here.

Ethereum Bull Sees This $0.09 Crypto Following ETH’s Rally from 2017, Here’s Why

Ethereum Founder Says Solana 'More Centralized' Than Ethereum; ETH Whales Are Rapidly Accumulating This Altcoin

“Don't Get Stuck On Sidelines” ETH Whale Forecasts Massive Jump to $5000 for Ethereum Price, 100x for ERC-20 Gem

Bitcoin (BTC) Investors Seek the Next Big 1000x Growth Token Before Profit-Taking Ensues!

Trader Revises XRP End-of-Year Target to $3 After Unexpected Rally, Predicts Ethereum-Based Coin at $0.09 Will Hit $12 in 66 Days