In an unprecedented step, a group of South Korean academics created DarkBERT, an LLM trained only on dark web information. They aimed to develop an artificial intelligence tool that outperforms existing language models and aids threat researchers, law enforcement, and cybersecurity experts in combating cyber threats.
DarkBERT is a RoBERTa architecture-based transformer-based encoder model. The LLM was trained on millions of dark web pages, including data from hacker forums, scamming websites, and other criminal internet sources. The word dark web refers to an unreachable concealed area of the internet using standard web browsers. The sector is well-known for its anonymous websites and markets, which are notorious for criminal activities such as the trafficking of stolen data, narcotics, and firearms.
The researchers used the Tor network to obtain access to the dark web and collect raw data to train DarkBERT. They meticulously sifted this data using techniques such as deduplication, category balancing, and pre-processing to produce a refined dark web database. It was then fed to Roberta over around 15 days to produce DarkBERT.
DarkBERT's Potential Use in Cybersecurity: DarkBERT has an exceptional comprehension of cybercriminals' lingua franca and excels in identifying particular possible dangers. It can conduct dark web research and successfully discover and highlight cybersecurity dangers such as data breaches and ransomware, making it a potentially valuable weapon in the battle against cyber threats.
Researchers compared DarkBERT to two well-known NLP models, BERT and RoBERTa, analyzing their performance across three critical cybersecurity-related use cases, according to the research published on arxiv.org.
Use of AI for Threat Detection and Prevention: DarkBERT was pre-trained on dark web data and outperformed existing language models across many cybersecurity use cases, establishing itself as a critical tool for furthering dark web research. The dark web-trained AI might be used for various cybersecurity activities, such as identifying websites selling leaked personal data, monitoring dark web forums for illicit information exchange, and finding keywords relevant to cyber dangers. However, remember that DarkBERT, like other LLMs, is a work in progress, and its performance may be increased with continual training and fine-tuning.
Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp
_____________
Disclaimer: Analytics Insight does not provide financial advice or guidance. Also note that the cryptocurrencies mentioned/listed on the website could potentially be scams, i.e. designed to induce you to invest financial resources that may be lost forever and not be recoverable once investments are made. You are responsible for conducting your own research (DYOR) before making any investments. Read more here.