The Cambridge Analytica[i] scandal along with other data breaches[ii] have given the data extraction industry a negative reputation. That's a hard reality to face, because (a) I lead a company that provides ethically-sourced proxies for public data extraction, and (b) I believe that web scraping can be a force for good.
I realise that some people will need to be convinced that this is true because positive stories don't get nearly as many clicks as negative ones. But they do exist, and I hope to change some minds with this article.
A wise person in the business told me not long ago that the solution to "bad" or poorly-used technology is not to dispose of it. The solution is to upgrade how it works or improve standards for how it's used.
This reminded me of issues that arose following the invention of the automobile, and how initial reckless driving and a lack of safety laws led to many injuries and deaths. The car wasn't to blame, it was the way it was being driven. Therefore the solution was to create better driving conditions and training that emphasised safety.
That example comes to mind when I think about the issues affecting our industry. We believe that the solution to many of the widespread concerns can be addressed through ethical tools to support web scraping, along with the use of best practices.
There's a lot of great work being done out there by technology firms, startups and independent developers looking to leverage web scraping in positive ways. This list is just a small sample of some of those projects:
Online marketplaces are an obvious example of the power of web scraping because almost all of us use these websites on a regular basis. And while I understand that cheap flights, bargain hotel rooms and rock-bottom gadget prices may not be saving the planet, these sites have made many of these products and services accessible to a wider audience. On top of that, new aggregator websites have brought to light ethical manufacturers[iii] that produce goods of high quality under fair working conditions.
Investigative journalists and "watchdog" monitoring groups use scrapers to source, track and compare information from public sites in their reporting. A notable (and controversial) example is the Reveal Project[iv]. This group compared member lists across several Facebook groups and found overlap between "extremist" organisations and law enforcement groups. Other examples include a Reuters investigation that uncovered an underground market for adopted children[v], and another that tracked elements of the online gun market[vi].
Along with online product marketplaces, the job market has benefited immensely from the power of web scraping. There are dozens of sites like CareerBuilder that list jobs from a wide variety of industries all over the world.
Journalistic integrity is a worldwide concern, no matter where an individual is located on the political spectrum. Besides disrupting the political process, fake news wields immense power whether it comes from a corporate or independent source.
Some startups are tackling the problem head-on[ix] by leveraging the power of web scraping along with machine learning algorithms to process large amounts of data from thousands of sources. The results, when analysed, provide insights into the credibility of the story according to its source and political slant.
Illegal content is an increasing concern among business leaders and politicians. While there are many systems in place that track complaints, checking each individually and removing the content manually is impossibly time-consuming and inefficient.
Web scraping is fast-tracking efforts to combat this serious problem. A notable example is a project developed by the Oxylabs team in cooperation with the Communications Regulatory Authority of the Republic of Lithuania (RRT) that produced an AI-powered tool for detecting content depicting child abuse[x].
Besides being able to identify prohibited visual material, the solution automatically sends a notification to the RRT hotline where appropriate actions are taken to prosecute the offenders.
It is possible to engage in large-scale data collection while respecting the server infrastructure of public websites and the privacy of users. Our crystal-clear code of ethics for our web scraping customers provides a framework with guidelines that include:
• Scraping only publicly-available web pages.
• Ensuring that the data is requested at a fair rate and that it doesn't compromise the web server.
• Respecting the data obtained and any privacy issues relevant to the source website.
• Studying the website's legal documents, deciding if they will be accepted, and determining if the terms will be breached upon acceptance.
• Making use of proxies that are procured ethically.
Web scraping, like many things in life, can be used in positive and negative ways. Data breaches and associated scandals may make headlines, but that shouldn't overshadow all the good work taking place in the world of data extraction. The benefits of ethical web scraping benefit our industry and extend into society. The solution is better technology that will power the next evolution of web scraping that makes it all possible.
Julius Cerniauskas, CEO at Oxylabs
Join our WhatsApp Channel to get the latest news, exclusives and videos on WhatsApp
_____________
Disclaimer: Analytics Insight does not provide financial advice or guidance. Also note that the cryptocurrencies mentioned/listed on the website could potentially be scams, i.e. designed to induce you to invest financial resources that may be lost forever and not be recoverable once investments are made. You are responsible for conducting your own research (DYOR) before making any investments. Read more here.