Image Credit: Franki Chamaki via Unsplash
11.05.2021

(Data) Scraping – Your data is my data

Cybersecurity | IT Protection | Attack Vectors

Most recently, you can read more and more about data scraping incidents on Facebook, LinkedIn or Clubhouse. But what exactly is behind this data collection? We’ll tell you.

What is (data) scraping?

The term “scraping” comes from English and means something like scraping off or scraping together. Data scraping refers to a technique in which a computer program extracts data from the readable output of another program – scrapes it together.

In this scraping, different types of data are collected and stored very quickly from internet sites, platforms or social networks. Mostly to use them later for analysis purposes.


Where do we encounter scraping in everyday life?

We encounter scraping very often in everyday life. Search engines or price comparison sites use scraping to collect and display product feeds, images, prices, and other related product details. Information is collected from many websites.

Scraping is also often used in a professional context. For example, scrapers – those who use scraping tools – collect data from different companies to gain insights into marketing information, user behavior, product reviews, or product prices. This allows them to gain insights about competitors or give their own company a competitive advantage.

Data scraping is also carried out to obtain personal information from employees or customers. For example, contact details and addresses, which are then sold on to other companies. In isolated cases, cyber criminals can also gain access to this information.


Scraping – Legitimate or Abuse?

Usually, scraping collects publicly available data. However, the decisive factor is how they are used. Legitimate uses are, for example, the above-mentioned price comparison sites.

However, the data can also be misused. For example, by simply copying professionally created texts from a company and using them for other websites. Or by sending phishing emails to email addresses collected via scraping. Cyber criminals can also use data scraping to copy websites in detail and use them for phishing attempts – for example, a login page for online banking.


Focus on social media channels

At the moment, there is talk that several one hundred million personal user data have been published and are in circulation. They were “scraped”, for example, on the career network LinkedIn and on the audio-based social network app Clubhouse.

Shortly before, Facebook also announced that it was affected by data scraping. This is about several hundred million profiles.

However, the platforms concerned are unaware of the are not aware of any guilt, as these data scrapings are not security incidents caused by hackers. Only personal data was skimmed off, which third parties can access anyway through the apps or public APIs – and which the users themselves have published in their profiles. These include names, profile names or photo URLs.


Data found on the darknet

Scraping can have negative consequences for the people affected – even if it is not a “classic hacker attack” in which cyber criminals gain unauthorized access to systems, servers or networks.

In recent days and weeks, several million profile data is offered for sale in well-known hacker forums. In some cases, the data was even made available free of charge. According to reports, sensitive data such as passwords or credit card information were apparently not affected. However, there is a possibility that the published data may be combined with information harvested elsewhere, providing sufficient information for fraud attacks (e.g., phishing, brute force).


We advise increased attention

If you have a profile (or several) on the social networks mentioned, it could be that your data is affected. Therefore, we advise you to pay more attention in the coming weeks.

Be especially suspicious of suspicious messages and emails, which could be attempts to use the collected data for fraud, phishing, or social engineering attacks.


How can you protect yourself?

In general, it is advisable to be aware of the publication of information on the Internet and especially in social networks. Only share the content that would let the general public see it. In addition, be aware that your data will be officially shared with third parties. And that they are collected and passed on by people or, for example, through scraping tools.

You can find out how to check whether data of yours has been published and what you can do in this case in this blog article.