Scraping

The term “scraping” comes from the English language and means to scrape or scrape together. In the field of cyber security, scraping refers to the process of collecting – scraping together – and storing data. This usually involves data from websites, platforms or social networks.

What does the term scraping mean in detail?

Scraping – the collection and storage of data – can basically take place in two ways:
  • Manually, i.e. by hand. This procedure is very labor-intensive with large amounts of data.
  • Automatically, for example by computer programs. Then even large volumes of data can be processed quickly.

Currently, the term scraping is mainly used for the collection of data from websites. In principle, however, it can refer to all texts that are displayed on screens. Different terms are therefore sometimes used: for example, web scraping, screen scraping or data scraping. However, what they all have in common is the collection and storage of data. Scraping can be used for different purposes:

  • For your own analyses, for example for a manual competitor analysis.
  • For automatic collections and preparation of data from many different websites.
  • For the collection of contact data, for example email addresses published on social media platforms.
  • For the copying and unauthorized publication of content from third-party websites.

Where do I encounter scraping in my day-to-day work?

Behind every search with a search engine and every online price comparison is scraping. Search engine programs tirelessly “scrape” the addresses and information from websites in order to display them as search results. When comparing prices, scraping is used to collect prices, images and possibly product details. Scraping is also frequently used in a professional context. For example, for competitor analysis. However, you may also encounter the abusive side of scraping in your day-to-day work. For example through:
  • A phishing email after your email address published on the company website or LinkedIn, for example, has been scraped.
  • A company that systematically undercuts your prices, which it reads out via scaping.
  • A company that has copied texts and images from your website without your consent.
  • Phishing websites that have copied legitimate pages in detail through scraping, for example a login page for online banking.

What can I do to protect myself from abusive scraping?

  • Be very careful when sharing your data on websites and social media. This data can be collected, stored and passed on via scraping.
  • Publish as little data as possible that is of interest for abusive scraping. For example, set up contact forms on your company website instead of listing email addresses.
  • Follow the instructions in this Perseus blog post to check whether data has already been collected and published by you or your company via scraping.