Web crawler

Webcrawler translates as “crawler through the Internet”. Web crawlers are computer programs that independently search the Internet and analyze web pages.

What does web crawler mean in detail?

Web crawlers are bots – computer programs that automatically and independently perform certain tasks. The figurative term “web crawler” is derived from the way these bots work. They “crawl” from link to link through the Internet. This is how they get from website to website. Search engines use web crawlers to analyze websites and add them to their directory. Each search engine uses its own web crawlers. Your business website is visited by Googlebot, Bingbot, and other search engine crawlers.

Web crawlers can be used to automatically collect and analyze data from websites. Therefore, they are also used, for example, for price comparison websites to find the lowest prices for certain products.
But web crawlers are also used for shady or illegal purposes. For example, to automatically collect email addresses, to which spam is then sent.

Where do I encounter the topic of “web crawlers” in my everyday work?

You encounter it indirectly every time you use a search engine. The search results are the results of the work of the web crawlers. Unusual e-mail address information on websites such as “info(at)unternehmen(dot)com” is also justified by web crawlers. This format is designed to make the email address unreadable to shady web crawlers. However, e-mail addresses provided in this way are no longer barrier-free and can no longer be used, for example, by people with severely impaired vision. Furthermore, many shady web crawlers are now programmed to recognize such alternative spellings.

What can I do to improve my security?

Protect email addresses accessible on your company website from shady web crawlers. There are several ways to do this. Two examples:

  • Replace email addresses with contact forms. Contact forms cannot be used by bots, but can be made accessible to humans.
  • Replace email addresses with a forwarding via HTTP redirect. In this case, the e-mail address becomes accessible via a detour that is not comprehensible to bots. Contact your IT department to identify and implement your website best practices.