Saturday, November 23, 2024

Google Shows How To Block Bots And Boost Site Performance

Must read

Google’s Martin Splitt answered a question about malicious bots that impact site performance, offering suggestions every SEO and site owner should know and put into action.

Malicious Bots Are An SEO Problem

Many SEOs who do site audits commonly overlook security and bot traffic as part of their audits because it’s not widely understood by digital marketers that security events impact site performance and can account for why a site is inadequately crawled. Improving core web vitals will do nothing to improve site performance when a poor security posture is contributing to poor site performance.

Every website is under attack and the effects of excessive crawling can trigger a “500 server error” response code, signaling an inability to serve web pages and hindering Google’s ability to crawl web pages.

How To Defend Against Bot Attacks

The person asking the question wanted Google’s advice on how to fight back against the waves of scraper bots impacting their server performance.

This is the question asked:

“Our website is experiencing significant disruptions due to targeted scraping by automated software, leading to performance issues, increased server load, and potential data security concerns. Despite IP blocking and other preventive measures, the problem persists. What can we do?”

Google’s Martin Splitt suggested identifying the service that is serving as the source of the attacks and notifying them of an abusive use of their services. He also recommended the firewall capabilities of a CDN (Content Delivery Network).

Martin answered:

“This sounds like somewhat of a distributed denial-of-service issue if the crawling is so aggressive that it causes performance degradation.

You can try identifying the owner of the network where the traffic is coming from, thank “their hoster” and send an abuse notification. You can use WHOIS information for that, usually.

Alternatively, CDNs often have features to detect bot traffic and block it and by definition they take the traffic away from your server and distribute it nicely, so that’s a win. Most CDNs recognize legitimate search engine bots and won’t block them but if that’s a major concern for you, consider asking them before starting to use them.”

Will Google’s Advice Work?

Identifying the cloud provider or server data center that’s hosting the malicious bots is good advice. But there are many scenarios where that won’t work.

Three Reasons Why Contacting Resource Providers Won’t Work

1. Many Bots Are Hidden

Bots often use VPNs and open source “Tor” networks that hide the source of the bots, defeating all attempts of identifying the cloud services or web host providing the infrastructure for the bots. Hackers also hide behind compromised home and business computers, called botnets to launch their attacks. There’s no way to identify them.

2. Bots Switch IP Addresses

Some bots respond to IP blocking by instantly switching to a different network to immediately resume their attack. An attack can originate from a German server and when blocked will switch to a network provider in Asia.

3. Inefficient Use Of Time

Contacting network providers about abusive users is futile when the source of the traffic is obfuscated or from hundreds of sources. Many site owners and SEOs might be surprised to discover how intensive the attacks on their websites are. Even taking action against a small group of offenders is an inefficient use of time because there are literally millions of other bots that will replace the ones blocked by a cloud provider.

And what about botnets made up of thousands of compromised computers around the world? Think you have time to notify all of those ISPs?

Those are three reasons why notifying infrastructure providers is not a viable approach to stopping bots that impact site performance. Realistically, it’s a futile and inefficient use of time.

Use A WAF To Block Bots

Using a Web Application Firewall (WAF) is a good idea and that’s the function that Martin Splitt suggests when he mentioned using a CDN (content delivery network). A CDN, like Cloudflare, sends browsers and crawlers the requested web page from a server that’s located closest to them, speeding up site performance and reducing server resources for the site owner.

A CDN also has a WAF (Web Application Firewall) which automatically blocks malicious bots. Martin’s suggestion for using a CDN is definitely a good option, especially because it has the additional benefit of improving site performance.

An option that Martin didn’t mention is to use a WordPress plugin WAF like Wordfence. Wordfence has a WAF that automatically shuts down bots based on their behavior. For example, if a bot is requesting ridiculous amounts of pages it will automatically create a temporary IP block. If the bot rotates to another IP address it will identify the crawling behavior and block it again.

Another solution to consider is a SaaS platform like Sucuri that offers a WAF and a CDN to speed up performance. Both Wordfence and Sucuri are trustworthy providers of WordPress security and they come with limited but effective free versions.

Listen to the question and answer at the 6:36 minute mark of the Google SEO Office Hours podcast:

Featured Image by Shutterstock/Krakenimages.com

Latest article