Wednesday, October 30, 2024

Google On Diagnosing Multi-Domain Crawling Issues

Must read

Google’s Search Advocate, John Mueller, shared insights on diagnosing widespread crawling issues.

This guidance was shared in response to a disruption reported by Adrian Schmidt on LinkedIn. Google’s crawler stopped accessing several of his domains at the same time.

Despite the interruption, Schmidt noted that live tests via Search Console continued to function without error messages.

Investigations indicated no increase in 5xx errors or issues with robots.txt requests.

What could the problem be?

Mueller’s Response

Addressing the situation, Mueller pointed to shared infrastructure as the likely cause:

“If it shared across a bunch of domains and focuses on something like crawling, it’s probably an issue with a shared piece of infrastructure. If it’s already recovering, at least it’s not urgent anymore and you have a bit of time to poke at recent changes / infrastructure logs.”

Infrastructure Investigation

All affected sites used Cloudflare as their CDN, which raised some eyebrows.

When asked about debugging, Mueller recommended checking Search Console data to determine whether DNS or failed requests were causing the problem.

Mueller stated:

“The crawl stats in Search Console will also show a bit more, perhaps help decide between say DNS vs requests failing.”

He also pointed out that the timing was a key clue:

“If it’s all at exactly the same time, it wouldn’t be robots.txt, and probably not DNS.”

Impact on Search Results

Regarding search visibility concerns, Mueller reassured this type of disruption wouldn’t cause any problems:

“If this is from today, and it just lasted a few hours, I wouldn’t expect any visible issues in search.”

Why This Matters

When Googlebot suddenly stops crawling across numerous sites simultaneously, it can be challenging to identify the root cause.

While temporary crawling pauses might not immediately impact search rankings, they can disrupt Google’s ability to discover and index new content.

The incident highlights a vulnerability organizations might face without realizing it, especially those relying on shared infrastructure.

How This Can Help You

If time Googlebot stops crawling your sites:

  • Check if the problem hits multiple sites at once
  • Look at your shared infrastructure first
  • Use Search Console data to narrow down the cause
  • Don’t rule out DNS just because regular traffic looks fine
  • Keep an eye on your logs

For anyone running multiple sites behind a CDN, make sure you:

  • Have good logging set up
  • Watch your crawl rates
  • Know who to call when things go sideways
  • Keep tabs on your infrastructure provider

Featured Image: PeopleImages.com – Yuri A/Shutterstock

Latest article