Tuesday, November 5, 2024

The Biggest Takeaways from the Google Algorithm Leaks

Must read

For more than a century, people have been trying to steal the formula for Coca-Cola. Perhaps only slightly less-legendary? The formula for Google’s SEO algorithm. After all, placement in Google is vital. It drives organic traffic. It gets more eyes on your content. And it helps determine how many people ultimately come to view your business.

Recently, the secrets behind that algorithm were leaked, with Mike King over at iPullRank noting:

Internal documentation for Google Search’s Content Warehouse API has leaked. Google’s internal microservices appear to mirror what Google Cloud Platform offers and the internal version of documentation for the deprecated Document AI Warehouse was accidentally published publicly to a code repository for the client library.

Whoops.

As you might imagine, these leaks are reverberating throughout the marketing world. For years, SEO specialists have tried to backward-engineer what was really going on with Google search results. How much of what Google claimed was true? What wasn’t? And what really factored into the algorithm to help drive SEO success?

Now it appears we have some answers. So let’s parse through them and see what we can find out about how Google really arranges its placement.

What’s in the SEO Algo Leak?

According to iPullRank, there are “2,596 modules” in the API documentation, including over 14,000 features.

In this context, a module might be related to a component like YouTube or a video search. Google’s code is then stored in one large repository, meaning that any machine in Google’s network can access and run the code if it wants.

This helped readers get a sense of how Google’s overall structure works. But perhaps even more engaging about the leaks is that, according to the article, the “API docs reveal some notable Google lies.” There are some specific ones here that they address point by point:

  • Domain authority. Domain authority refers to a search engine score that predicts how likely a website will show up in the results on the strength of that domain overall. Google was sheepish about domain authority. Some people even denied that there was something like an overall domain authority ranking that might affect search results, even though many SEOs felt otherwise. But the leaks did find a score called “siteAuthority.” What this means is up to interpretation, but it appears there may be some sort of authority ranking baked into the system.
  • Clicks for rankings. Does Google rank search engines based on how well the links click? Google must have that information handy, and there is apparently testimony from Google experts in the past that hint at some sort of ranking system based on clicks. But others have denied that this is the case. In the algorithm, there appear to be variables for “badClicks” and “goodClicks” that could promote or demote a link based on its click qualities.
  • The sandbox. Some people have claimed that if a domain has some poor signals (lack of trust, less-trustworthy domain age), then it gets put in a “sandbox,” a bit like being punished. Or you might think of this as a penalty box. However, “hostAge” is an attribute included in the documentation. Maybe this doesn’t lead to the long, drawn-out experiences of “sandboxing” that many have reported, but it appears there’s some sort of truth to the claims.
  • Chrome usage. Google, of course, owns Chrome, which should potentially flood the search engine with information about Internet behavior. So you might think it’s natural that Google might use this information to feed its algorithm, right? Well, that’s been contentious as well. iPullRank claims that there is a module “that seems to be related to the generation of sitelinks” with a Chrome-related attribute here. In other words, Chrome data might feed into Google rankings as well.

Those are some of the key points. However, the post also zooms out. What is this monolithic thing known as the “Google algorithm” after all? “It’s a series of microservices,” says the post. Not truly a monolith. Google’s results are the result of an interplay of different systems, such as the web crawling system or the ranking system. Understanding that these different elements all play into the overall ranking of a website is key to good SEO.

What Are the Takeaways?

Given all that we’ve learned about the algorithm, what are the revelations that may impact the future of SEO?

  • Links still matter. Over the years, people have grown to imagine that links have declined in relevance to Google. “For quick background, Google’s index is stratified into tiers where the most important, regularly updated, and accessed content is stored in flash memory,” writes Mike King. But Google does have to prioritize content. And there may be “tiers” to the index that suggest there are higher-quality links, as Google ranks it.
  • Google remembers…a whole lot. Google’s file system is enormous and impressive, a sort of Wayback machine for the Internet. The algorithms seem to suggest this is true. However, when grabbing quick information from a site, it might just include the twenty latest versions of the page, which suggests that more frequent updates might have a bigger effect for someone who’s doing an SEO revamp.
  • Trust still matters. Homepage trust is still a key factor: high-quality, relevant links are more important than high-volume links. This isn’t a game-changer for SEOs, but it does appear to be verified in the leaks.

Jason Bernard of Search Engine Land also highlighted the keys from this leak, particularly pointing out the key personality-based keys. For example, the “isAuthor” name might highlight whether the entity in question (such as a website) is also the author of the document—this tends to rank news articles more highly. Bernard therefore recommends a new three-tiered approach to SEO: optimizing website content (traditional SEO), taking accountability for a website as the “website owner,” and then taking accountability for pages as the author.

That means doubling down on a “personal brand,” which people tend to trust. And if people tend to trust it, that means Google’s algorithm — as leaked — probably won’t be very far off.

Latest article