RankBrain, Machine Learning & Artificial Intelligence

Google was using a machine-learning artificial intelligence system called “RankBrain” to help sort through its search results. Wondering how that works and fits in with Google’s overall ranking system? Here’s faqs on RankBrain & what we know about RankBrain Unleashed.

What Is RankBrain?

RankBrain is Google’s name for a machine-learning artificial intelligence system that’s used to help process its search results, as was reported by Bloomberg and also confirmed to us by Google.

What Is Machine Learning?

Machine learning is where a computer teaches itself how to do something, rather than being taught by humans or following detailed programming.

What Is Artificial Intelligence?

True artificial intelligence, or AI for short, is where a computer can be as smart as a human being, at least in the sense of acquiring knowledge both from being taught and from building on what it knows and making new connections.

True AI exists only in science fiction novels, of course. In practice, AI is used to refer to computer systems that are designed to learn and make connections.

How’s AI different from machine learning? In terms of RankBrain, it seems to us they’re fairly synonymous. You may hear them both used interchangeably, or you may hear machine learning used to describe the type of artificial intelligence approach being employed.

So RankBrain Is The New Way Google Ranks Search Results?

No. RankBrain is part of Google’s overall search “algorithm,” a computer program that’s used to sort through the billions of pages it knows about and find the ones deemed most relevant for particular queries.

What’s The Name Of Google’s Search Algorithm?

It’s called Hummingbird, as we reported in the past. For years, the overall algorithm didn’t have a formal name. But in the middle of 2013, Google overhauled that algorithm and gave it a name, Hummingbird.

So RankBrain Is Part Of Google’s Hummingbird Search Algorithm?

That’s our understanding. Hummingbird is the overall search algorithm, just like a car has an overall engine in it. The engine itself may be made up of various parts, such as an oil filter, a fuel pump, a radiator and so on. In the same way, Hummingbird encompasses various parts, with RankBrain being one of the newest.

In particular, we know RankBrain is part of the overall Hummingbird algorithm because the Bloomberg article makes clear that RankBrain doesn’t handle all searches, as only the overall algorithm would.

Hummingbird also contains other parts with names familiar to those in the SEO space, such as Panda, Penguin and Payday designed to fight spam, Pigeon designed to improve local results,Top Heavy designed to demote ad-heavy pages, Mobile Friendly designed to reward mobile-friendly pages and Pirate designed to fight copyright infringement.

I Thought The Google Algorithm Was Called “PageRank”

PageRank is part of the overall Hummingbird algorithm that covers a specific way of giving pages credit based on the links from other pages pointing at them.

PageRank is special because it’s the first name that Google ever gave to one of the parts of its ranking algorithm, way back at the time the search engine began in 1998.

What About These “Signals” That Google Uses For Ranking?

Signals are things Google uses to help determine how to rank Web pages. For example, it will read the words on a Web page, so words are a signal. If some words are in bold, that might be another signal that’s noted. The calculations used as part of PageRank give a page a PageRank score that’s used as a signal. If a page is noted as being mobile-friendly, that’s another signal that’s registered.

All these signals get processed by various parts within the Hummingbird algorithm to ultimately figure out which pages Google shows in response to various searches.

How Many Signals Are There?

Google has fairly consistently spoken of having more than 200 major ranking signals that are evaluated that, in turn, might have up to 10,000 variations or sub-signals. It more typically just says “hundreds” of factors, as it did in yesterday’s Bloomberg article.

It’s a pretty good guide, we think, to general things that search engines like Google use to help rank Web pages.

And RankBrain Is The Third-Most Important Signal?

That’s right. From out of nowhere, this new system has become what Google says is the third-most important factor for ranking Web pages. From the Bloomberg article:

RankBrain is one of the “hundreds” of signals that go into an algorithm that determines what results appear on a Google search page and where they are ranked, Corrado said. In the few months it has been deployed, RankBrain has become the third-most important signal contributing to the result of a search query, he said.

What Are The First And Second-Most Important Signals?

Google won’t tell us what the first and second-most important signals are. We asked. Twice.

It’s annoying and arguably a bit misleading that Google won’t explain the top two. The Bloomberg article was no accident. Google wants some PR about what it considers to be its machine-learning breakthrough.

But to really assess that breakthrough, it’s helpful to know the other most important factors that Google uses now, as well as was was knocked behind by RankBrain. That’s why Google should explain these.

By the way, my personal guess is that links remain the most important signal, the way that Google counts up those links in the form of votes. It’s also a terribly aging system, as I’ve covered in my Links: The Broken “Ballot Box” Used By Google & Bing article from the past.

As for the second-most important signal, I’d guess that would be “words,” where words would encompass everything from the words on the page to how Google’s interpreting the words people enter into the search box outside of RankBrain analysis.

What Exactly Does RankBrain Do?

From emailing with Google, I gather RankBrain is mainly used as a way to interpret the searches that people submit to find pages that might not have the exact words that were searched for.

Didn’t Google Already Have Ways To Find Pages Beyond The Exact Query Entered?

Yes, Google has found pages beyond the exact terms someone enters for a very long time. For example, years and years ago, if you’d entered something like “shoe,” Google might not have found pages that said “shoes,” because those are technically two different words. But “stemming” allowed Google to get smarter, to understand that shoes is a variation of shoe, just like “running” is a variation of “run.”

Google also got synonym smarts, so that if you searched for “sneakers,” it might understand that you also meant “running shoes.” It even gained some conceptual smarts, to understand that there are pages about “Apple” the technology company versus “apple” the fruit.

What About The Knowledge Graph?

The Knowledge Graph, launched in 2012, was a way that Google grew even smarter about connections between words. More important, that it learned how to search for “things not strings,” as Google has described it.

Strings means searching just for strings of letters, such as pages that match the spelling of “Obama.” Things means that instead, Google understands when someone searches for “Obama,” they probably mean US President Barack Obama, an actual person with connections to other people, places and things.

The Knowledge Graph is a database of facts about things in the world and the relationships between them. It’s why you can do a search like “when was the wife of Obama born” and get an answer about Michele Obama, without ever using her name:

How’s RankBrain Helping Refine Queries?

The methods Google already uses to refine queries generally all flow back to some human being somewhere doing work, either having created stemming lists or synonym lists or making database connections between things. Sure, there’s some automation involved. But largely, it depends on human work.

The problem is that Google processes three billion searches per day. In 2007, Google said that 20 percent to 25 percent of those queries had never been seen before. In 2013, it brought that numberdown to 15 percent, which was used again in yesterday’s Bloomberg article and which Google reconfirmed to us. But 15 percent of three billion is still a huge number of queries never entered by any human searcher — 450 million per day.

Among those can be complex, multi-word queries, also called “long-tail” queries. RankBrain is designed to help better interpret those queries and effectively translate them, behind the scenes in a way, to find the best pages for the searcher.

As Google told us, it can see patterns between seemingly unconnected complex searches to understand how they’re actually similar to each other. This learning, in turn, allows it to better understand future complex searches and whether they’re related to particular topics. Most important, from what Google told us, it can then associate these groups of searches with results that it thinks searchers will like the most.

Google didn’t provide examples of groups of searches or give details on how RankBrain guesses at what are the best pages. But the latter is probably because if it can translate an ambiguous search into something more specific, it can then bring back better answers.

How About An Example?

While Google didn’t give groups of searches, the Bloomberg article did have a single example of a search where RankBrain is supposedly helping. Here it is:

What’s the title of the consumer at the highest level of a food chain

To a layperson like myself, “consumer” sounds like a reference to someone who buys something. However, it’s also a scientific term for something that consumes food. There are also levels of consumers in a food chain. That consumer at the highest level? The title — the name — is “predator.”

Entering that query into Google provides good answers, even though the query itself sounds pretty odd:

Imagine that RankBrain is connecting that original long and complicated query to this much shorter one, which is probably more commonly done. It understands that they are very similar. As a result, Google can leverage all it knows about getting answers for the more common query to help improve what it provides for the uncommon one.

Let me stress that I don’t know that RankBrain is connecting these two searches. I only know that Google gave the first example. This is simply an illustration of how RankBrain my be used to connect an uncommon search to a common one as a way of improving things.

Can Bing Do This, Too, With RankNet?

Back in 2005, Microsoft starting using its own machine-learning system, called RankNet, as part of what became its Bing search engine of today. In fact, the chief researcher and creator of RankNet was recently honored. But over the years, Microsoft has barely talked about RankNet.

You can bet that will likely change. It’s also interesting that when I put the search above into Bing, given as an example of how great Google’s RankBrain is, Bing gave me good results, including one listing that Google also returned:

One query doesn’t mean that Bing’s RankNet is as good as Google’s RankBrain or vice versa. Unfortunately, it’s really difficult to come up with a list to do this type of comparison.

Any More Examples?

Google did give us one fresh example: “How many tablespoons in a cup?” Google said that RankBrain favored different results in Australia versus the United States for that query because the measurements in each country are different, despite the similar names.

I tried to test this by searching at Google.com versus Google Australia. I didn’t see much difference, myself. Even without RankBrain, the results would often be different in this way just because of the “old fashioned” means of favoring pages from known Australian sites for those searchers using Google Australia.

Does RankBrain Really Help?

Despite my two examples above being less than compelling as testimony to the greatness of RankBrain, I really do believe that it probably is making a big impact, as Google is claiming. The company is fairly conservative with what goes into its ranking algorithm. It does small tests all the time. But it only launches big changes when it has a great degree of confidence.

Integrating RankBrain, to the degree that it’s supposedly the third-most important signal, is a huge change. It’s not one that I think Google would do unless it really believed it was helping.

When Did RankBrain Start?

Google told us that there was a gradual rollout of RankBrain in early 2015 and that it’s been fully live and global for a few months now.

What Queries Are Impacted?

Google told Bloomberg that a “very large fraction” of queries are being processed by RankBrain. We asked for a more specific figure but were given the same large fraction statement.

Is RankBrain Always Learning?

All learning that RankBrain does is offline, Google told us. It’s given batches of historical searches and learns to make predictions from these.

Those predictions are tested and if proven good, then the latest version of RankBrain goes live. Then the learn-offline-and-test cycle is repeated.

Does RankBrain Do More Than Query Refinement?

Typically, how a query is refined — be it through stemming, synonyms or now RankBrain — has not been considered a ranking factor or signal.

Signals are typically factors that are tied to content, such as the words on a page, the links pointing at a page, whether a page is on a secure server and so on. They can also be tied to a user, such as where a searcher is located or their search and browsing history.

So when Google talks about RankBrain as the third-most important signal, does it really mean as a ranking signal? Yes. Google reconfirmed to us that there is a component where RankBrain is directly contributing somehow to whether a page ranks.

How exactly? Is there some type of “RankBrain score” that might assess quality? Perhaps, but it seems much more likely that RankBrain is somehow helping Google better classify pages based on the content they contain. RankBrain might be able to better summarize what a page is about than Google’s existing systems have done.

Or not. Google isn’t saying anything other than there’s a ranking component involved.

How Do I Learn More About RankBrain?

Google told us people who want to learn about word “vectors” — the way words and phrases can be mathematically connected — should check out this blog post, which talks about how the system (which wasn’t named RankBrain in the post) learned the concept of capital cities of countries just by scanning news articles:

There’s a longer research paper this is based on here. You can even play with your own machine learning project using Google’s word2vec tool. In addition, Google has an entire area with its AI and machine learning papers, as does Microsoft.

RankBrain Unleashed By Gianluca Fiorelli at Moz

Introduction

Whenever Google announces something as important as a new algorithm, I always try to hold off on writing about it immediately, to let the dust settle, digest the news and the posts that talk about it, investigate, and then, finally, draw conclusions.

I did so in the case of Hummingbird. I do it now for RankBrain.

In the case of RankBrain, this is even more correct, because — let’s be honest — we know next to nothing about how RankBrain works. The only things that Google has said publicly are in the video Bloomberg published and the few things unnamed Googlers told Danny Sullivan for his article, FAQ: All About The New Google RankBrain Algorithm.

Dissecting the sources

As I said before, the only direct source we have is the video interview published on Bloomberg.

So, let’s dissect what Jack Clark, reporter of the Bloomberg said in that video and what Greg Corrado — senior research scientist at Google and one of the founding members and co-technical lead of Google’s large-scale deep neural networks project —came others said to Clark.

RankBrain is already worldwide.

I wanted to say this first: If you’re wondering whether or not RankBrain is already affecting the SERPs in your country, now you know — it is.

RankBrain is Artificial Intelligence.

Does this mean that RankBrain is our first evidence of Google as the Star Trek computer? No, it does not.

It’s true that many Googlers — like Peter Norvig, Corinna Cortes, Mehryar Mohri, Yoram Singer,Thomas Dean, Jeff Dean and many others — have been investigating and working on machine/deep learning and AI for a number of years (since 2001, as you can see when scrolling down this page). It’s equally true that much of the Google work on language, speech, translation, and visual processing relies on machine learning and AI. However, we should consider the topic of ANI (Artificial Narrow Intelligence), which Tim Urban of Wait But Why describes as: “Machine intelligence that equals or exceeds human intelligence or efficiency at a specific thing.”

Considering how Google is still buggy, we could have some fun and call it HANI (Hopefully Artificial Narrow Intelligence).

All jokes aside, Google clearly intends for its search engine to be an ANI in the (near) future.

RankBrain is a learning system.

With the term “learning system,” Greg Corrado surely means “machine learning system.”

Machine learning is not new to Google. We SEOs discovered how Google uses machine learning when Panda rolled out in 2011.

Panda, in fact, is a machine learning-based algorithm able to learn through iterations what a “quality website” is — or isn’t.

In order to train itself, it needs a dataset and yes/no factors. The result is an algorithm that is eventually able to achieve its objective.

Iterations, then, are meant to provide the machine with a constant learning process, in order to refine and optimize the algorithm.

Hundreds of people are working on it, and on building computers that can think by themselves.

Uhhhh… (Sorry, I couldn’t resist.)

RankBrain is a machine learning system, but — from what Greg Corrado said in the video — we can infer that in the future, it will probably be a deep learning one.

We do not know when this transition will happen (if ever), but assuming it does, then RankBrain won’t need any input — it will only need a dataset, over which it will apply its learning process in order to generate and then refine its algorithm.

Rand Fishkin visualized in a very simple but correct way what a deep learning process is:

Remember — and I repeat this so there’s no misunderstanding — RankBrain is not (yet) a deep learning system, because it still needs inputs in order to work. So… how does it work?

It interprets languages and interprets queries.

Paraphrasing the Bloomberg interview, Greg Corrado gave this information about how RankBrain works: It works when people make ambiguous searches or use colloquial terms, trying to solve a classic breakdown computers have because they don’t understand those queries or never saw them before.

We can consider RankBrain to be the first 100% post-Hummingbird algorithm developed by Google.

Even if we had some new algorithms rolling out after the Hummingbird release (e.g. Quality Update), those were based on pre-Hummingbird algos and/or were serving a very different phase of search (the Filter/Clustering and Ranking ones, specifically).

RankBrain seems to be a needed “patch” to the general Hummingbird update. In fact, we should remember that Hummingbird itself was meant to help Google understand “verbose queries.”

However, as Danny Sullivan wrote in the above mentioned FAQ article at Search Engine Land, RankBrain is not a sort of Hummingbird v.2, but rather a new algorithm that “optimizes” the Hummingbird work.

If you look at the image above while reading Greg Corrado’s words, we can say with a high degree of correctness that RankBrain acts in between the “Understanding” and the “Retrieving” phases of the overall search process.

Evidently, the too-ambiguous queries and the ones based on colloquialisms were too hard for Hummingbird to understand — so much so, in fact, that Google needed to create RankBrain.

RankBrain, like Hummingbird, generalizes and rewrites those kinds of queries, trying to match the intent behind them.

In order to understand a never-before-seen or unclear query, RankBrain uses vectors, which are — to quote the Bloomberg article — “vast amounts of written language embedded into mathematical entities,” and it tries to see if those vectors may have a meaning in relation to the query it’s trying to answer.

Upon discovering web documents that may answer the query, RankBrain retrieves them and lets them proceed, following the steps of the search phase until those documents are presented in a visible SERP.

It is within this context that we must accept the definition of RankBrain as a “ranking factor,” because in regards to the specific set of queries treated by RankBrain, this is substantially the truth.

In other words, the more RankBrain considers a web document to be a potentially correct answer to an unknown or not understandable query, the higher that document will rank in the corresponding SERP — while still taking into account the other applicable ranking factors.

Of course, it will be the choice of the searcher that ultimately informs Google as to what the answer to that unclear or unknown query is.

As a final note, necessary in order to head off the claims I saw when Hummingbird rolled out: No, your site did not lose visibility because of a mysterious RankBrain penalty.

Dismantling the RankBrain gears

Kristine Schachinger, a wonderful SEO geek whom I hold in deep esteem, relates RankBrain to Knowledge Graph and Entity Search in this article on Search Engine Land. However — while I’m in agreement that RankBrain is a patch of Hummingbird and that Hummingbird is not yet the “semantic search” Google announced — our opinions do differ on a few points.

I do not consider Hummingbird and Knowledge Graph to be the same thing. They surely share the same mission (moving from strings to things), and Hummingbird uses some of the technology behind Knowledge Graph, but still — they are two separate things.

This is, IMHO, a common misunderstanding SEOs have. So much so, in fact, that I even tend to not consider the Featured Snippets (aka the answers boxes) part of Knowledge Graph itself, as is commonly believed.

Therefore, if Hummingbird is not the same as Knowledge Graph, then we should think of entities not only as named entities (people, concepts like “love,” planets, landmarks, brands), but also as search entities, which are quite different altogether.

Search entities, as described by Bill Slawski, are as follows:

A query a searcher submits
Documents responsive to the query
The search session during which the searcher submits the query
The time at which the query is submitted
Advertisements presented in response to the query
Anchor text in a link in a document
The domain associated with a document

The relationships between these search entities can create a “probability score,” which may determine if a web document is shown in a determined SERP or not.

We cannot exclude the fact that RankBrain utilizes search entities in order to find the most probable and correct answers to a never-before-seen query, then uses the probability score as a qualitative metric in order to offer reasonable, substantive SERPs to the querying user.

The biggest advancement with RankBrain, though, is in how it deals with the quantity of content it analyzes in order to create the vectors. It seems bigger than the classic “link anchor text and surrounding text” that we always considered when discussing, for instance, how the Link Graph works.

There is a patent filed by Google that cites one of the AI experts cited by Greg Corrado — Thomas Strohmann — as an author.

In that patent, very well explained (again) by Bill Slawski in this post on Gofishdigital.com, is described a process through which Google can discover potential meanings for non-understandable queries.

In the patent, huge importance is attributed to context and “concepts,” and the fact that RankBrain uses vectors (again, “vast amounts of written language embedded into mathematical entities”). This is likely because those vectors are needed to secure a higher probability of understanding context and detecting already-known concepts, thus resulting in a higher probability of positively matching those unknown concepts it’s trying to understand in the query.

Speculating about RankBrain

As the section title says, now I enter in the most speculative part of this post.

What I wrote before, though it may also be considered speculation, has the distinct possibility of being true. What I am going to write now may or may not be true, so please, take it with a grain of salt.

DeepMind and Google Search

In 2014, Google acquired a company specialized in learning systems called DeepMind. I cannot help but consider that some of its technology and the evolutions of its technologies are used by Google for improving its search algorithm — hence the machine learning process of RankBrain.

In this article published last June on technologyreview.com, it’s explained in detail how not having a correctly-formatted database is the biggest obstacle for a correct machine and deep learning process. Without it, the neural computing (which is behind machine and deep learning) cannot work.

In the case of language, then, having “vast amounts of written language” is not enough if there’s no context, especially if not using n-grams within the search so the machine can understand it.

However, Karl Moritz Hermann and some of his DeepMind colleagues described in this paper how they were able to discover the kind of annotations they were looking for in classic “news highlights,” which are independent from the main news body.

Allow me to quote the Technology Review article in explaining their experiment:

Hermann and co anonymize the dataset by replacing the actors in sentences with a generic description. An example of some original text from the Daily Mail is this: “The BBC producer allegedly struck by Jeremy Clarkson will not press charges against the “Top Gear” host, his lawyer said Friday. Clarkson, who hosted one of the most-watched television shows in the world, was dropped by the BBC Wednesday after an internal investigation by the British broadcaster found he had subjected producer Oisin Tymon “to an unprovoked physical and verbal attack.”

An anonymized version of this text would be the following:

The ent381 producer allegedly struck by ent212 will not press charges against the “ent153” host, his lawyer said friday. ent212, who hosted one of the most – watched television shows in the world, was dropped by the ent381 wednesday after an internal investigation by the ent180 broadcaster found he had subjected producer ent193 “to an unprovoked physical and verbal attack.”

In this way it is possible to convert the following Cloze-type query to identify X from “Producer X will not press charges against Jeremy Clarkson, his lawyer says” to “Producer X will not press charges against ent212, his lawyer says.”

And the required answer changes from “Oisin Tymon” to “ent212.”

In that way, the anonymized actor is only possible to identify with some kind of understanding of the grammatical links and causal relationships between the entities in the story.

Using the Daily Mail, Hermann was able to provide a large, useful dataset to the DeepMind deep learning machine, and thus train it. After the training, the computer was able to correctly answer up to 60% of the questions asked.

Not a great percentage, we might be thinking. Besides, not all documents on the web are presented with the kind of highlights the Daily Mail or CNN sites have.

However, let me speculate: What are the search index and the Knowledge Graph if not a giant, annotated database? Would it be possible for Google to train its neural machine learning computing systems using the same technology DeepMind used with the Daily Mail-based database?

And what if Google were experimenting and using the Quantum Computer it shares with NASA and USRA for these kinds of machine learning tasks?

Or… What if Google were using all the computers in all of its data centers as one unique neural computing system?

I know, science fiction, but…

Ray Kurzweil’s vision

Ray Kurzweil is usually known for the “futurist” facets of his credentials. It’s easy for us to forget that he’s been working at Google since 2012, personally hired by Larry Page “to bring natural language understanding to Google.” Natural language understanding is essential both for RankBrain and for Hummingbird to work properly.

In an interview with The Guardian last year, Ray Kurzweil said:

When you write an article you’re not creating an interesting collection of words. You have something to say and Google is devoted to intelligently organising and processing the world’s information. The message in your article is information, and the computers are not picking up on that. So we would like to actually have the computers read. We want them to read everything on the web and every page of every book, then be able to engage an intelligent dialogue with the user to be able to answer their questions.

The DeepMind technology I cited above seems to be going in that direction, even though it’s still a non-mature technology.

The biggest problem, though, is not really being able to read billion of documents, because Google is already doing it (go read the EULA of Gmail, for instance). The biggest problem is understanding the implicit meaning within the words, so that Google may properly answer users’ questions, or even anticipate the answers before the questions are asked.

We know that Google is hard at work to achieve this, because the same Kurzweil told us that in the same interview:

“We are going to actually encode that, really try to teach it to understand the meaning of what these documents are saying.”

The vectors used by RankBrain may be our first glimpse of the technology Google will end up using for understanding all context, which is fundamental for giving a meaning to language.

How can we optimize for RankBrain?

I’m sure you’re asking this question.

My answer? This is a useless question, because RankBrain targets non-understandable queries and those using colloquialisms. Therefore, just as it’s not very useful to create specific pages for every single long-tail keyword, it’s even less useful to try targeting the queries RankBrain targets.

What we should do is insist on optimizing our content using semantic SEO practices, in order to help Google understand the context of our content and the meaning behind the concepts and entities we are writing about.

What we should do is consider the factors of personalized search as priorities, because search entities are strictly related to personalization. Branding, under this perspective, surely is a strategy that may have positive correlation to RankBrain and Hummingbird as they interpret and classify web documents and their content.

RankBrain, then, may not mean that much for our daily SEO activities, but it is offering us a glimpse of the future to come.

The information covered from the sources of Moz & Search Engine Land.

What is RankBrain, Machine Learning and Artificial Intelligence?