Classifying Search Results – Google Patents SEO

In this series on learning SEO with Google patents, I will cover some of what I have learned reading the Google patent titled “Classifying Search Results” by Tania Bedrax-Weiss and al. This patent is an update to a previous patent named “Classifying search results to determine page elements”

Components of the architecture used to classify search results
Components of the architecture used to classify search results

What is the Patent About?

The Classifying search results patents is about classifying search results and determining page elements to display in response to a search query.

It classifies both the search query and the search results in order to generate page elements that correspond to the user’s search intent.


Subscribe to my Newsletter


Highlights From Classifying search results

This patent explains how Google checks and score search results categories and shows page elements (e.g. Knowledge box or Knowledge panel) related to the highest scored categories.

What is a Page Element?

A page element is a portion of an HTML document including some content. A page element can be anything such as text, image, multimedia content, or programming elements such as a scripts (source).

Controlling browser extensions

A page element can be visible (e.g. an image) or invisible (e.g. a cookie).

Examples of page elements in search results is a knowledge box, a knowledge panel, an individual search result, or even a portion of a search result (e.g. title or structured snippet).

In this paper, page elements are drawn from news, images, documents, maps and facts. It also mentions the Knowledge panel, individual search results, search bar, navigation menu as being page elements. It seems like they are what we call the SERP features.

How Google Decides Which Page Elements to Show

When Google receives a search query, it classifies the query, returns search results and uses the page generator component to classify search results. Finally, it decides which page elements to show or not to show in search results.

Components used by Google to classify search results.
Interactions of components showing how Google classifies search results.

Here is an overview of the steps used by Google to classify search queries and search results and decide which page elements to show in a search result page (see image below). Each step will be described in greater detail later on.

  1. User sends a search query through its browser
  2. Query classifier generates a query classification (query category / score)
  3. Search engine generates initial search results
  4. The page generator receives the search results
  5. Different search classifiers analyze portion of search results and return a classification response.
  6. The page element trigger manager determines the page elements to be shown based on the classification response
  7. The triggering manager generates the page and return it to the user
Displaying Pages by Classifying Search Results
Displaying Pages by Classifying Search Results

1. User Sends a Search Query

A search query is composed of one or more query terms.

A search query is data that a user submits to a search engine to satisfy the user’s informational needs. The search queries are usually in the form of text, e.g., one or more query terms, and may include transcriptions of spoken search queries.

Image Analysis for Results of Textual Image Queries

I would expand by saying that a search query can also be a combination of data received from the user and from the input documents related to the query. (see Google’s Document Locator patent).

A query, in Google’s eyes, can mean the user’s search query, but can also mean the expanded query, such as in the image below.

example expanded query
Source: Paul Haahr

Anyway, the user sends the query to the server, and the queries are labelled by the query classifier.

2. Query Classifier Classifies the Queries

Once the query is received, the query classifier will label the queries using a machine learning model.

What is Google’s Query Classifier?

Google’s query classifier, also known as query classification system, is the component that classifies search queries based on query features (e.g. characteristics of a query) and a query classification model.

The query classification model is a machine learning model used to classify search queries and trained on query data and search log data (e.g. labelled queries).

The query classifier works by training a query classification model on a set of labeled queries and then using that model on unlabeled queries to assign a label to each one of them.

Query strings can be expanded too.

Example process of labelling queries using a query classification model
Example process of labelling queries using a query classification model

More information about Google’s query classifier can be found in a patent named “query classification” and “Query classification based on search engine results“.

Once query are classified and labeled, the server sends the query to the search engine.

3. Search Engine Processes the Queries and Returns Search Results

The query is sent to the information retrieval system where a document processing system and a query processing system will work together to return search results for the query.

The document processing system prepares the inverted index from documents, pages or sites retrieved from the document corpus. It processes the contents using an automated or manual process.

The query processing system processes queries, generates relevancy scores and also generates snippets.

In query processing, a query string is tokenized and parsed by a query parser into query terms (i.e. each distinct term = token). The tokenized query terms are translated by the global-lexicon and can be expanded.

More information about the document and query processing systems can be found in the patent named: “Multi-stage query processing system and method for use with tokenspace repository“.

The search engine can then return search results, including the position, title, snippet URL and labels, and send the inverted index to the Google Page Generator.

Example search result returned by Google Search Engine
Example search result returned by Google Search Engine

4. Page Generator Determines Which Page Element to Display

The page generator works by receiving search results from the search engine and processing search results through various search classification systems (e.g. search classifiers) and then identifying page elements to be displayed to the user.

What is the Google Page Generator

The Google Page Generator is the component that determines which page elements to display and generate the page containing selected elements. The page generator uses the Page Element Trigger Manager and multiple search classifiers in its operations.

Example of the components of Google's page generator
Example of the components of the page generator

By using search results to determine page elements to be displayed, the page generator creates a page that represents the search engine’s predominant interpretation of the user’s query (e.g. search intent).

5. Search Classifiers Return a Classification Response

The Google page generator classifies search results using multiple search classifiers and returns a classification response.

What are Google Search Classifiers?

Google search classifiers are components of the page generator that generate a classification based on search results. According to the “Classifying Search Results” patent, there are 5 search results classifiers:

  • URL Classifier
  • Label Classifier
  • Label Histogram Classifier
  • Snippet Classifier
  • Title Classifier

What is a Classification Response?

A classification response is an HTTP response that contains a classification set and the number of documents analyzed in the classification. The classification set is a set of individual classifications where each classification contains a category and a score. Here is an example image of a classification response returned by search classifiers.

Example of classification response data structure at Google
Example of classification response data structure at Google

What is a Category?

A category, also known in the patent as a result category, is a content category that classifies a search result. It probably does so in similar way to the categories returned by the classifyText method of the Natural Language API.

See a full list of Google’s content categories.

Categories may be structured as a tree that contains a root category. Each root category may have child categories and child categories may be parents of further child categories.

Example of category hierarchies

How Google Search Classifiers Work?

Google search classifier generate a classification based on a portion of the search results generated by the search engine.

They analyze portions of search results and then return a classification response from the analysis. Each classifier analyses a different portion of a search result (e.g. position, title, URL, snippet, label or label histogram).

Search Results Data Structure

Here is an example image of a search results data structure where each search result has a position, title, URL, snippet and optional labels. Each item will be analyzed by a different classifier.

Example of search results portions at Google
Example of search results data structure at Google

Example Flowchart for Determining Classifications

This flowchart shows how Google performs classification depending on the search classifier.

How the Google Search Classifier determines a classification response
How the Google Search Classifier determines a classification response

How URL, Title and Snippet Classifiers Works

Google uses the same logic flow for the URL, the title and the snippets classification systems.

Determining URL, Title and Snippet classification starts by finding categories and weights for each search result. Weights are scaled and aggregated by category.

How the Google Search Classifier Determines URL, Title or Snippet Classification
How Google Determines URL, Title or Snippet Classification (fig.6)

Step 602 changes based on the classifier (URL, title or snippet).

Determining URL Classification for a Search Result

The URL classifier tries to determine categories and scores for a URL in order to produce an output similar to the one below.

How URL classifier determines URL classification
How URL classifier determines URL classification

To determine categories and weights for a specific search result for a given URL, Google assigns categories and weights for the URL, and for the hostname and then calculates a weighted average.

  • URL: https://example.com/path
  • Hostname: example.com
How Google Determines URL category and Weights when classifying based on the URL
How Google Determines URL category and Weights when classifying based on the URL (fig.7)

The weighted average is calculated by averaging the multiplied hostname and full URL weights of all categories.

w = hostname_weight * full_url_weight
w_avg = avg(cat1_weight + cat2_weight + ... + catn_weight)
How Google calculate weighted average of the full URL and hostname only
How URL classification weights are assigned
Handling Position Bias

The Page element triggering manager receives the weights and then takes care of the position bias. The position bias is the tendency of users to click on the top ranked search results.

Example position bias
Example position bias

In this case however, the position bias is used to determine that top ranked search results are more relevant to the user, and thus their categories should weigh more than pages ranked at the bottom of the page.

To take care of the position bias, the page element triggering manager applies a position bias function to the category weights.

Scaling weights according to position bias
Scaling weights according to position bias

Finally, results are aggregated to determine the score for each category.

Determining Title and Snippet Classification for a Search Result

The Title and Snippet classifiers determine categories and weights for the corresponding text contents by splitting the titles and snippets in sequences of words (e.g. ngrams). Then, it identifies a category for each sequence.

Below is a table representation of the output from the title and snippets classifiers where categories and weights have been assigned to each sequence of words for the following title: “National Institute of Health”.

Determine Categories and weights for title and snippet classifiers
How Google Determines Categories and weights for title and snippet classifiers
How the Weights are Assigned

Weights are assigned where the longer the matching sequence is, the greater the weight. This is because matching longer sequences is less ambiguous and provides a greater the confidence that the category is the right one for the sequence.

Determine URL  title or snippet category and Weights
How Google Determines URL category and Weights when classifying based on the title or snippet

Determining Label Classification for a Search Result

Classifying search result based on the label from a search results
Classifying search result based on the label from a search results

To determine label categories and weights for a specific search result, Google inspects the top search results returned by the search engine and assigns categories and weights for each label. Then, the category is scored checking what labels are associated with it.

How Google Determines Scores for Label Classifier
How Google Determines Scores for Label Classifier

Determining Label Histogram Classification for All Search Results

Classifying a search results based on label distribution of search results
Classifying a search results based on label distribution of search results

To determine label histogram categories and weights for all search results returned for a query, Google inspects the top labels distributions for the query.

Then, each label is mapped to a category (e.g. Disease -> Health).

Then, each category is scored based on the distribution. The score is determined by taking the following ration:

Score = label frequency of category / Sum of all the label frequencies
How Google Determines Scores for Label Histogram Classifier
How Google Determines Scores for Label Histogram Classifier

After determining the label histogram score, scores are then sent to the Page Element Trigger Manager.

6. Page Element Trigger Manager Picks What to Show

The page element trigger manager receives the aggregated classification weights and use them to determine the page elements to be shown based on the classification response.

What is the Page Element Trigger Manager?

The page element trigger manager is the component of the page generator that generates a page based on a set of results classifications.

Generating a Score for the Classification Set

The page element trigger manager receives the search results classification set (multiple classifications). From the set, it filters individual classifications with low scores. Then it determines a score for the entire classification set.

Using that scores of that set, it determines which elements to display.

How page element trigger manager uses classification response and query classification to generate a page
How page element trigger manager uses classification response and query classification to generate a page

Scoring Categories for a Search Query Based on the Classifier Weights

Once the classifications with low scores are filtered, the triggering manager determines the scores of the categories based on the weights received by the classifiers.

Determine scores based on classifier weights
Determine scores based on classifier weights

Combining All Scores for Categories

To aggregate all scores, each classifier score is given a weight. For example, title may weigh more than snippet. Then, weighted scores are combined to each other to provide a final average category score.

How Google Determines Scores based on classifiers weights
How Google Determines Scores based on classifiers weights

All the scores together form a result classification set. The result classification set is then used to decide which page element to display.

Determining Page Elements to Display

The page element trigger manager uses the results set classification to determine which page to display by generating negative and positive signals.

Page element trigger manager determining page elements to display

Determining Negative Signals

Negative signals are conditions in which page elements may not be displayed.

It checks if the classification of the query in the top ranked search results classifications based on the scores.

How page element trigger manager determines negative signals
How page element trigger manager determines negative signals used to determine page elements to display

For example, if the query “burns medical conditions” is categorized in the health/health_conditions category. Are the ranked documents also in the same category?

For example, if the query “burns medical conditions” is categorized in the health/health_conditions category. Are the ranked documents also in the same category? This example below would thus the page element related to medical conditions would be displayed.

In the example below, we would say that the health/health_conditions category has a high weighted average score.

Thus the page element related to medical conditions would be displayed.

Determining Positive Signals

Positive signals is simply the condition that tells that highest scored page elements should be displayed.

How page element trigger manager determines positive signals
How page element trigger manager determines positive signals

What Categories is the Patent About?

  • SERP enhancements
  • Knowledge panels

Definitions

Patent termDefinition
Page elementPortion of a page including some content
SnippetExcerpt from a web page
LabelLabel describing the content of a web page
Label HistogramList of labels and corresponding frequencies
Page Element Trigger ManagerComponent of the page generator that generates a page based on a set of results classifications
Page GeneratorComponent that determines which page elements to display and generate the page containing selected elements
Search ClassifierComponent of the page generator that determines the classification based on a search result
URL ClassifierComponent of the search classifier that determines the classification based on a URL
Label ClassifierComponent of the search classifier that determines the classification based on a Label
Label Histogram ClassifierComponent of the search classifier that determines the classification based on a label distribution inside search results
Snippet ClassifierComponent of the search classifier that determines the classification based on a Snippet
Title ClassifierComponent of the search classifier that determines the classification based on a Title
Query ClassifierComponent that determines a query classification based on a Query. Also known as query classification system.
Query ClassificationClassification of a search query based on query features of search queries and query classification model. It includes the category and a score. Examples of query classifications include products (e.g., query relates to the purchase of goods), images/videos, geographic location (e.g., query related to a map), news, adult content (e.g., pornography).
Search engineSoftware that sorts information from documents stored in an index and presents the most relevant search results to the user query.
Classification responseClassification determined based on search results that includes a classification set (multiple classifications). The classification response also includes the number of documents.
Classification setSet of multiple classifications assigned to a search result.
Search result classificationClassification of a search result generated from various elements in search results and represented as a category and a score
Search resultsResults returned by the search engine in response to a search query. They include hits from the search and information about the hits. Search results include a result set and a label histogram.
Search results setSet of multiple search results.
Search resultIndividual search result returned by the search engine in response to a search query. Each search result represents a hit in search. Each search result contains a position, a URL, a title, a snippet and a label.
SnippetExcerpt from a web page
Query classification modelMachine learning model used to classify search queries and trained on query data and search log data (e.g. labelled queries).
Number of documentsNumber of search results used by the classifier to generate the classification.
Position Bias
Natural bias towards clicking on results that are lower position (closer to top) than higher position (closer to bottom). Position bias is also used in search result classification to give higher weights to search results located closer to the top.

Google Search Infrastructure Involved

The Classifying search results patent mentions these elements from the Google Search Infrastructure:

  • Search Engine
  • Page Generator
    • Page Element Trigger Manager
    • Search Classifier
      • URL Classifier
      • Label Classifier
      • Label Histogram Classifier
      • Snippet Classifier
      • Title Classifier
  • Query Classifier

Patent Details

NameClassifying search results
AssigneeGoogle LLC
Filed2013-11-25
Assigned2017-10-02
StatusActive
Expiration2027-10-11
Application14/088,689
InventorTania Bedrax-Weiss,Ramanathan V. Guha, Patrick F. Riley, Corin Anderson
PatentUS9043322B2

Conclusion

The Classifying Search Results patent showed us how Google may decide which page element to show in search based on search results classifications.

When search results had a high enough score for a given category, Google may display a page element relevant to that category.

Enjoyed This Post?