Classifying Search Results – Google Patents SEO

In this series on learning SEO with Google patents, I will cover some of what I have learned reading the Google patent titled “Classifying Search Results” by Tania Bedrax-Weiss and al. This patent is an update to a previous patent named “Classifying search results to determine page elements”

Components of the architecture used to classify search results
Components of the architecture used to classify search results

What is the Patent About?

The Classifying search results patents is about classifying search results and determining page elements to display in response to a search query.

It classifies both the search query and the search results in order to generate page elements that correspond to the user’s search intent.

Join the Newsletter

    Highlights From Classifying search results

    This patent explains how Google checks and score search results categories and shows page elements (e.g. Knowledge box or Knowledge panel) related to the highest scored categories.

    What is a Page Element?

    A page element is a portion of an HTML document including some content. A page element can be anything such as text, image, multimedia content, or programming elements such as a scripts (source).

    Controlling browser extensions

    A page element can be visible (e.g. an image) or invisible (e.g. a cookie).

    Examples of page elements in search results is a knowledge box, a knowledge panel, an individual search result, or even a portion of a search result (e.g. title or structured snippet).

    In this paper, page elements are drawn from news, images, documents, maps and facts. It also mentions the Knowledge panel, individual search results, search bar, navigation menu as being page elements. It seems like they are what we call the SERP features.

    How Google Decides Which Page Elements to Show

    When Google receives a search query, it classifies the query, returns search results and uses the page generator component to classify search results. Finally, it decides which page elements to show or not to show in search results.

    Components used by Google to classify search results.
    Interactions of components showing how Google classifies search results.

    Here is an overview of the steps used by Google to classify search queries and search results and decide which page elements to show in a search result page (see image below). Each step will be described in greater detail later on.

    1. User sends a search query through its browser
    2. Query classifier generates a query classification (query category / score)
    3. Search engine generates initial search results
    4. The page generator receives the search results
    5. Different search classifiers analyze portion of search results and return a classification response.
    6. The page element trigger manager determines the page elements to be shown based on the classification response
    7. The triggering manager generates the page and return it to the user
    Displaying Pages by Classifying Search Results
    Displaying Pages by Classifying Search Results

    1. User Sends a Search Query

    A search query is composed of one or more query terms.

    A search query is data that a user submits to a search engine to satisfy the user’s informational needs. The search queries are usually in the form of text, e.g., one or more query terms, and may include transcriptions of spoken search queries.

    Image Analysis for Results of Textual Image Queries

    I would expand by saying that a search query can also be a combination of data received from the user and from the input documents related to the query. (see Google’s Document Locator patent).

    A query, in Google’s eyes, can mean the user’s search query, but can also mean the expanded query, such as in the image below.

    example expanded query
    Source: Paul Haahr

    Anyway, the user sends the query to the server, and the queries are labelled by the query classifier.

    2. Query Classifier Classifies the Queries

    Once the query is received, the query classifier will label the queries using a machine learning model.

    What is Google’s Query Classifier?

    Google’s query classifier, also known as query classification system, is the component that classifies search queries based on query features (e.g. characteristics of a query) and a query classification model.

    The query classification model is a machine learning model used to classify search queries and trained on query data and search log data (e.g. labelled queries).

    The query classifier works by training a query classification model on a set of labeled queries and then using that model on unlabeled queries to assign a label to each one of them.

    Query strings can be expanded too.

    Example process of labelling queries using a query classification model
    Example process of labelling queries using a query classification model

    More information about Google’s query classifier can be found in a patent named “query classification” and “Query classification based on search engine results“.

    Once query are classified and labeled, the server sends the query to the search engine.

    3. Search Engine Processes the Queries and Returns Search Results

    The query is sent to the information retrieval system where a document processing system and a query processing system will work together to return search results for the query.

    The document processing system prepares the inverted index from documents, pages or sites retrieved from the document corpus. It processes the contents using an automated or manual process.

    The query processing system processes queries, generates relevancy scores and also generates snippets.

    In query processing, a query string is tokenized and parsed by a query parser into query terms (i.e. each distinct term = token). The tokenized query terms are translated by the global-lexicon and can be expanded.

    More information about the document and query processing systems can be found in the patent named: “Multi-stage query processing system and method for use with tokenspace repository“.

    The search engine can then return search results, including the position, title, snippet URL and labels, and send the inverted index to the Google Page Generator.

    Example search result returned by Google Search Engine
    Example search result returned by Google Search Engine

    4. Page Generator Determines Which Page Element to Display

    The page generator works by receiving search results from the search engine and processing search results through various search classification systems (e.g. search classifiers) and then identifying page elements to be displayed to the user.

    What is the Google Page Generator

    The Google Page Generator is the component that determines which page elements to display and generate the page containing selected elements. The page generator uses the Page Element Trigger Manager and multiple search classifiers in its operations.

    Example of the components of Google's page generator
    Example of the components of the page generator

    By using search results to determine page elements to be displayed, the page generator creates a page that represents the search engine’s predominant interpretation of the user’s query (e.g. search intent).

    5. Search Classifiers Return a Classification Response

    The Google page generator classifies search results using multiple search classifiers and returns a classification response.

    What are Google Search Classifiers?

    Google search classifiers are components of the page generator that generate a classification based on search results. According to the “Classifying Search Results” patent, there are 5 search results classifiers:

    • URL Classifier
    • Label Classifier
    • Label Histogram Classifier
    • Snippet Classifier
    • Title Classifier

    What is a Classification Response?

    A classification response is an HTTP response that contains a classification set and the number of documents analyzed in the classification. The classification set is a set of individual classifications where each classification contains a category and a score. Here is an example image of a classification response returned by search classifiers.

    Example of classification response data structure at Google
    Example of classification response data structure at Google

    What is a Category?

    A category, also known in the patent as a result category, is a content category that classifies a search result. It probably does so in similar way to the categories returned by the classifyText method of the Natural Language API.

    See a full list of Google’s content categories.

    Categories may be structured as a tree that contains a root category. Each root category may have child categories and child categories may be parents of further child categories.

    Example of category hierarchies

    How Google Search Classifiers Work?

    Google search classifier generate a classification based on a portion of the search results generated by the search engine.

    They analyze portions of search results and then return a classification response from the analysis. Each classifier analyses a different portion of a search result (e.g. position, title, URL, snippet, label or label histogram).

    Search Results Data Structure

    Here is an example image of a search results data structure where each search result has a position, title, URL, snippet and optional labels. Each item will be analyzed by a different classifier.

    Example of search results portions at Google
    Example of search results data structure at Google

    Example Flowchart for Determining Classifications

    This flowchart shows how Google performs classification depending on the search classifier.

    How the Google Search Classifier determines a classification response
    How the Google Search Classifier determines a classification response

    How URL, Title and Snippet Classifiers Works

    Google uses the same logic flow for the URL, the title and the snippets classification systems.

    Determining URL, Title and Snippet classification starts by finding categories and weights for each search result. Weights are scaled and aggregated by category.

    How the Google Search Classifier Determines URL, Title or Snippet Classification
    How Google Determines URL, Title or Snippet Classification (fig.6)

    Step 602 changes based on the classifier (URL, title or snippet).

    Determining URL Classification for a Search Result

    The URL classifier tries to determine categories and scores for a URL in order to produce an output similar to the one below.

    How URL classifier determines URL classification
    How URL classifier determines URL classification

    To determine categories and weights for a specific search result for a given URL, Google assigns categories and weights for the URL, and for the hostname and then calculates a weighted average.

    • URL: https://example.com/path
    • Hostname: example.com
    How Google Determines URL category and Weights when classifying based on the URL
    How Google Determines URL category and Weights when classifying based on the URL (fig.7)

    The weighted average is calculated by averaging the multiplied hostname and full URL weights of all categories.

    w = hostname_weight * full_url_weight
    w_avg = avg(cat1_weight + cat2_weight + ... + catn_weight)
    How Google calculate weighted average of the full URL and hostname only
    How URL classification weights are assigned
    Handling Position Bias

    The Page element triggering manager receives the weights and then takes care of the position bias. The position bias is the tendency of users to click on the top ranked search results.

    Example position bias
    Example position bias

    In this case however, the position bias is used to determine that top ranked search results are more relevant to the user, and thus their categories should weigh more than pages ranked at the bottom of the page.

    To take care of the position bias, the page element triggering manager applies a position bias function to the category weights.

    Scaling weights according to position bias
    Scaling weights according to position bias

    Finally, results are aggregated to determine the score for each category.

    Determining Title and Snippet Classification for a Search Result

    The Title and Snippet classifiers determine categories and weights for the corresponding text contents by splitting the titles and snippets in sequences of words (e.g. ngrams). Then, it identifies a category for each sequence.

    Below is a table representation of the output from the title and snippets classifiers where categories and weights have been assigned to each sequence of words for the following title: “National Institute of Health”.

    Determine Categories and weights for title and snippet classifiers
    How Google Determines Categories and weights for title and snippet classifiers
    How the Weights are Assigned

    Weights are assigned where the longer the matching sequence is, the greater the weight. This is because matching longer sequences is less ambiguous and provides a greater the confidence that the category is the right one for the sequence.

    Determine URL  title or snippet category and Weights
    How Google Determines URL category and Weights when classifying based on the title or snippet

    Determining Label Classification for a Search Result

    Classifying search result based on the label from a search results
    Classifying search result based on the label from a search results

    To determine label categories and weights for a specific search result, Google inspects the top search results returned by the search engine and assigns categories and weights for each label. Then, the category is scored checking what labels are associated with it.

    How Google Determines Scores for Label Classifier
    How Google Determines Scores for Label Classifier

    Determining Label Histogram Classification for All Search Results

    Classifying a search results based on label distribution of search results
    Classifying a search results based on label distribution of search results

    To determine label histogram categories and weights for all search results returned for a query, Google inspects the top labels distributions for the query.

    Then, each label is mapped to a category (e.g. Disease -> Health).

    Then, each category is scored based on the distribution. The score is determined by taking the following ration:

    Score = label frequency of category / Sum of all the label frequencies
    How Google Determines Scores for Label Histogram Classifier
    How Google Determines Scores for Label Histogram Classifier

    After determining the label histogram score, scores are then sent to the Page Element Trigger Manager.

    6. Page Element Trigger Manager Picks What to Show

    The page element trigger manager receives the aggregated classification weights and use them to determine the page elements to be shown based on the classification response.

    What is the Page Element Trigger Manager?

    The page element trigger manager is the component of the page generator that generates a page based on a set of results classifications.

    Generating a Score for the Classification Set

    The page element trigger manager receives the search results classification set (multiple classifications). From the set, it filters individual classifications with low scores. Then it determines a score for the entire classification set.

    Using that scores of that set, it determines which elements to display.

    How page element trigger manager uses classification response and query classification to generate a page
    How page element trigger manager uses classification response and query classification to generate a page

    Scoring Categories for a Search Query Based on the Classifier Weights

    Once the classifications with low scores are filtered, the triggering manager determines the scores of the categories based on the weights received by the classifiers.

    Determine scores based on classifier weights
    Determine scores based on classifier weights

    Combining All Scores for Categories

    To aggregate all scores, each classifier score is given a weight. For example, title may weigh more than snippet. Then, weighted scores are combined to each other to provide a final average category score.

    How Google Determines Scores based on classifiers weights
    How Google Determines Scores based on classifiers weights

    All the scores together form a result classification set. The result classification set is then used to decide which page element to display.

    Determining Page Elements to Display

    The page element trigger manager uses the results set classification to determine which page to display by generating negative and positive signals.

    Page element trigger manager determining page elements to display

    Determining Negative Signals

    Negative signals are conditions in which page elements may not be displayed.

    It checks if the classification of the query in the top ranked search results classifications based on the scores.

    How page element trigger manager determines negative signals
    How page element trigger manager determines negative signals used to determine page elements to display

    For example, if the query “burns medical conditions” is categorized in the health/health_conditions category. Are the ranked documents also in the same category?

    For example, if the query “burns medical conditions” is categorized in the health/health_conditions category. Are the ranked documents also in the same category? This example below would thus the page element related to medical conditions would be displayed.

    In the example below, we would say that the health/health_conditions category has a high weighted average score.

    Thus the page element related to medical conditions would be displayed.

    Determining Positive Signals

    Positive signals is simply the condition that tells that highest scored page elements should be displayed.

    How page element trigger manager determines positive signals
    How page element trigger manager determines positive signals

    What Categories is the Patent About?

    • SERP enhancements
    • Knowledge panels

    Definitions

    Patent termDefinition
    Page elementPortion of a page including some content
    SnippetExcerpt from a web page
    LabelLabel describing the content of a web page
    Label HistogramList of labels and corresponding frequencies
    Page Element Trigger ManagerComponent of the page generator that generates a page based on a set of results classifications
    Page GeneratorComponent that determines which page elements to display and generate the page containing selected elements
    Search ClassifierComponent of the page generator that determines the classification based on a search result
    URL ClassifierComponent of the search classifier that determines the classification based on a URL
    Label ClassifierComponent of the search classifier that determines the classification based on a Label
    Label Histogram ClassifierComponent of the search classifier that determines the classification based on a label distribution inside search results
    Snippet ClassifierComponent of the search classifier that determines the classification based on a Snippet
    Title ClassifierComponent of the search classifier that determines the classification based on a Title
    Query ClassifierComponent that determines a query classification based on a Query. Also known as query classification system.
    Query ClassificationClassification of a search query based on query features of search queries and query classification model. It includes the category and a score. Examples of query classifications include products (e.g., query relates to the purchase of goods), images/videos, geographic location (e.g., query related to a map), news, adult content (e.g., pornography).
    Search engineSoftware that sorts information from documents stored in an index and presents the most relevant search results to the user query.
    Classification responseClassification determined based on search results that includes a classification set (multiple classifications). The classification response also includes the number of documents.
    Classification setSet of multiple classifications assigned to a search result.
    Search result classificationClassification of a search result generated from various elements in search results and represented as a category and a score
    Search resultsResults returned by the search engine in response to a search query. They include hits from the search and information about the hits. Search results include a result set and a label histogram.
    Search results setSet of multiple search results.
    Search resultIndividual search result returned by the search engine in response to a search query. Each search result represents a hit in search. Each search result contains a position, a URL, a title, a snippet and a label.
    SnippetExcerpt from a web page
    Query classification modelMachine learning model used to classify search queries and trained on query data and search log data (e.g. labelled queries).
    Number of documentsNumber of search results used by the classifier to generate the classification.
    Position Bias
    Natural bias towards clicking on results that are lower position (closer to top) than higher position (closer to bottom). Position bias is also used in search result classification to give higher weights to search results located closer to the top.

    Google Search Infrastructure Involved

    The Classifying search results patent mentions these elements from the Google Search Infrastructure:

    • Search Engine
    • Page Generator
      • Page Element Trigger Manager
      • Search Classifier
        • URL Classifier
        • Label Classifier
        • Label Histogram Classifier
        • Snippet Classifier
        • Title Classifier
    • Query Classifier

    Patent Details

    NameClassifying search results
    AssigneeGoogle LLC
    Filed2013-11-25
    Assigned2017-10-02
    StatusActive
    Expiration2027-10-11
    Application14/088,689
    InventorTania Bedrax-Weiss,Ramanathan V. Guha, Patrick F. Riley, Corin Anderson
    PatentUS9043322B2

    Conclusion

    The Classifying Search Results patent showed us how Google may decide which page element to show in search based on search results classifications.

    When search results had a high enough score for a given category, Google may display a page element relevant to that category.

    Enjoyed This Post?