Query Categorization Based on Image Results

Query categorization can be used by Google to improve image search results by comparing image annotations to query categories.

In this post, I will cover some of what I have learned on the Google Image Search Engine reading the Google patent titled “Query Categorization Based on Image Results” by Anna Majkowska and Cristian Tapus.

Using query categorization to improve image search results for a query

Navigation Show

What is the Patent About?

This patent shows how Google may re-rank image search results by associating categories to queries and

Process of Ranking Images and Where this Patent Takes its Place

Before we dive into the patent that is core of this article, let’s have an overview of the process of ranking images.

First of all, a user performs a search query.

Then, the image search engine performs an initial ranking of images using text associated with images such as described in the image retrieval patent.

Next, Google refines the search results based on image features in a process described in the “Ranking of images and image labels” and “How Google Selects the Most Representative Image of a Set” patents.

In addition to reranking images based on their features, Google may re-rank images based on query categorization such as will be described in this article.

How Query Categorization Improves Search Results?

By assigning categories to queries and images, Google can re-order the image search results.

Reordering ca be done by increasing or decreasing the ranks of images that belong, or not, to the categories assigned to the query.

Process of Assigning Categories to Queries

When the image search engine receives the query, the query and the initial image search results are passed through to the query categorizer engine.

The categorizer engine gathers information from the user behaviour data, the image annotations and the query categories repositories.

It analyzes image results for the query to determine if the query belongs to one or more categories.

It can analyze either all images or images that have been selected often enough for the given query.

Initial ranking of Images for a query
Relevant images are selected based on user behaviour data
Images are annotated based on their surrounding content
Query classifiers use image annotations to predict the query category
Images are re-ordered based on matching image annotations and query categories

Flowchart for categorizing image queries at Google

1. Initial Ranking of Images

When Google receives an image search query from the user, it proceeds to an initial ranking of image search results based using typical image retrieval technique.

For the query, images are ranked and returned along with an IR score

2. Selecting Images: User Behaviour Data

Google may, or may not, use all the images returned in the initial search results.

User behaviour data is a statistic assigned to each image based on the historical interactions with it.

User behaviour data can be something like:

Total number of selection
Clicks on results after viewing the image
Long click / Short click

Google may filter a subset of images using a threshold on one of the selected user behaviour metric.

3. Generating Image Annotations: Image Annotators

Google first generates image annotations on image search results for a given query using similar techniques such as described in this paper.

Example of image annotations at Google.

number of faces in the image
a size of each face
dominant colours of the image
if image contains text or a graph
if image is a screenshot
hash or finger print to determine image duplication

The result is an image vector with image annotations.

Annotations may also be used to cluster image Queries at Google.

4. Classifying Queries: Query Categorizer Engine

The query categorizer engine analyzes image results for a query to determine the categories that can be assigned to the query.

Using image search results, annotations and optionally user behaviour data, the module tries to associate queries to categories.

What are Query Categories and How Google Uses Them?

The categorizer engine tries to assign categories to queries.

Example of a *query-category associations table*

One example of a mentioned category that can be assigned to queries is “red query” assigned to the query “rose”. For this category, the image search engine could re-order search results to increase the rank of images that are annotated with the corresponding dominant colour.

Example of the robustness of the “red query” category. Here “red” is not in the query.

This categorization, or annotation, of queries provides advantages over the parsing of the query:

It works for any language without the need for translation
Does not require the term to be in the query
It does not increase the rank of red images for queries such as “red sea” or “RED the movie”
Labels can be used to further enhance search results (see the “blue” label)

How Google Categorizes Queries Using Query Classifiers

Queries are categorized by the categorizer engine using machine learning in a series of software called query classifiers.

For simplicity, as this is not a machine learning tutorial, but a tutorial to learn about SEO, I will not go into the details of the ML algorithm described in the patent.

What the classifier essentially tries to achieve is to look at image search results for a query and classify the intent behind the query using various machine learning algorithms (e.g. AdaBoost Boosting Algorithm).

Query classifiers use the image annotations of the initial results to predict the category of the query.

Types of intent are shown in the image below.

Single person classifier. AdaBoost is used to generate the strongest predictor capable of evaluating the probability that a query’s intent is to get single persons.
Divers/Homogenous classifier. Clustering algorithm groups similar images into clusters to estimate if the query is so very specific as to make it relevant to show duplicate images.
Screenshot classifier. Probability that the query calls for images that are screenshots (boosting)
Text/graph classifier. Probability that query calls for images that contains text or graph (boosting)
Color classifier. Probability that query calls for image dominated by a color (boosting)

5. Ranking and Re-Ranking Image Search Results Based on Category

Before we describe how images ranks are re-ordered using query categories, let’s set some groundworks. Before re-ranking, all of this happened:

the ranking engine generated an initial set of search results
user behaviour data was used to create a smaller subset of images to work with
the image annotators generated image annotations for images of the initial search results based on visual content of images
the Query Categorizer Engine categorized queries in a vector that can be used as a measure of relevance by the rank modifier engine.

Afterwards, the rank modifier engine starts to generate modification factors for each of the ranked images.

The rank modifier engine will loop through each query categories to compare against image annotations.

If a query has a specific category, all the images that were annotated with the same category will see their rank increase.

If an image has multiple category annotations (single face and screenshot) and the query is categorized as a single person‘s query, then the image’s score will increase for being annotated as a single face, but will decrease for being a screenshot.

What Categories is the Patent About?

Re-Ranking
Image SEO
Annotations
Query processing
Machine learning at Google

Definitions

Patent term	Definition
Query Categorization	Assigning a query category to a query
Query category	Structured annotation for a query
Query categorizer engine	Software that categorizes queries to improve relevance of search results
Image annotation	Structured annotation of an image
Query classifier	Machine learning model used to categorize queries
User behaviour data	Quality of result statistic assigned to documents
Quality of result statistic	Statistic derived from indicators that describe past user behaviour
IR Score	Information retrieval score used to sort document for a query

Google Search Infrastructure Involved

The Query Categorization Based on Image Results patent mentions these elements from the Google Search Infrastructure:

Search System
- Search Engine
  - Indexing Engine
  - Scoring Engine
  - Ranking Engine
  - Rank Modifier Engine
  - Categorizer Engine
    - Query Classifiers
      - Single Person Query Classifier
      - Divers Homogenous Query Classifier
      - Screenshot Query Classifier
      - Text Query Classifier
      - Graph Query Classifier
      - Color Query Classifier
    - Image Annotators
      - Face Annotator
      - Fingerprint Annotator
      - Screen Shot Annotator
      - Text Annotator
      - Graph/Chart Annotator
      - Dom. Color Annotator
Result selection logs
Tracking component
Index DB
User behaviour data repository
Image annotations repository
Query categories repository

Query Categorization Based on Image Results FAQ

How query categories are used in ranking of images?

After initial ranking, image annotations are compared with query categories. Matching and non-matching elements increase and decrease the IR score.

What are the query categories?

According to the patent, Google tries to identify faces, screenshots, text, graphs, colours and whether the query allows duplicate images.

What is the query categorizer engine?

Software used in Google’s search engine to generate category for queries and to re-order images based on relevance implied from matching image annotations to query categories.

What is re-ranking in search?

Re-ranking is the process of re-ordering a set of initial search results to improve their relevancy to the user.

Image Ranking Factors Mentioned in the Patent

Below is a list (as mentioned in the patent) of ranking signals used by the scoring function:

Where (and how often) query text appears in document text surrounding an image
Image caption
Alternative text for the image
How common the query terms are in the search results indexed by the search engine

Patent Details

Name	Query Categorization Based on Image Results
Assignee	Google LLC
Filed	2022-03-28
First filed	2009-12-29
Status	Pending
Expiration	2031-05-27
Application	17/705,478
Inventor	Anna Majkowska, Cristian Tapus
Patent	US20220215049A1

Conclusion

We now have learned one of the many ways Google re-order initial image search results.

By assigning a query category based on image annotations of an initial set of search results, Google can compare how relevant images are to the query.

This is it for this overview of query categorization at Google for image search results.

5/5 - (1 vote)

Jean-Christophe Chouinard

SEO Strategist at Tripadvisor, ex- Seek (Melbourne, Australia). Specialized in technical SEO. Writer in Python, Information Retrieval, SEO and machine learning. Guest author at SearchEngineJournal, SearchEngineLand and OnCrawl.