Query categorization can be used by Google to improve image search results by comparing image annotations to query categories.
In this post, I will cover some of what I have learned on the Google Image Search Engine reading the Google patent titled “Query Categorization Based on Image Results” by Anna Majkowska and Cristian Tapus.
What is the Patent About?
This patent shows how Google may re-rank image search results by associating categories to queries and
Process of Ranking Images and Where this Patent Takes its Place
Before we dive into the patent that is core of this article, let’s have an overview of the process of ranking images.
First of all, a user performs a search query.
Then, the image search engine performs an initial ranking of images using text associated with images such as described in the image retrieval patent.
Next, Google refines the search results based on image features in a process described in the “Ranking of images and image labels” and “How Google Selects the Most Representative Image of a Set” patents.
In addition to reranking images based on their features, Google may re-rank images based on query categorization such as will be described in this article.
How Query Categorization Improves Search Results?
By assigning categories to queries and images, Google can re-order the image search results.
Reordering ca be done by increasing or decreasing the ranks of images that belong, or not, to the categories assigned to the query.
Process of Assigning Categories to Queries
When the image search engine receives the query, the query and the initial image search results are passed through to the query categorizer engine.
The categorizer engine gathers information from the user behaviour data, the image annotations and the query categories repositories.
It analyzes image results for the query to determine if the query belongs to one or more categories.
It can analyze either all images or images that have been selected often enough for the given query.
- Initial ranking of Images for a query
- Relevant images are selected based on user behaviour data
- Images are annotated based on their surrounding content
- Query classifiers use image annotations to predict the query category
- Images are re-ordered based on matching image annotations and query categories
1. Initial Ranking of Images
When Google receives an image search query from the user, it proceeds to an initial ranking of image search results based using typical image retrieval technique.
For the query, images are ranked and returned along with an IR score
2. Selecting Images: User Behaviour Data
Google may, or may not, use all the images returned in the initial search results.
User behaviour data is a statistic assigned to each image based on the historical interactions with it.
User behaviour data can be something like:
- Total number of selection
- Clicks on results after viewing the image
- Long click / Short click
Google may filter a subset of images using a threshold on one of the selected user behaviour metric.
3. Generating Image Annotations: Image Annotators
Google first generates image annotations on image search results for a given query using similar techniques such as described in this paper.
Example of image annotations at Google.
- number of faces in the image
- a size of each face
- dominant colours of the image
- if image contains text or a graph
- if image is a screenshot
- hash or finger print to determine image duplication
The result is an image vector with image annotations.
Annotations may also be used to cluster image Queries at Google.
4. Classifying Queries: Query Categorizer Engine
The query categorizer engine analyzes image results for a query to determine the categories that can be assigned to the query.
Using image search results, annotations and optionally user behaviour data, the module tries to associate queries to categories.
What are Query Categories and How Google Uses Them?
The categorizer engine tries to assign categories to queries.
One example of a mentioned category that can be assigned to queries is “red query” assigned to the query “rose”. For this category, the image search engine could re-order search results to increase the rank of images that are annotated with the corresponding dominant colour.
This categorization, or annotation, of queries provides advantages over the parsing of the query:
- It works for any language without the need for translation
- Does not require the term to be in the query
- It does not increase the rank of red images for queries such as “red sea” or “RED the movie”
- Labels can be used to further enhance search results (see the “blue” label)
How Google Categorizes Queries Using Query Classifiers
Queries are categorized by the categorizer engine using machine learning in a series of software called query classifiers.
For simplicity, as this is not a machine learning tutorial, but a tutorial to learn about SEO, I will not go into the details of the ML algorithm described in the patent.
What the classifier essentially tries to achieve is to look at image search results for a query and classify the intent behind the query using various machine learning algorithms (e.g. AdaBoost Boosting Algorithm).
Query classifiers use the image annotations of the initial results to predict the category of the query.
Types of intent are shown in the image below.
- Single person classifier. AdaBoost is used to generate the strongest predictor capable of evaluating the probability that a query’s intent is to get single persons.
- Divers/Homogenous classifier. Clustering algorithm groups similar images into clusters to estimate if the query is so very specific as to make it relevant to show duplicate images.
- Screenshot classifier. Probability that the query calls for images that are screenshots (boosting)
- Text/graph classifier. Probability that query calls for images that contains text or graph (boosting)
- Color classifier. Probability that query calls for image dominated by a color (boosting)
5. Ranking and Re-Ranking Image Search Results Based on Category
Before we describe how images ranks are re-ordered using query categories, let’s set some groundworks. Before re-ranking, all of this happened:
- the ranking engine generated an initial set of search results
- user behaviour data was used to create a smaller subset of images to work with
- the image annotators generated image annotations for images of the initial search results based on visual content of images
- the Query Categorizer Engine categorized queries in a vector that can be used as a measure of relevance by the rank modifier engine.
Afterwards, the rank modifier engine starts to generate modification factors for each of the ranked images.
The rank modifier engine will loop through each query categories to compare against image annotations.
If a query has a specific category, all the images that were annotated with the same category will see their rank increase.
If an image has multiple category annotations (single face and screenshot) and the query is categorized as a single person‘s query, then the image’s score will increase for being annotated as a single face, but will decrease for being a screenshot.
What Categories is the Patent About?
- Image SEO
- Query processing
- Machine learning at Google
|Query Categorization||Assigning a query category to a query|
|Query category||Structured annotation for a query|
|Query categorizer engine||Software that categorizes queries to improve relevance of search results|
|Image annotation||Structured annotation of an image|
|Query classifier||Machine learning model used to categorize queries|
|User behaviour data||Quality of result statistic assigned to documents|
|Quality of result statistic||Statistic derived from indicators that describe past user behaviour|
|IR Score||Information retrieval score used to sort document for a query|
Google Search Infrastructure Involved
The Query Categorization Based on Image Results patent mentions these elements from the Google Search Infrastructure:
- Search System
- Search Engine
- Indexing Engine
- Scoring Engine
- Ranking Engine
- Rank Modifier Engine
- Categorizer Engine
- Query Classifiers
- Single Person Query Classifier
- Divers Homogenous Query Classifier
- Screenshot Query Classifier
- Text Query Classifier
- Graph Query Classifier
- Color Query Classifier
- Image Annotators
- Face Annotator
- Fingerprint Annotator
- Screen Shot Annotator
- Text Annotator
- Graph/Chart Annotator
- Dom. Color Annotator
- Query Classifiers
- Search Engine
- Result selection logs
- Tracking component
- Index DB
- User behaviour data repository
- Image annotations repository
- Query categories repository
Query Categorization Based on Image Results FAQ
After initial ranking, image annotations are compared with query categories. Matching and non-matching elements increase and decrease the IR score.
According to the patent, Google tries to identify faces, screenshots, text, graphs, colours and whether the query allows duplicate images.
Software used in Google’s search engine to generate category for queries and to re-order images based on relevance implied from matching image annotations to query categories.
Re-ranking is the process of re-ordering a set of initial search results to improve their relevancy to the user.
Image Ranking Factors Mentioned in the Patent
Below is a list (as mentioned in the patent) of ranking signals used by the scoring function:
- Where (and how often) query text appears in document text surrounding an image
- Image caption
- Alternative text for the image
- How common the query terms are in the search results indexed by the search engine
|Name||Query Categorization Based on Image Results|
|Inventor||Anna Majkowska, Cristian Tapus|
We now have learned one of the many ways Google re-order initial image search results.
By assigning a query category based on image annotations of an initial set of search results, Google can compare how relevant images are to the query.
This is it for this overview of query categorization at Google for image search results.
SEO Strategist at Tripadvisor, ex- Seek (Melbourne, Australia). Specialized in technical SEO. In a quest to programmatic SEO for large organizations through the use of Python, R and machine learning.