In this post, we will learn in great details how the Google’s Image Search Engine works and the engineering behind it.
In this article we will learn:
- What is Google Images;
- Wow it differs from Google Web Search;
- Introduction to how the Google Images algorithm works;
- Deeper dive into the engineering behind Google Images based on patent research
What is the Image Search Engine?
Google’s Image Search Engine, also known as Google Images, is the search engine that is used for searching images online.
How to Access Google Images?
There are 3 ways to access Google Images:
- On google.com, select “Images” in the top right corner;
- Use the direct URL images.google.com;
- Perform for a query in web search and select “images” from the menu.
What Does the Image Search Engine Do?
The image search engine is the part of the image search system that crawls and indexes images, receives search queries for images, ranks images and returns search results in response to the image query.
TL;DR – How the Image Search Engine Results are Ranked?
Serving image search results to the user is complex and require some work before the query and other work at the time of the query.
Here is what happens before you search for an image.
Google’s web crawler follows links to web pages. On these pages Google may find images. Before storing the images, to provide context for the image, the crawler extracts textual information relevant to the images (anchor, alt attribute, text surrounding the images, image caption, metadata and more).
Once the crawler has extracted everything, it stores the images and its related information into an image repository, also know as the index, or image index.
When indexing images, Google identifies image features, determines the similarity between images and excludes near-duplicates from the index keeping only the most representative of the set.
The image feature scores and similarity score are added in the index along with the image.
Here is what happens when you search for an image.
Google takes your query and processes it to understand what it means.
Then, it uses information retrieval techniques to match the search query to the textual information associated with the images available in the image index.
This returns an inverted index showing the most relevant images at the top.
After, Google re-ranks the image search results.
The images that have the highest aggregate similarity score to the rest of the top-ranked images for the query get favourable treatment. This is so because the image that is the most similar to all images is deemed more representative for the query (Remember near-duplicates were removed first).
Also, when looking at images Google estimates the most important features for the query (color, shape, …). The images showing the most important features also get re-ranked closer to the top.
Then, once re-ranked, the image results are returned to you along with an image thumbnail, a short title and a link to the image.
What is the Difference Between Google Search and Google Image Search?
The difference between Google’s Web Search Engine and Image Search Engine lies in the way the information retrieval is made to serve results to a user query.
Google’s Web Search Engine searches for multiple types of text-based documents across the web and supplements its search results using alternate proprietary search engines such as the image search engine.
Google’s Image Search Engine searches for media matching queries, making it more dependant on filenames, metadata and other text associated with images.
How Does the Google Image Search Engine Algorithm Works?
The Google Image Search engine algorithm is in a way more simplistic than web search as it involve less moving parts, but can also be more complex as it requires a lot of processing power to return relevant media to queries without text-based content.
The core of the image search engine algorithm relies on matching search queries to text associated with images.
Example of text associated with images (mentions collected from various patents) :
- Image metadata such as the EXIF metadata
- Text from the webpage in which the image was linked from
- Text from the webpage that the image links to
- Anchor text of hyperlinks to the image
- Image Caption
Once the initial ranking of the image search results is generated, the top images are re-processed to provide more relevant image search results.
In that re-processing, images that are the most similar to all the other images in the subset are given more relevancy and images that are too different are re-ranked lower, or even removed.
Additional element can come in to make the results more relevant such as:
- Removing or giving more weight to images with faces based on the likelihood that the query is intended to find a person’s image
- Removing or giving more weight to images with specific landmarks or objects based on the likelihood that the query is intended to
- Giving more weight to higher quality images
A Deep Dive into the Image Search Engine
Now that we have a broad understanding of what the Image Search Engine does, we will dive deeper into the specifics of how it works.
To do so, we will cover the foundations by looking at the main patent behind Google’s Image Search Engine.
Main Patent Behind Google Search Engine
The main patent behind the Google Image Search Engine is a patent named “Ranking of images and image labels” by Yushi Jing and Shumeet Baluja.
|Name||Ranking of images and image labels|
|Inventor(s)||Yushi Jing, Shumeet Baluja|
Highlights of How the Image Search Engine Works
To summarize the entire article:
To rank well in Google Image, an image has to contain relevant terms in its meta data and surrounding text, be high-quality. Also, the image should be similar enough to the other images so that Google knows it is relevant, but not too similar so that Google knows that it provides value.
- Terms are associated with images and used to be matched against the query
- Google uses TF-IDF to evaluate the relevancy of a term to an Image
- The more relevant terms are associated with images, the likelier the images are to be retrieved during search
- The more similar an image is to all the images ranked for a query, the better it will rank
- In opposition, uniqueness of the image is calculated as part of the image quality score, be balanced.
- Similarity is estimated using feature similarity and co-click data
- Not all image features weigh the same depending on the query
- The higher quality an image is, the better it will rank
- Small alterations of images may lead to duplicates being considered as unique
- Landmarks, objects and persons are often raised in patents, making these elements important to Google when analyzing images.
Core Components of the Image Search Engine
From the moment the user performs a search query to the moment when image search results are shown in the browser, many separate things happen.
Separate modules (or softwares) within the image search engine are involved in the processing of the query, the processing of images and the ranking of search results.
As shown in the image below, the image search engine uses all these separate module in its algorithm.
- Image Crawling Module
- Image Terms Module
- Image Features Module
- Image Similarity Module
- Probabilities Module
- Image Indexing Module
- Image Search Server
- Scoring and Ranking Module
- Image Repository / Index
What is the Image Crawling Module?
The image crawling module is the module from the Image Search Engine that crawls content hosts (web pages) for images and associated content. The image crawler downloads copies of the images and store them for further processing by the image search engine.
What is the Image Terms Module and How Does it Work?
The Image Terms Module is the module from the Image Search Engine that extracts terms from metadata (e.g. filename, EXIF metadata, …) and content associated with an image and determines which terms may associated with the images.
Terms associated with an image can also be referred to as “label” or “keyword”.
What is the Process Flow of Assigning Terms to Images
The Image Term Module receives images, evaluate their similarity and likelihood of being co-selected. Then, an image score is given
Below, we’ll explain how it works specifically.
Where do Terms Come From?
The image terms module extracts terms from:
- Metadata associated with an image.
- Content associated with the image
- User testing
What Information Can Be Found in the Metadata?
The metadata of the image can contain terms such as:
- Camera make and model,
- Focal length,
- Metering mode,
- ISO speed,
- File size,
- File creation date,
- Image resolution information
For more details on image metadata extraction read the “Automatic curation of digital images” patent.
What Information Can be in the Content Associated With an Image?
The content associated with the image can include information such as:
- a text of a web page in which the image is associated,
- anchor text of hyperlinks to the image,
- text of a web page to which the image is the anchor of a links to that page
How Google can Gather Additional Information Through User Testing
Google can also gather terms for images through user testing. To do so, it shows images to users and ask them to provide terms for the given images
How the Relevancy of a Term to an Image is Evaluated
Each term associated with an image is scored to evaluate its relevancy. To do so, Google uses term frequency-inverse document frequency (TF-IDF) to calculate if the term is relevant to an image or if it is generic to all other images.
If a query does not satisfy a threshold, it is removed.
For instance, words like “the”, “that” and “a” are very common across all documents, so by definition not specific to a single document. Thus, their score is low, do not satisfy a threshold and removed.
What is the Image Features Module and How Does it Work?
The Image Feature Module is the module from the Image Search Engine that identifies the features of the images.
The image features are the features that Google can use to determine the degree of similarity between images.
Examples of image features:
The features can be extracted using Wavelet based techniques (e.g. wavelet transform).
The image feature module can also assign weights for each of the feature.
For example, images of flowers may require more weight on the color feature whereas images of tables may require more weight on the shape feature.
Once this is done, it sends the features to the Image Similarity Module.
What is the Image Similarity Module and How it Works
Google’s image Similarity Module is the module from the Image Search Engine that compares the features between pairs of images to determine the degree of similarity between the images.
High-level view of the Image Similarity Module
How it works at a high level is by comparing features (color, shape, etc.) between pairs of images.
What Features Can Be Analyzed?
Images can be analyzed in different ways.
Entire images can be analyzed, or alternatively, only specific elements of images may be compared (e.g. face, landmark, object).
Regardless, features are determined and compared.
In each case, a feature can be either a:
- Global feature: Analysis of a feature for the entire image
- Local feature: Analysis of a feature for each portions of an image
Detailed view of the Image Similarity Module
How the image similarity module works specifically is that it identifies objects of interests and splits an image in sections of geometric shapes (e.g. squares, circles, …).
The identification of object of interests is done with the registration algorithm and the process of splitting and identify frequencies in the image is called wavelet transform.
Selecting the Objects of Interests
To save computing resources, Google my identify objects of interests and compute the feature analysis only for the specific part of the image.
Alternatively, Google may also use the SIFT technique to identify objects of interest.
What feature should be used to compare images?
Some features such as the shape are more appropriate for photos containing products, whereas the color more appropriate for photos of flowers.
How to choose which is are the most important features describing a set of images?
Google uses the scale-invariant feature transform (SIFT) technique that identifies the features that are the most resistant to image noise.
Splitting the Image in Sections
Using wavelet-based techniques, Google splits images in sections and evaluates one section of the image at a time.
Then, one feature after the other, it evaluates the feature for each section of the image and create a histogram with the data.
After, it compares the histograms to see how similar each image are.
Defining Similarity by Comparing Histograms
Many techniques can be used to compare color histograms.
The technique mentioned in the patent consists of looping through each bin and adding the absolute difference between the selected bin of one image and the other.
The image similarity module is being discussed in more details in a separate patent which I covered in How Google Selects the Most Representative Image of a Set.
Defining Similarity by Comparing Visual Words
An alternative to histograms is mentioned in the “Near duplicate images” and “Image retrieval” patents.
Although, it works differently, the main idea is similar (and potentially more efficient in computing resources).
It creates feature regions by drawing geometrical shapes (e.g. ellipses), the same way as we described earlier. Then, it computes feature vectors for each shape. Finally, it assigns a visual word to the feature region.
Then, it determines similarity by counting how many visual words images have in common.
Defining Similarity based on Image Location
Similarity of images can also be compared based on the relationship between the pages or locations from which they were crawled.
For example, images from the same website can be considered more similar than images found in pages of independent websites.
Combining Image Similarity Module and Image Features Module
Once both computations are made by the image similarity module and the image features module, an algorithm computes both element together to get an image similarity score.
What is the Probabilities Module?
The probabilities module is the module from Google’s Image Search Engine that calculates the transitional probabilities between pairs of images.
The transitional probability, also known as the probability of navigation, is the probability that a user will select and image given another image.
How is the Transitional Probability Calculated?
The transitional probability, also know as co-click probability, is calculated using the the image term ranking function.
The image term ranking function is a probability distribution that represents the likelihood that a user will select image B after viewing Image A.
In other words, how likely is it that a user will co-select, or co-click, on two images?
The probability is based on two assumptions:
- Users are more likely to co-select similar images than dissimilar images
- Users are likely to ignore images that are too similar.
We see that pattern in the probability distribution where both highly different images (Image similarity = 0) and identical images (Image similarity = 1) have very low transitional probability.
How the Transitional Probability can be Enhanced?
The transitional probability, or co-click probability, function can be improved for the image relationships to be evaluated more accurately.
For example, it can be augmented with:
- Usage data from each user, or many users
- Image quality score
- Structural-context similarity score
- Click and co-click data estimation of likelihood of an image being viewed before or after another
What is Google’s Image Indexing Module?
Google’s image indexing module, or indexing engine, is the module from the Image Search Engine that indexes images to be displayed to the user.
It assigns an index along with terms associated with each image in a way that can easily be searched by the image search server.
In more details, the indexing engine maintains multiple posting lists in the index database.
Each posting list is a list of images in the image collection (image repository) that have the same visual word.
What is the Image Search Server and What Does it Do?
Google’s image search server, also known as the search component, is the part of the image search engine that receives the query, performs the search and returns the results to the user.
In some simplified way, discarding all the additional processing we just described, the image search server is the image search engine.
The image search server contains the Image Search Module.
What the image search server does:
- Receive a query
- Processes search queries for images
- Finds the images in the index that contains the terms
- Returns an inverted index of the search results
- Presents the search results to the user in a webpage that contains: image thumbnails, information on the result
Here is the image information that may be display with the image search result webpage:
- File size,
- File type,
- Image dimensions,
- Domain hosting the image.
What is the Scoring and Ranking Module
The scoring and ranking module is the module from the image search engine that provide scores used at various stages of the pipeline to improve the ranking of images.
The scoring and ranking module contains two groups of algorithms:
- The scoring algorithm provides scores for each term related to an image.
- The ranking algorithm provides scores each image related to a search term.
Sometimes, the scoring algorithm and the ranking algorithm are separated as different submodules, and others they are merged together.
The image search engine uses terms to index images. Ergo, the relevance of a term to a document impacts the quality of the search results. Consequently, Google provides rankings for terms associated with images.
Thus, the module provide scores for each of the terms associated with images.
In cases where a query image is used as a search query, the ranking module can also identify indexed images that go beyond the threshold of common visual words with the image query.
How the Scores can be Used
Scores can be used for different purposes, at different stages, by different modules.
Those scores can be used to:
- Identify the best term for an image
- Identify good terms describing an image to be used as keywords for the image indexing module
- Augment the transitional probability function to learn from image relationships
- Provide the most relevant image(s) for a given term
How Ranks are Calculated?
The rank and scoring module also computes the ranking function to determine the scores of an image for a specific term.
The ranks are scored based on probability adjusted scores for other images of the image index. These score are also weighted based on the probability of navigating between images.
In a nutshell, the ranking of the image is based on not only its similarity with other images, but on the probability of users clicking on that image after seeing other images in the set.
The ranking algorithm essentially divides at the number of times a query is associated with an image with the number of times the query is appears in all the images of the index.
This simplified view helps to illustrate the process.
The algorithm however is more complex.
Once the values are initialized, they are multiplied with the transition probabilities and the resulting value is being used in the next iteration until convergence. Results can also be improved using TF-IDF to remove noisy terms.
Details of the functions are available in the patent and will not be discussed here.
What is the Image Repository / Index
The image repository, also known as the image collection, is the database that stores known images and their information.
The image index is the database that stores the indices of images in the image repository in a way that can be “searched” by the image search engine.
The image index receives images from the image crawler and then stores the images along with an index and terms associated with the image.
A copy of an image does not need to be stored to be in the index.
How Image Quality Score is Estimated?
The Image quality score can be estimated by the image quality control module of the image search engine by using one or more query-independent indicators:
- Color depth of the image
- How much the image is in focus
- Level of saturation of the image
The image quality is scored between 0 and 1.
More details of how the image quality control module works can be found by reading the “Automatic curation of digital images” patent.
How Duplicate Images Can be Identified
Showing duplicate images is rarely useful the the user. Thus, Google tries to identify duplicate, or near-duplicate, images in many ways:
- Images with identical EXIF metadata
- Images with with identical, or too similar, calculated image features
Based on another patent named “image retrieval”, the image search system cluster near-duplicate images into collections of near-duplicate images and indexing the collection instead of indexing each images individually.
According to that same patent, images taken at different times by different cameras, with different compression, different cropping, can still be considered near-duplicate images.
Re-Ranking Image Search Results
We have seen in the Image Retrieval patent how Google provides an initial set of image search results for a query.
Now, in a patent named “Selection of an Image or Images Most Representative of a Set of Images”, the Google engineer mentions the refinement component as the software that re-ranks image search results.
In a later patent named “Query Categorization Based on Image Results”, a “rank modifying engine” is mentioned. It is unclear if they are both the same or separated pieces of software.
Regardless of the name, re-ranking is done based on the features of the images such as we described extensively in the “Ranking of images and image labels” patent and the query categorization module mentioned in the “Query Categorization Based on Image Results”.
How the Google Image Search Engine Works?
- The Image indexing component receives and parse the HTML of a document
The web crawler provide documents to the indexing component that parses the documents for text and images. It then stores images, links to images and descriptive text in the database
- The Search Component receives queries, search the database and return results
The generation of results is based on a comparison of the search query to the descriptive text associated with images.
- The Refinement Component ranks the results based on the features of the images
The ranking is made such as the images that are most representative of the initial image search results are rank above lower quality ones.
Patents Related to the Image Search Engine
The most complete patent regarding the image search engine is named “Ranking of images and image labels” and is the basis for most of the content you will read in this article.
- Ranking of images and image labels
- Clustering Queries for Image Search
- How Google Displays Images Within Web Search Results?
- How Google Selects the Most Representative Image of a Set
- Query Categorization Based on Image Results
- Near duplicate images
- Image retrieval
- Mapping images to search queries (covered by Olaf Kopp)
What Categories is the Patent About?
- Image SEO
|Search System||System that indexes content, receives query, ranks content and returns results for the query.|
|Image Search Query||Search query for which the intent is to get a particular image content.|
|Terms||Part of the query that can include words, phrases, characters and so forth.|
|Image Features||Features that can be used to identify similarities between images. Examples of features are intensity, color, edges, texture, etc.|
|Image term ranking function||Formula that evaluates the relevance of a term to an image|
|Image quality score||Metric defining the quality of images within a set of images|
|Structural-context similarity score||Metric evaluating the similarity of images based on the level of dependance of the websites where the images were found.|
|Wavelet transform||Evaluation of frequency of elements of an image that was split in geometrical shapes.|
|Transition probability||Probability of a user clicking on an image after seeing another|
|Posting List||Sequence of documents identifiers associated to a term (posting)|
Google Search Infrastructure Involved
The “Ranking of Images and Image Labels” patent mentions these elements from the Google Search Infrastructure:
- Image Search System
- Image Search Engine
- Image Crawling Module
- Image Terms Module
- Image Features Module
- Image Similarity Module
- Probabilities Module
- Scoring and Ranking Module
- Image Indexing Module
- Image Search Server
- Image search module
- Image Repository / Index
- Image Search Engine
That was a lot.
Google Image search results are backed by multiple components and algorithm to make sure that images are relevant to a query and that images are high-quality and informative to the user.
We have learned that when Google crawls images they gather terms from the image context and store those terms in the indexing module to enable retrieval.
We also have learned how the features module and the similarity modules work together with the probabilities modules to identify how similar images are and how likely users are to navigate between images.
All these elements are used to score terms and rank images when the image search server is requested by the user to display images for a given query.
We now know how the image search engine works and how Google ranks images.
SEO Strategist at Tripadvisor, ex- Seek (Melbourne, Australia). Specialized in technical SEO. Writer in Python, Information Retrieval, SEO and machine learning. Guest author at SearchEngineJournal, SearchEngineLand and OnCrawl.