Google’s Encoding and Decoding Systems (What they are and How they Work)

In this series on learning SEO with Google patents, I will explain what the Encoding and Decoding systems are, how they works, and how Google may use them in their infrastructure to provide search results.

This tutorial is part of a series on learning information retrieval and learning SEO using Google patents, specifically related to the article on “Multi-stage query processing system and method for use with tokenspace repository“.

Google Encoding and Decoding Systems
Google Encoding/Decoding Systems

What are the Encoding and Decoding Systems?

The encoding and decoding systems are the systems used in Google’s information retrieval system for encoding and decoding data to conserve space.


Subscribe to my Newsletter


They can use one of the multiple encoding algorithm to encode tokens, token attributes or inverted indexes.

  • The encoding system is the system used for encoding documents for a tokenspace repository.
  • The decoding system is the system used for decoding documents from a tokenspace repository.

How the Encoding and Decoding Systems Work?

The encoding and decoding systems work by encoding data into a compressed format that uses fewer bytes than it would ordinarily be used to store data. They then generate mappings that can be used by the decoding system to unpack and decompress the data values. For example, it may receive mappings from the lexicon generator for encoding and decoding.

Encoding and Compression Algorithms

Depends on the required task and the data to encode, the encoding system can use one of multiple compression and encoding algorithms such as the unary, gamma, variable byte or PForDelta:

Inverse index encoding algorithms
Inverted index compression algorithms from “Scalability Challenges in Web Search Engines”

According to the multi-stage query processing patent, the encoding system uses the variable byte, or variable-length data encoding algorithm.

Variable Length Data Encoding

The variable-length, or variable bytes, data encoding algorithm used to minimize the number of bytes used to represent the data.

Instead of spending too much time explaining the complex concept of variable length data encoding, I suggest that you watch this video that explains the concept explained in the patent:

Attribute Encoding and Decoding Systems

Attribute encoding and decoding systems are systems that use the variable-length encoding algorithm to encode attributes of tokens from a document.

Token attributes can be things like:

  • Font attributes: bold, underlines, etc.
  • Document position attributes: title, heading, etc.
  • Metadata
  • Features used to distinguish between tokens in a document

Which Patents Mentions the Encoding and Decoding Systems?

Google Parent Infrastructure Involved

Where does the Encoding and Decoding Systems falls into?

Google Children Infrastructure Involved

  • Encoding System
    • Pre-processor,
    • Delta encoder,
    • Variable-length data encoder
  • Decoding System
    • Delta Decoder,
    • Variable-length data decoder

Encoding and Decoding Systems is related to

The Encoding and Decoding Systems is often used in conjunction with the tokenspace repository, the document processing system and the query processing system.

Definitions

Patent termDefinition
Encoding and decoding systemsSystems used in Google’s Information retrieval system for encoding and decoding data of variable-length to conserve space and increase throughput
Encoding systemSystem used in Google’s information retrieval system for encoding documents for a tokenspace repository
Decoding systemSystem used in Google’s information retrieval system for decoding documents from a tokenspace repository
Tokenspace repositoryTokenized collection of documents
Document Processing SystemSystem that prepares the inverted index from documents, pages or sites retrieved from the document corpus. It processes the contents using an automated or manual process
Token attributesLabels to understand the importance, location of features that can help to distinguish between tokens in a documents (e.g. fonts, page location, etc.)
Enjoyed This Post?