In this series on learning SEO with Google patents, I will explain what the Encoding and Decoding systems are, how they works, and how Google may use them in their infrastructure to provide search results.
This tutorial is part of a series on learning information retrieval and learning SEO using Google patents, specifically related to the article on “Multi-stage query processing system and method for use with tokenspace repository“.
What are the Encoding and Decoding Systems?
The encoding and decoding systems are the systems used in Google’s information retrieval system for encoding and decoding data to conserve space.
They can use one of the multiple encoding algorithm to encode tokens, token attributes or inverted indexes.
- The encoding system is the system used for encoding documents for a tokenspace repository.
- The decoding system is the system used for decoding documents from a tokenspace repository.
How the Encoding and Decoding Systems Work?
The encoding and decoding systems work by encoding data into a compressed format that uses fewer bytes than it would ordinarily be used to store data. They then generate mappings that can be used by the decoding system to unpack and decompress the data values. For example, it may receive mappings from the lexicon generator for encoding and decoding.
Encoding and Compression Algorithms
Depends on the required task and the data to encode, the encoding system can use one of multiple compression and encoding algorithms such as the unary, gamma, variable byte or PForDelta:
According to the multi-stage query processing patent, the encoding system uses the variable byte, or variable-length data encoding algorithm.
Variable Length Data Encoding
The variable-length, or variable bytes, data encoding algorithm used to minimize the number of bytes used to represent the data.
Instead of spending too much time explaining the complex concept of variable length data encoding, I suggest that you watch this video that explains the concept explained in the patent:
Attribute Encoding and Decoding Systems
Attribute encoding and decoding systems are systems that use the variable-length encoding algorithm to encode attributes of tokens from a document.
Token attributes can be things like:
- Font attributes: bold, underlines, etc.
- Document position attributes: title, heading, etc.
- Metadata
- Features used to distinguish between tokens in a document
Which Patents Mentions the Encoding and Decoding Systems?
- System and method for encoding and decoding variable-length data
- Multi-stage query processing system and method for use with tokenspace repository
Google Parent Infrastructure Involved
Where does the Encoding and Decoding Systems falls into?
- Information Retrieval System
- Document Processing System
- Encoding System
- Query Processing System
- Decoding System
- Document Processing System
Google Children Infrastructure Involved
- Encoding System
- Pre-processor,
- Delta encoder,
- Variable-length data encoder
- Decoding System
- Delta Decoder,
- Variable-length data decoder
The Encoding and Decoding Systems is often used in conjunction with the tokenspace repository, the document processing system and the query processing system.
Definitions
Patent term | Definition |
---|---|
Encoding and decoding systems | Systems used in Google’s Information retrieval system for encoding and decoding data of variable-length to conserve space and increase throughput |
Encoding system | System used in Google’s information retrieval system for encoding documents for a tokenspace repository |
Decoding system | System used in Google’s information retrieval system for decoding documents from a tokenspace repository |
Tokenspace repository | Tokenized collection of documents |
Document Processing System | System that prepares the inverted index from documents, pages or sites retrieved from the document corpus. It processes the contents using an automated or manual process |
Token attributes | Labels to understand the importance, location of features that can help to distinguish between tokens in a documents (e.g. fonts, page location, etc.) |
SEO Strategist at Tripadvisor, ex- Seek (Melbourne, Australia). Specialized in technical SEO. Writer in Python, Information Retrieval, SEO and machine learning. Guest author at SearchEngineJournal, SearchEngineLand and OnCrawl.