Skip to content
Search engines 5511dd3

A Glossary of Information Retrieval Terminology

Rand Fishkin

The author's views are entirely their own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

Table of Contents

Rand Fishkin

A Glossary of Information Retrieval Terminology

The author's views are entirely their own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

A Glossary of Information Retrieval Terminology

Many times when reading through complex threads, research papers or even blogs by some of the more advanced SEOs in the industry, I get lost in the meaning of terms and an entire paragraph or document can be lost to my ignorance. Luckily, great resources like the Modern Information Retrieval Glossary from Berkeley University.

I've picked out some of the more important terms to know:

  • Clustering - the grouping of documents which satisfy a set of common properties. The aim is to assemble together documents which are related among themselves. Clustering can be used, for instance, to expand a user query with new and related index terms.
  • E measure - an information retrieval performance measure, distinct from the harmonic mean, which combines recall and precision.
  • Generalized vector space model - a generalization of the classic vector model based on a less restrictive interpretation of term-to-term independence.
  • Information retrieval - (IR) part of computer science which studies the retrieval of information (not data) from a collection of written documents. The retrieved documents aim at satisfying a user information need usually expressed in natural language.
  • Latent semantic indexing - an algebraic model of document retrieval based on a singular value decomposition of the vectorial space of index terms.
  • Probabilistic model - a classic model of document retrieval based on a probabilistic interpretation of document relevance (to a given user query).
  • Stemming - a technique for reducing words to their grammatical roots.
  • TREC collection - a reference collection which contains over a million documents and which has been used extensively in the TREC conferences. The TREC collection has been organized by NIST and is becoming a standard for comparing IR models and algorithms.
  • Zipf's Law - an empirical rule that describes the frequency of the text words. It states that the i-th most frequent word appears as many times as the most frequent one divided by iø, for some ø <= 1.
Back to Top

With Moz Pro, you have the tools you need to get SEO right — all in one place.

Read Next

The Helpful Content Update Was Not What You Think

The Helpful Content Update Was Not What You Think

Sep 05, 2024
How to Optimize for Google's Featured Snippets [Updated for 2024]

How to Optimize for Google's Featured Snippets [Updated for 2024]

Aug 20, 2024
How Will Google’s Antitrust Ruling Affect You?

How Will Google’s Antitrust Ruling Affect You?

Aug 08, 2024

Comments

Please keep your comments TAGFEE by following the community etiquette

Comments are closed. Got a burning question? Head to our Q&A section to start a new conversation.