A Glossary of Information Retrieval Terminology

February 14, 2005

The author's views are entirely their own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

By: Rand Fishkin

February 14, 2005

A Glossary of Information Retrieval Terminology

Search Engines

The author's views are entirely their own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

A Glossary of Information Retrieval Terminology

Many times when reading through complex threads, research papers or even blogs by some of the more advanced SEOs in the industry, I get lost in the meaning of terms and an entire paragraph or document can be lost to my ignorance. Luckily, great resources like the Modern Information Retrieval Glossary from Berkeley University.

I've picked out some of the more important terms to know:

Clustering - the grouping of documents which satisfy a set of common properties. The aim is to assemble together documents which are related among themselves. Clustering can be used, for instance, to expand a user query with new and related index terms.
E measure - an information retrieval performance measure, distinct from the harmonic mean, which combines recall and precision.
Generalized vector space model - a generalization of the classic vector model based on a less restrictive interpretation of term-to-term independence.
Information retrieval - (IR) part of computer science which studies the retrieval of information (not data) from a collection of written documents. The retrieved documents aim at satisfying a user information need usually expressed in natural language.
Latent semantic indexing - an algebraic model of document retrieval based on a singular value decomposition of the vectorial space of index terms.
Probabilistic model - a classic model of document retrieval based on a probabilistic interpretation of document relevance (to a given user query).
Stemming - a technique for reducing words to their grammatical roots.
TREC collection - a reference collection which contains over a million documents and which has been used extensively in the TREC conferences. The TREC collection has been organized by NIST and is becoming a standard for comparing IR models and algorithms.
Zipf's Law - an empirical rule that describes the frequency of the text words. It states that the i-th most frequent word appears as many times as the most frequent one divided by i^Ã¸, for some Ã¸ <= 1.

A Glossary of Information Retrieval Terminology

Table of Contents

A Glossary of Information Retrieval Terminology

With Moz Pro, you have the tools you need to get SEO right — all in one place.

Read Next

The Helpful Content Update Was Not What You Think

How to Optimize for Google's Featured Snippets [Updated for 2024]

How Will Google’s Antitrust Ruling Affect You?

Comments