Skip to content
Blogging f148f14

Separating Web Spam from Quality Content - What are the Metrics?

Rand Fishkin

The author's views are entirely their own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

Table of Contents

Rand Fishkin

Separating Web Spam from Quality Content - What are the Metrics?

The author's views are entirely their own (excluding the unlikely event of hypnosis) and may not always reflect the views of Moz.

Let's try a little exercise...

Common features of spam domains include:

  • Long domain names
  • .info, .cc, .us and other cheap, easy to grab TLDs
  • Short registration period (1 year, maybe 2)
  • High ratio of ad blocks to content
  • Javascript redirects from initial landing pages
  • Use of common, high-commercial value spam keywords like "mortgage," "poker," "texas hold 'em," "porn," "student credit cards," and related terms
  • Many links to other low quality, spam sites
  • Few links to high quality, trusted sites
  • High keyword frequencies and keyword densities
  • Small amounts of unique content
  • Very few direct visits
  • Very few links sent out in (non-spam) email to the site
  • Registered to people/entities not associated with trusted sites
  • Not frequently registered with services like Yahoo! Site Explorer, Google Webmaster Central or Live Webmaster Tools
  • Rarely have short, high value domain names
  • Often contain many keyword-stuffed subdomains
  • More likely to have longer domain names
  • More likely to contain multiple hyphens in the domain name
  • Less likely to have links from trusted sources
  • Less likely to have SSL Security certificates
  • Less likely to be in directories like DMOZ, Yahoo!, Librarian's Internet Index, etc.
  • Unlikely to have any significant quantity of branded searches
  • Unlikely to be bookmarked in services like My Yahoo!, Del.icio.us, Faves.com, etc.
  • Unlikely to get featured in social voting sites like Digg, Reddit, Yahoo! Buzz, StumbleUpon,  etc.
  • Unlikely to have channels on YouTube, communities on Facebook or links from Wikipedia
  • Unlikely to be mentioned on major news sites (either with or without link attribution)
  • Unlikely to register with Google/Yahoo!/MSN Local Services
  • Unlikely to have a legitimate physical address/phone number on the website 
  • Likely to have the domain associated with emails on blacklists
  • Often contain a large number of snippets of "duplicate" content found elsewhere on the web
  • Unlikely to contain unique content in the form of PDFs, PPTs, XLSs, DOCs, etc.
  • Frequently feature commercially focused content
  • Many levels of links away from highly trusted websites
  • Rarely contain privacy policy and copyright notice pages
  • Rarely listed in Better Business Bureau's Online Directory
  • Rarely contains high grade level text content (as measured by metrics like Fleisch-Kincaid Reading Level)
  • Rarely have small snippets of text quoted on other websites and pages
  • Cloaking based on user-agent or IP address is common
  • Rarely contain paid analytics tracking software
  • Rarely have online or offline marketing campaigns
  • Rarely have affiliate link programs pointing to them
  • Less likely to have .com or .org extensions
  • Almost never have .mil, .edu or .gov extensions
  • Rarely have links from domains with .edu or .gov extensions
  • Almost never have links from domains with .mil extensions
  • Rarely receive high quantities of monthly visits
  • Rarely have visits lasting longer than 30 seconds
  • Rarely have visitors bookmarking their domains in the browser
  • Unlikely to buy significant quantities of PPC ad traffic
  • Rarely have banner ad media buys
  • Likely to have links to a significant portion of the sites and pages that link to them
  • Extremely unlikely to be mentioned or linked-to in scientific research papers
  • Unlikely to use expensive web technologies (Microsoft Server & Coding Products that Require a Licensing Fee)
  • Likely to be registered by parties who own a very large number of domains
  • Unlikely to attract significant return traffic
  • More likely to contain malware, viruses or spyware (or any automated downloads)

For high quality content domains, the opposite is true (at least, for a good percentage of these). Now think about the sites you're building - which features apply to them? What could you do differently to be more like the "high quality" category and less like the "spam"?

BTW - Love to hear your take on features you think are common to spam, or to high quality sites.

Back to Top

With Moz Pro, you have the tools you need to get SEO right — all in one place.

Read Next

How to Make AI Your Writing Sidekick for Content Marketing

How to Make AI Your Writing Sidekick for Content Marketing

Mar 11, 2024
How a Small Travel Blog Gained Topical Authority: A Case Study

How a Small Travel Blog Gained Topical Authority: A Case Study

Dec 04, 2023
Author names: Do They Matter? How to Attribute Content

Author names: Do They Matter? How to Attribute Content

Sep 11, 2023

Comments

Please keep your comments TAGFEE by following the community etiquette

Comments are closed. Got a burning question? Head to our Q&A section to start a new conversation.