What is Considered Duplicate Content by Crawlers?

RyanRhodes

I am asking this because I have a couple of site audit tools that I use to crawl a site I work on every week and they are showing duplicate content issues (which I know there is a lot on this site) but some of what is flagged as duplicate content makes no sense.

For example, the following URL's were grouped together as duplicate content:

|

https://www.firefold.com/contact-us

|

https://www.firefold.com/gabe

| https://www.firefold.com/sale |

|

How are these pages duplicate content? I am confused on what site audit tools are considering duplicate content.

Just FYI, this is data from Moz crawl diagnostics but SEMrush site auditor is giving me the same type of data.

Any help would be greatly appreciated.

Ryan

RyanRhodes

Yea I just started working on this site. I haven't used Moz Analytics much so just wanting to see how their crawler crawls pages.

And yes I agree, there are a lot of BIG BIG BIG issues with this site.

I got a large workload over the next few months haha.

Lumina

I would add that there's is no text on any of those three pages - any "text" one would see there is actually just embedded in an image - which is a huge issue for a number of reasons:

Search engines see that there's no text - a big no-no.
You're getting practically no SEO value from the content that would be there, even if there isn't much.
It's heavier this way - which makes load times slower.

I want to clarify that there are many, bigger issues with these pages - but as your question concerns only duplicate content, I'll leave all of that out for the time being. To summarize, Google, Yahoo, and Bing are just seeing some duplicate banners, sidebars, etc. and then some images in the body of your pages. Hence, duplicate content.

RyanRhodes

Thanks for that information.

It makes sense looking at the data and pages from that perspective.

DavidLee

Hi Ryan!

Our crawler will flag pages that have at least 90% similarity in the entire source code of the site so not just the body.

The way you want to interpret the report is the contact-us page has 35 duplicates, so "gabe" and "sale" are not dupes of each other in this section but are only each a duplicate of "contact-us". Those URLs might appear with their own duplicates of the same pages further down in the report.

While on the front end the pages do not appear to be similar. The issue is likely with the amount of javascript code on those pages.

Our crawler cannot read javascript so we are likely only able to see the template of the page. Other search tools are probably seeing the same thing as it returns 79% similarity using this tool: http://www.freebulkseotools.com/similar-page-checker-tool.php

I can't provide much insight from a dev perspective but hope this helps!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

What is Considered Duplicate Content by Crawlers?

Browse Questions

Explore more categories

Related Questions

Crawler triggering Spam Throttle and creating 4xx errors

Moz Crawler Causing Server Timeouts... Crawling thousands of non-existant pages with query parameters

Duplicate Page Content

Alternative to Moz Content?

Duplicate content reported for totally different pages

Duplicate Content Report

Duplicate page titles

Blocked Production Site from Search Engines - How to get it Crawled by Moz Crawler