Duplicate page report

jimmyzig

We ran a CSV spreadsheet of our crawl diagnostics related to duplicate URLS' after waiting 5 days with no response to how Rogerbot can be made to filter.

My IT lead tells me he thinks the label on the spreadsheet is showing “duplicate URLs”, and that is – literally – what the spreadsheet is showing.

It thinks that a database ID number is the only valid part of a URL. To replicate: Just filter the spreadsheet for any number that you see on the page. For example, filtering for 1793 gives us the following result:

|

URL

http://truthbook.com/faq/dsp_viewFAQ.cfm?faqID=1793

http://truthbook.com/index.cfm?linkID=1793

http://truthbook.com/index.cfm?linkID=1793&pf=true

http://www.truthbook.com/blogs/dsp_viewBlogEntry.cfm?blogentryID=1793

http://www.truthbook.com/index.cfm?linkID=1793

|

There are a couple of problems with the above:

1. It gives the www result, as well as the non-www result.

2. It is seeing the print version as a duplicate (&pf=true) but these are blocked from Google via the noindex header tag.

3. It thinks that different sections of the website with the same ID number the same thing (faq / blogs / pages)

In short: this particular report tell us nothing at all.

I am trying to get a perspective from someone at SEOMoz to determine if he is reading the result correctly or there is something he is missing?

Please help. Jim

SamWeber

Hi Jim!

Thanks for the question. One thing we should clarify before we move forward is that the Pro app doesn't actually report on duplicate URLs, but we do report when we find duplicate title tags or content.

Duplicate titles just refer to when we find the same title tag on more than one page. In one example from your diagnostics, we're reporting the title tag 'Truthbook Religious News' is being used in multiple pages (http://screencast.com/t/GYCKNfAoj).

Duplicate content is content we see on the source code of your pages that is identical or nearly identical and would cause the pages to compete against each other for rankings. To fix either of these you have a several options:

Set up a 301 redirect to have the pages you would consider duplicate redirect to the main page.
- Change the content/title tags enough that they won't be considered duplicates
Canonicalize the content you would consider duplicates.

Most developers will go for the latter two options so that the pages will still be reachable by visitors. You can find out more about how to implement these in our Help Hub.

To answer your other questions:

1 - At the time of the crawl, we were able to get to sub domain pages from other pages on your site. The sub domains were also resolving separately, but they seem to be redirecting to your root domain now, so your next crawl should reflect this.

2 - Running a curl for the print versions of your pages, I see "no follow" tags related to Wikipedia links embedded (http://screencast.com/t/reYjeLLPvWG3) in the doc, but I'm not finding any "no index tags" (http://screencast.com/t/DsXMZInngSzH). This would be why you're seeing us crawling those pages.

3 - As I mentioned above, our crawler looks for similarities in the source code of pages when reporting on duplicate content. Since no one knows exactly how similar content would need to be for the search engines to consider it a duplicate, we err on the side of caution and recommended best practices when reporting them. Using one of the methods mentioned above and detailed in our Help Hub should resolve this for you

Let me know if you have any other questions!

Best,

Sam
Moz Helpster

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Duplicate page report

Browse Questions

Explore more categories

Related Questions

Duplicate Page Title error for an eCommerce store !!

Duplicate conent

How to set the crawler or reports to ignore

Branded PDF Report

I want to create a report of only de duplicate content pages as a csv file so i can create a script to canonicalize them.

On Page Ranking Tool Giving Weird Reports

Pages Crawled: 0 ?

Why aren't canonical tags reducing duplicate page title/content?