Excel tips or tricks for duplicate content madness?
-
Dearest SEO Friends,
I'm working on a site that has over 2,400 instances of duplicate content (yikes!).
I'm hoping somebody could offer some excel tips or tricks to managing my SEOMoz crawl diagnostics summary data file in a meaningful way, because right now this spreadsheet is not really helpful. Here's a hypothetical situation to describe why:
Say we had three columns of duplicate content. The data is displayed thusly:
|
Column A
|
Column B
|
Column C
URL A
|
URL B
|
URL C
|
In a perfect world, this is easy to understand. I want URL A to be the canonical. But unfortunately, the way my spreadsheet is populated, this ends up happening:
|
Column A
|
Column B
|
Column C
URL A
|
URL B
|
URL C
URL B
|
URL A
|
URL C
URL C
|
URL A
|
URL B
|
Essentially all of these URLs would end up being called a canonical, thus rendering the effect of the tag ineffective. On a site with small errors, this has never been a problem, because I can just spot check my steps. But the site I'm working on has thousands of instances, making it really hard to identify or even scale these patterns accurately.
This is particularly problematic as some of these URLs are identified as duplicates 50+ times! So my spreadsheet has well over 100K cells!!! Madness!!! Obviously, I can't go through manually. It would take me years to ensure the accuracy, and I'm assuming that's not really a scalable goal.
Here's what I would love, but I'm not getting my hopes up. Does anyone know of a formulaic way that Excel could identify row matches and think - "oh! these are all the same rows of data, just mismatched. I'll kill off duplicate rows, so only one truly unique row of data exists for this particular set" ? Or some other work around that could help me with my duplicate content madness?
Much appreciated, you Excel Gurus you!
-
Choose one of the URL's as the authoritive and remove the dupped content from the others.
-
FMLLC,
I use Excel 2010 so my approach would be as follows:
-
Make a backup copy of your file before you start.
-
You will need to sort each row by value, but Excel has a 3 sort level limit, so you will need to add a macro.
-
Assuming your data starts in A1 and has no header row, Put it in a general module, go back to excel, activate your sheet, then run the macro from Tools=>Macro=>Macros.
Sub SortEachRowHorizontal()
Dim rng As Range, rw As Range
Set rng = Range("A1").CurrentRegion
For Each rw In rng.Rows
rw.Sort Key1:=rw(1), _
order1:=xlAscending, _
Header:=xlNo, _
OrderCustom:=1, _
MatchCase:=False, _
Orientation:=xlLeftToRight
Next
End Sub
- Then Highlight all your cells and then go to Data -> Remove Duplicates
The result should be all unique rows. I hope this helps.
-
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
99/100 in MOZPRO but content not indexed ?
Hi All, I am just new to Moz pro. We have a site that has 99/100 for its page optimisation score in Moz pro but it's still not indexed anywhere in the first 5 pages? Any ideas what I'm missing and how to start resolving?
Moz Pro | | timcbambrick0 -
My "tag" pages are showing up as duplicate content. Is this harmful?
Hi. I ran a Moz sitecrawl. I see "Yes" under "Duplicate Page Content" for each of my tag pages. Is this harmful? If so, how do I fix it? This is a Wordpress site. Tags are used in both the blog and ecommerce sections of the site. Ecommerce is a very small portion. Thank you. | |
Moz Pro | | dlmilli1 -
Duplicate Content for Default Document Domains
I've noticed recently that within the Moz Crawl Report I keep seeing duplicate content for one of our pages that pulls from a default document. The pages are product pages, one ending in releases/ and the other ending in releases/index and are both identical pages. Normally in these situations I would prefer to make sure that every link is being sent to the releases/ page, however according to Moz, the releases/index page is actually ranking better and has a higher internal link count. Can someone advise me on the best way to deal with this situation? Hopefully I've explained myself well enough! Thanks Sam
Moz Pro | | BlueLinkERP0 -
Fresh Web Explorer: Will frequent posting of relevant links to own content in comments of blogs and articles hurt rankings?
Fresh Web Explorer finds daily about 10 articles/blog posts where I could post in comment section a relevant link to content on my own website, which I really belief may be of interest to many readers of the articles. I would like to do this for the traffic that these links are generating to my site (intention is not ranking improvement).
Moz Pro | | lcourse
Would I need to be concerned that it could affect negatively my ranking if I post so many times links to my own site in the comment sections and also considering that I would always link to the same about 5 landing pages on my site? To give some context. The site for wich I want to build links for is long established PageRank 6 site with legit links from about 1000 different root domains. I would only post to articles published within last 48h. Would I need to be concerned about the frequency of the posting and if so, any ideas what may be a reasonable frequency to post my links?0 -
Duplicate Content Issue because of root domain and index.html
SEOMoz crawl diagnostics is suggesting that my root domain and the rootdomain/index.html are duplicate content. What can be done to ensure that both are considered as a single age only?
Moz Pro | | h1seo1 -
"Duplicate Page Title" Problem - Please Help
Hello, My website is categorized into 2 main categories. Sci/Tech (Has 4 sub-categories) Gadgets(Has 2 sub-categories) The Crawl diagnostic tool shows "Duplicate Page Title" error on Gadget's sub-categories while there's no error on the Sci/Tech. I don't really know how to get rid of these errors. Anyone has a solution to this?
Moz Pro | | MighteeObvious0 -
Analytics to Excel
I am looking for a Excel template to pull GA directly into the spreadsheets. I have seen them around before, but I have always had trouble with them. Does anyone know a good how to or template somewhere? Hopefully this can include some areas to customize areas such as date and fields.
Moz Pro | | KJ-Rodgers1 -
Duplicate Content Issues with WordPress
I'm having some difficulty with a few of the sites I'm managing right now. When I run a report here, I'm getting a duplicate content issue with sites that I'm running through WordPress. Sites running on a different CMS are not getting the issue. The duplicate content is being listed as from two URL's that are identical. I checked trailing slash, spelling, capitalization, everything. It looks like the same site is being marked as two with duplicate content. Does anyone have any ideas of what could be causing this and/or what I may be able to do to resolve the issue (or if it's really something to worry about or not)? Thanks. (and thanks for helping the new guy!)
Moz Pro | | DeliaAssociates0