How does Google decide what content is "similar" or "duplicate"?
-
Hello all,
I have a massive duplicate content issue at the moment with a load of old employer detail pages on my site. We have 18,000 pages that look like this:
http://www.eteach.com/Employer.aspx?EmpNo=26626
http://www.eteach.com/Employer.aspx?EmpNo=36986
and Google is classing all of these pages as similar content which may result in a bunch of these pages being de-indexed. Now although they all look rubbish, some of them are ranking on search engines, and looking at the traffic on a couple of these, it's clear that people who find these pages are wanting to find out more information on the school (because everyone seems to click on the local information tab on the page). So I don't want to just get rid of all these pages, I want to add content to them.
But my question is...
If I were to make up say 5 templates of generic content with different fields being replaced with the schools name, location, headteachers name so that they vary with other pages, will this be enough for Google to realise that they are not similar pages and will no longer class them as duplicate pages?
e.g. [School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
Something like that...
Anyone know if Google would slap me if I did that across 18,000 pages (with 4 other templates to choose from)?
-
Hi Virginia,
Maybe this whiteboard Friday can help you out.
-
Hey Virginia
That is essentially what we call near duplicates and is the kind of content that can easily be created by pulling fields out of a database and dynamically creating the pages and dropping name, address etc into the placeholders.
Unique content is essentially that, unique content so this approach is probably not going to cut it. You could have certain elements pulled like this such as the address but you need to either remove these duplicate blocks and keep it more simple (like a business directory) and ideally add some unique elements to each page.
These kinds of pages often still rank for very specific queries and also often well thought out landing pages that link to pages like this that have value for users but are not search friendly can be a strategy.
So, assess how well these work as landing pages from search or are they coming in elsewhere? If they come in elsewhere you could no index these pages or block them in robots.txt. Then, target the bigger search terms higher up the tree and create good search landing pages that link to these other pages for users.
This is a real good read to get a better handle on duplicate content types and the relevant strategies:
http://moz.com/blog/fat-pandas-and-thin-content
Hope that helps
Marcus
-
Hi Virginia,
If you take your pages as a whole, code and all, the only slight difference in those pages is the
tag and the sidebar info with school address. The rest of the page code is exactly the same.
If you were to create 5 templates similar to:
[School name] is a busy and dynamic school led by [headteachers name] who achieve excellence every year from ofsted. Located in [location], [school name] offers a wide range of experiences both in the classroom and through extra-curricular activities, we encourage all of our pupils to “Aim Higher". We value all our teachers and support staff and work hard to keep [school name]'s reputation to the highest standards.
If all you are doing is changing the [school name] ans [location] etc, I'm sure Google will still flag these pages as duplicate content.
Unique content is the best way. If theres not a lot of competition for the school name and the page has enough content about each individual school, head teacher etc, then "templates" might work. You can try it out but I'd say unique content is the best way. It's the nature of the beast with so many pages.
Hope this helps.
Robert
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google says Geolocation Redirects Are Okay - is this really ok ?
Our aim is to send a user from https://abc.com/en/us to** https://abc..com/en/uk/ **if they came to our US English site from the UK So we came across this document - https://webmasters.googleblog.com/2014/05/creating-right-homepage-for-your.html We are planning to follow this in our international website based on the article by google : automatically serve the appropriate HTML content to your users depending on their location and language settings. You will either do that by using server-side 302 redirects or by dynamically serving the right HTML content. Will there be any ranking issues/ penalty issue because of following this or because of 302 redirects ? **Another article - **https://www.seroundtable.com/google-geolocation-redirects-are-okay-26933.html
White Hat / Black Hat SEO | | NortonSupportSEO0 -
Its posible to use Google Authorship in an online shop?
Today I installed Google Authorship in my Wordpress Blog and I would like to know if its posible to implement it in my Opencart online shop. I am not interested in rich snippets because I have 9k of products and the 90% of them dont have sells nor reviews
White Hat / Black Hat SEO | | mozismoz0 -
Traffic Generation Visitor Exchange Systems & Google Algo / Punihsments
So, in recent years some services have been developed such as Engageya I want to ask the experts to weigh in on these types of services that generate traffic. I know of sites that have achieved higher ranking via these NON-bot, user browser visitors. Here's their own explanation. Any thoughts will be appreciated. I could not find what Google's Matt Cutts has to say about these affairs, I suspect not very good things. However, I KNOW of sites that have achieved higher ranking, with about 30-40% of traffic coming from similar systems to this. Join our exclusive readers exchange ecosystem Engageya offers an exclusive readers exchange ecosystem - either within the network only, or cross-networks as well - enabling participating publishers to exchange engaged readers between them in a 1:1 exchange ratio. No commissions involved! Why networks work with Engageya? Create traffic circulation within your network - increase your inventory and impressions within your existing properties.Engage readers within your network and experience an immediate increase in network's page views. Enjoy readers'- exchange from other networksOur engine intelligently links matching content articles together, from within your network, as well as from other networks. Get new audiences to your network for non-converting users clicking out. New revenue channel - monetize pages with reader-friendly content ad units, while making your readers happy!This is the time to move from aggressive and underperforming monetization methods - to effective and reader-friendly content advertising.
White Hat / Black Hat SEO | | Ripe
Let our state-of-the-art semantic & behavioral algorithms place quality targeted content ads on your publisher's content pages. Enjoy highest CTRs in the industryContent ads are proven to yield the highest CTRs in the industry, starting at 2% and up to 12% click-through rates! This is simple. Readers click on an article they are interested-in, whether it's sponsored or not. Enhance your brand - Offer your publishers private-label content recommendations today, before someone else does.Content advertising is becoming more and more common. New content advertising networks and suppliers are being introduced into the online advertising market, and, sooner or later, they are going to approach your publishers. Engageya offers you a private-label platform to offer your publishers the new & engaging content ad unit - today! Comprehensive reports and traffic control dashboardTrace the effectiveness of the content recommendations ad units, as well as control the traffic within your network.0 -
Noindexing Thin Content Pages: Good or Bad?
If you have massive pages with super thin content (such as pagination pages) and you noindex them, once they are removed from googles index (and if these pages aren't viewable to the user and/or don't get any traffic) is it smart to completely remove them (404?) or is there any valid reason that they should be kept? If you noindex them, should you keep all URLs in the sitemap so that google will recrawl and notice the noindex tag? If you noindex them, and then remove the sitemap, can Google still recrawl and recognize the noindex tag on their own?
White Hat / Black Hat SEO | | WebServiceConsulting.com0 -
Is using Zeus's gateway feature to display contents from the different URL OK to do?
I've been writing a blog on free hosting blog platform and planning to migrate that under my domain name as directory. myblog.ABCD.com to www.mydomain.com/myblog now, I've learned that my Zeus server has a way to show myblog.ABCD.com at mydomain.com/myblog without transferring anything by using the Gateway feature. This will save a lot of time and hassle for me, but my question is if this is ok to do?
White Hat / Black Hat SEO | | HypermediaSystems
Is there a chance that this could be considered a blackhat even though the content is mine? From the Zeus documentation:
"Gateway aliases enable users to request files from the new
web server, and receive them as if they were on the new server, when they are
still located on the legacy server. To the user, the files appear to be located on
the new server. " Thank you.0 -
Help required as difficulty removing Google algorithmic penalty
I am not an SEO expert but I am trying to recover my company's ranking on Google. We are a UK based baby shower company. Been established since 2003. We have used SEO companies a few years ago. On September 28th 2012 our rankings in Google dropped significantly on certain landing pages, others like our baby shower gifts page has remained position 1 for UK Google searches . Bing and Yahoo were unaffected. Searches for baby shower and baby shower decorations has gone from position 1 or 2 (behind wikipedia ) to these 2 landing pages being unranked in Google. I have for the first time ever gone through our back links, tried to locate bad or low quality links, emailed where possible, and set up in webmaster tools a dissavow file ( currently not acted upon by Google). I have also amended the text in the baby shower department so it does not read as keyword stuffed. It has been two and a half months now and sales has dropped significantly and me and the staff are getting very concerned. Our site is www.showermybaby.co.uk . We have not received a manual penalty. Any suggestions or help in removing this Google penalty would be greatly appreciated.
White Hat / Black Hat SEO | | postagestamp0 -
Google Sitemaps & punishment for bad URLS?
Hoping y'all have some input here. This is along story, but I'll boil it down: Site X bought the url of Site Y. 301 redirects were added to direct traffic (and help transfer linkjuice) from urls in Site X to relevant urls in Site Y, but 2 days before a "change of address" notice was submitted in Google Webmaster Tools, an auto-generating sitemap somehow applied urls from Site Y to the sitemap of Site X, so essentially the sitemap contained urls that were not the url of Site X. Is there any documentation out there that Google would punish Site X for having essentially unrelated urls in its sitemap by downgrading organic search rankings because it may view that mistake as black hat (or otherwise evil) tactics? I suspect this because the site continues to rank well organically in Yahoo & Bing, yet is nonexistent on Google suddenly. Thoughts?
White Hat / Black Hat SEO | | RUNNERagency0 -
Is it outside of Google's search quality guidelines to use rel=author on the homepage?
I have recently seen a few competitors using rel=author to markup their homepage. I don't want to follow suit if it is outside of Google's search quality guidelines. But I've seen very little on this topic, so any advice would be helpful. Thanks!
White Hat / Black Hat SEO | | smilingbunny0