Medium sizes forum with 1000's of thin content gallery pages. Disallow or noindex?

PixelKicks

I have a forum at http://www.onedirection.net/forums/ which contains a gallery with 1000's of very thin-content pages. We've currently got these photo pages disallowed from the main googlebot via robots.txt, but we do all the Google images crawler access.

Now I've been reading that we shouldn't really use disallow, and instead should add a noindex tag on the page itself.

It's a little awkward to edit the source of the gallery pages (and keeping any amends the next time the forum software gets updated).

Whats the best way of handling this?

Chris.

PhilNottingham

Hey Chris,

I agree that your current implementation, while not ideal, is perfectly adequate for the purposes of ensuring you don't have duplicate content or cannibalisation problems - but still allows Google to index the UCG images.

You're also preventing Googlebot from seeing the user profile pages, which is a good idea, since many of them are very thin and mostly duplicate.

So, from a pure SEO perspective, I think you've done a good job.

However... I think you should also consider the ethical implications of potentially blocking the image googlebot as well. By preventing Google from indexing all those images of young girls fawning over the vacuous runners up of a televised talent show, you would undoubtedly be doing the world a great service.

Devanur-Rafi

Hi Chris, I second Jarno's opinion in this regard. If it is going to be a huge overhead to add the page level blocking, you can rely on your current robots.txt setup. There is a small catch here though. Even if you block using robots.txt file, if Google finds a reference to the blocked content elsewhere on the Internet, then it would index the blocked content. In situations like this, page level content blocking is the way forward. So to fully restrict Google bot indexing your content, you should ideally be using the page level robots meta tag or x-robots-tag.

Here you go for more: https://support.google.com/webmasters/answer/156449?hl=en

Hope it helps.

Best,

Devanur Rafi.

JarnoNijzing

Chris,

is the disallow meta update is too complicated for you to add due to software issues etc. then I feel that your current method is the right way to go. Normally you would be absolutely right for the simple reason that page level overrules the robots.txt. But if a software update overrules the rules places in your code then you have to manually add it after each and every update and i'm not sure you want to do that.

regards

Jarno

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Medium sizes forum with 1000's of thin content gallery pages. Disallow or noindex?

Browse Questions

Explore more categories

Related Questions

No: 'noindex' detected in 'robots' meta tag

How to get into Google's Tops Stories?

Duplicate Content/Similar Pages

Implications of Disallowing A LOT of Pages

Specific pages won't index

Why is robots.txt blocking URL's in sitemap?

Duplicate page/Title content - Where?

Duplicate Content Home Page