How to Block Urls with specific components from Googlebot

TheMartingale

Hello,

I have around 100,000 Error pages showing in Google Webmaster Tools. I want to block specific components like com_fireboard, com_seyret,com_profiler etc.

Few examples:

http://www.toycollector.com/videos/generatersslinks/index.php?option=com_fireboard&Itemid=824&func=view&catid=123&id=16494

http://www.toycollector.com/index.php?option=com_content&view=article&id=6932:tomica-limited-nissan-skyline-r34--nissan-skyline-gt-r-r34-vspec&catid=231&Itemid=634

I tried blocking using robots.txt. Just used this

Disallow: /com_fireboard/
Disallow: /com_seyret/

But its not working. Can anyone suggest me to solve this problem.

Many Thanks

Shradda

RyanKent

I agree with Sha that your 404 page has a nice appearance. My main concern is it lacks functionality.

If I click on a link to your site and end up on that page, what is my next action? Likely I would hit the <back>button on my browser and leave your site. It is either that or typing a URL.</back>

I recommend you offer users the option to stay on your site. Your site navigation, a search box, some links, anything would be helpful.

ShaMenz

Hi Shradda,

I agree with Ryan that the use of a meta noindex tag is the preferable way to block the pages, but obviously there may be difficulties with applying the tag, depending upon how your pages are generated and whether you are able to alter the code or not.

You can also use ?option=com_fireboard etc to create 301 redirects back to a higher order category page or search.

You should be able to use a single line of code to 301 all pages within each directory.

Using 301 redirects will also send a signal to search engines to de-index those pages.

Very clever 404 page too! Had to watch him go all the way across the page and back just so I knew I wasn't missing anything!

Sha

RyanKent

You can log into Google Webmaster Tools and adjust your parameter settings. It was designed for this exact purpose. Site Parameters > URL Parameters. If you use this solution, be sure to do the same in Bing WMT as well.

A better solution would be to noindex the pages. Using robots.txt should be avoided when possible.

If you do need to use robots.txt, your current disallow statement is set up to not crawl the folder named "com_fireboard". You intention is to not crawl the parameter ?option=com_fireboard. I know wildcards work for the trailing portion of a path but I have not tried them for the beginning part of the path.

I suggest you try the following:

Disallow: ?option=com_fireboard

For more on the robots.txt file, please view the following site: http://www.robotstxt.org/

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

How to Block Urls with specific components from Googlebot

Browse Questions

Explore more categories

Related Questions

Submitted URL has crawl issue - Submitted URL seems to be a Soft 404 - but all looks fine

Site scraped over 400,000 urls

Canonical URL on frontpage

Sitemap Contains Blocked Resources

URL not indexed but shows in results?

URL Structure: When to insert keywords?

Why googlebot indexing one page, not the other?

URL Length