Skip indexing the search pages

mtthompsons

Hi,

I want all such search pages skipped from indexing

www.somesite.com/search/node/

So i have this in robots.txt (Disallow: /search/)

Now any posts that start with search are being blocked and in Google i see this message

A description for this result is not available because of this site's robots.txt – learn more.

How can i handle this and also how can i find all URL's that Google is blocking from showing

Thanks

Mark_Ginsberg

Sure - you have urls that are being blocked by robots - you have this line in your robots.txt -

Disallow: /questions/search

It is thus preventing urls from within that folder, questions, which start with the word search from being crawled. What are you trying to accomplish with this block? If it's the folder search, within questions, it should be /questions/search/.

And the other warning is telling you these pages take a long time to load - check your server or these individual pages and see why that is taking so long.

mtthompsons

Thanks a lot, I assumed this because of the below 2 screenshots

The Sitemap shows warnings,. Is this something that you can help with identifying why we get these errors. 2 images that explain more

ojpbkJO PLlTbxW

Mark_Ginsberg

As Saijo said above, the meta robots noindex tag is the way to go. When you block a folder via robots.txt, you prevent Google from visiting and crawling that folder and any content within it. If Google has already crawled the content, they won't remove the content from their index just if you block it with robots.txt. The old version they have of the page will be stored and saved in their index, and they just won't be able to show you an updated snippet of the page due to the robots.txt block.

To remove the pages from the index completely, you can do one of 2 things -

in webmaster tools, go to the url removal section, and remove that folder from the index - this will only work when it's blocked via robots.txt
you can add a meta robots noindex tag to the pages/page template, and remove the robots.txt block - you need to remove the robots.txt block so the search engines can recrawl the pages, see the meta robots directive, and follow the noindex guide to remove the page.

In general, I would recommend using the meta robots noindex directive over the robots.txt, because it should work for all search engines, and you won't have to go into webmaster tools for each one. You also will ensure that you don't accidentally block other urls.

From your example above, if you just blocked the folder /search/, a page that includes the word search in the url but isn't in the blocked folder shouldn't be blocked from the search engines because of that line - I would check in webmaster tools the robots.txt section, because it doesn't look to me, based on your robots.txt file, that any url with search in it should be blocked.

Good luck,

Mark

mtthompsons

I guess i was not clear with my question.

So i have this in robots.txt (Disallow: /search/)

My intension yo place /search/ is to stop Google indexing any of my search posts

www.somesite.com/search/node/

Now whats happened is

www.somesite.com/questions/search-the-internet

Posts like above are also being blocked

Saijo.George

To Block search pages from the index you can try adding the META NOINDEX tag in the head section of the search pages

https://support.google.com/webmasters/answer/93710?hl=en

BlueprintMarketing

I would do a

Complete site audit

http://www.distilled.net/blog/seo/do-your-very-own-site-structure-audit/

http://yoast.com/articles/duplicate-content/

http://yoast.com/change-wordpress-permalink-structure/

http://yoast.com/wp-content/permalink-helper.php

http://yoast.com/wordpress-archive-pages/

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Skip indexing the search pages

Browse Questions

Explore more categories

Related Questions

Can I use high ranking sites to push my competitors out of the first page of search results?

Google dropping pages from SERPs even though indexed and cached. (Shift over to https suspected.)

Does adding subcategory pages to an commerce site limit the link juice to the product pages?

After I 301 redirect duplicate pages to my rel=canonical page, do I need to add any tags or code to the non canonical pages?

No existing pages in Google index

For large sites, best practices for pages hidden behind internal search?

If you only want your home page to rank, can you use rel="canonical" on all your other pages?

Discrepency between # of pages and # of pages indexed