Search Engine Blocked by Robot Txt warnings for Filter Search result pages--Why?
-
Hi,
We're getting 'Yellow' Search Engine Blocked by Robot Txt warnings for URLS that are in effect product search filter result pages (see link below) on our Magento ecommerce shop. Our Robot txt file to my mind is correctly set up i.e. we would not want Google to index these pages. So why does SeoMoz flag this type of page as a warning? Is there any implication for our ranking? Is there anything we need to do about this? Thanks.
Here is an example url that SEOMOZ thinks that the search engines can't see.
http://www.site.com/audio-books/audio-books-in-english?audiobook_genre=132
Below are the current entries for the robot.txt file.
User-agent: Googlebot
Disallow: /index.php/
Disallow: /?
Disallow: /.js$
Disallow: /.css$
Disallow: /checkout/
Disallow: /tag/
Disallow: /catalogsearch/
Disallow: /review/
Disallow: /app/
Disallow: /downloader/
Disallow: /js/
Disallow: /lib/
Disallow: /media/
Disallow: /.php$
Disallow: /pkginfo/
Disallow: /report/
Disallow: /skin/
Disallow: /utm
Disallow: /var/
Disallow: /catalog/
Disallow: /customer/
Sitemap: -
Thanks Keri for your advice
-
Thanks Rick for your advice
-
Like Rick said, it's just a "hey, make sure that you really wanted to do this" type warning, since you can easily write a robots.txt that blocks things you didn't really think would be blocked. Or someone else can modify the robots.txt without telling you, and this can be a warning that you need to go find someone and get that fixed.
-
So what your saying is:
1. SEOmoz says these pages can't get indexed by search engines because of our robot.txt
2. We don't want these pages indexed and blocked them using robots.txt
My initial reaction is: no problem, SEOmoz is just showing you as a 'confirmation warning' that these pages are not indexed, but since you did that on purpose, it's okay.
Hope this helps!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Safety Data Sheet PDFs are Showing Higher in Search Results than Product Pages
I have a client who just launched an updated website that has WooCommerce added to it. The website also has a page of Safety Data Sheets that are PDFs that contain information about some of the products. When we do a Google search for many of the products the Safety Data Sheets show up first in the search results instead of the product pages. Has anyone had this happen and know how to solve the issue?
Technical SEO | | teamodea0 -
Nofollow/Noindex Category Listing Pages with Filters
Our e-commerce site currently has thousands of duplicate pages indexed because category listing pages with all the different filters selected are indexed. So, for example, you would see indexed: example.com/boots example.com/boots/black example.com/boots/black-size-small etc. There is a logic in place that when more than one filter is selected all the links on the page are nofollowed, but Googlebot is still getting to them, and the variations are being indexed. At this point I'd like to add 'noindex' or canonical tags to the filtered versions of the category pages, but many of these filtered pages are driving traffic. Any suggestions? Thanks!
Technical SEO | | fayfr0 -
Onsite Search Engine
Hi, We have a search engine on our website. Whenever a user searches on our site it goes to subdomain - search.mysite.com. This domain is not hosted by us, is this negatively affecting our SEO? Does google interpret this as a bounce since a user is technically leaving our site to search for something? Are there any other ramifications this would have on our site? Thanks!
Technical SEO | | EcomLkwd1 -
Best use of robots.txt for "garbage" links from Joomla!
I recently started out on Seomoz and is trying to make some cleanup according to the campaign report i received. One of my biggest gripes is the point of "Dublicate Page Content". Right now im having over 200 pages with dublicate page content. Now.. This is triggerede because Seomoz have snagged up auto generated links from my site. My site has a "send to freind" feature, and every time someone wants to send a article or a product to a friend via email a pop-up appears. Now it seems like the pop-up pages has been snagged by the seomoz spider,however these pages is something i would never want to index in Google. So i just want to get rid of them. Now to my question I guess the best solution is to make a general rule via robots.txt, so that these pages is not indexed and considered by google at all. But, how do i do this? what should my syntax be? A lof of the links looks like this, but has different id numbers according to the product that is being send: http://mywebshop.dk/index.php?option=com_redshop&view=send_friend&pid=39&tmpl=component&Itemid=167 I guess i need a rule that grabs the following and makes google ignore links that contains this: view=send_friend
Technical SEO | | teleman0 -
Is it bad to have your pages as .php pages?
Hello everyone, Is it bad to have your website pages indexed as .php? For example, the contact page is site.com/contact.php and not /contact. Does this affect your SEO rankings in any way? Is it better to have your pages without the extension? Also, if I'm working with a news site and the urls are dynamic for every article (ie site.com/articleid=2323.) Should I change all of those dynamic urls to static? Thank You.
Technical SEO | | BruLee0 -
No Search Results Found - Should this return status code 404?
A question came up today on how to correctly serve the right status code on pages where no search results are found. I did a couple searches on some major eccomerce and news sites and they were ALL serving status code 200 for No Search Results Found http://www.zappos.com/dsfasdgasdgadsg http://www.amazon.com/s/ref=nb_sb_noss?url=search-alias%3Daps&field-keywords=sdafasdklgjasdklgjsjdjkl http://www.ebay.com/sch/i.html?_trksid=p5197.m570.l1313&_nkw=dfjakljgdkslagklasd&_sacat=0 http://www.cnn.com/search/?query=sdgadgdsagas&x=0&y=0&primaryType=mixed&sortBy=date&intl=false http://www.seomoz.org/pages/search_results?q=sdagasdgasdgasg I thought I read somewhere were it was recommended to serve a status code 404 on these types of pages. Based on what I found above, all sites were serving a 200, so it appears this may not be the best practice. Any thoughts?
Technical SEO | | WEB-IRS0 -
How to block "print" pages from indexing
I have a fairly large FAQ section and every article has a "print" button. Unfortunately, this is creating a page for every article which is muddying up the index - especially on my own site using Google Custom Search. Can you recommend a way to block this from happening? Example Article: http://www.knottyboy.com/lore/idx.php/11/183/Maintenance-of-Mature-Locks-6-months-/article/How-do-I-get-sand-out-of-my-dreads.html Example "Print" page: http://www.knottyboy.com/lore/article.php?id=052&action=print
Technical SEO | | dreadmichael0 -
How do search engines treat urls that end in hashtags?
How do search engines treat urls that end in hashtags? For example, www.domain.com/abc#xyz.
Technical SEO | | nicole.healthline0