Should I set up a disallow in the robots.txt for catalog search results?
-
When the crawl diagnostics came back for my site its showing around 3,000 pages of duplicate content. Almost all of them are of the catalog search results page. I also did a site search on Google and they have most of the results pages in their index too. I think I should just disallow the bots in the /catalogsearch/ sub folder, but I'm not sure if this will have any negative effect?
-
One step at a time = long term success. I wish you the best with it Jordan.
-
Thanks Alan, you are right this site has quite a long way to go. The first crawl was just finished and I notice that the most errors were due to dupe content so I decided I would try and tackle that first. Thank you for all the pointers, I will be taking a look at all those as soon as I can.
-
Totally agree with Alan, it can cause circular navigation problems for crawlers too.
-
Jordan,
Others might have a different view, however that's exactly what I recommend to clients. but only if you've got other html link based ways for bots to get to all the content in a direct manner, and have a good sitemap.xml file to reinforce that.
I am happy to see that you have a sound overall site architecture, however I see no robots.txt file at your root so I'm not sure what's up with that. Also your sitemap.xml file only has 43 URLs in it. that's a problem not because google can't find content by other means, it's just that I've found Google likes that reinforcement, and Bing especially does a better job discovering content with a proper sitemap.xml submitted through their webmaster system (they're less efficient at discovering content by other means).
I'd also suggest you have a big push ahead in dealing with near-duplicate content.
For example:
http://www.durafaucet.com/mk850-orb.html
http://www.durafaucet.com/kitchen-faucets/mk850.html
Sure, these are unique products. Except there's already so little unique content on either page that the common content compounded by the site-wide replication of top, sidebar and footer content means the total weight of uniqueness is on the very minor end of the spectrum.
And then there's the issue of a complete lack of inbound link authority - OpenSiteExplorer.org might be wrong, but currently shows almost no inbound links. Not only will you need inbound links to the home page, but also to as many inner pages as is realistic in terms of implementation capabilities go. This is especially true for category level pages. (including a variety of inbound link anchor text - brand, domain, keyword phrase and generic text).
So if you don't address those type of issues, removing all the dupes that show up in search now won't result in as much long-term value as you'll need.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How can I make it so that robots.txt is not ignored due to a URL re-direct?
Recently a site moved from blog.site.com to site.com/blog with an instruction like this one: /etc/httpd/conf.d/site_com.conf:94: ProxyPass /blog http://blog.site.com
Technical SEO | | rodelmo4
/etc/httpd/conf.d/site_com.conf:95: ProxyPassReverse /blog http://blog.site.com It's a Wordpress.org blog that was set as a subdomain, and now is being redirected to look like a directory. That said, the robots.txt file seems to be ignored by Google bot. There is a Disallow: /tag/ on that file to avoid "duplicate content" on the site. I have tried this before with other Wordpress subdomains and works like a charm, except for this time, in which the blog is rendered as a subdirectory. Any ideas why? Thanks!0 -
Google Search Results Flip-Flop
For a site we manage, Google can’t seem to decide which of two pages to present for a search for “skid steer attachments.” Almost weekly, it flip-flops from the home page to an interior page (which is a shopping cart category page that we have not actually optimized for the phrase.) The site is berlon.com. Have any of you had a similar experience and, if so, how did you address it? I’ve attached a Moz screen shot that shows the changes. mNfmJoY
Technical SEO | | PKI_Niles0 -
Website no longer visible Search Results
Overnight my website no longer appears in search engines for the two keywords I use. The website has been nicely climbing up (very steady progress to 42 and 73) the overnight it has vanished off the Radar. I have checked my webmaster account, no messages etc. Please can anyone shed any light on why this has happened? Website is http://www.securityjobsuk.co.uk Many thanks in advance for any help with this. D
Technical SEO | | SJUK0 -
Noindex search result pages Add Classifieds site
Dear All, Is it a good idea to noindex the search result pages of a classified site?
Technical SEO | | te_c
Taking into account that category pages are also search result pages, I would say it is not a good idea, but the whole information is in the sitemap, google can index individual listings (which are index, follow) anyway. What would you do? What kind of effects has in the indexing of the site, marking the search result pages as "search results" with schema.org microdata? Many thanks for your help, Best Regards, Daniel0 -
How does google know a search result is a search result?
In the google webmaster forums, google specifically states that you should not include search results in the google index. What is the best way to make dynamic, great content show in search results without receiving a penalty?
Technical SEO | | nicole.healthline0 -
Does a CDN affect search rankings?
I feel kind of stupid asking this, but if i use one it would speed things up quite a bit. It is for a ecommerce website, any guidance on this would be awesome!
Technical SEO | | Hyrule0 -
Need Help With Robots.txt on Magento eCommerce Site
Hello, I am having difficulty getting my robots.txt file to be configured properly. I am getting error emails from Google products stating they can't view our products because they are being blocked, and this past week, in my SEO dashboard, the URL's receiving search traffic dropped by almost 40%. Is there anyone that can offer assistance on a good template robots.txt file I can use for a Magento eCommerce website? The one I am currently using was found at this site here: e-commercewebdesign.co.uk/blog/magento-seo/magento-robots-txt-seo.php - However, I am getting problems from Google now because of it. I searched and found this thread here: http://www.magentocommerce.com/wiki/multi-store_set_up/multiple_website_setup_with_different_document_roots#the_root_folder_robots.txt_file - But I felt like maybe I should get some additional help on properly configuring a robots for a Magento site. Thanks in advance for any help. Please, let me know if you need more info to provide assistance.
Technical SEO | | JerDoggMckoy0