Big problem with my new crawl report
-
I am owner of small opencart online store. I installed http://www.opencart.com/index.php?route=extension/extension/info&extension_id=6182&filter_search=seo. Today my new crawl report is awful. The number of errors is up by 520 (30 before), up with 1000 (120 before), notices up with 8000 (1000 before). I noticed that the problem is with search. There is a lot duplicate content in search only. What to do ?
-
Thank you again Alan.
Typo fixed.
-
I use Bing search API,
By the way, you want to change from GET to POST, not the other way around.
-
Alan,
Thank you for the great advice. If one has enough control over the eCommerce system, or the internal site search product, to change from GET to POST so these pages act more like real dynamically generated "search pages" than an infinite amount of "landing pages" I think that is a fantastic solution. It would keep merchandisers and others from linking to those pages - because we all know that they will continue to do it even if the SEO pleads on hands and knees for them to stop.
However, I have found it to be the case that most eCommerce businesses (from small mom-n-pop shops to fortune 500 companies) do not have the ability to do this because the internal site search functionality they use is out of their hands. Site search vendors like Endeca and Celebros serving enterprise eCommerce businesses don't typically hand over the keys to the client.
If you know any site search vendors or solutions that allow one to do this it would make a great contribution to this thread if you could share a few of them. I'd definitely look into recommending them in the future!
Thanks again!
-
The problem with PR leaks is that they are scalable, If you are losing 10%, then you get some quality links, 10% of them will be wasted, every effort you do in the future will be discounted by 10%.
There are ways to fix all these problems, for example I would make a search to be POST and not GET so that links to search pages can not be made and therefor search pages will not get indexed.
We work so hard to get good links, why waste them when you do?
-
I have tried different methods to fix this. First-hand experience tells me that oftentimes it is better to just block the paths (assuming there is better navigation on the site) from being crawled or indexed using robots.txt than to use a noindex,follow tag in order to save the pagerank you're sending via internal links. It is very easy for Google to get bogged down crawling around in the internal search results area.
Unless there are lots of links to search pages from top pages on the site, or a big list of search page links from every page (sitewide footer, for example) I really don't think the waste of internal pagerank is noticeable in the rankings, or worth salvaging if it risks sending spiders into a maze or a trap.
Yes, best practice is not to link to pages that you are blocking. In the real world though, search pages can be very useful to visitors, and to merchandisers who don't have the ability to create more targeted sub-sub-sub categories will often use them, and link to them on the site, as landing pages for promotional purposes (emails, PPC, sales...).
Everyone has their own strategies, and all we can do is make recommendations based on our own experience and knowledge. Thanks for helping out with this question Alan. Feel free to elaborate so Anastas has more input to help guide his decision.
-
as long as no one is linking to the search pages including internal links.
-
Hello Anastas,
I agree that you should block the search folder from being indexed. I'm going to assume that nobody is linking to your search pages and that you have other paths (e.g. SEO-friendly navigation, sitemaps...) for search engines to use to access your products).
I don't understand why you have formatted the disallow statement that way, however. Unless I'm missing something (and could be since I don't know what your site is) you only need to do this:
Disallow: /product/search*
And of course after doing this you should test it in GWT to make sure that A: You are blocking the pages you want to block, such as search pages with lots of parameters, and B: You are NOT blocking other pages you don't want to block, such as product pages. Here is more info on where to find the testing tool in GWT if you don't know: http://productforums.google.com/forum/#!topic/webmasters/tbikAxJiIZ4
Let us know how it goes. Good luck.
-
Please I need help
-
I am using opencart. I dont know what to do. Before I had 50 errors, now they are more than 500 after this plug in. The plug in removed the previous errors, but now there are many different errors. I have 2 options:
1. Remove the plug in
2. Do something with new errors - the new errors are only because of search, I have dublicate page content because when you type PDODUCT NAME in search box, there is same content as www.mydomain.com/category1/PRODUCT NAME
Maybe this plug in removed the canonical urls in search or I dont know what.
In robots.txt there is row:
Disallow: /*?route=product/search
The duplicate content is mydomain.com/product/search&filter_tag=XXXXXX
Instead of XXXXX there are many paths.
I decided to add another row in robots.txt:
Disallow: /*?route=product/search&filter_tag=/
Do you thing it is correct or to remove the plug in?
I hope you understand what is the problem.
-
When you no index a page, any links pointing to those pages pour away link juice from you indexed pages. you should never no-index pages IMO
I assume you are using a CMS or some sort of plug in, this is a common cost when you do so. CMS create very untidy code, not good for SEO
-
The urls are: /product/search&filter_tag=%D0%B1%D0%B8%D0%B6%D1%83%D1%82%D0%B0
after = there are a lot of combinations. Is it correct to put this in robots.txt
Disallow: /*?route=product/search&filter_tag=/
-
Sholud I disallow search (in robots.txt)?
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
To avoid the duplicate content issue I have created new urls for that specific site I am posting to and redirecting that url to the original on my site. Is this the right way to do it?
I am trying to avoid the duplicate content issue by creating new urls and redirecting them to the original url. Is this the proper way of going about it?
On-Page Optimization | | yagobi210 -
Google Index Report
Hi, I have just checked my google webmaster tools account and viewed the index status of my website and it produced the attached graph, which show quite a big spike in indexing during July and August 2012. Does this look normal or does it reveal anything peculiar? We did have a new website launched in June 2012 and I re-submitted the sites URL's to google as part of the re-launch and so I am unsure if this may account for the spike. Any advice appreciated. Thanks indexing.png
On-Page Optimization | | UnderMe0 -
Too Many Internal Links Reported By SEOmoz
Hi, I recently did run a crawl report for my blog dapazze.com, and found that SEOmoz is reporting many pages on my blog having more than 100 internal links. I opened OSE, and made a search for one of my pages which was reported to contain more than 100 links. And I found it to contain 464 internal links. Here is the link: http://www.opensiteexplorer.org/links?page=1&site=dapazze.com%2F2012%2F10%2Fwin-a-commentluv-premium-single-site-and-multi-site-license-worth-about-154-giveaway-of-october%2F&sort=page_authority&filter=&source=internal&target=page&group=0 Please have a look at it. I have chosen - Show "All" links from "only internal" pages to "this page" option in OSE, which reports me this. I see almost every page in my blog linking to every page. This is not the problem for me. I have also tried to make a search for some popular bloggers, like ProBlogger.net, ShoutMeLoud.com, HellBoundBloggers.com, etc, and all of them have the same problem. Should I be worrying about this problem? What is the problem actually?
On-Page Optimization | | rahulchowdhury0 -
404 crawl errors with all url+domain
We have 187 crawl 404 errors. All urls on web make a 404 error that this http://www.domain.com/[.....]l/www.domain.com all errors added to the url, the url domain I put an example gestoriabarcelona.com/www.gestoriabarcelona.com
On-Page Optimization | | promonet
gestoriabarcelona.com/tarifas/www.gestoriabarcelona.com
gestoriabarcelona.com/category/noticias/page/7/www.gestoriabarcelona.com
gestoriabarcelona.com/2012/08/amortizacion-de-unaconstruccion/
www.gestoriabarcelona.com
[..] I don't know where can i find to solve errors Anyone can help me? Thanks0 -
On-Page Report Card with https
Hi, Our site has a 301 redirect to https and I'm getting two different grades for my pages depending upon whether I type: https://www.domain.com (gets an A grade) or www.domain.com (gets a C grade) Is there a setting I need to use to make sure my campaign knows our site is at https? Thank you 🙂
On-Page Optimization | | GroundSix0 -
How does a keyword get crawled and pointed at a certain page
I was wondering if you can give me some insight on how a keyword that I put on my campaign gets linked to a specific URL on my website by SEOMoz or Google. For example: updating a brick fireplace is my keyword. On the campaign when I am looking at my on page optimization, the URL assigned (or given) to it is my homepage. How is this determined and is there a way around it and or directing it to the correct page? Thanks
On-Page Optimization | | SammyT0 -
New CMS system - 100,000 old urls - use robots.txt to block?
Hello. My website has recently switched to a new CMS system. Over the last 10 years or so, we've used 3 different CMS systems on our current domain. As expected, this has resulted in lots of urls. Up until this most recent iteration, we were unable to 301 redirect or use any page-level indexation techniques like rel 'canonical' Using SEOmoz's tools and GWMT, I've been able to locate and redirect all pertinent, page-rank bearing, "older" urls to their new counterparts..however, according to Google Webmaster tools 'Not Found' report, there are literally over 100,000 additional urls out there it's trying to find. My question is, is there an advantage to using robots.txt to stop search engines from looking for some of these older directories? Currently, we allow everything - only using page level robots tags to disallow where necessary. Thanks!
On-Page Optimization | | Blenny0 -
Many canonical warnings. Is this a problem?
My site has over 80 canonical warnings. The report states the url is for example http://www.musicliveuk.com and the 'tag value' column says http://www.musicliveuk.com/ Is that a good thing? I'm new to seo and am running my site on wordpress with all in one seo pack. Does this mean the seo pack has automatically added canonical tags to my pages? If so why is it showing as an error? I am also getting lots of 301 permanent redirects and I haven't set any up manually. I'm getting them for every page on my site from the normal url to a url with a slash at the end.
On-Page Optimization | | SamCUK0