GWT False Reporting or GoogleBot has weird crawling ability?

baldnut

Hi I hope someone can help me.

I have launched a new website and trying hard to make everything perfect. I have been using Google Webmaster Tools (GWT) to ensure everything is as it should be but the crawl errors being reported do not match my site. I mark them as fixed and then check again the next day and it reports the same or similar errors again the next day.

Example:

http://www.mydomain.com/category/article/ (this would be a correct structure for the site).

GWT reports:

http://www.mydomain.com/category/article/category/article/ 404 (It does not exist, never has and never will) I have been to the pages listed to be linking to this page and it does not have the links in this manner. I have checked the page source code and all links from the given pages are correct structure and it is impossible to replicate this type of crawl.

This happens accross most of the site, I have a few hundred pages all ending in a trailing slash and most pages of the site are reported in this manner making it look like I have close to 1000, 404 errors when I am not able to replicate this crawl using many different methods.

The site is using a htacess file with redirects and a rewrite condition.

Rewrite Condition:

Need to redirect when no trailing slash

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !.(html|shtml)$
RewriteCond %{REQUEST_URI} !(.)/$
RewriteRule ^(.)$ /$1/ [L,R=301]

The above condition forces the trailing slash on folders.

Then we are using redirects in this manner:

Redirect 301 /article.html http://www.domain.com/article/

In addition to the above we had a development site whilst I was building the new site which was http://dev.slimandsave.co.uk now this had been spidered without my knowledge until it was too late. So when I put the site live I left the development domain in place (http://dev.domain.com) and redirected it like so:

<ifmodule mod_rewrite.c="">RewriteEngine on
RewriteRule ^ - [E=protossl]
RewriteCond %{HTTPS} on
RewriteRule ^ - [E=protossl:s]

RewriteRule ^ http%{ENV:protossl}://www.domain.com%{REQUEST_URI} [L,R=301]</ifmodule>

Is there anything that I have done that would cause this type of redirect 'loop' ?

Any help greatly appreciated.\

CommT

Yeah - do this!

baldnut

Anyone any thoughts on this?

baldnut

Sorry I also should add that the url structure that google generates is like this:

http://www.domain.com/category/article/

http://www.domain.com/category/article/same-category/differentarticle/

http://www.domain.com/category/article/same-category/another-different-article/

http://www.domain.com/category/article/another-different-category/differentarticle/

etc, it is like it gets to a category article and then moves sideways and somehow adds the move onto the current url without keeping hold of the suffix of the URL

Whebb

Doesn't sound like GWT is false reporting. May want to check your trailing slash URL rewrite. It seems like there is an issue there as what you are describing sounds like the URLs are being written incorrectly and causing the incorrect URLs to be generated and show up in GWT.

Your 301 looks ok and if the dev site was spidered and indexed, you should just add the site to GWT and then use the URL removal tool to remove the site from the index, then remove the site and redirect.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

GWT False Reporting or GoogleBot has weird crawling ability?

Need to redirect when no trailing slash

Browse Questions

Explore more categories

Related Questions

Googlebot and other spiders are searching for odd links in our website trying to understand why, and what to do about it.

Brushing up on my SEO skills - how do I check my website to see if Javascript is blocking search engines from crawling the links within a javascript-enabled drop down menu?

On our site by mistake some wrong links were entered and google crawled them. We have fixed those links. But they still show up in Not Found Errors. Should we just mark them as fixed? Or what is the best way to deal with them?

Google only crawling a small percentage of the sitemap

Duplicate content on report

Pages crawled is only 23 even after 8 days??

Oh no googlebot can not access my robots.txt file

When is the last time Google crawled my site