GWT False Reporting or GoogleBot has weird crawling ability?

baldnut

Hi I hope someone can help me.

I have launched a new website and trying hard to make everything perfect. I have been using Google Webmaster Tools (GWT) to ensure everything is as it should be but the crawl errors being reported do not match my site. I mark them as fixed and then check again the next day and it reports the same or similar errors again the next day.

Example:

http://www.mydomain.com/category/article/ (this would be a correct structure for the site).

GWT reports:

http://www.mydomain.com/category/article/category/article/ 404 (It does not exist, never has and never will) I have been to the pages listed to be linking to this page and it does not have the links in this manner. I have checked the page source code and all links from the given pages are correct structure and it is impossible to replicate this type of crawl.

This happens accross most of the site, I have a few hundred pages all ending in a trailing slash and most pages of the site are reported in this manner making it look like I have close to 1000, 404 errors when I am not able to replicate this crawl using many different methods.

The site is using a htacess file with redirects and a rewrite condition.

Rewrite Condition:

Need to redirect when no trailing slash

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !.(html|shtml)$
RewriteCond %{REQUEST_URI} !(.)/$
RewriteRule ^(.)$ /$1/ [L,R=301]

The above condition forces the trailing slash on folders.

Then we are using redirects in this manner:

Redirect 301 /article.html http://www.domain.com/article/

In addition to the above we had a development site whilst I was building the new site which was http://dev.slimandsave.co.uk now this had been spidered without my knowledge until it was too late. So when I put the site live I left the development domain in place (http://dev.domain.com) and redirected it like so:

<ifmodule mod_rewrite.c="">RewriteEngine on
RewriteRule ^ - [E=protossl]
RewriteCond %{HTTPS} on
RewriteRule ^ - [E=protossl:s]

RewriteRule ^ http%{ENV:protossl}://www.domain.com%{REQUEST_URI} [L,R=301]</ifmodule>

Is there anything that I have done that would cause this type of redirect 'loop' ?

Any help greatly appreciated.\

CommT

Yeah - do this!

baldnut

Anyone any thoughts on this?

baldnut

Sorry I also should add that the url structure that google generates is like this:

http://www.domain.com/category/article/

http://www.domain.com/category/article/same-category/differentarticle/

http://www.domain.com/category/article/same-category/another-different-article/

http://www.domain.com/category/article/another-different-category/differentarticle/

etc, it is like it gets to a category article and then moves sideways and somehow adds the move onto the current url without keeping hold of the suffix of the URL

Whebb

Doesn't sound like GWT is false reporting. May want to check your trailing slash URL rewrite. It seems like there is an issue there as what you are describing sounds like the URLs are being written incorrectly and causing the incorrect URLs to be generated and show up in GWT.

Your 301 looks ok and if the dev site was spidered and indexed, you should just add the site to GWT and then use the URL removal tool to remove the site from the index, then remove the site and redirect.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

GWT False Reporting or GoogleBot has weird crawling ability?

Need to redirect when no trailing slash

Browse Questions

Explore more categories

Related Questions

Should you use robots.txt for pages within your site which do not have high quality content or are not contributing a great deal so when Google crawls your site the best performing content has a higher chance of being indexed?

Why does my site have so many crawl errors relating to the wordpress login / captcha page

My site is not being regularly crawled?

My 404 page shows in the report as an error.

How to force a crawl

404 crawl errors from "tel:" link?

On-Page Report Card & Rel Canonical

How to Block Urls with specific components from Googlebot