Google can't access/crawl my site!
-
Hi
I'm dealing with this problem for a few days. In fact i didn't realize it was this serious until today when i saw most of my site "de-indexed" and losing most of the rankings.
[URL Errors: 1st photo]
8/21/14 there were only 42 errors but in 8/22/14 this number went to 272 and it just keeps going up.
The site i'm talking about is gazetaexpress.com (media news, custom cms) with lot's of pages.
After i did some research i came to the conclusion that the problem is to the firewall, who might have blocked google bots from accessing the site. But the server administrator is saying that this isn't true and no google bots have been blocked.
Also when i go to WMT, and try to Fetch as Google the site, this is what i get:
[Fetch as Google: 2nd photo]
From more than 60 tries, 2-3 times it showed Complete (and this only to homepage, never to articles).
What can be the problem? Can i get Google to crawl properly my site and is there a chance that i will lose my previous rankings?
Thanks a lot
Granit -
What did you do specifically to mitigate the problem? You can PM me, if you would like.
-
This applies to the guy from Albania.
Oh, this IS the guy from Albania. Never mind.
-
Great, thanks for letting us know what happened with this!
-
Hi all
Just wanted to let you know that we fixed the problem. We disabled CloudFlare which we found out was blocking Google bots. More about this issue can be found at: https://support.cloudflare.com/hc/en-us/articles/200169806-I-m-getting-Google-Crawler-Errors-What-should-I-do-
-
Hi Travis, thank you for your time.
Great for your friend, I also suggest to visit Kosovo someday, you will have great time here, for sure
Back to the issue:
Here is an interesting issue that is happening with the crawler.
Our own cms uses htaccess for rewrite purposes. I created 2 new files that are independent from CMS and tried to fetch them with WMT, and it worked like a charm.
These 2 independent files are:
www.gazetaexpress.com/test_manaferra.php
www.gazetaexpress.com/xhezidja.php
Then, I created an ajax page with our CMS, which contains only plain text, tried to fetch it by WMT and strangely enough it didn't work. To make sure that the .htaccess file is not affecting this behavior, I deleted the htaccess and tried to fetch it, but it didn't worked.
The ajax page is: www.gazetaexpress.com/page/xhezidja/?pageSEO=false
The site works perfectly for humans which access it via the browser.
I'm more than confused now!
-
A friend of mine just got back from Kosovo. It was the last stop on a tour of the Balkans. He had a pretty good time. Moving along...
I crawled about 12K URLs and hit almost 90 Internal Server Errors (500). It's probably not your core problem, but it's something to look at. Here are a few examples:
http://www.gazetaexpress.com/blihet/?search_category_id=1&searchFilter=1
http://www.gazetaexpress.com/shitet/?category_id=134&searchFilter=1
http://www.gazetaexpress.com/me-qera/?category_id=131&searchFilter=1
There was one actual page that threw a 500 at the time of crawl:
http://www.gazetaexpress.com/mistere/edhe-kesaj-i-thuhet-veze-22591/
The edhe kesaj page now resolves fine. (I'm not even going to pretend to understand or write Albanian.)
So there may be some issues with the server or hosting. If you haven't already, try this troubleshooter from Cloudflare.
-
Ah OK - well keep us updated with what you find. Someone else will chip in with other info if they have some
-Andy
-
We are suspecting that CloudFlare might be causing these troubles. We are trying everything, in the meantime i'm looking here to see if anyone has any similar experience or an idea for solution.
As for warnings, the only warning we had was the one last week (8/23/14) saying that Google bot can't acces our site:
Over the last 24 hours, Googlebot encountered 316 errors while attempting to connect to your site. Your site's overall connection failure rate is 7.5%.
-Granit
-
It doesn't look like a firewall, as I can crawl it with Screaming Frog. However, the server logs will be able to answer that one for you.
Without looking in depth, I'm not seeing anything that stands out to me - do you think that there have been changes to the server that could cause issues? What firewall is the server running? Also, if there were errors in crawling the site, you would see a warning about this.
-Andy
-
In mid-march website changed it's CMS but i don't think that could be the reason because until this week everything was working perfectly. I don't think it could have been compromised too. I'm still suspecting it could be the firewall blocking bots from crawling the site, but the server administrator couldn't find any evidence of this.
-
Hi Granit,
Has any work been done to the site in the last 2-3 months? Have you had any warnings in webmaster tools at all? I did once see a strange problem where Google wasn't crawling a site correctly because it had been compromised, but after checking, there is nothing like this on yours.
-Andy
-
No prb. Thanks a lot for your time. Let just hope that someone in the community will help with a solution
-
Unfortunately, I don't have a quick answer for you. Looking forward to seeing what other community members have to say on this one!
-
I'm looking at the http version in GWT
-
If I do a site:gazetaexpress.com in Google, I get some results that are http, and some results that are https. The https ones say there is an SSL connection error.
Are you looking at the http or https version in GWT?
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
My last site crawl shows over 700 404 errors all with void(0 added to the ends of my posts/pages.
Hello, My last site crawl shows over 700 404 errors all with void(0 added to the ends of my posts/pages. I have contacted my theme company but not sure what could have done this. Any ideas? The original posts/pages are still correct and working it just looks like it did duplicates and added void(0 to the end of each post/page. Questions: There is no way to undo this correct? Do I have to do a redirect on each of these? Will this hurt my rankings and domain authority? Any suggestions would be appreciated. Thanks, Wade
Intermediate & Advanced SEO | | neverenoughmusic.com0 -
How can I make a list of all URLs indexed by Google?
I started working for this eCommerce site 2 months ago, and my SEO site audit revealed a massive spider trap. The site should have been 3500-ish pages, but Google has over 30K pages in its index. I'm trying to find a effective way of making a list of all URLs indexed by Google. Anyone? (I basically want to build a sitemap with all the indexed spider trap URLs, then set up 301 on those, then ping Google with the "defective" sitemap so they can see what the site really looks like and remove those URLs, shrinking the site back to around 3500 pages)
Intermediate & Advanced SEO | | Bryggselv.no0 -
Incoming links which don't exists...
I believe our site is being penalized/held back in rankings, and I think this is why... We placed an advert on a website which they didn't make "no follow" so we had hundreds of site-wide links coming into our site. We asked them to remove the advert which they did. This was 4 months ago, and the links are still showing in GWMT. We have look into their pages which GWMT is saying still link to us, but these a number pages aren't being indexed by Google, and others aren't being cached. Is it possible that because Google cant find these pages, it can tell our link has been removed? And/or are we being penalized for this? Many thanks
Intermediate & Advanced SEO | | jj34341 -
Site Indexed by Google but not Bing or Yahoo
Hi, I have a site that is indexed (and ranking very well) in Google, but when I do a "site:www.domain.com" search in Bing and Yahoo it is not showing up. The team that purchased the domain a while back has no idea if it was indexed by Bing or Yahoo at the time of purchase. Just wondering if there is anything that might be preventing it from being indexed? Also, Im going to submit an index request, are there any other things I can do to get it picked up?
Intermediate & Advanced SEO | | dbfrench0 -
Why is Google Still Penalizing My Site?
We got hit pretty hard by Penguin. There were some bad link issues which we've cleared up and we also had a pretty unique situation stemming from about a year ago when we changed the name of the company and created a whole new site with similar content under a different URL. We used the same phone number and address, and left the old site up as it was still performing well. Google didn't care for that so we eventually used 301 redirects to push the link juice from the old site to the new site. That's the background, here's the problem...... We've partially recovered, but there are several keywords that haven't come back anywhere near where they were in Google. We have higher page rank and more links than our competition and are performing in the top 5 for some of our keywords. Other, similar keywords, where we used to be in the top 5, we are now down on page 4 or 5. Our website is www.hudsoncabinetrydesign.com. We build custom cabinetry and furniture in Westchester County, NY just north of NYC. Examples - For "custom built-ins new york" we are number 3 on Google, number 1 on Bing/Yahoo. For "custom kitchen cabinetry ny" we are number 3 on Bing/Yahoo, not in the top 50 on Google. For "custom radiator covers ny" we used to be #1 on Google, are currently #48, currently #2 on Bing/Yahoo. Obviously, we've done something to upset the Google, but we've run out of ideas as to what it could be. Any ideas as to what is going on? Thanks so much for your feedback, Doug B.
Intermediate & Advanced SEO | | doug_b0 -
Is it safe to not have a sitemap if Google is already crawling my site every 5-10 min?
I work on a large news site that is constantly being crawled by Google. Googlebot is hitting the homepage every 5-10 minutes. We are in the process of moving to a new CMS which has left our sitemap nonfunctional. Since we are getting crawled so often, I've met resistance from an overwhelmed development team that does not see creating sitemaps as a priority. My question is, are they right? What are some reasons that I can give to support my claim that creating an xml sitemap will improve crawl efficiency and indexing if we are already having new stories appear in Google SERPs within 10-15 minutes of publication? Is there a way to quantify what the difference would be if we added a sitemap?
Intermediate & Advanced SEO | | BostonWright0 -
How can scraper sites be successful post Panda?
I read this article on SEJ: http://www.searchenginejournal.com/scrapers-and-the-panda-update/34192/ And, I'm a bit confused as to how a scraper site can be successful post Panda? Didn't panda specifically target sites that have duplicate content & shouldn't scraper sites actually be suffering?
Intermediate & Advanced SEO | | nicole.healthline0 -
Can't find my site on Bing, since ages
Hi Guys, Well, the problem seems normal but I guess it's not. I have tried many things, and nothing changed it, now I give it last try... ask so maybe you will help me. The problem is.. I can't find my site nowhere in Bing, I mean nowhere by not in first 20 pages for my keywords "beauty tips" and the site is: http://www.beauty-tips.net/. In my opinion it should be pretty high... maybe it's too high so I can't see it ;). I never had special problems with Bing, was easier to be there "somewhere" than in google, but with this one is totally opposite. Any ideas? Thanks for your time!
Intermediate & Advanced SEO | | Luke220