Roger bot taking a long time to crawl site
-
Hi all, I've noticed Roger bot is taking a long time to crawl my new site. It started on the 28th Feb 2013 and is still going. There aren't many pages at the moment. Any ideas please?
thanks a lot, Mark.
-
Hi Peter
thanks for your reply. The crawl has now completed and given me some more areas to work on, it's a great tool.
I was so preoccupied with 'hiding' the site over the last couple of months with the easy code:
User-agent: * Disallow: /
I hadn't thought beyond this.
I've noticed Google has now recognised the new robots.txt which has allowed the sitemap to be accepted..
I'll look at your notes, thank you, and work out my next move. I'll let you know how I get on too.
I know (well think) I have to get noindex, follow for 'sorted' category pages...
all the best, Mark.
-
Hi Mike
The crawl has now completed, thank you. I think the results will keep me occupied
all the best, Mark.
-
Hi Mark,
Sorry it's taking a while to crawl your new site.
While I'm not exactly sure what the delay is, one of the possible reasons is through your robots.txt. Here's what I see in a short snippet from your robots.txt:
# Crawlers Setup User-agent: * Crawl-delay: 30 # Allowable Index Allow: /*?p= Allow: /index.php/blog/ Allow: /catalog/seo_sitemap/category/ Allow: /catalogsearch/result/ Allow: /media/ # Directories Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /downloader/ Disallow: /errors/ Disallow: /includes/ Disallow: /js/ Disallow: /lib/ Disallow: /magento/ Disallow: /pkginfo/ Disallow: /report/ From here, the formatting looks a little awkward. What's going on is that you're telling Roger bot to only look at these:
Allowable Index
Allow: /*?p=
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/
Allow: /catalogsearch/result/
Allow: /media/While the syntax is OK, not every crawler out there will follow the allow directive. Here's an example something you can use.
# Crawlers Setup User-agent: * Crawl-delay: 30 Disallow: / Disallow: /404/ Disallow: /app/ Disallow: /cgi-bin/ Disallow: /downloader/ Disallow: /errors/ Disallow: /includes/ Disallow: /js/ From here you're telling the crawler to disallow nothing except these directories. Please let us know once you implement this method is that will actually fix the crawl. Thanks for reaching out! Best, Peter Li SEOmoz Help Team ```
-
Hi Mark,
This sounds like a bug or issue with the SEOmoz software.
Contact [email protected] and ask one of the help associates to look into this for you.
If you do not have many pages, it definitely shouldn't take that long.
The help team responds extremely quickly!
Good luck.
Mike
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
301'd site, but new site is not getting picked up in google.
Hi I'm having big issues! Any help would be greatly appreciated This is the 3rd time this happened. Every time I switch my old site greatcleanjokes.com to the new design of chokeonajoke.com traffic goes almost completely down (I even tried out the new design on greatcleanjokes [to see if it was a 301 issue] and traffic also went down.) What can possibly be wrong with this new site that google just doesn't like it ?! I was ranking high up for many big phrase like joke of the day, corny jokes, clean jokes, short jokes. Now It's all gone. I also think it's strange that when I search for site:chokeonajoke.com the post pages show up before the category pages!? Here is the old site http://web.archive.org/web/20140406214615/http://www.greatcleanjokes.com/ Here is the new one http://chokeonajoke.com/ If you can't figure out anything do you know of anyone I can hire who may be able to figure it out?
Technical SEO | | Nickys22111 -
Site not indexed after 1 month
Hi people, I have been working on this new website for a month now and it has still not been indexed, here is a link: http://bit.ly/HNgzKG Can any of you spot anything wrong with it? I have tried submitting and also submitted an xml sitemap but still no joy.
Technical SEO | | Eavesy0 -
Site verification in WMT
Hello all, I have a site and I want to set a preferred domain but when I do it says I need to verify my site but it gives me no ideas how to do that. I know that normally you have to do it when you set the account up but I had an analytics account for this domain first then just logged on with those details and I was in with no verification process. Cheers
Technical SEO | | jwdl0 -
Site Blacklisted
Good morning. Just done my WMT ritual morning check and one of my sites has been blacklisted for malware. It's a wordpress site - I've run various scans, e.g. http://sitecheck.sucuri.net/scanner/ and also installed wordfence and scanned with that and wordfence produced some offending files which I have now deleted. I've also installed website defender in the hope that it wont happen again. I'm pretty good with staying on top of updates and rarely let a few days pass without upgrading new version of wordpress or plugins etc. I've also checked my users to make sure no new admins or anything and also changes passwords. I've asked for a review from Google and just wondered how long these reviews take? Also, has anybody got any advice, is there anything else I should be doing? Thanks
Technical SEO | | littlesthobo0 -
Way to spider Wordpress site
I have an old Wordpress site and I want to move it to a new server and take it off Wordpress (too many hacks). I am trying to spider the site so as to get static, non-Wordpress, pages. I am having trouble doing this. When I spider the site, it changes the URLs. For instance, if the URL is www.domain.com/page/ the URL I get out of the spider is /page/index.html And those are not the URLs in the search engine indices. There are about 2000 pages on this site, so it is not feasible to set up 301 redirects. I tried using these spidering programs: WinHTTack Website Copier and PageNest Does anyone know of another method of turning a Wordpress site into a non Wordpress site?
Technical SEO | | DanCrean0 -
If you are organizing the site structure for an ecommerce site, how would you do it?
Should you use not use slashes and use all dashes or use just a few slashes and the rest with dashes? For example, domain.com/category/brand/product-color-etc OR domain.com/anythinghere-color-dimensions-etc Which structure would you rather go for and why?
Technical SEO | | Zookeeper0 -
How is this site doing this?
http://www.meccabingo.com It shows a splash / promotion page yet you check the cache and it's the real homepage, they are doing this so they don't lose rankings but how are they redirecting users to that but Google is caching the real homepage? is it friendly? thanks!!
Technical SEO | | AdiRste0 -
Does this page crawl well?
I just put up a page that uses an image map to illustrate a national currency note. http://www.antiquebanknotes.com/NationalCurrency/National-Bank-Note-Information.aspx My goal with this page is get results for National Bank Note. But I know image maps are wierd creatures and not good for linking. My question is, will Google index my tooltips and find this page useful and therefore worthy? I think the content is useful for my users but I just don't know if the implementation will work well. This screen will eventually have 5 or 6 notes on it and I don't want to do it the concensus is negative... Thanks for any advice.
Technical SEO | | Banknotes0