Roger bot taking a long time to crawl site

caterfor

Hi all, I've noticed Roger bot is taking a long time to crawl my new site. It started on the 28th Feb 2013 and is still going. There aren't many pages at the moment. Any ideas please?

thanks a lot, Mark.

caterfor

Hi Peter

thanks for your reply. The crawl has now completed and given me some more areas to work on, it's a great tool.

I was so preoccupied with 'hiding' the site over the last couple of months with the easy code:

User-agent: *
Disallow: /

I hadn't thought beyond this.

I've noticed Google has now recognised the new robots.txt which has allowed the sitemap to be accepted..

I'll look at your notes, thank you, and work out my next move. I'll let you know how I get on too.

I know (well think) I have to get noindex, follow for 'sorted' category pages...

all the best, Mark.

caterfor

Hi Mike

The crawl has now completed, thank you. I think the results will keep me occupied

all the best, Mark.

Peterli

Hi Mark,

Sorry it's taking a while to crawl your new site.

While I'm not exactly sure what the delay is, one of the possible reasons is through your robots.txt. Here's what I see in a short snippet from your robots.txt:

# Crawlers Setup
User-agent: *
Crawl-delay: 30
# Allowable Index
Allow: /*?p=
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/
Allow: /catalogsearch/result/
Allow: /media/
# Directories
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
Disallow: /js/
Disallow: /lib/
Disallow: /magento/
Disallow: /pkginfo/
Disallow: /report/

From here, the formatting looks a little awkward. What's going on is that you're telling Roger bot to only look at these:

Allowable Index

Allow: /*?p=
Allow: /index.php/blog/
Allow: /catalog/seo_sitemap/category/
Allow: /catalogsearch/result/
Allow: /media/

While the syntax is OK, not every crawler out there will follow the allow directive. Here's an example something you can use.

# Crawlers Setup
User-agent: *
Crawl-delay: 30
Disallow: /
Disallow: /404/
Disallow: /app/
Disallow: /cgi-bin/
Disallow: /downloader/
Disallow: /errors/
Disallow: /includes/
Disallow: /js/

From here you're telling the crawler to disallow nothing except these directories. Please let us know once you implement this method is that will actually fix the crawl.

Thanks for reaching out!

Best,

Peter Li
SEOmoz Help Team
```

Mike.Goracke

Hi Mark,

This sounds like a bug or issue with the SEOmoz software.

Contact [email protected] and ask one of the help associates to look into this for you.

If you do not have many pages, it definitely shouldn't take that long.

The help team responds extremely quickly!

Good luck.

Mike

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Roger bot taking a long time to crawl site

Allowable Index

Browse Questions

Explore more categories

Related Questions

Content from Another Site

Representing categories on my site

Mobile site content and main site content

Site not indexed after 1 month

Will Links to one Sub-Domain on a Site hurt a different Sub-Domain on the same site by affecting the Quality of the Root Domain?

Site Map Problems or Are They?

Problem wth Crawling

Which is more accurate? site: or GWT?