Robots.txt crawling URL's we dont want it to
-
Hello
We run a number of websites and underneath them we have testing websites (sub-domains), on those sites we have robots.txt disallowing everything. When I logged into MOZ this morning I could see the MOZ spider had crawled our test sites even though we have said not to.
Does anyone have an ideas how we can stop this happening?
-
Hi there!
Thanks for reaching out to us! I am sorry if Roger is somehow not following your robots.txt directives. To ensure that Roger doesn't crawl your site you can put the following directive above your general directives in your robots.txt:
User-agent: rogerbot
Dissallow: /Once this is in place you should find our crawler to be a lot more obedient towards your site.
Hope this helps, please let us know if you have any more questions about our crawler.
Best,
Peter
Moz Help Team.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
URL Structure On Site - Currently it's domain/product-name NOT domain/category/product name is this bad?
I have a eCommerce site and the site structure is domain/product-name rather than domain/product-category/product-name Do you think this will have a negative impact SEO Wise? I have seen that some of my individual product pages do get better rankings than my categories.
Technical SEO | | the-gate-films0 -
What's Worse - 404 errors or a huge .htaccess file
We have changed our site architecture pretty significantly and now have many fewer pages (albeit with more robust content and focused linking). My question is, what should I do about all the 404 errors (keep in mind, I am only finding these in Bing Webmaster tools, not Moz or GWT)? Is it worse to have all those 404 errors (hundreds), or to have a massive htaccess file for pages that are only getting hits by the Bing crawlbot. Any insight would be great. Thanks
Technical SEO | | CleanEdisonInc0 -
How to use robots.txt to block areas on page?
Hi, Across the categories/product pages on out site there are archives/shipping info section and the texts are always the same. Would this be treated as duplicated content and harmful for seo? How can I alter robots.txt to tell google not to crawl those particular text Thanks for any advice!
Technical SEO | | LauraHT0 -
How should I close my forum in a way that's best for SEO?
Hi Guys, I have a forum on a subdomain and it is no longer used. (like forum.mywebsite.com) It kind of feels like a dead limb and I don't know what's best to do for SEO. Should I just leave it as it is and let it stagnate? There is a link in the nav menu to the main domain so users have a chance to find the main domain. Or should I remove it and just redirect the whole subdomain to the main domain? I don't know if redirects would work as I doubt most of the threads would match our articles, plus there are 700 of them. The main domain is PR3 and so is the forum subdomain. Please help!
Technical SEO | | HCHQ0 -
On March 10 a client's newsroom disappeared out of the SERPS. Any idea why?
For years the newsroom, which is on the subdomain news.davidlerner.com - has ranked #2 for their brand name search. On march 10 it fell out of the SERPs - it is completely gone. What happened? How can I fix this?
Technical SEO | | MeritusMedia0 -
Is there any value in having a blank robots.txt file?
I've read an audit where the writer recommended creating and uploading a blank robots.txt file, there was no current file in place. Is there any merit in having a blank robots.txt file? What is the minimum you would include in a basic robots.txt file?
Technical SEO | | NicDale0 -
Robots.txt & Mobile Site
Background - Our mobile site is on the same domain as our main site. We use a folder approach for our mobile site abc.com/m/home.html We are re-directing traffic to our mobile site vie device detection and re-direction exists for a handful of pages of our site ie most of our pages do not redirect the user to a mobile equivalent page. Issue – Our mobile pages are being indexed in desktop Google searches Input Required – How should we modify our robots.txt so that the desktop google index does not index our mobile pages/urls User-agent: Googlebot-Mobile Disallow: /m User-agent: `YahooSeeker/M1A1-R2D2` Disallow: /m User-agent: `MSNBOT_Mobile` Disallow: /m Many thanks
Technical SEO | | CeeC-Blogger0 -
Crawl Tool Producing Random URL's
For some reason SEOmoz's crawl tool is returning duplicate content URL's that don't exist on my website. It is returning pages like "mydomain.com/pages/pages/pages/pages/pages/pricing" Nothing like that exists as a URL on my website. Has anyone experienced something similar to this, know what's causing it, or know how I can fix it?
Technical SEO | | MyNet0