Robots.txt
-
Hello Everyone,
The problem I'm having is not knowing where to have the robots.txt file on our server.
We have our main domain (company.com) with a robots.txt file in the root of the site, but we also have our blog (company.com/blog) where were trying to disallow certain directories from being crawled for SEO purposes...
Would having the blog in the sub-directory still need its own robots.txt? or can I reference the directories i don't want crawled within the blog using the root robots.txt file?
Thanks for your insight on this matter.
-
Thanks John & Naghimiac,
Both your responses helped me understand the robots.txt file and the proper ways of implementing it.
Thanks again for all your help!
-
The bots won't care about that. If you have your site on www.company.com, your robots.txt will reside at www.company.com/robots.txt, and its directives will apply to any pages living under www.company.com. When a bot comes to www.company.com/blog, it'll look for the robots.txt at www.company.com/robots.txt to see if it's allowed to crawl there. It won't look in a subdirectory. Robots.txt always resides on the root level.
If you had your blog at blog.company.com instead of company.com/blog, then you would have to have a separate robots.txt at blog.company.com/robots.txt. As you have your blog in a subdirectory rather than a subdomain, one robots.txt is all you need.
-
Thanks Naghimiac,
Your link is very resourceful, but on the other hand I was looking for something more specific as to blogs being in a sub-directory. I know by default WordPress has its own .htaccess file in the root of the blog directory and I have a separate .htaccess file in the root of my main domain. This is why I was thinking it needed its own robots.txt file.
Is the robot.txt known for only being in the root level of the main directory even if a blog is in a sub-directory?
-
You only need a robot file at your main directory and it is used for the whole website.
If you want to have more info's about robots.txt, there is an very good post from Lindsay: http://www.seomoz.org/blog/robot-access-indexation-restriction-techniques-avoiding-conflicts
With this I think it will be easier for you to go pro in robots files. Good luck!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sub Domains and Robot.txt files...
This is going to seem like a stupid question, and perhaps it is but I am pulling out what little hair I have left. I have a sub level domain on which a website sits. The Main domain has a robots.txt file that disallows all robots. It has been two weeks, I submitted the sitemap through webmaster tools and still, Google has not indexed the sub domain website. My question is, could the robots.txt file on the main domain be affecting the crawlability of the website on the sub domain? I wouldn't have thought so but I can find nothing else. Thanks in advance.
Technical SEO | | Vizergy0 -
HTTP Status showing up in opensiteexplorer top pages as blocked by robot.txt file
I am trying to find an answer to this question it has alot of url on this page with no data when i go into the data source and search for noindex or robot.txt but the site is visible in the search engines ?
Technical SEO | | ReSEOlve0 -
Two META Robots tags on a page - which will win?
Hi, Does anybody know which meta-robots tag will "win" if there is more than one on a page? The situation:
Technical SEO | | jmueller
our CMS is not very flexible and so we have segments of META-Tags on the page that originate from templates.
Now any author can add any meta-tag from within his article-editor.
The logic delivering the pages does not care if there might be more than one meta-robots tag present (one from template, one from within the article). Now we could end up with something like this: Which one will be regarded by google & co?
First?
Last?
None? Thanks a lot,
Jan0 -
Question about Robot.txt
I just started my own e-commerce website and I hosted it to one of the popular e-commerce platform Pinnacle Cart. It has a lot of functions like, page sorting, mobile website, etc. After adjusting the URL parameters in Google webmaster last 3 weeks ago, I still get the same duplicate errors on meta titles and descriptions based from Google Crawl and SEOMOZ crawl. I am not sure if I made a mistake of choosing pinnacle cart because it is not that flexible in terms of editing the core website pages. There is now way to adjust the canonical, to insert robot.txt on every pages etc. however it has a function to submit just one page of robot.txt. and edit the .htcaccess. The website pages is in PHP format. For example this URL: www.mycompany.com has a duplicate title and description with www.mycompany.com/site-map.html (there is no way of editing the title and description of my sitemap) Another error is www.mycompany.com has a duplicate title and description with http://www.mycompany.com/brands?url=brands Is it possible to exclude those website with "url=" and my "sitemap.html" in the robot.txt? or the URL parameters from Google is enough and it just takes a lot of time. Can somebody help me on the format of Robot.txt. Please? thanks
Technical SEO | | paumer800 -
Do you get credit for an external link that points to a page that's being blocked by robots.txt
Hi folks, No one, including me seems to actually know what happens!? To repeat: If site A links to /home.html on site B and site B blocks /home.html in Robots.txt, does site B get credit for that link? Does the link pass PageRank? Will Google still crawl through it? Does the domain get some juice, but not the page? I know there's other ways of doing this properly, but it is interesting no?
Technical SEO | | DaveSottimano0 -
Robot.txt pattern matching
Hola fellow SEO peoples! Site: http://www.sierratradingpost.com robot: http://www.sierratradingpost.com/robots.txt Please see the following line: Disallow: /keycodebypid~* We are trying to block URLs like this: http://www.sierratradingpost.com/keycodebypid~8855/for-the-home~d~3/kitchen~d~24/ but we still find them in the Google index. 1. we are not sure if we need to specify the robot to use pattern matching. 2. we are not sure if the format is correct. Should we use Disallow: /keycodebypid*/ or /*keycodebypid/ or even /*keycodebypid~/? What is even more confusing is that the meta robot command line says "noindex" - yet they still show up. <meta name="robots" content="noindex, follow, noarchive" /> Thank you!
Technical SEO | | STPseo0 -
Mobile site - allow robot traffic
Hi, If a user comes to our site from a mobile device, we redirect to our mobile site. That is www.mysite/mypage redirects to m.mysite/mypage. Right now we are blocking robots from crawling our m. site. Previously there were concerns the m. site could rank for normal browser searches. To make sure this isn't a problem we are planning on rel canonical our m. site pages and reference the www pages (mobile is just a different version of our www site). From my understanding having a mobile version of a page is a ranking factor for mobile searches so allowing robots is a good thing. Before doing so, I wanted to see if anyone had any other suggestions/feedback (looking for potential pitfalls, issues etc)
Technical SEO | | NicB10 -
Robots.txt File Redirects to Home Page
I've been doing some site analysis for a new SEO client and it has been brought to my attention that their robots.txt file redirects to their homepage. I was wondering: Is there a benfit to setup your robots.txt file to do this? Will this effect how their site will get indexed? Thanks for your response! Kyle Site URL: http://www.radisphere.net/
Technical SEO | | kchandler0