XML Sitemap instruction in robots.txt = Worth doing?
-
Hi fellow SEO's,
Just a quick one, I was reading a few guides on Bing Webmaster tools and found that you can use the robots.txt file to point crawlers/bots to your XML sitemap (they don't look for it by default).
I was just wondering if it would be worth creating a robots.txt file purely for the purpose of pointing bots to the XML sitemap?
I've submitted it manually to Google and Bing webmaster tools but I was thinking more for the other bots (I.e. Mozbot, the SEOmoz bot?).
Any thoughts would be appreciated!
Regards,
Ash
-
Thanks for the answer and link John!
Regards,
Ash
-
I think it's worth it as it should only take a few minutes to set up, and it's good to have a robots.txt, even if it's allowing everything. Put a text file named "robots.txt" in your root directory with:
<code>User-agent: * Disallow: Sitemap: http://www.yourdomain.com/none-standard-location/sitemap.xml</code>
Read more about robots.txt here: http://www.seomoz.org/learn-seo/robotstxt.
-
It is not going to make any difference. Time is better spend in fixing crawling & indexing issues of the website.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt blocked internal resources Wordpress
Hi all, We've recently migrated a Wordpress website from staging to live, but the robots.txt was deleted. I've created the following new one: User-agent: *
Intermediate & Advanced SEO | | Mat_C
Allow: /
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/plugins/
Disallow: /wp-content/cache/
Disallow: /wp-content/themes/
Allow: /wp-admin/admin-ajax.php However, in the site audit on SemRush, I now get the mention that a lot of pages have issues with blocked internal resources in robots.txt file. These blocked internal resources are all cached and minified css elements: links, images and scripts. Does this mean that Google won't crawl some parts of these pages with blocked resources correctly and thus won't be able to follow these links and index the images? In other words, is this any cause for concern regarding SEO? Of course I can change the robots.txt again, but will urls like https://example.com/wp-content/cache/minify/df983.js end up in the index? Thanks for your thoughts!2 -
Yoast XML Sitemap Taxonomies
Hey all, Quick question about taxonomies and sitemap settings. On our e-commerce site, we noindex product tags and post tags. Under the "Taxonomies" settings in Yoast I'm seeing taxonomies such as Product Color (pa_color). Would it be wise to remove such taxonomies from our sitemap if we already include product colors, attributes, etc. in our page titles and product descriptions? Thanks in advance, Andrew
Intermediate & Advanced SEO | | mostcg0 -
Can I add external links to my sitemap?
Hi, I'm integrating with a service that adds 3rd-party images/videos (owned by them, hosted on their server) to my site. For instance, the service might have tons of pictures/videos of cars; and then when I integrate, I can show my users these pictures/videos about cars I might be selling. But I'm wondering how to build out the sitemap--I would like to include reference to these images/videos, so Google knows I'm using lots of multimedia. How's the most white-hat way to do that? Can I add external links to my sitemap pointing to these images/videos hosted on a different server, or is that frowned upon? Thanks in advance.
Intermediate & Advanced SEO | | SEOdub0 -
Robots.txt Disallowed Pages and Still Indexed
Alright, I am pretty sure I know the answer is "Nothing more I can do here." but I just wanted to double check. It relates to the robots.txt file and that pesky "A description for this result is not available because of this site's robots.txt". Typically people want the URL indexed and the normal Meta Description to be displayed but I don't want the link there at all. I purposefully am trying to robots that stuff outta there.
Intermediate & Advanced SEO | | DRSearchEngOpt
My question is, has anybody tried to get a page taken out of the Index and had this happen; URL still there but pesky robots.txt message for meta description? Were you able to get the URL to no longer show up or did you just live with this? Thanks folks, you are always great!0 -
Robots.txt: Syntax URL to disallow
Did someone ever experience some "collateral damages" when it's about "disallowing" some URLs? Some old URLs are still present on our website and while we are "cleaning" them off the site (which takes time), I would like to to avoid their indexation through the robots.txt file. The old URLs syntax is "/brand//13" while the new ones are "/brand/samsung/13." (note that there is 2 slash on the URL after the word "brand") Do I risk to erase from the SERPs the new good URLs if I add to the robots.txt file the line "Disallow: /brand//" ? I don't think so, but thank you to everyone who will be able to help me to clear this out 🙂
Intermediate & Advanced SEO | | Kuantokusta0 -
Robots.txt error message in Google Webmaster from a later date than the page was cached, how is that?
I have error messages in Google Webmaster that state that Googlebot encountered errors while attempting to access the robots.txt. The last date that this was reported was on December 25, 2012 (Merry Christmas), but the last cache date was November 16, 2012 (http://webcache.googleusercontent.com/search?q=cache%3Awww.etundra.com/robots.txt&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:en-US:official&client=firefox-a). How could I get this error if the page hasn't been cached since November 16, 2012?
Intermediate & Advanced SEO | | eTundra0 -
Canonical URLs and Sitemaps
We are using canonical link tags for product pages in a scenario where the URLs on the site contain category names, and the canonical URL points to a URL which does not contain the category names. So, the product page on the site is like www.example.com/clothes/skirts/skater-skirt-12345, and also like www.example.com/sale/clearance/skater-skirt-12345 in another category. And on both of these pages, the canonical link tag references a 3rd URL like www.example.com/skater-skirt-12345. This 3rd URL, used in the canonical link tag is a valid page, and displays the same content as the other two versions, but there are no actual links to this generic version anywhere on the site (nor external). Questions: 1. Does the generic URL referenced in the canonical link also need to be included as on-page links somewhere in the crawled navigation of the site, or is it okay to be just a valid URL not linked anywhere except for the canonical tags? 2. In our sitemap, is it okay to reference the non-canonical URLs, or does the sitemap have to reference only the canonical URL? In our case, the sitemap points to yet a 3rd variation of the URL, like www.example.com/product.jsp?productID=12345. This page retrieves the same content as the others, and includes a canonical link tag back to www.example.com/skater-skirt-12345. Is this a valid approach, or should we revise the sitemap to point to either the category-specific links or the canonical links?
Intermediate & Advanced SEO | | 379seo0 -
Does using robots.txt to block pages decrease search traffic?
I know you can use robots.txt to tell search engines not to spend their resources crawling certain pages. So, if you have a section of your website that is good content, but is never updated, and you want the search engines to index new content faster, would it work to block the good, un-changed content with robots.txt? Would this content loose any search traffic if it were blocked by robots.txt? Does anyone have any available case studies?
Intermediate & Advanced SEO | | nicole.healthline0