Sitemap Rules
-
Hello there,
I have some questions pertaining to sitemaps that I would appreciate some guidance on.
1. Can an XML sitemap contain URLs that are blocked by robots.txt? Logically, it makes sense to me to not include pages blocked by robots.txt but would like some clarity on the matter i.e. will having pages blocked by robots.txt in a sitemap, negatively impact the benefit of a sitemap?
2. Can a XML sitemap include URLs from multiple subdomains? For example:
http://www.example.com/www-sitemap.xml would include the home page URL of two other subdomains i.e. http://blog.example.com/ & http://blog2.example.com/
Thanks
-
Theoretically, if the URL is blocked by robots.txt it should not appear in the index results no matter if they are in the sitemap but I have seen URLs indexed that are blocked by robots.txt but are in the sitemap and have good links pointing to it. If you want to block pages that have good links pointing to them, my advice is to remove them from sitemap. #justathought.
About URLs from multiple domains, I personally create separate sitemaps for different subdomains and link to main sitemap and I see better indexing that way.
Again, these are my personal experiences and not rules so please do keep that in mind as things can be different fro them.
-
Hey,
1.) Yes you can do this and it won't 'negativel impact it' but it might cause a couple of Search Console errors when you come to submit the URLs - blocking crawlers in the robots.txt file is a directive that instructs them not to crawl that particular page. With this being said, supplying them with a sitemap of all page locations will not mean that they crawl these pages, but it is an instruction to crawlers that these pages do exist. Personally, I would meta noindex these pages to make sure that they don't reach search engines as the blocking in the robots.txt file can often not be enough to prevent this, especially if you're also submitting a sitemap.
2.) In short, I don't think you can have a single XML sitemap containing URLs from multiple subdomains BUT you can have sitemaps for multiple subdomains hosted on the TLD individually. Google have broken this down really well in their Webmaster Tools post:
https://support.google.com/webmasters/answer/75712?hl=en&topic=8476&ctx=topic
Hope this helps!
Sean
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Which pages should I index or have in my XML sitemap?
Hi there, my website is ConcertHotels.com - a site which helps users find hotels close to concert venues. I have a hotel listing page for every concert venue on my site - about 12,000 of them I think (and the same for nearby restaurants). e.g. https://www.concerthotels.com/venue-hotels/madison-square-garden-hotels/304484 Each of these pages list the nearby hotels to that concert venue. Users clicking on the individual hotel are brought through to a hotel (product) page e.g. https://www.concerthotels.com/hotel/the-new-yorker-a-wyndham-hotel/136818 I made a decision years ago to noindex all of the /hotel/ pages since they don't have a huge amount of unique content and aren't the pages I'd like my users to land on . The primary pages on my site are the /venue-hotels/ listing pages. I have similar pages for nearby restaurants, so there are approximately 12,000 venue-restaurants pages, again, one listing page for each concert venue. However, while all of these pages are potentially money-earners, in reality, the vast majority of subsequent hotel bookings have come from a fraction of the 12,000 venues. I would say 2000 venues are key money earning pages, a further 6000 have generated income of a low level, and 4000 are yet to generate income. I have a few related questions: Although there is potential for any of these pages to generate revenue, should I be brutal and simply delete a venue if it hasn't generated revenue within a time period, and just accept that, while it "could" be useful, it hasn't proven to be and isn't worth the link equity. Or should I noindex these "poorly performing pages"? Should all 12,000 pages be listed in my XML sitemap? Or simply the ones that are generating revenue, or perhaps just the ones that have generated significant revenue in the past and have proved to be most important to my business? Thanks Mike
Technical SEO | | mjk260 -
Which product URL to include in Sitemaps?
Hi Does the product URL's in Sitemaps affect the sub-categories authority too? For example, if I have a product with 2 URL's and which have a canonical tag: **/brands/michael-kors/bags/**jet-set-double-zip-wallet/ **/women/accessories/wallets/**jet-set-double-zip-wallet/ If I make the main URL "/women/accessories/wallets/jet-set-double-zip-wallet/" and set that as the Canonical URL & list that URL in the XML Sitemap, will it also mean the "/women/accessories/wallets/" category will get more authority and increase it's power to rank? Thanks Frankie
Technical SEO | | Frankie-BTDublin0 -
Query on Sitemap xml Root Path
Is it compulsory to have sitemap.xml at this path - abcd.com/sitemap.xml? My sitename is abcd.com. Now is it compulsory to have sitemap.xml at this path - abcd.com/sitemap.xml only? a) If i take cnd services where path can be like xyz.com/sitemap.xml and then this sitemap i can submit in robot file so it is fine? b) What will happen here in webmaster tool as in webmaster tool when we submit sitemap by default it gives us domain name like abcd.com and we have to just add /sitemap.xml
Technical SEO | | Johny123450 -
Is sitemap required on my robots.txt?
Hi, I know that linking your sitemap from your robots.txt file is a good practice. Ok, but... may I just send my sitemap to search console and forget about adding ti to my robots.txt? That's my situation: 1 multilang platform which means... ... 2 set of pages. One for each lang, of course But my CMS (magento) only allows me to have 1 robots.txt file So, again: may I have a robots.txt file woth no sitemap AND not suffering any potential SEO loss? Thanks in advance, Juan Vicente Mañanas Abad
Technical SEO | | Webicultors0 -
Should all pagination pages be included in sitemaps
How important is it for a sitemap to include all individual urls for the paginated content. Assuming the rel next and prev tags are set up would it be ok to just have the page 1 in the sitemap ?
Technical SEO | | Saijo.George0 -
Xml Sitemap
Hi mozzers, I am about to submit a sitemap for one of my clients via webmaster tools. The issue is that I have way too many urls that I don't want them to be indexed by Google such as testing pages, auto generated pages... Is there way to remove certain URL from the XML sitemap or is this impossible? If impossible, is the only way to control these urls is to "No index" all these pages that i don't want the search engine to see? Thanks Mozzers,
Technical SEO | | Ideas-Money-Art0 -
Sitemaps for Google
In Google Webmaster Central, if a URL is reported in your site map as 404 (Not found), I'm assuming Google will automatically clean it up and that the next time we generate a sitemap, it won't include the 404 URL. Is this true? Do we need to comb through our sitemap files and remove the 404 pages Google finds, our will it "automagically" be cleaned up by Google's next crawl of our site?
Technical SEO | | Prospector-Plastics0