Sitemap.xml - autogenerated by CMS is full of crud
-
Hi all,
hope you can help.
the Magento ecommerce system I'm working with autogenerates sitemap.xml - it's well formed with priority and frequency parameters.
However, it has generated lots of URLs that are pointing to broken pages returning fatal erros, duplicate URLs (not canonicals), 404s etc
I'm thinking of hand creating sitemap.xml - the site has around 50 main pages including products and categories, and I can get the main page URLs listed by screaming frog or xenu.
Then I'll have to get into the hand editing the crud pages with noindex, and useful duplicates with canonicals.
Is this the way to go or is there another solution
thanks in advance for any advice
-
If the cron is working then I would personally turn to the other forum to see if anyone knows a way to rope those messy URLs in and get them under control. I try to avoid manually generating and updating sitemaps whenever I can, because it's a hassle on a small site, not to mention the trouble on an ecommerce site.
If your site is going to stay that small, then a manual sitemap might be less of a headache for you than customizing Magento.
I would worry about keeping a clean sitemap. If the search engines learn that you keep a messy sitemap, they will rely on it less and less. 404 & 500 codes especially, but also redirects and perhaps duplicate content.
For Further Reading:
Google Sitemaps Ask For Clean URLs - http://www.johnfdoherty.com/google-sitemaps-ask-for-clean-urls/
-
Hi Kane,
the sitemap is new - it's just that Magento create lots of duplicate files on the fly & it's not putting the canonical URLs in the sitemap etc.
I just wondered whether its worth hand creating a sitemap.xml containing the content pages (60 or 70 of them) for this relatively small site, or not worry too much about the sitemap, the site is pretty well indexed by google already
I'll head over to the Magento forums again to see if I can find more info
many thanks for you help
-
If it's returning 404 pages, that sounds like a dated sitemap. Have you activated the cron service?
See the "Refreshing Sitemaps at Regular Intervals" section of this page if not:
Magento can be set up to automatically refresh Google Sitemaps at regular intervals. This function is configured in Admin > System > Configuration > Google Sitemap.
To use Magento’s automatic generation of Google Sitemaps, you must activate the Magento Cron service.
If you do have that setup, and you're certain it's working correctly, then I would turn to the forums at MagentoCommerce.com - you're going to get a lot faster answer there since everyone is familiar with that exact platform.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Which pages should I index or have in my XML sitemap?
Hi there, my website is ConcertHotels.com - a site which helps users find hotels close to concert venues. I have a hotel listing page for every concert venue on my site - about 12,000 of them I think (and the same for nearby restaurants). e.g. https://www.concerthotels.com/venue-hotels/madison-square-garden-hotels/304484 Each of these pages list the nearby hotels to that concert venue. Users clicking on the individual hotel are brought through to a hotel (product) page e.g. https://www.concerthotels.com/hotel/the-new-yorker-a-wyndham-hotel/136818 I made a decision years ago to noindex all of the /hotel/ pages since they don't have a huge amount of unique content and aren't the pages I'd like my users to land on . The primary pages on my site are the /venue-hotels/ listing pages. I have similar pages for nearby restaurants, so there are approximately 12,000 venue-restaurants pages, again, one listing page for each concert venue. However, while all of these pages are potentially money-earners, in reality, the vast majority of subsequent hotel bookings have come from a fraction of the 12,000 venues. I would say 2000 venues are key money earning pages, a further 6000 have generated income of a low level, and 4000 are yet to generate income. I have a few related questions: Although there is potential for any of these pages to generate revenue, should I be brutal and simply delete a venue if it hasn't generated revenue within a time period, and just accept that, while it "could" be useful, it hasn't proven to be and isn't worth the link equity. Or should I noindex these "poorly performing pages"? Should all 12,000 pages be listed in my XML sitemap? Or simply the ones that are generating revenue, or perhaps just the ones that have generated significant revenue in the past and have proved to be most important to my business? Thanks Mike
Technical SEO | | mjk260 -
.xml sitemap showing in SERP
Our sitemap is showing in Google's SERP. While it's only for very specific queries that don't seem to have much value (it's a healthcare website and when a doctor who isn't with us is search with the brand name so 'John Smith Brand,' it shows if there's a first or last name that matches the query), is there a way to not make the sitemap indexed so it's not showing in the SERP. I've seen the "x-robots-tag: noindex" as a possible option, but before taking any action wanted to see if this was still true and if it would work.
Technical SEO | | Kyleroe950 -
Sitemap error in Webmaster tools - 409 error (conflict)
Hey guys, I'm getting this weird error when I submit my sitemap to Google. It says I'm getting a 409 error in my post-sitemap.xml file (https://cleargear.com/post-sitemap.xml). But when I check it, it looks totally fine. I am using YoastSEO to generate the sitemap.xml file. Has anyone else experienced this? Is this a big deal? If so, Does anyone know how to fix? Thanks EwTswL4
Technical SEO | | Extima-Christian0 -
CMS Auto Generated Sitemap Work Around?
Hey Moz Community, The Shopify ecommerce platform auto generates xml sitemaps and robots.txt for you. Frustratingly there is no way to augment either of these. If I noindex on a page it will still show up in the site map... Causing inconstancy with the sitemap submitted to GWT. In theory if put the MY version of the sitemap on site and point GWT to MY version.. Would this solve the inconstancy ? Or would Googlebot go in and still crawl the default /sitemap.xml anyway? Any suggestions and insight is greatly appreciated!
Technical SEO | | paul-bold0 -
Removing images from site and Image Sitemap SEO advice
Hello again, I have received an update request where they want me to remove images from this site (as of now its a bunch of thumbnails) current page design: http://1stimpressions.com/portfolio/car-wraps/ and turn it into a new design which utilized a slider (such as this): http://1stimpressions.com/portfolio/ They don't want the thumbnails on the page anymore. My question is since my site has a image sitemap that has been indexed will removing all the images hurt my SEO greatly? What would the recommended steps to take to reduce any SEO damage be, if so? Thank you again for your help, always great and very helpful feedback! 🙂 cheers!
Technical SEO | | allstatetransmission0 -
Google Sitemap - How Long Does it Take Google To Index?
We have changed our sitemap about 1 month ago and Google is yet to index it. We have run a site: search and we still have many pages indexed but we are wondering how long does it take for google to index our sitemap? The last sitemap we put up had thousands of pages indexed within a fortnight, but for some reason this version is taking way longer. We are also confident that there are no errors in this version. Help!
Technical SEO | | JamesDFA0 -
Sitemap error
Hi, When i search for my blog post in google i get sitemap results, and when i click on it i get an error, here is the screen shot http://screencast.com/t/lXOIiTnVZR1 http://screencast.com/t/MPWkuc4Ocixy How can i fix that, it loos like if i just add www. it work just fine. Thanks
Technical SEO | | tonyklu0 -
How do i Organize an XML Sitemap for Google Webmaster Tools?
OK, so i used am xlm sitemap generator tool, xml-sitemaps.com, for Google Webmaster Tools submission. The problem is that the priorities are all out of wack. How on earth do i organize it with 1000's of pages?? Should i be spending hours organizing it?
Technical SEO | | schmeetz0