Which pages should I index or have in my XML sitemap?
-
Hi there,
my website is ConcertHotels.com - a site which helps users find hotels close to concert venues. I have a hotel listing page for every concert venue on my site - about 12,000 of them I think (and the same for nearby restaurants).
e.g.
https://www.concerthotels.com/venue-hotels/madison-square-garden-hotels/304484
Each of these pages list the nearby hotels to that concert venue. Users clicking on the individual hotel are brought through to a hotel (product) page e.g.
https://www.concerthotels.com/hotel/the-new-yorker-a-wyndham-hotel/136818
I made a decision years ago to noindex all of the /hotel/ pages since they don't have a huge amount of unique content and aren't the pages I'd like my users to land on . The primary pages on my site are the /venue-hotels/ listing pages.
I have similar pages for nearby restaurants, so there are approximately 12,000 venue-restaurants pages, again, one listing page for each concert venue.
However, while all of these pages are potentially money-earners, in reality, the vast majority of subsequent hotel bookings have come from a fraction of the 12,000 venues. I would say 2000 venues are key money earning pages, a further 6000 have generated income of a low level, and 4000 are yet to generate income.
I have a few related questions:
-
Although there is potential for any of these pages to generate revenue, should I be brutal and simply delete a venue if it hasn't generated revenue within a time period, and just accept that, while it "could" be useful, it hasn't proven to be and isn't worth the link equity. Or should I noindex these "poorly performing pages"?
-
Should all 12,000 pages be listed in my XML sitemap? Or simply the ones that are generating revenue, or perhaps just the ones that have generated significant revenue in the past and have proved to be most important to my business?
Thanks
Mike
-
-
Hi Chris,
thank you very much for your help and suggestions, it is much appreciated. I'll de-noindex a handful of my biggest artist pages and see if they attract much interest from users.
As for the /venues/ pages, these have been fairly neglected to date, so perhaps I need to really focus my attention on them, as you say, and bring in some cross referencing.
I have also wondered whether allowing companies to create pages dedicated to their events would be a good route to take - it could be done with ease, so perhaps I should investigate further.
Again, thanks very much, and hopefully I can report back with good news at some point.
Best wishes
Mike
-
I think they should be indexed, but keyword research should shed light on this topic for you. It will let you know if your audience is searching for those things and in what numbers. Even as they are, though, they might make sufficient landing pages for google. You could de-noindex a group of those pages at a time, starting with the ones most likely to be popular and see how google treats them. I think I'd go that route rather than release them into the wild all at once.
To me, the pages with the most interesting potential are the /venues/ pages like /venues/md-concert-venues/a, for example. I think the potential lies in populating them with venue grouping, upcoming artists grouping, and state. How hard would it be to populate an area above the black line with all/some of the upcoming artists playing near the hotels that show on that page. That 3-way cross referencing would make those pages fairly unique on the web and unique on your site and would give google a number of good reasons to send traffic there. They'd probably be good pages to publish advertising on, too.
Also wondering if there is a thing such as "licensing" dedicated pages out to companies/hotels that are putting on non-musical events like conferences, etc, so they can link to a kind of pre-fab hotels-close-by page up for their attendees?
-
Thanks Chris,
appreciate your comments. Google in indexing a high percentage of the key pages, and does not have any noindex pages indexed. Pages are loading at a decent speed. And only indexed pages are in the sitemap. So perhaps the non-performing pages are not something I should be particularly concerned about, especially they don't necessarily take up much of my time. I guess if I start to run into issues with overall site speed then perhaps then is is the time to consider whether they should continue to be listed. So perhaps, you're right, it's more of a business decision, rather than an SEO one.
I have a further question if you don't mind, which is related but I think is an SEO one. I have a large number of /artist/ pages - these are pages that list which venues a particular artist is performing at, and allows a user to then check hotel availability for the specific venue and date they will be attending. At the minute the pages are fairly light on content - they just list venues and dates, although I'm planning to start introducing more content in the near future. An example page can be seen here:
https://www.concerthotels.com/artist/hotels-near-guns-n-roses-events/1227
At the minute, I've noindexed every artist page on the site, because I was worried Google would see them as thin pages. But I actually think they are potentially very useful to users, and a powerful landing page for quickly taking a user to the correct venue page with the correct dates for the concert. I also think that not all users will search for "Hotels near Metlife Stadium" - they might instead search for "Hotels for Guns n roses in NJ..." etc etc. so perhaps I can pick up some long tail searches with these additional landing pages.
The question is, should I index these pages?
If the answer to that question is yes..... obviously, artists/bands do a tour and then generally disappear into a recording studio for a year or two - as a result, there will be many /artist/ pages that, for a while, have lots of useful event dates/venues listed, but at the end of the tour, the pages will simply be empty, and no longer useful, at least until the next tour. Would you recommend that such pages are indexed when there are events, but when no future events are listed, I set them to noindex?
Many thanks
Mike
-
Mike,
I'm wondering...is that an SEO question? It sounds like a business decision to me. From what you've said, I don't see any reason for Google to ding you on anything. My only questions would be--Is google indexing all the pages you want it to and does not have your noindex pages indexed? Any bad links coming in? Pages are loading at a decent speed? Oh, and I don't see a reason to have your noindex pages in the the sitemap.
Other than that, if those non-performing page are taking up time that you could be spending on more productive pages or on exploring more productive opportunities, then, again, it's time to put on your CEO cap.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sitemaps:
Hello, doing an audit found in our sitemaps the tag which at the time was to say that the url was mobile. In our case the URL is the same for desktop and mobile.
Technical SEO | | romaro
Do you recommend leaving or removing it?
Thank you!0 -
Search Console Indexed Page Count vs Site:Search Operator page count
We launched a new site and Google Search Console is showing 39 pages have been indexed. When I perform a Site:myurl.com search I see over 100 pages that appear to be indexed. Which is correct and why is there a discrepancy? Also, Search Console Page Index count started at 39 pages on 5/21 and has not increased even though we have hundreds of pages to index. But I do see more results each week from Site:psglearning.com My site is https://wwww.psglearning.com
Technical SEO | | pdowling0 -
Indexed pages
Just started a site audit and trying to determine the number of pages on a client site and whether there are more pages being indexed than actually exist. I've used four tools and got four very different answers... Google Search Console: 237 indexed pages Google search using site command: 468 results MOZ site crawl: 1013 unique URLs Screaming Frog: 183 page titles, 187 URIs (note this is a free licence, but should cut off at 500) Can anyone shed any light on why they differ so much? And where lies the truth?
Technical SEO | | muzzmoz1 -
Upgrade old sitemap to a new sitemap index. How to do without danger ?
Hi MOZ users and friends. I have a website that have a php template developed by ourselves, and a wordpress blog in /blog/ subdirectory. Actually we have a sitemap.xml file in the root domain where are all the subsections and blog's posts. We upgrade manually the sitemap, once a month, adding the new posts created in the blog. I want to automate this process , so i created a sitemap index with two sitemaps inside it. One is the old sitemap without the blog's posts and a new one created with "Google XML Sitemap" wordpress plugin, inside the /blog/ subdirectory. That is, in the sitemap_index.xml file i have: Domain.com/sitemap.xml (old sitemap after remove blog posts urls) Domain.com/blog/sitemap.xml (auto-updatable sitemap create with Google XML plugin) Now i have to submit this sitemap index to Google Search Console, but i want to be completely sure about how to do this. I think that the only that i have to do is delete the old sitemap on Search Console and upload the new sitemap index, is it ok ?
Technical SEO | | ClaudioHeilborn0 -
Why would GWT say 0 pages indexed ?
Hi Looking in GWT > Google Index > Index Status says 0 pages indexed Yes if i search manually on google for brand site is listed, and i see organic traffic from Google in analytics I take it this is likely an error in GWT and nothing to worry about ? Cheers Dan
Technical SEO | | Dan-Lawrence0 -
Duplicate pages in Google index despite canonical tag and URL Parameter in GWMT
Good morning Moz... This is a weird one. It seems to be a "bug" with Google, honest... We migrated our site www.three-clearance.co.uk to a Drupal platform over the new year. The old site used URL-based tracking for heat map purposes, so for instance www.three-clearance.co.uk/apple-phones.html ..could be reached via www.three-clearance.co.uk/apple-phones.html?ref=menu or www.three-clearance.co.uk/apple-phones.html?ref=sidebar and so on. GWMT was told of the ref parameter and the canonical meta tag used to indicate our preference. As expected we encountered no duplicate content issues and everything was good. This is the chain of events: Site migrated to new platform following best practice, as far as I can attest to. Only known issue was that the verification for both google analytics (meta tag) and GWMT (HTML file) didn't transfer as expected so between relaunch on the 22nd Dec and the fix on 2nd Jan we have no GA data, and presumably there was a period where GWMT became unverified. URL structure and URIs were maintained 100% (which may be a problem, now) Yesterday I discovered 200-ish 'duplicate meta titles' and 'duplicate meta descriptions' in GWMT. Uh oh, thought I. Expand the report out and the duplicates are in fact ?ref= versions of the same root URL. Double uh oh, thought I. Run, not walk, to google and do some Fu: http://is.gd/yJ3U24 (9 versions of the same page, in the index, the only variation being the ?ref= URI) Checked BING and it has indexed each root URL once, as it should. Situation now: Site no longer uses ?ref= parameter, although of course there still exists some external backlinks that use it. This was intentional and happened when we migrated. I 'reset' the URL parameter in GWMT yesterday, given that there's no "delete" option. The "URLs monitored" count went from 900 to 0, but today is at over 1,000 (another wtf moment) I also resubmitted the XML sitemap and fetched 5 'hub' pages as Google, including the homepage and HTML site-map page. The ?ref= URls in the index have the disadvantage of actually working, given that we transferred the URL structure and of course the webserver just ignores the nonsense arguments and serves the page. So I assume Google assumes the pages still exist, and won't drop them from the index but will instead apply a dupe content penalty. Or maybe call us a spam farm. Who knows. Options that occurred to me (other than maybe making our canonical tags bold or locating a Google bug submission form 😄 ) include A) robots.txt-ing .?ref=. but to me this says "you can't see these pages", not "these pages don't exist", so isn't correct B) Hand-removing the URLs from the index through a page removal request per indexed URL C) Apply 301 to each indexed URL (hello BING dirty sitemap penalty) D) Post on SEOMoz because I genuinely can't understand this. Even if the gap in verification caused GWMT to forget that we had set ?ref= as a URL parameter, the parameter was no longer in use because the verification only went missing when we relaunched the site without this tracking. Google is seemingly 100% ignoring our canonical tags as well as the GWMT URL setting - I have no idea why and can't think of the best way to correct the situation. Do you? 🙂 Edited To Add: As of this morning the "edit/reset" buttons have disappeared from GWMT URL Parameters page, along with the option to add a new one. There's no messages explaining why and of course the Google help page doesn't mention disappearing buttons (it doesn't even explain what 'reset' does, or why there's no 'remove' option).
Technical SEO | | Tinhat0 -
De-indexing millions of pages - would this work?
Hi all, We run an e-commerce site with a catalogue of around 5 million products. Unfortunately, we have let Googlebot crawl and index tens of millions of search URLs, the majority of which are very thin of content or duplicates of other URLs. In short: we are in deep. Our bloated Google-index is hampering our real content to rank; Googlebot does not bother crawling our real content (product pages specifically) and hammers the life out of our servers. Since having Googlebot crawl and de-index tens of millions of old URLs would probably take years (?), my plan is this: 301 redirect all old SERP URLs to a new SERP URL. If new URL should not be indexed, add meta robots noindex tag on new URL. When it is evident that Google has indexed most "high quality" new URLs, robots.txt disallow crawling of old SERP URLs. Then directory style remove all old SERP URLs in GWT URL Removal Tool This would be an example of an old URL:
Technical SEO | | TalkInThePark
www.site.com/cgi-bin/weirdapplicationname.cgi?word=bmw&what=1.2&how=2 This would be an example of a new URL:
www.site.com/search?q=bmw&category=cars&color=blue I have to specific questions: Would Google both de-index the old URL and not index the new URL after 301 redirecting the old URL to the new URL (which is noindexed) as described in point 2 above? What risks are associated with removing tens of millions of URLs directory style in GWT URL Removal Tool? I have done this before but then I removed "only" some useless 50 000 "add to cart"-URLs.Google says themselves that you should not remove duplicate/thin content this way and that using this tool tools this way "may cause problems for your site". And yes, these tens of millions of SERP URLs is a result of a faceted navigation/search function let loose all to long.
And no, we cannot wait for Googlebot to crawl all these millions of URLs in order to discover the 301. By then we would be out of business. Best regards,
TalkInThePark0