Sitemap - % of URL's in Google Index?
-
What is the average % of links from a sitemap that are included in the Google index? Obviously want to aim for 100% of the sitemap urls to be indexed, is this realistic?
-
If all the pages in your sitemap are worthy of the Google index, then you should expect around a 100% indexation rate. On the flip side, if you reference low quality pages in your sitemap file, you will not got them indexed and may even be hurting the trust of your sitemap file. As a point in case, Bing just recently announced that if they see an error rate greater than 1% in the sitemap, then they will just ignore your sitemap file.
-
Clients, so I have no idea how they do it. It's a complex automated process for sure.
-
Wow. Do you have a third party program to build your site map files or our you using something built in house?
-
Ryan's point is important to note. 100% is achievable under the correct circumstances. I've got a client with 34 million pages on their main site (and contained within a combined 909 sitemap xml files), and they have 34 million pages indexed.
-
The percent of pages indexed varies greatly with each site. If you desire 100% of your site indexed then 100% of your site's pages should be reviewed to ensure their content is worthy of being indexed. The content should be unique, well written and properly presented. Your sitemap process also needs to be carefully reviewed. Many site owners simply set up an automated process without taking the time to ensure it is properly configured. Often pages which are blocked by robots.txt are included in the site map, and those pages will not be indexed.
Many people say "I want 100% of my site indexed" just how many people say "I want to be #1 rank in Google". Both results are achievable, but both require time and effort, and perhaps money.
-
Hi. We have a stiemap with over 250,000 URLs and we are at 87%. This is a high for us. We have never been able to get 100%. We have been trying to clean up the sitemap a bit but with so many URLs it is hard to go through it line by line. We are making more of an effort to fix the errors Google tells us about in Webmaster Tools but these only account for a fraction of the URLs apparently not indexed.
We also do site searches on Google to see how many URLs total we have in Google as our sitemap only includes "the most important" pages. Doing a search for "site:www.sierratradingpost.com" comes up with over 400,000 URLs.
For us, I don't think 100% is realistic. We have never been able to achieve it. It will be interesting to see what other SEOmozers have to report!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
301 vs Canonical - With A Side of Partial URL Rewrite and Google URL Parameters-OH MY
Hi Everyone, I am in the middle of an SEO contract with a site that is partially HTML pages and the rest are PHP and part of an ecommerce system for digital delivery of college classes. I am working with a web developer that has worked with this site for many years. In the php pages, there are also 6 different parameters that are currently filtered by Google URL parameters in the old Google Search Console. When I came on board, part of the site was https and the remainder was not. Our first project was to move completely to https and it went well. 301 redirects were already in place from a few legacy sites they owned so the developer expanded the 301 redirects to move everything to https. Among those legacy sites is an old site that we don't want visible, but it is extensively linked to the new site and some of our top keywords are branded keywords that originated with that site. Developer says old site can go away, but people searching for it are still prevalent in search. Biggest part of this project is now to rewrite the dynamic urls of the product pages and the entry pages to the class pages. We attempted to use 301 redirects to redirect to the new url and prevent the draining of link juice. In the end, according to the developer, it just isn't going to be possible without losing all the existing link juice. So its lose all the link juice at once (a scary thought) or try canonicals. I am told canonicals would work - and we can switch to that. My questions are the following: 1. Does anyone know of a way that might make the 301's work with the URL rewrite? 2. With canonicals and Google parameters, are we safe to delete the parameters after we have ensures everything has a canonical url (parameter pages included)? 3. If we continue forward with 301's and lose all the existing links, since this only half of the pages in the site (if you don't count the parameter pages) and there are only a few links per page if that, how much of an impact would it have on the site and how can I avoid that impact? 4. Canonicals seem to be recommended heavily these days, would the canonical urls be a better way to go than sticking with 301's. Thank you all in advance for helping! I sincerely appreciate any insight you might have. Sue (aka Trudy)
Intermediate & Advanced SEO | | TStorm1 -
Mass Removal Request from Google Index
Hi, I am trying to cleanse a news website. When this website was first made, the people that set it up copied all kinds of articles they had as a newspaper, including tests, internal communication, and drafts. This site has lots of junk, but this kind of junk was on the initial backup, aka before 1st-June-2012. So, removing all mixed content prior to that date, we can have pure articles starting June 1st, 2012! Therefore My dynamic sitemap now contains only articles with release date between 1st-June-2012 and now Any article that has release date prior to 1st-June-2012 returns a custom 404 page with "noindex" metatag, instead of the actual content of the article. The question is how I can remove from the google index all this junk as fast as possible that is not on the site anymore, but still appears in google results? I know that for individual URLs I need to request removal from this link
Intermediate & Advanced SEO | | ioannisa
https://www.google.com/webmasters/tools/removals The problem is doing this in bulk, as there are tens of thousands of URLs I want to remove. Should I put the articles back to the sitemap so the search engines crawl the sitemap and see all the 404? I believe this is very wrong. As far as I know this will cause problems because search engines will try to access non existent content that is declared as existent by the sitemap, and return errors on the webmasters tools. Should I submit a DELETED ITEMS SITEMAP using the <expires>tag? I think this is for custom search engines only, and not for the generic google search engine.
https://developers.google.com/custom-search/docs/indexing#on-demand-indexing</expires> The site unfortunatelly doesn't use any kind of "folder" hierarchy in its URLs, but instead the ugly GET params, and a kind of folder based pattern is impossible since all articles (removed junk and actual articles) are of the form:
http://www.example.com/docid=123456 So, how can I bulk remove from the google index all the junk... relatively fast?0 -
Google is indexing the wrong pages
I have been having problems with Google indexing my website since mid May. I haven't made any changes to my website which is wordpress. I have a page with the title 'Peterborough Cathedral wedding', I search Google for 'wedding Peteborough Cathedral', this is not a competitive search phrase and I'd expect to find my blog post on page one. Instead, half way down page 4 I find Google has indexed www.weddingphotojournalist.co.uk/blog with the title 'wedding photojournalist | Portfolio', what google has indexed is a link to the blog post and not the blog post itself. I repeated this for several other blog posts and keywords and found similar results, most of which don't make any sense at all - A search for 'Menorca wedding photography' used to bring up one of my posts at the top of page one. Now it brings up a post titled 'La Mare wedding photography Jersey" which happens to have a link to the Menorca post at the bottom of the page. A search for 'Broadoaks country house weddng photography' brings up 'weddingphotojournalist | portfolio' which has a link to the Broadoaks post. a search for 'Blake Hall wedding photography' does exactly the same. In this case Google is linking to www.weddingphotojournalist.blog again, this is a page of recent blog posts. Could this be a problem with my sitemap? Or the Yoast SEO plugin? or a problem with my wordpress theme? Or is Google just a bit confused?
Intermediate & Advanced SEO | | weddingphotojournalist0 -
Canonical URL & sitemap URL mismatch
Hi We're running a Magento store which doesn't have too much stock rotation. We've implemented a plugin that will allow us to give products custom canonical URLs (basically including the category slug, which is not possible through vanilla Magento). The sitemap feature doesn't pick up on these URLs, so we're submitting URLs to Google that are available and will serve content, but actually point to a longer URL via a canonical meta tag. The content is available at each URL and is near identical (all apart from the breadcrumbs) All instances of the page point to the same canonical URL We are using the longer URL in our internal architecture/link building to show this preference My questions are; Will this harm our visibility? Aside from editing the sitemap, are there any other signals we could give Google? Thanks
Intermediate & Advanced SEO | | tomcraig860 -
Google Disavow wrong Sample URLs
Sometime last year we were penalized for unnatural links by Google. We followed a couple of good posts on MOZ on how to go about correcting all of this (Excel Showing work, Google Docs, Ahrefs, semrush, etc...) and submitted a reconsideration request as well as a list of urls using Disavow Tool. So today, about 20 days later, we get a reply from Google stating that we still have some links that are outside their quality guidelines with 3 examples. Problem is, none of the 3 examples they listed have our website on them. I even did a View Source and checked all our inbound link reports for these websites. All of the 3 examples are Lawyer / Legal Advise websites and ours has nothing to do with this. Any ideas on how to reply and ask them to double check? Maybe they mixed up with another account they were working on at the same time.
Intermediate & Advanced SEO | | Apex-Lighting0 -
How is Google crawling and indexing this directory listing?
We have three Directory Listing pages that are being indexed by Google: http://www.ccisolutions.com/StoreFront/jsp/ http://www.ccisolutions.com/StoreFront/jsp/html/ http://www.ccisolutions.com/StoreFront/jsp/pdf/ How and why is Googlebot crawling and indexing these pages? Nothing else links to them (although the /jsp.html/ and /jsp/pdf/ both link back to /jsp/). They aren't disallowed in our robots.txt file and I understand that this could be why. If we add them to our robots.txt file and disallow, will this prevent Googlebot from crawling and indexing those Directory Listing pages without prohibiting them from crawling and indexing the content that resides there which is used to populate pages on our site? Having these pages indexed in Google is causing a myriad of issues, not the least of which is duplicate content. For example, this file <tt>CCI-SALES-STAFF.HTML</tt> (which appears on this Directory Listing referenced above - http://www.ccisolutions.com/StoreFront/jsp/html/) clicks through to this Web page: http://www.ccisolutions.com/StoreFront/jsp/html/CCI-SALES-STAFF.HTML This page is indexed in Google and we don't want it to be. But so is the actual page where we intended the content contained in that file to display: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff As you can see, this results in duplicate content problems. Is there a way to disallow Googlebot from crawling that Directory Listing page, and, provided that we have this URL in our sitemap: http://www.ccisolutions.com/StoreFront/category/meet-our-sales-staff, solve the duplicate content issue as a result? For example: Disallow: /StoreFront/jsp/ Disallow: /StoreFront/jsp/html/ Disallow: /StoreFront/jsp/pdf/ Can we do this without risking blocking Googlebot from content we do want crawled and indexed? Many thanks in advance for any and all help on this one!
Intermediate & Advanced SEO | | danatanseo0 -
Google's serp
Hello Guys ! I will appreciate if you will share your thoughts re the situation i have. The homepage for one of my sites is one last page of google's serp, although internal pages are displayed in the top 10. 1. Why ?
Intermediate & Advanced SEO | | Webdeal
2. What should I do to correct the situation with the homepage ? regards0 -
Could a HTML <select>with large numbers of <option value="<url>">'s affect my organic rankings</option></select>
Hi there, I'm currently redesigning my website, and one particular pages lists hotels in New York. Some functionality I'm thinking of adding in is to let the user find hotels close to specific concert venues in New York. My current thinking is to provide the following select element on the page - selecting any one of the options will automatically redirect to my page for that concert venue. The purpose of this isn't to affect the organic traffic - I'm simply introducing this as a tool to help customers find the right hotel, but I certainly don't want it to have an adverse effect on my organic traffic. I'd love to know your thoughts on this. I must add that in certain cities, such as New York, there could be up to 450 different options in this select element. | <select onchange="location=options[selectedIndex].value;"> <option value="">Show convenient hotels for:</option> <option value="http://url1..">1492 New York</option> <option value="http://url2..">Abrons Arts Center</option> <option value="http://url3..">Ace of Clubs New York</option> <option value="http://url4..">Affairs Afloat</option> <option value="http://url5..">Affirmation Arts New York</option> <option value="http://url6..">Al Hirschfeld Theatre</option> <option value="http://url7..">Alice Tully Hall</option> .. .. ..</select> Many thanks Mike |
Intermediate & Advanced SEO | | mjk260