Some URLs in the sitemap not indexed
-
Our company site has hundreds of thousands of pages. Yet no matter how big or small the total page count, I have found that the "URLs Indexed" in GWMT has never matched "URLS in Sitemap". When we were small and now that we have a LOT more pages, there is always a discrepancy of ~10% or so missing from the index.
It's difficult to know which pages are not indexed, but I have found some that I can verify are in the Sitemap.xml file but not at all in the index. When I go to GWMT I can "Fetch and Render" missing pages fine - it's not as though it's blocked or inaccessible.
Any ideas on why this is? Is this type of discrepancy typical?
-
Thanks. Very helpful!
-
This is great to know that 10% is a good discrepancy. Hard to know otherwise.
That article about Screaming Frog is super helpful, thanks!
-
I have never had a site with 100% crawled pages, sometimes Google will drop a page off for being too similar to another, not informative enough, canonical links set, redirects.
As Ryan says, don't just rely on Moz use Screaming Frog to get a good view of your site too, see if there are any errors. Also you can run the frog whenever you like, it's just a little more technical to understand.
Xenu oooh never heard of that one Ryan thanks!
Just looked into Xenu, Screaming frog does it all and some.
-
Hi Mase,
I've managed sites with with hundreds of thousands of pages too, and in my experience a discrepancy between what's offered up via the sitemaps and what gets indexed is typical (dare I say it, a 10% discrepancy seems pretty good!). Pages deeper in the site seem to suffer this fate more frequently than those with fewer subfolders, as do those with thin content.
I agree completely with Ryan's comment about Screaming Frog: it is an invaluable tool for site audits, in addition to lots of other useful site insights. You might find this article interesting to get a sense of the many ways you can use SF: http://www.seerinteractive.com/blog/screaming-frog-guide/
-
You're welcome. Definitely take a look at a crawler that gives you more insight, especially with a site as large as yours. Just note, no matter what you might never achieve an exact match between the pages you've submitted and the number indexed as Google can decide not to index a page for other reasons aside from the page's presence in a site map. Something useful for you as well would be to look at how many of your pages recieve visits in analytics. That will give you an idea of percentages on pages in the sitemap vs the index vs active.
-
I have not run the site through those tools you mentioned, I'm unfamiliar.
I am not, however, receiving any errors on those pages. And when I "Fetch and Render" in GWMT, they look and render fine without errors. I'm able to submit them to the index one-by-one.
Thanks for your response, Ryan.
-
Hi Mase. Are you getting errors on URLs you've submitted? Or ran other crawlers on your site like Xenu or ScreamingFrog to produce any possible errors? It's also good to know which pages might not have enough content to be indexed: filters, sorting views, etc.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
What are the negative implications of listing URLs in a sitemap that are then blocked in the robots.txt?
In running a crawl of a client's site I can see several URLs listed in the sitemap that are then blocked in the robots.txt file. Other than perhaps using up crawl budget, are there any other negative implications?
Technical SEO | | richdan0 -
Staging & Development areas should be not indexable (i.e. no followed/no index in meta robots etc)
Hi I take it if theres a staging or development area on a subdomain for a site, who's content is hence usually duplicate then this should not be indexable i.e. (no-indexed & nofollowed in metarobots) ? In order to prevent dupe content probs as well as non project related people seeing work in progress or finding accidentally in search engine listings ? Also if theres no such info in meta robots is there any other way it may have been made non-indexable, or at least dupe content prob removed by canonicalising the page to the equivalent page on the live site ? In the case in question i am finding it listed in serps when i search for the staging/dev area url, so i presume this needs urgent attention ? Cheers Dan
Technical SEO | | Dan-Lawrence0 -
I have altered a url as it was too long. Do I need to do a 301 redirect for the old url?
Crawl diagnostics has shown a url that is too long on one of our sites. I have altered it to make it shorter. Do I now need to do a 301 redirect from the old url? I have altered a url previously and the old url now goes to the home page - can't understand why. Anyone know what is best practice here? Thanks
Technical SEO | | kingwheelie0 -
De-indexed from Google
Hi Search Experts! We are just launching a new site for a client with a completely new URL. The client can not provide any access details for their existing site. Any ideas how can we get the existing site de-indexed from Google? Thanks guys!
Technical SEO | | rikmon0 -
Will rel canonical tags remove previously indexed URLs?
Hello, 7 days ago, we implemented canonical tags to resolve duplicate content issues that had been caused by URL parameters. These "duplicate content" had already been indexed. Now that the URLs have rel canonical tags in place, will Google automatically remove from its index the other URLs with the URL parameters? I ask because we have been tracking the approximate number of URLs indexed by doing a site: search in Google, and we have barely noticed a decrease in URLs indexed. Thanks.
Technical SEO | | yacpro130 -
How to keep a URL social equity during a URL structure/name change?
We are in the process of making significant URL name/structure change to one of our property and we want to keep the social equity (likes, share, +1, tweets) from the old to the new URL. We have been trying many different option without success. We are running our social "button" in an iframe. Thanks
Technical SEO | | OlivierChateau0 -
Keywords in Vanity URL
If I set up a vanity URL that just 301's to the main site, do the search engines look at the keywords in the vanity URL when determing how to rank the site. For example, if I set up a vanity URL of www.coolnewtechgear.com, and redirect it to www.company.com/products/, would the search engines view the keywords of cool, new, tech, and gear and associate that with the page it's getting redirected to? Or does it ignore the vanity URL and only look at the content of the page itself?
Technical SEO | | ryanwats0 -
Why is a 301 redirected url still getting indexed?
We recently fixed a redirect issue in a website, and although it appears that the redirection is working fine, the url in question keeps on getting crawled, indexed and cached by google. The redirect was done a month ago, and google shows cached version of it, even for a couple of days ago. Manual checking shows that its being redirected, and also a couple of online tools i checked report a 301 redirect. Do you have any idea why this could be happening? The website I'm talking about is www.hotelmajestic.gr and its being redirected to www.hotel-majestic.gr
Technical SEO | | dim_d0