Sitemaps and Indexed Pages
-
Hi guys,
I created an XML sitemap and submitted it for my client last month.
Now the developer of the site has also been messing around with a few things.
I've noticed on my Moz site crawl that indexed pages have dropped significantly.
Before I put my foot in it, I need to figure out if submitting the sitemap has caused this.. can a sitemap reduce the pages indexed?
Thanks
David.
-
Sorry - I missed the part about you looking specifically at the Moz crawler. While useful, it's a stand-in for what will actually be used for rankings - namely the actual crawls by the search engine crawlers themselves. I'd be looking right to the source for that info if you're concerned there's an issue, rather than trusting just Mozbot. You can find the SE crawlers data in Google Search Console and Bing Webmaster Tools. Look for trends and patterns there, especially around the sitemap report.
The challenge to a Screaming Frog-rendered sitemap is that it can only find what's linked. If the site has orphaned pages or an ineffective internal linking scheme, a crawl could easily miss pages. It's certainly better than no sitemap, but a map generated by the site's technology itself (usually the database) is safer.
P.
-
Thanks Paul,
Yes there has been a big clean up of pages. There were over 80,000 to begin with. I managed to get that down to about 14k but then last month MOZ bot only crawled about 4,000 pages.
I was just a bit worried that the sitemap generated by Screaming Frog was incorrect and therefore that was the reason for the drop.
I was referring mainly to the MOZ site crawl. I guess I was worried that the MOZ bot only followed the sitemap!
There were loads of filter URL's and all sorts going on so it's a bit of a spiders web!
-
No - submitting a sitemap won't reduce the crawl of a site. The search engines will crawl the sitemap and add these pages to the index if they consider them worthy. But they'll still also crawl any other links/pages they can find in other ways and index those as well if they consider them worthy.
Note though - having the number of indexed pages drop is not necessarily a bad thing. If removing a large number of worthless/duplicate/canonicalised/no-indexed pages cleans up the site, that will also be reflected in fewer crawled pages - an indication that quality improvement work was effective.
That help?
Paul
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sitemaps and Indexed Pages
Hi guys, I created an XML sitemap and submitted it for my client last month. Now the developer of the site has also been messing around with a few things. I've noticed on my Moz site crawl that indexed pages have dropped significantly. Before I put my foot in it, I need to figure out if submitting the sitemap has caused this.. can a sitemap reduce the pages indexed? Thanks David. TInSM
API | | Slumberjac0 -
Frequency of Moz page authority updates?
I have some new pages on my site, and Moz gives them a very low PA ranking. I am wondering if these scores are updated monthly or quarterly? I'm not sure how frequently to check back for updated scoring.
API | | AndrewMicek0 -
Why did the April Index Raise DA?
All of our websites DA raised dramatically, including the competitors we track Any idea why this may have happened across the board?
API | | Blue_Compass0 -
First Mozscape index of the year is live
I'm happy to announce, the first index of the year is out. We did have a smaller count of subdomains, but correlations are generally up and coverage of what's in Google looks better, too. We're giving that one a high five! We've (hopefully) removed a lot of foreign and spam subdomains, which you might see reflected in your spam links section. (another woot!) Here are some details about this index release: 145,549,223,632 (145 billion) URLs 1,356,731,650 (1 billion) subdomains 200,255,095 (200 million) root domains 1,165,625,349,576 (1.1 Trillion) links Followed vs nofollowed links 3.17% of all links found were nofollowed 63.49% of nofollowed links are internal 36.51% are external Rel canonical: 26.50% of all pages employ the rel=canonical tag The average page has 89 links on it 72 internal links on average 17 external links on average Thanks! PS - For any questions about DA/PA fluctuations (or non-fluctuations) check out this Q&A thread from Rand: https://mza.seotoolninja.com/community/q/da-pa-fluctuations-how-to-interpret-apply-understand-these-ml-based-scores.
API | | jennita5 -
September's Mozscape Update Broke; We're Building a New Index
Hey gang, I hate to write to you all again with more bad news, but such is life. Our big data team produced an index this week but, upon analysis, found that our crawlers had encountered a massive number of non-200 URLs, which meant this index was not only smaller, but also weirdly biased. PA and DA scores were way off, coverage of the right URLs went haywire, and our metrics that we use to gauge quality told us this index simply was not good enough to launch. Thus, we're in the process of rebuilding an index as fast as possible, but this takes, at minimum 19-20 days, and may take as long as 30 days. This sucks. There's no excuse. We need to do better and we owe all of you and all of the folks who use Mozscape better, more reliable updates. I'm embarassed and so is the team. We all want to deliver the best product, but continue to find problems we didn't account for, and have to go back and build systems in our software to look for them. In the spirit of transparency (not as an excuse), the problem appears to be a large number of new subdomains that found their way into our crawlers and exposed us to issues fetching robots.txt files that timed out and stalled our crawlers. In addition, some new portions of the link graph we crawled exposed us to websites/pages that we need to find ways to exclude, as these abuse our metrics for prioritizing crawls (aka PageRank, much like Google, but they're obviously much more sophisticated and experienced with this) and bias us to junky stuff which keeps us from getting to the good stuff we need. We have dozens of ideas to fix this, and we've managed to fix problems like this in the past (prior issues like .cn domains overwhelming our index, link wheels and webspam holes, etc plagued us and have been addressed, but every couple indices it seems we face a new challenge like this). Our biggest issue is one of monitoring and processing times. We don't see what's in a web index until it's finished processing, which means we don't know if we're building a good index until it's done. It's a lot of work to re-build the processing system so there can be visibility at checkpoints, but that appears to be necessary right now. Unfortunately, it takes time away from building the new, realtime version of our index (which is what we really want to finish and launch!). Such is the frustration of trying to tweak an old system while simultaneously working on a new, better one. Tradeoffs have to be made. For now, we're prioritizing fixing the old Mozscape system, getting a new index out as soon as possible, and then working to improve visibility and our crawl rules. I'm happy to answer any and all questions, and you have my deep, regretful apologies for once again letting you down. We will continue to do everything in our power to improve and fix these ongoing problems.
API | | randfish11 -
Suggestion - Should OSE include "citation links" within its index?
This is really a suggestion (and debate to see if people agree with me), with regard to including "citation links" within Moz tools, by default, as just another type of link NOTE: when I am talking about "citation links" I am talking about a link that is not wrapped in a link tag and is therefore non clickable, eg moz.com Obviously Moz have released the mentions tool, which is great, and also FWE which is also great. However, it would seem to me that they are missing a trick in that "citation links" don't feature in the main link index at all. We know that Google as a minimum uses them as an indicator to crawl a page ( http://ignitevisibility.com/google-confirms-url-citations-can-help-pages-get-indexed/ ), and also that they don't pass page rank - HOWEVER, you would assume that google does use then as part of their alogrithm in some manner as they do nofollow links. It would seem to me that a "Citation Link" could (possibly) be deemed more important than a no follow link in Googles alogrithm, as a "no follow" link is a clear indication by the site owner that they don't fully trust the link, but a citation link would neither indicate trust or non trust. So - my request is to get "citation links" into the main link index (and the Just Discovered index for that matter). Would others agree??
API | | James770 -
On-Page Reports showing old urls
While taking a look at our sites on-page reports I noticed some of our keywords with very old urls that haven't existed for close to a year. How do I make sure moz's keyword ranking is finding the correct page and make sure I'm not getting graded on that keywords/urls that don't exist any more or have been 301'd to new urls? Is there a way to clean these out? My on-page reports say I have 62 reports for only a total of 34 keywords in rankings. As you can see from the image most of the urls for "tax folder" have now been 301'd to not include /product or /category but moz is still showing them with the old url structure. BTW our site is minespress.com 2KdGcPL.png
API | | smines0 -
Huge in crease in on page errors
Hi guys I’ve just checked my online campaign and I see errors in my crawl diagnostics have almost doubled from the 21<sup>st</sup> of October to the 25<sup>th</sup> of October going from 6708 errors to 11 599. Can anyone tell me what may have caused this? Also I notice we have a lot of issues with duplicate page titles which seems strange as no new pages have been added, can anyone explain why this might be? I look forward to hearing from you
API | | Hardley1110