Sitemaps and Indexed Pages
-
Hi guys,
I created an XML sitemap and submitted it for my client last month.
Now the developer of the site has also been messing around with a few things.
I've noticed on my Moz site crawl that indexed pages have dropped significantly.
Before I put my foot in it, I need to figure out if submitting the sitemap has caused this.. can a sitemap reduce the pages indexed?
Thanks
David.
-
Sorry - I missed the part about you looking specifically at the Moz crawler. While useful, it's a stand-in for what will actually be used for rankings - namely the actual crawls by the search engine crawlers themselves. I'd be looking right to the source for that info if you're concerned there's an issue, rather than trusting just Mozbot. You can find the SE crawlers data in Google Search Console and Bing Webmaster Tools. Look for trends and patterns there, especially around the sitemap report.
The challenge to a Screaming Frog-rendered sitemap is that it can only find what's linked. If the site has orphaned pages or an ineffective internal linking scheme, a crawl could easily miss pages. It's certainly better than no sitemap, but a map generated by the site's technology itself (usually the database) is safer.
P.
-
Thanks Paul,
Yes there has been a big clean up of pages. There were over 80,000 to begin with. I managed to get that down to about 14k but then last month MOZ bot only crawled about 4,000 pages.
I was just a bit worried that the sitemap generated by Screaming Frog was incorrect and therefore that was the reason for the drop.
I was referring mainly to the MOZ site crawl. I guess I was worried that the MOZ bot only followed the sitemap!
There were loads of filter URL's and all sorts going on so it's a bit of a spiders web!
-
No - submitting a sitemap won't reduce the crawl of a site. The search engines will crawl the sitemap and add these pages to the index if they consider them worthy. But they'll still also crawl any other links/pages they can find in other ways and index those as well if they consider them worthy.
Note though - having the number of indexed pages drop is not necessarily a bad thing. If removing a large number of worthless/duplicate/canonicalised/no-indexed pages cleans up the site, that will also be reflected in fewer crawled pages - an indication that quality improvement work was effective.
That help?
Paul
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Sitemaps and Indexed Pages
Hi guys, I created an XML sitemap and submitted it for my client last month. Now the developer of the site has also been messing around with a few things. I've noticed on my Moz site crawl that indexed pages have dropped significantly. Before I put my foot in it, I need to figure out if submitting the sitemap has caused this.. can a sitemap reduce the pages indexed? Thanks David. TInSM
API | | Slumberjac0 -
/index.php causing a few issues
Hey Mozzers, Our site uses magento. Pages within the site (not categories or products) are set to display as www.domain.co.uk/page-url/ The hta access is set to redirect all version such as www.domain.co.uk/page-url to a url ending in a / However in google analytics and in moz landing page tracker these urls are being represented by www.domain.co.uk/page-url/index.php When visiting www.domain.co.uk/page-url/index.php a 404 is displayed. I know that by default when directed to a directory it automatically finds and displays the index file. So i understand why this is happening to some degree. However, when manually visiting this link does not exist. This poses a problem when trying to view the landing pages information in moz pro. I have 20 keywords being tracked in relation to www.domain.co.uk/page-url/ but because moz is recording it as www.domain.co.uk/page-url/index.php the keywords are unrelated so not showing information in relation to the page. Any ideas?
API | | ATP0 -
Moz Crawl: Can't check page optimization error https
Help needed, when I try to do a page optimization check i get the following error : The URL you entered does not appear to be returning a page successfully. Please make sure that you've entered the URL of valid, working page. But i can do a site crawl, what should be the problem? Checked with frog seo spider and add no problem, robots.txt its also clean. Anyone knows what can be wrong? Thanks
API | | Luis-Pereira0 -
March 2nd Mozscape Index Update is Live!
We are excited to announce that our March 2<sup>nd</sup> Index Update is complete and it is looking great! We grew the number of subdomains and root domains indexed, and our correlations are looking solid across the board. Run, don’t walk, to your nearest computer and check out the sweet new data! Here is a look at the finer details: 141,626,596,068 (141 billion) URLs 1,685,594,701 (1 billion) subdomains 193,444,117 (193 million) root domains 1,124,641,982,250 (1.1 Trillion) links Followed vs nofollowed links 3.09% of all links found were nofollowed 62.41% of nofollowed links are internal 37.59% are external Rel canonical: 27.46% of all pages employ the rel=canonical tag The average page has 92 links on it 74 internal links on average 18 external links on average Thanks again! PS - For any questions about DA/PA fluctuations (or non-fluctuations) check out this Q&A thread from Rand:https://mza.seotoolninja.com/community/q/da-pa-fluctuations-how-to-interpret-apply-understand-these-ml-based-scores
API | | IanWatson7 -
Mozscape Index
Hello: There was a Mozscape Index scheduled 9/8/2015 and now it go pushed back October 8,2015. There seems to be a lot of delays with the Mozscape Index. Is this something we should expect? Updates every 2 months instead of every month? Thanks!
API | | sderuyter1 -
Have Questions about the Jan. 27th Mozscape Index Update? Get Answers Here!
Howdy y'all. I wanted to give a brief update (not quite worthy of a blog post, but more than would fit in a tweet) about the latest Mozscape index update. On January 27th, we released our largest web index ever, with 285 Billion unique URLs, and 1.25 Trillion links. Our previous index was also a record at 217 Billion pages, but this one is another 30% bigger. That's all good news - it means more links that you're seeking are likely to be in this index, and link counts, on average, will go up. There are two oddities about this index, however, that I should share: The first is that we broke one particular view of data - 301'ing links sorted by Page Authority doesn't work in this index, so we've defaulted to sorting 301s by Domain Authority. That should be fixed in the next index, and from our analytics, doesn't appear to be a hugely popular view, so it shouldn't affect many folks (you can always export to CSV and re-sort by PA in Excel if you need, too - note that if you have more than 10K links, OSE will only export the first 10K, so if you need more data, check out the API). The second is that we crawled a massively more diverse set of root domains than ever before. Whereas our previous index topped out at 192 million root domains, this latest one has 362 million (almost 1.9X as many unique, new domains we haven't crawled before). This means that DA and PA scores may fluctuate more than usual, as link diversity are big parts of those calculations and we've crawled a much larger swath of the deep, dark corners of the web (and non-US/non-.com domains, too). It also means that, for many of the big, more important sites on the web, we are crawling a little less deeply than we have in the past (the index grew by ~31% while the root domains grew by ~88%). Often, those deep pages on large sites do more internal than external linking, so this might not have a big impact, but it could depend on your field/niche and where your links come from. As always, my best suggestion is to make sure to compare your link data against your competition - that's a great way to see how relative changes are occurring and whether, generally speaking, you're losing or gaining ground in your field. If you have specific questions, feel free to leave them and I'll do my best to answer in a timely fashion. Thanks much! p.s. You can always find information about our index updates here.
API | | randfish8 -
API - Internal Links to page and related metrics
Hi dear moz Team! Currently I´m building a Java application accessing your API. But there are some metrics I urgently need which I can´t get out of the API until now: The total number of internal links to a page The total number of internal links to a page with partial anchor text match MozRank passed by all internal links w. part. match anchor text (would be nice) For example, if I try this by your links endpoint, my idea was: http://lsapi.seomoz.com/linkscape/links/http%3A%2F%2Fwww.jetztspielen.de%2F?AccessID=..
API | | pollierer
&Expires=..
&Signature=..
&Scope=domain_to_page
&Filter=internal
&Sort=domain_authority
&SourceCols=4 (or any other value)
&SourceDomain=www.jetztspielen.de
&Offset=0
&Limit=50 If I try this, the API says: {"status": "400", "error_message": "Cannot set a source domain when filtering for internal links."} Is there any way to get the data I need by your API endpoints? I´m currently writing my master thesis and it is very important to me to solve this somehow. Thank you very much in advance! Best, Andreas Pollierer1 -
Suggestion - Should OSE include "citation links" within its index?
This is really a suggestion (and debate to see if people agree with me), with regard to including "citation links" within Moz tools, by default, as just another type of link NOTE: when I am talking about "citation links" I am talking about a link that is not wrapped in a link tag and is therefore non clickable, eg moz.com Obviously Moz have released the mentions tool, which is great, and also FWE which is also great. However, it would seem to me that they are missing a trick in that "citation links" don't feature in the main link index at all. We know that Google as a minimum uses them as an indicator to crawl a page ( http://ignitevisibility.com/google-confirms-url-citations-can-help-pages-get-indexed/ ), and also that they don't pass page rank - HOWEVER, you would assume that google does use then as part of their alogrithm in some manner as they do nofollow links. It would seem to me that a "Citation Link" could (possibly) be deemed more important than a no follow link in Googles alogrithm, as a "no follow" link is a clear indication by the site owner that they don't fully trust the link, but a citation link would neither indicate trust or non trust. So - my request is to get "citation links" into the main link index (and the Just Discovered index for that matter). Would others agree??
API | | James770