Moz Crawl shows over 100 times more pages than my site has?
-
The latest crawl stats are attached. My site has just over 300 pages?
Wondering what I have done wrong?
-
total pages is higher you are right Keri but still only 581
-
I believe this image looks at what's indexed that's a subset of your sitemap that you submitted. You may want to look at Google Index -> Index Status in GWT to see what it shows there.
-
latest Moz crawl
-
latest webmaster tools crawl
-
I will definetly be paying attention to those numbers Keri. Webmaster tools is showing the right number of pages (something over 300 with 90% of those indexed)
-
It's not going to be a penalty, but it'll be good to have a bit less of a load on your server (bots no longer crawling thousands of pages) and just have your real pages in the index.
Places to look for interesting changes in site metrics would be your organic traffic in analytics and taking a look at your Google Webmaster Tools account to see your impressions, pages crawled, etc.
-
Thanks Keri, I will update asap.
could you let me know how big an issue would this be? (When you have the time of course;))
-
You're welcome! I may have opened a can of worms, however. That sitemap is generated by an automated tool (based on the footer at the bottom), so somehow it's finding that page 28 as well.
You may also want to ask the developer if you should be indexing the categories in the blog archives. There are resources on Moz about the best way to set that up in Wordpress, but I don't have them at my fingertips at the moment (I have a snuggly baby sleeping on my lap instead that's slowing me down a tad).
To answer your next question, after you figure out where the page 28 is being linked from and cure that, yes, you can do a one-time crawl from Research Tools. It won't overwrite your campaign info, but you can at least see if Moz is seeing thousands of pages or just a few hundred to see if stuff was fixed. Again, happy to provide more detail if/when you need it (and others will likely jump in with help on the thread, too).
I'd love to also see a little update a few weeks down the line of any changes you've noticed on your site metrics after getting this fixed.
-
You rock:)
-
And I found it. The sitemap at http://www.nineclouds.ca/sitemap includes a page /28, which is where the crawlers are finding the non-existent pages.
-
If you look at http://www.nineclouds.ca/blog/page/23, you'll see that there's a double arrow in the pagination at the right that goes to page 24, even though the last page is page 21. Google somehow has found the pages greater than 21 (which I'm not sure how they found), and once they found one of those, they keep seeing the link there with the double arrows to go to another page. Same happened with Rogerbot. I'm not sure where the bad originating link is (what legit page on your site is linking to something over page 21), but that's the loop that's happening and causing a ton of pages to be indexed. Get rid of those, and you'll also get rid of most of your errors.
-
Not shy about that at all thanks Keri.
any help you can provide is greatly appreciated.
-
Hi Bill,
Using my admin powers, I took a peek at your account. I'm still trying to figure out where it's coming from, but you have thousands of empty pages of your blog indexed. I'll dig around a little more and see if I can figure out what's up.
If you're comfortable with sharing your URL here in a public forum, other people can come take a look too. Otherwise, I'm happy to send you a private message with part of what's up and give your developer a place to start looking.
-
Thanks Keri. I am the owner of the site not the programmer so I am looking up the terms you are using as I write this response. If I am using pagination is there a way for the moz not to allow for this? If I understand your question about the calendar correctly I do have one as part of my blog that dates each post? Can I get the bot to not recognize this calendar?
-
My first guess would be parameters or something are being crawled. Do you have pagination? Sorting ascending and descending? A calendar that's getting crawled through the year 2525?
Your next step would be to look into what those duplicate pages are and see if something is amiss that's generating a ton of URLs.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why would on-site search queries show up as referral traffic?
The site analytics have been set up for over a year and suddenly last month there was a huge spike in referral traffic (1100+ sessions). Upon further investigation, the majority of it was coming directly from internally, either as mysite.com or search.mysite.com and the landing pages from the referrals are all /search.html?query=* This was never an issue before so I'm trying to understand what could have changed. I'm following up with the client to find out if their dev team may have changed anything related to their search engine but I'm wondering if there may be another explanation. A few notes: previously mysite.com / search.mysite.com were not in the Referral Exclusion list. I've added them now but this was never an issue before. Thanks in advance!
Reporting & Analytics | | SEMnMs0 -
How can I redirect incoming links from an old version of my site ending in .ctlg and .ivnu?
My original site was published in 2001 using "version 2" software from Ivenue, the hosting company that I signed up with at that time. The site's structure was built in such a way that the primary category pages ended in the extension .ivnu. Product or item pages on the shopping cart side ended in the extension .ctlg. My site's name was and is [Lamplight Feather, Inc.](<a class="webkit-html-attribute-value webkit-html-external-link" href="http://www.tonyhill.net/" target="_blank">http://www.tonyhill.net/</a>). We built our business between 2001 and 2011 and by the last three years (2009 - 2011) of using their version two were averaging a million dollars per year in gross sales. We decided to "upgrade" to Ivenue's "version 3" in 2011 to take advantage of some more modern options and because their newer software created web pages ending in .html which we thought more desirable. We made the switch in late 2011. But it was a disaster. Traffic and sales dropped precipitously. For the past two years (2012-2013) our annual gross sales average dropped to $400,000. (Two other factors were involved beside losing the many incoming links and link juice we had built up over the years: Panda came in that fall and my little niche market (decorative feathers) was flooded with competitors.) However as I try to rebuild our traffic and business little by little, I am stumped as to how to redirect the many incoming links that went to our first site's .ivnu and .ctlg pages. I have constructed redirects for some of our current but changed .html pages like this and put them in the file cabinet and they work: For (example): http://www.tonyhill.net/feathers_c384589.html then But trying the same thing for (example) http://www.tonyhill.net/craftfeathers.ivnu still returns a 404. Is there something I am missing. Ivenue is useless in this matter by the way. Their "technicians" are no help. I plan to be migrating my site once again to a new hosting company and hope to solve this problem before then. Thanks for the attention, Tony Hill This is an example from Google Webmaster of the type of links that show up as 404's that I would like to redirect: | URL: | http://www.tonyhill.net/productCat96521.ctlg | | | Error details | Linked from | | <colgroup><col></colgroup>
Reporting & Analytics | | featherman
| http://www.tonyhill.net/productCat43986.ctlg |
| http://forum.muppetcentral.com/showthread.php?t=21416&page=2 |
| http://www.cosplay.com/showthread.php?p=3832751 |
| http://forum.muppetcentral.com/showthread.php?t=21416&page=2&highlight=fur |
| http://www.muppetcentral.com/forum/threads/puppeteers-resources-links.19330/page-2 |
| http://www.muppetcentral.com/forum/threads/how-do-you-like-my-puppets.18549/page-2 | | | | |0 -
Google Analytics - Next Page Path is the Same URL?
Hey Everyone, I have a Google analytics question. I'm looking through a client's site and when I look at the next page path, I get the same URL as the next path. For example, on the homepage, the next page path I get is the homepage again? This happens for all URL's, is this an implementation error? Is there a way to fix this? Thanks!
Reporting & Analytics | | EvansHunt0 -
How to get a specific keyword count from a particular country to a particular page in google analytics
We are trying to get the keyword count of a particular keyword from a particular country to a particular page. eg: (keyword)Green shoes from (country)United states on (particular page)one of our blog posts page Any help will be really appreciated
Reporting & Analytics | | Nobody15870501745820 -
How do you handle a "too many inbound links" notice on your crawl diagnostics report?
How do you handle a "too many inbound links" notice on your crawl diagnostics report? The url's identified come from our blog....is it reading all the blog post URL's and giving me a notice? Is this ok to leave all the links? Or would anyone recommend no follow tags on all of our blog posts?
Reporting & Analytics | | cschwartzel0 -
Why is GG analytics showing so much (not provided)?
is there a better analytics tool (preferably free) that will show the actual keywords that drive traffic to a website or are we doomed to deal with (not provided) as a result of the Big G's retarded privacy policy? - seems they cant make up their minds as to whether they are going to take care of site owners or advertisers - but they should really take a stand and stop putting out half-ass results if anyone has any helpful ideas that may ease my pain i would be forever in your debt.
Reporting & Analytics | | Ezpro90 -
Duplicate Page Title
I'm new to SEO and have just signed up to SEOMOZ to see what I can learn. I got the report back on my site and it indicates various errors, one of them being Duplicate Page Title - I have a blog on my site and a lot of pages identified as with duplicates are like this: http://www.martinspencephotography.co.uk/blog?page=2 Is it important I rectify this? Do I need to rectify it?
Reporting & Analytics | | MartinSpence460 -
SEOMoz & Google Webmaster Tools crawl error conflicting info
Site im working on has zero crawl errors according to SEOMoz (it did previously have lots since ironed out) but now looking at GWebmaster Tools saying 5000 errors. Date of those are not that recent but Webmaster Tools line graph of errors still showing aprox 5000 up to yesterday There is an option to bulk action/tick them all as fixed so thinking/hoping GWT just keeping a historical record that can now be deleted since no longer applicable. However i'm not confident this is the case since still showing on the line graph. Any ideas re this anomalous info (can i delete and forget in GWT) ? Also side question I take it its not possible to link a GA property with a GWT account if created with different logins/accounts ? Many Thanks Dan
Reporting & Analytics | | Dan-Lawrence0