Moz Crawl shows over 100 times more pages than my site has?
-
The latest crawl stats are attached. My site has just over 300 pages?
Wondering what I have done wrong?
-
total pages is higher you are right Keri but still only 581
-
I believe this image looks at what's indexed that's a subset of your sitemap that you submitted. You may want to look at Google Index -> Index Status in GWT to see what it shows there.
-
latest Moz crawl
-
latest webmaster tools crawl
-
I will definetly be paying attention to those numbers Keri. Webmaster tools is showing the right number of pages (something over 300 with 90% of those indexed)
-
It's not going to be a penalty, but it'll be good to have a bit less of a load on your server (bots no longer crawling thousands of pages) and just have your real pages in the index.
Places to look for interesting changes in site metrics would be your organic traffic in analytics and taking a look at your Google Webmaster Tools account to see your impressions, pages crawled, etc.
-
Thanks Keri, I will update asap.
could you let me know how big an issue would this be? (When you have the time of course;))
-
You're welcome! I may have opened a can of worms, however. That sitemap is generated by an automated tool (based on the footer at the bottom), so somehow it's finding that page 28 as well.
You may also want to ask the developer if you should be indexing the categories in the blog archives. There are resources on Moz about the best way to set that up in Wordpress, but I don't have them at my fingertips at the moment (I have a snuggly baby sleeping on my lap instead that's slowing me down a tad).
To answer your next question, after you figure out where the page 28 is being linked from and cure that, yes, you can do a one-time crawl from Research Tools. It won't overwrite your campaign info, but you can at least see if Moz is seeing thousands of pages or just a few hundred to see if stuff was fixed. Again, happy to provide more detail if/when you need it (and others will likely jump in with help on the thread, too).
I'd love to also see a little update a few weeks down the line of any changes you've noticed on your site metrics after getting this fixed.
-
You rock:)
-
And I found it. The sitemap at http://www.nineclouds.ca/sitemap includes a page /28, which is where the crawlers are finding the non-existent pages.
-
If you look at http://www.nineclouds.ca/blog/page/23, you'll see that there's a double arrow in the pagination at the right that goes to page 24, even though the last page is page 21. Google somehow has found the pages greater than 21 (which I'm not sure how they found), and once they found one of those, they keep seeing the link there with the double arrows to go to another page. Same happened with Rogerbot. I'm not sure where the bad originating link is (what legit page on your site is linking to something over page 21), but that's the loop that's happening and causing a ton of pages to be indexed. Get rid of those, and you'll also get rid of most of your errors.
-
Not shy about that at all thanks Keri.
any help you can provide is greatly appreciated.
-
Hi Bill,
Using my admin powers, I took a peek at your account. I'm still trying to figure out where it's coming from, but you have thousands of empty pages of your blog indexed. I'll dig around a little more and see if I can figure out what's up.
If you're comfortable with sharing your URL here in a public forum, other people can come take a look too. Otherwise, I'm happy to send you a private message with part of what's up and give your developer a place to start looking.
-
Thanks Keri. I am the owner of the site not the programmer so I am looking up the terms you are using as I write this response. If I am using pagination is there a way for the moz not to allow for this? If I understand your question about the calendar correctly I do have one as part of my blog that dates each post? Can I get the bot to not recognize this calendar?
-
My first guess would be parameters or something are being crawled. Do you have pagination? Sorting ascending and descending? A calendar that's getting crawled through the year 2525?
Your next step would be to look into what those duplicate pages are and see if something is amiss that's generating a ton of URLs.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Any idea why this page is an absolute magnet for bots?
This page on our client's website seems to be an absolute magnet for bots, and it's skewing our Google Analytics stats: https://cbisonline.com/us/catholic-socially-responsible-esg-investing/proxy-voting/ We already filter out lots of bots in GA, primarily through a segment we created several years ago and continue to build upon, but plenty of spam traffic still manages to slip through – mostly to the page above. Last quarter, almost all of it came from two random cities in Europe, so we're going to filter out traffic from those places. (At least for now – not an ideal solution, I know.) But I'm really wondering what drives so many bots to that page in particular. Any insights would be greatly appreciated!
Reporting & Analytics | | matt-145670 -
Need help Taking my Site to the Next Level
Any, and I mean ANY suggestions would be Great and Welcome, Good and Bad. Have a Good site, with what I believe good content, and just stuck at the same level for over 6 months. Please and Thank you for taking a few minute of time out of your day on this. Joe https://www.surecretedesign.com/
Reporting & Analytics | | surecreteproucts0 -
SEO dealing with a CDN on a site.
This one is stumping me and I need some help. I have a client who's site is www.site.com and we have set them up a CDN through Max CDN at cdn.site.com which is basically a cname to the www.site.com site. The images in the GWT for www.site.com are de-indexing rapidly and the images on cdn.site.com are not indexing. In the Max CDN account I have the images from cdn.site.com sending a canonical header from www.site.com but that does not seem to help, they are all still de-indexing.
Reporting & Analytics | | LesleyPaone0 -
Apple.com showing in Google Analytics "My Top Active Pages" - Why?
Hi Mozzers This has more than likely been asked before but my searches have left me empty handed so I have asked on here ( sorry! ) Please see attached screenshot, do you have any idea why apple.com is showing in my top pages? I'm just curious more than anything as this isn't the first time I've noticed it. Anybody know? Happy Friday people, cheers Jamie CiTCA5b.png
Reporting & Analytics | | SanjidaKazi1 -
Why is Google Analytics showing index.php after every page URL?
Hi, My client's site has GA tracking code gathering correct data on the site, but the pages are listed in GA as having /index.php at the end of every URL, although this does not appear when you visit the site pages. Even if there is a redirect happening for site visitors, shouldn't GA be showing the pages as their redirect destination, i.e. the URL that visitors actually see? Could this discrepancy be adversely affecting my search performance? Example page: http://freshstarttax.com/innocent-spouse/ shows up in GA as http://freshstarttax.com/innocent-spouse/index.php thanks
Reporting & Analytics | | JMagary0 -
Multi-Site Analytics Dashboards?
Anyone have recommendations on a good multi-site analytics dashboard? I am managing roughly 20 sites right now, and am looking for a dashboard that provides basic info like # of visitors, search traffic, etc. for a couple dozen sites at a glance.
Reporting & Analytics | | TakeshiYoung0 -
Getting traffic for another site
Hi Everyone, Our website url/brand is very close to another website url/brand. We are non-competing entities. It appears as though this other company has begun a marketing program which has resulted in our traffic skyrocketing. However, it seems to have also resulted in our Pages/Visit and Visit Duration to decrease and our Bounce Rate to increase. Can anyone suggest how to deal with this type of scenario? Thanks,
Reporting & Analytics | | AC_Pro
Robert0 -
Segmenting traffic from referring sites in GA
Most of our traffic is from Referring sites, and in referring sites, job sites are sending most of the traffic. How can we segment traffic from job sites. There are about 40 such sites. We would like to receive a report which shows traffic excluding from these job sites.
Reporting & Analytics | | seoug_20050