Crawl Diagnostics - Crawling way more pages than my site has?
-
Hello all,
I'm fairly new here, more of a paid search guy dabbling in SEO on the side. I have a client that I have in SEOMoz and the Crawl Diagnostics report is showing 10,000+ pages crawled and I think the site has at most 800 pages (e-commerce site using freewebstore.org as the platform).
Any reasons this would be happening?
-
Ok - Here is an update. I found that it has a basketful of entries for each Category and I have a pretty good list of categories.
Attached is an image showing what is happening in one category. There is an entry for each sort option which I understand where this is coming from (Sort Name, Sort Price Ascending, Sort Price Descending) what i don't understand are all the "rw=1" entries. And why they stack up like they do.
Is this an issue? I am assuming it is because there seems to be no real reason for it.
-
Thanks to both of you. I will start to dig in to your suggested steps later today.
I just took this one and they really don't have anything set-up. I just got them set-up on Webmaster tools as well so not even sure if they had their site indexed before.
The Crawl Diagnostics doesn't show much duplicate content (60 pages?) but the Too Many On Page Links, Overly Dynamic URL, Duplicate Title, Long URL warnings are all showing 6000-10000 pages.
The site sells crystals, each item is unique and as I did my first review they don't really even have item descriptions written let alone page titles and meta-descriptions.
I am in analysis mode working up my comments in review and detailing an action plane to help them focus moving forward. I was just shocked by the 10,000 pages listed in one of the crawl warnings.
anyway, I'll dig into this info and let you know what I find. It's an adventure!
-
I'm guessing that as an ecommerce site you've got multiple ways to browse your content, by category / brand / special offers etc. The thing to watch out for is interesting URLs with categories or lots of parameters.As a result, chances are you've got a duplicate content problem.
As Nakul mentioned a good first step is to take a look at your crawl report or use one of the tools he mentioned to see if you've got the same content being indexed multiple times.
Once you've done that, check is to see how many of these pages being crawled are appearing in Google's index. Is Google doing a reasonable job identifying the right version? How many pages are there in the index. Are recently added products being discovered quickly?
The Site: operators will be your friend here and Dr Pete did a great article on ways you can use it.
http://www.seomoz.org/blog/25-killer-combos-for-googles-site-operator
Once you understand what is being crawled and what's making it to the index you need to decide what pages you really do want to be indexed and make sure that these become the canonical versions and block parts of your site using robots.txt. (But understand the problem and what you want to achieve before you start doing this.)
Hope this helps.
<object id="plugin0" style="position: absolute; z-index: 1000;" width="0" height="0" type="application/x-dgnria"><param name="tabId" value="ff-tab-10"> <param name="counter" value="138"></object>
-
You can download the entire crawl and see if there's actually that many pages. Or post the URL here.
You can also test using a crawling software tool like Xenu or Screaming Frog to test it.
You can also post/private message the link here and I can take a look.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Block Moz (or any other robot) from crawling pages with specific URLs
Hello! Moz reports that my site has around 380 duplicate page content. Most of them come from dynamic generated URLs that have some specific parameters. I have sorted this out for Google in webmaster tools (the new Google Search Console) by blocking the pages with these parameters. However, Moz is still reporting the same amount of duplicate content pages and, to stop it, I know I must use robots.txt. The trick is that, I don't want to block every page, but just the pages with specific parameters. I want to do this because among these 380 pages there are some other pages with no parameters (or different parameters) that I need to take care of. Basically, I need to clean this list to be able to use the feature properly in the future. I have read through Moz forums and found a few topics related to this, but there is no clear answer on how to block only pages with specific URLs. Therefore, I have done my research and come up with these lines for robots.txt: User-agent: dotbot
Moz Pro | | Blacktie
Disallow: /*numberOfStars=0 User-agent: rogerbot
Disallow: /*numberOfStars=0 My questions: 1. Are the above lines correct and would block Moz (dotbot and rogerbot) from crawling only pages that have numberOfStars=0 parameter in their URLs, leaving other pages intact? 2. Do I need to have an empty line between the two groups? (I mean between "Disallow: /*numberOfStars=0" and "User-agent: rogerbot")? (or does it even matter?) I think this would help many people as there is no clear answer on how to block crawling only pages with specific URLs. Moreover, this should be valid for any robot out there. Thank you for your help!0 -
How do I retrieve crawl and ranking data about a site from the past?
Hey. One of my main clients has asked to see the crawl data and rankings data for the past eight months. He wants to have tangible evidence of the effects of Penguin. I would like that info too. Is it possible to retrieve that information on a weekly crawl and ranking basis through SEO Moz and if so, how do you do it? I simply want to show a graph, timeline and brief explanation across several main keywords... Help me as you guys always do - You rock Best Ben
Moz Pro | | creativeguy0 -
Site Optimisation for ktichens!!
Hey Guys, I am about to begin to optimise this site www.stuartandersonkitchens.co.uk for some keywords such as edinburgh kitchens, quality kitchens edinburgh etc as he is based in edinburgh. I am also going to be doing some work behind the bones of the site optimisiing alt, meta data, micro data etc. Also I am thinking about creating a FB, Twitter, Youtube profile and possibly adding him to a few directories (DMOZ etc), some local & running a small adwords campaign using the freebie credits. I will be altering some of the content in order to have it more amicable to the engines. Is there any other advice anyone would like to give me, where I could improve in order to have him seen on the search engines. Be great to here your take on this. Thanks again, Craig
Moz Pro | | fenwaymedia0 -
Need to find all pages that link to list of pages/pdf's
I know I can do this in OSE page by page, but is there a way I can do this in a large batch? There are 200+ PDF's that I need to figure out what pages (if any) link to the PDF. I'd rather not do this page by page, but rather copy-paste the entire list of pages I'm looking for. Any tools you know of that can do this?
Moz Pro | | ryanwats0 -
Crawl Test produced only 1 page
Hi, I recently submitted a crawl for www.cirrato.com using SEOMoz Crawl Test Tool. I have a lot of pages, but the crawl result shows only 1 page, which is the front page and nothing else... Does anyone know what this could mean or what the problem is?
Moz Pro | | yusufcirrato0 -
Site Ranking Report
Hi guys, My site ranking report says that I've gone from being 1-20 for a variety of keywords in Google UK to not in the top 50. When I do a search myself I see that my site remains where it previously was (between 1-20). How reliable is the site ranking reporting on a weekly basis? Is it best to look at it monthly?
Moz Pro | | columbus0 -
On Page Analysis and Grading
I am new here and happy to be! My site is an ecommerce site with hundreds of products. I have set up campaigns to track specific products. For the on page analysis where SEOMOZ gives you a grade I have 2 urls showing. But 1 of the urls is getting an A, and 1 is getting a F. But they are the same url and obviously go to the same page. Any help would be appreciated!
Moz Pro | | Confections0 -
Why does Open Site Explorer show less inbound links than yahoo site Explorer?
Hello, We have a question regarding inbound link measurement. We used to measure our inbound links with yahoo site explorer. Now that it's been shut down we use opensiteexplorer.org. However, Open Site Explorer only shows a fraction of inbound links compared to yahoo site explorer. For our website www.theprintspace.co.uk yahoo site explorer measured approx. 14,000 inbound links, whereas open Site Explorer only counts approx. 3,000. This is more than 10,000 links less. For our other website www.theprintspace.de Open Site Explorer also shows 3000 links less than Yahoo. How can this be? Does Open Site Explorer count the links in a different way to Yahoo? Please explain. It would be great if you could help us with this. Thank you!
Moz Pro | | Waplington0