Crawl Diagnostics 2261 Issues with Our Blog
-
I just recently signed up for MOZ, so much information. I've done the walk through and will continue learning how to us the tools. But I need your help.
Our first moz crawl indicated 2261 issues (447 404's, 803 duplicate content, 11 502's, etc). I've reviewed all of the crawls issues and they are linked to our Yahoo hosted WordPress blog. Our blog is over 9 years old. The only issue that I'm able to find is our categories are not set up correctly. I've searched for WordPress assistance on this topic and cant find any issues with our current category set up. Every category link that I click returns Nothing Found Apologies, but no results were found for the requested archive. Perhaps searching will help find a related post.
http://site.labellaflorachildrensboutique.com/blog/
Any assistance is greatly appreciated.
-
Go Dan!
-
While what Matt and CleverPHD (Hi Paul!) have said is correct - here's your specific issue:
Your categories are loading with "ugly" permalinks like this: http://site.labellaflorachildrensboutique.com/blog/?cat=175 (that loads fine)
But you are linking to them from the bottom of posts with the "clean" URLs --> http://screencast.com/t/RIOtqVCrs
The fix is that Catgory URLs need to load with "clean" URLs and the ugly one should redirect to the clean one.
Possible fixes:
- Try updating wordpress (I see you're on a slightly older version)
- See if you .htaccess file has been modified (ask a developer or your hosting for help with this perhaps)
Found another linking issue:
This link to Facebook in your left sidebar --> http://screencast.com/t/EqltiBpM it's just coded incorrectly. It adds the current page URL so you get a link like this http://site.labellaflorachildrensboutique.com/blog/category/unique-baby-girl-gifts/www.facebook.com/LaBellaFloraChildrensBoutique instead of your Facebook page: http://www.facebook.com/LaBellaFloraChildrensBoutique
You can fix that Facebook link probably in Appearance->Widgets.
That one issue is causes about 200 of your broken URLs
-
One other thing I forgot. This video by Matt Cutts
It explains why Google might show a link even though the page was blocked by robots.txt
https://www.youtube.com/watch?v=KBdEwpRQRD0
Google really tries not to forget URLs and this video reminds us that Google uses links not just for ranking, but discovery so you really have to pay attention to how you link internally. This is especially important for large sites.
-
Awesome! Thanks for straightening it out.
-
Yes, the crawler will avoid the category pages if they are in robots.txt. It sounded like from the question that this person was going to remove or change the category organization and so you would have to do something with the old URLs (301 or noindex) and that is why I would not use robots.txt in this case so that those directives can be seen.
If these category pages had always been blocked using robots.txt, then this whole conversation is moo as the pages never got in the index. It is when unwanted pages get in the index that you potentially want to get rid of that things get a little tricky, but workable.
I have seen issues where there are pages on sites that got into the index and ranking but they were the wrong pages and so the person just blocked with robots.txt. Those URLs continued to rank and cause problems with the canonical pages that should be ranking. We had to unblock, let Google see the 301, rank the new pages then put the old URLs back into robots to prevent the old URLs from getting back into the index.
Cheers!
-
Oh yeah, that's a great point! I've found that the category pages rarely rank directly, but you'll definitely want to double-check before outright blocking crawlers.
Just to check my own understanding, CleverPhD, wouldn't crawlers avoid the category pages if they were disallowed by robots.txt (presuming they obey robots.txt), even if the links were still on the site?
-
One wrinkle. If the category pages are in Google and potentially ranking well - you may want to 301 them to consolidate them into a more appropriate page (if this makes sense) or if you want to get them out of the index, use a meta noindex robots tag on the page(s) to have them removed from the index, then block them in robots.txt.
Likewise, you have to remove the links on the site that are pointing to the category pages to prevent Google from recrawling and reindexing etc.
-
Category pages actually turn up as duplicate content in Crawl Diagnostics _really _often. It just means that those categories are linked somewhere on your site, and the resulting category pages look almost exactly like all the others.
Generally, I recommend you use robots.txt to block crawlers from accessing pages in the category directory. Once that's done and your campaign has re-crawled your site, then you can see how much of the problem was resolved by that one change, and consider what to do to take care of the rest.
Does that make sense?
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Complex Rankings Issue For A Law Firm Site
Be warned, this is a complex issue that I have and will require someone who has some advanced knowledge about 301s and link penalty’s. I have a law firm client whose site is having some issues. There are some very complex details here so I'm going to articulate them in bullet points in hopes of making the issues easy to understand. So here's my root problem: We have poor organic rankings (4th, 5th, 6th page for most terms) despite Domain Authority of 32 (avg. 1st page competitor is 28) and some very strong white hat link building the last 60 days or so. How's their backlink profile look, you ask? When you look at their backlink profile in OSE, their spam score is a 1/17 (not sure if that's credible in any way). Lot's of links that score 5's on the spam score make up about 10% of their OSE links. Here’s where it gets tricky; those links are not directed the client's New URL, they are links that go to some old URLs the client used to have, for which they had an SEO guy who built all those crappy links. Those URLs with the crappy links (we'll call them The Crappy URLs) were 301'd (can we all agree 301'd is a verb?) to the NEW URL for just a couple of months. Shortly after that, NEW URL dropped almost completely out of Google, so the client turned off the 301s. So despite those 301s being turned off, OSE still shows all the links going to The Crappy URLs but is giving The New URL credit for them. Keep in mind, the 301s were turned off about 6 months ago so it’s a little strange that OSE still shows those 301s. This has led me to the conclusion that the Domain Authority that OSE shows of 32, is not a “real” number since it is seemingly based off links inherited from 301s that no longer exist. So now I’m trying to create an action plan for this client that will hopefully help us start to make some real progress in our rankings. This client does not have the budget to wait another 6 months for some sign of hope so time is of the essence. Here’s my theoretical action plans I’m choosing from and would like the communities input on which, if any, they feel is best (Also, if I’m missing something or you have an idea, I’m all ears): **Potential Action Plans: ** Do nothing, keep building quality links, creating quality content, monitor crawl reports/gwt for issues. That strategy is going to win long term. #1 + Create one page sites on The Crappy URLs, setup GWT for them, submit sitemaps thus forcing Google, OSE and other web crawlers to index them, thus removing any potential residual penalties from the 301s. NOTE: Currently The Crappy URLS are just landing on GoDaddy’s default landing page which is of course not being indexed by Google or OSE. #2 + Disavow all the bad links going to The Crappy URLS. Then once the bad links no longer appear in the OSE profile for each of The Crappy Sites, 301 them again, thus inheriting the good links but not the bad. #1 + 301 the Crappy URLS back to the New URL, while also disavow any links going to The Crappy URLs. The logic here is that if the road back to recovery is going to be a few months away no matter what, when the 301 knocked them back 6 months ago no reputable link building was being done. I am cautiously optimistic the linkbuilding we are doing will eventually off set any penalty’s coming from the 301s. Plus now we’ll know the 32 Domain Authority OSE is giving us is real. This is the one I’m leaning towards quite frankly because I think it will reduce the recovery time and we’ll know somewhat quickly (30-60 days) if it’s actually working. 1-3 could each take 90 days before we know if it’s working. So please, if you have any expertise with any of this, your help or advice would be appreciated. I’d rather not share The New URL for obvious reasons but if you must know, simply message me and as long as you’re legit, I’ll share it with you.
Moz Pro | | BrianJGomez0 -
If links have been disavowed, do they still show in crawl reports?
I have a new client who says they have disavowed all their bad links, but I still see a bunch of spammy backlinks in my external links report. I understand that disavow does not mean links are actually removed so will they continue to show in Google Webmaster Tools and in my Moz reports? If so, how do I know which ones have been disavowed and which have not? Regards, Dino
Moz Pro | | Dino640 -
Issue with Data Updates on Campaign
I'm reviewing a campaign I am running for a client, the Keywords data has not updated yet even though it's supposed to update every monday. The competitive analysis shows a date of August 14, which is odd becuase I'm on a trial account?
Moz Pro | | amerihope0 -
Should I be running my crawl on our www address or our non-www address?
I currently run our crawl on oursitename.com, but am wondering if it should be run on www.oursitename.com instead.
Moz Pro | | THMCC0 -
How effective is Crawl DIagnostics in determining crawlibility?
Is Seomoz crawl diagnostics useful for determining what pages Google has a hard time indexing. One of the problems with my site is that it uses JS and Flash and I know Google isnt too keen on that. Can Crawl Diagnostics accurately tell me if there is too much of something and therefore Google is having a hard time crawling? I want to be able to know if JS or Flash is hurting any of my pages in any way. I provide good content and I want to make sure Google can pick it up.....Is this too much to ask? Is there anything out there for this?
Moz Pro | | waltergah0 -
Slowing down SEOmoz Crawl Rate
Is there a way to slow down SEOmoz crawl rate? My site is pretty huge and I'm getting 10k pages crawled every week, which is great. However I sometimes get multiple page requests in one second which slows down my site a bit. If this feature exists I couldn't find it, if it doesn't, it's a great idea to have, in a similar way to how Googlebot do it. Thanks.
Moz Pro | | corwin0 -
Crawl reports urls with duplicate content but its not the case
Hi guys!
Moz Pro | | MakMour
Some hours ago I received my crawl report. I noticed several records with urls with duplicate content so I went to open those urls one by one.
Not one of those urls were really with duplicate content but I have a concern because website is about product showcase and many articles are just images with href behind them. Many of those articles are using the same images so maybe thats why the seomoz crawler duplicate content flag is raised. I wonder if Google has problem with that too. See for yourself how it looks like: http://by.vg/NJ97y
http://by.vg/BQypE Those two url's are flagged as duplicates...please mind the language(Greek) and try to focus on the urls and content. ps: my example is simplified just for the purpose of my question. <colgroup><col width="3436"></colgroup>
| URLs with Duplicate Page Content (up to 5) |0 -
Crawl slow again
Once again the weekly crawl on my site is very slow. I have around 441 pages in the crawl and this has been running for over 12 hours. This last happened two weeks ago (ran for over 48 hours). Last week's crawl was much quicker (not sure exactly how long but guessing an hour or so). Is this a known issue and is there anything that can be done to unblock it? Weekends are the best time for me to assess and respond to changes I have made to my site so having this (small) crawl take most of the weekend is really quite problematic. Thanks. Mark
Moz Pro | | MarkWill0