Page not being indexed or crawled and no idea why!
-
Hi everyone,
There are a few pages on our website that aren't being indexed right now on Google and I'm not quite sure why. A little background:
We are an IT training and management training company and we have locations/classrooms around the US. To better our search rankings and overall visibility, we made some changes to the on page content, URL structure, etc. Let's take our Washington DC location for example. The old address was:
http://www2.learningtree.com/htfu/location.aspx?id=uswd44
And the new one is:
http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training
All of the SEO changes aren't live yet, so just bear with me. My question really regards why the first URL is still being indexed and crawled and showing fine in the search results and the second one (which we want to show) is not. Changes have been live for around a month now - plenty of time to at least be indexed.
In fact, we don't want the first URL to be showing anymore, we'd like the second URL type to be showing across the board. Also, when I type into Google site:http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training I'm getting a message that Google can't read the page because of the robots.txt file. But, we have no robots.txt file. I've been told by our web guys that the two pages are exactly the same. I was also told that we've put in an order to have all those old links 301 redirected to the new ones. But still, I'm perplexed as to why these pages are not being indexed or crawled - even manually submitted it into Webmaster tools.
So, why is Google still recognizing the old URLs and why are they still showing in the index/search results?
And, why is Google saying "A description for this result is not available because of this site's robots.txt"
Thanks in advance!
- Pedram
-
Hi Mike,
Thanks for the reply. I'm out of the country right now, so reply might be somewhat slow.
Yes, we have links to the pages on our sitemaps and I have done fetch requests. I did a check now and it seems that the niched "New York" page is being crawled now. Might have been a time issue as you suggested. But, our DC page still isn't being crawled. I'll check up on it periodically and see the progress. I really appreciate your suggestions - it's already helping. Thank you!
-
It possibly just hasn't been long enough for the spiders to re-crawl everything yet. Have you done a fetch request in Webmaster Tools for the page and/or site to see if you can jumpstart things a little? Its also possible that the spiders haven't found a path to it yet. Do you have enough (or any) pages linking into that second page that isn't being indexed yet?
-
Hi Mike,
As a follow up, I forwarded your suggestions to our Webmasters. The adjusted the robots.txt and now reads this, which I think still might cause issues and am not 100% sure why this is:
User-agent: * Allow: /htfu/ Disallow: /htfu/app_data/ Disallow: /htfu/bin/ Disallow: /htfu/PrecompiledApp.config Disallow: /htfu/web.config Disallow: / Now, this page is being indexed: http://www2.learningtree.com/htfu/uswd74/alexandria/it-and-management-training But, a more niched page still isn't being indexed: http://www2.learningtree.com/htfu/usny27/new-york/sharepoint-training Suggestions?
-
The pages in question don't have any Meta Robots Tags on them. So once the Disallow in Robots.txt is gone and you do a fetch request in Webmaster Tools, the page should get crawled and indexed fine. If you don't have a Meta Robots Tag, the spiders consider it Index,Follow. Personally I prefer to include the index, follow tag anyway even if it isn't 100% necessary.
-
Thanks, Mike. That was incredibly helpful. See, I did click the link on the SERP when I did the "site" search on Google, but I was thinking it was a mistake. Are you able to see the disallow robot on the source code?
-
Your Robots.txt (which can be found at http://www2.learningtree.com/robots.txt) does in fact have Disallow: /htfu/ which would be blocking http://www2.learningtree.com**/htfu/**uswd44/reston/it-and-management-training from being crawled. While your old page is also technically blocked, it has been around longer and would already have been cached so will still appear in the SERPs.... the bots just won't be able to see changes made to it because they can't crawl it.
You need to fix the disallow so the bots can crawl your site correctly and you should 301 your old page to the new one.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Do home page carry more seo benefit than other pages?
hi, i would like to include my kws in the URL and they are under 50 characters. is there anything in the algo that tells engines to give more importance to homepage?
White Hat / Black Hat SEO | | alan-shultis0 -
Canonical tags being direct to "page=all" pages for an Ecommerce website
I find it alarming that my client has canonical tags pointing to "page=all" product gallery pages. Some of these product gallery pages have over 100 products and I think this could effect load time, especially for mobile. I would like to get some insight from the community on this, thanks!
White Hat / Black Hat SEO | | JMSCC0 -
Why isn't a 301 redirect removing old style URLs from Google's index?
I have two questions:1 - We changed the URL structure of our site. Old URLs were in the format of kiwiforsale.com/used_fruit/yummy_kiwi. These URLs are 301 redirected to kiwiforsale.com/used-fruit/yummy-kiwi. We are getting duplicate content errors in Google Webmaster Tools. Why isn't the 301 redirect removing the old style URL out of Google's index?2 - I tried to remove the old style URL at https://www.google.com/webmasters/tools/removals, however I got the message that "We think the image or web page you're trying to remove hasn't been removed by the site owner. Before Google can remove it from our search results, the site owner needs to take down or update the content."Why are we getting this message? Doesn't the 301 redirect alert Google that the old style URL is toast and it's gone?
White Hat / Black Hat SEO | | CFSSEO0 -
Does Lazy Loading Create Indexing Issues of products?
I have store with 5000+ products in one category & i m using Lazy Loading . Does this effects in indexing these 5000 products. as google says they index or read max 1000 links on one page.
White Hat / Black Hat SEO | | innovatebizz0 -
Victim of Negative SEO - Can I Redirect the Attacked Page to an External Site?
My site has been a victim of Negative SEO. During the course of 3 weeks, I have received over 3000 new backlinks from 200 referring domains (based on Ahref report). All links are pointing to just 1 page (all other pages within the site are unaffected). I have already disavowed as many links as possible from Ahref report, but is that all I can do? What if I continue to receive bad backlinks? I'm thinking of permanently redirecting the affected page to an external website (a dummy site), and hope that all the juice from the bad backlinks will be transferred to that site. Do you think this would be a good practice? I don't care much about keeping the affected page on my site, but I want to make sure the bad backlinks don't affect the entire site. The bad backlinks started to come in around 3 weeks ago and the rankings haven't been affected yet. The backlinks are targeting one single keyword and are mostly comment backlinks and trackbacks. Would appreciate any suggestions 🙂 Howard
White Hat / Black Hat SEO | | howardd0 -
Should I redirect old pages
I have taken over SEO on a site. The old people built thousands of pages with duplicate content. I am converting site to wordpress and was wondering if I should take the time to 301 redirect all 10,000 or so pages with duplicate content. The 10,000 pages all have links back to different to different pages as well as to the homepage. Should I just let them go to a 404 page not found.
White Hat / Black Hat SEO | | Roots70 -
Link Building: Location-specific pages
Hi! I've technically been a member for a few years, but just recently decided to go Pro (and I gotta say, I'm glad I did!). Anyway, as I've been researching and analyzing, one thing I noticed a competitor is doing is creating location-specific pages. For example, they've created a page that has a URL similar to this: www.theirdomain.com/seattle-keyword-phrase They have a few of these for specific cities. They rank well for the city-keyword combo in most cases. Each city-specific page looks the same and the content is close to being the same except that they drop in the "seattle keyword phrase" bit here and there. I noticed that they link to these pages from their site map page, which, if I were to guess, is how SEs are getting to those pages. I've seen this done before on other sites outside my industry too. So my question is, is this good practice or is it something that should be avoided?
White Hat / Black Hat SEO | | AngieHerrera0