Page not being indexed or crawled and no idea why!
-
Hi everyone,
There are a few pages on our website that aren't being indexed right now on Google and I'm not quite sure why. A little background:
We are an IT training and management training company and we have locations/classrooms around the US. To better our search rankings and overall visibility, we made some changes to the on page content, URL structure, etc. Let's take our Washington DC location for example. The old address was:
http://www2.learningtree.com/htfu/location.aspx?id=uswd44
And the new one is:
http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training
All of the SEO changes aren't live yet, so just bear with me. My question really regards why the first URL is still being indexed and crawled and showing fine in the search results and the second one (which we want to show) is not. Changes have been live for around a month now - plenty of time to at least be indexed.
In fact, we don't want the first URL to be showing anymore, we'd like the second URL type to be showing across the board. Also, when I type into Google site:http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training I'm getting a message that Google can't read the page because of the robots.txt file. But, we have no robots.txt file. I've been told by our web guys that the two pages are exactly the same. I was also told that we've put in an order to have all those old links 301 redirected to the new ones. But still, I'm perplexed as to why these pages are not being indexed or crawled - even manually submitted it into Webmaster tools.
So, why is Google still recognizing the old URLs and why are they still showing in the index/search results?
And, why is Google saying "A description for this result is not available because of this site's robots.txt"
Thanks in advance!
- Pedram
-
Hi Mike,
Thanks for the reply. I'm out of the country right now, so reply might be somewhat slow.
Yes, we have links to the pages on our sitemaps and I have done fetch requests. I did a check now and it seems that the niched "New York" page is being crawled now. Might have been a time issue as you suggested. But, our DC page still isn't being crawled. I'll check up on it periodically and see the progress. I really appreciate your suggestions - it's already helping. Thank you!
-
It possibly just hasn't been long enough for the spiders to re-crawl everything yet. Have you done a fetch request in Webmaster Tools for the page and/or site to see if you can jumpstart things a little? Its also possible that the spiders haven't found a path to it yet. Do you have enough (or any) pages linking into that second page that isn't being indexed yet?
-
Hi Mike,
As a follow up, I forwarded your suggestions to our Webmasters. The adjusted the robots.txt and now reads this, which I think still might cause issues and am not 100% sure why this is:
User-agent: * Allow: /htfu/ Disallow: /htfu/app_data/ Disallow: /htfu/bin/ Disallow: /htfu/PrecompiledApp.config Disallow: /htfu/web.config Disallow: / Now, this page is being indexed: http://www2.learningtree.com/htfu/uswd74/alexandria/it-and-management-training But, a more niched page still isn't being indexed: http://www2.learningtree.com/htfu/usny27/new-york/sharepoint-training Suggestions?
-
The pages in question don't have any Meta Robots Tags on them. So once the Disallow in Robots.txt is gone and you do a fetch request in Webmaster Tools, the page should get crawled and indexed fine. If you don't have a Meta Robots Tag, the spiders consider it Index,Follow. Personally I prefer to include the index, follow tag anyway even if it isn't 100% necessary.
-
Thanks, Mike. That was incredibly helpful. See, I did click the link on the SERP when I did the "site" search on Google, but I was thinking it was a mistake. Are you able to see the disallow robot on the source code?
-
Your Robots.txt (which can be found at http://www2.learningtree.com/robots.txt) does in fact have Disallow: /htfu/ which would be blocking http://www2.learningtree.com**/htfu/**uswd44/reston/it-and-management-training from being crawled. While your old page is also technically blocked, it has been around longer and would already have been cached so will still appear in the SERPs.... the bots just won't be able to see changes made to it because they can't crawl it.
You need to fix the disallow so the bots can crawl your site correctly and you should 301 your old page to the new one.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz was unable to crawl your site? Redirect Loop issue
Moz was unable to crawl your site on Jul 25, 2017. I am getting this message for my site: It says "unable to access your homepage due to a redirect loop. https://kuzyklaw.com/ Site is working fine and last crawled on 22nd July. I am not sure why this issue is coming. When I checked the website in Chrome extension it saysThe server has previously indicated this domain should always be accessed via HTTPS (HSTS Protocol). Chrome has cached this internally, and did not connect to any server for this redirect. Chrome reports this redirect as a "307 Internal Redirect" however this probably would have been a "301 Permanent redirect" originally. You can verify this by clearing your browser cache and visiting the original URL again. Not sure if this is actual issue, This is migrated on Https just 5 days ago so may be it will resolved automatically. Not sure, can anybody from Moz team help me with this?
White Hat / Black Hat SEO | | CustomCreatives0 -
Competitor Drops 10,000 links since last index. Lets play detective.
One of the intriguing things about SEO is being able to reverse engineer your competitors rankings because all the technical information is available for those who know where to look. I recently looked at my Dashboard and saw that one of my competitors had dropped 10,000 links. The questions is why? Google algorthm change? Blackhat Penalty? Something else.? Here are the numbers, I am going to lieave my own clients site out because his numbers are pathetic. www.Leafly(dot)com 50.4k Links Down 10k www.thcfinder(dot)com 1,530 links Down 71 www.weedmaps(dot)com 64,000k links Up 1.5K Is it just me or is that a lot of links to loose over one indexing period?
White Hat / Black Hat SEO | | DavidMeshah0 -
Doorway v Landing Pages - Whats the difference?
I suppose I have not much further information to add apart from ask apart from what is the difference? Both are highly optimized pages but that's where my knowledge ends!
White Hat / Black Hat SEO | | loudawg0 -
Glossary pages - keyword stuffing danger?
I've put together a glossary of terms related to my industry that have SEO value and am planning on building out a section on our site with unique pages for each term. However, most of these terms have synonyms or are highly similar to other valuable terms. If I were to make a glossary, and on each page (that will have high-quality, valuable, and accurate definitions and more), wrote something like "{term}, also commonly referred to as {synonym}, {synonym}," would I run the risk of keyword stuffing penalties? My only other idea beyond creating a glossary with separate pages defining each synonym is to use schema.org markup to add synonyms to the HTML of the page, but that could be seen as even more grey-hat type keyword stuffing. I guess one other option would be to work the synonyms into the definition so that the presence of the keyword reads more organically. Thanks!
White Hat / Black Hat SEO | | alecfwilson0 -
Is Google not Penalizing aggressively anymore for on page manipulation?
I wanted to throw this out where we have been seeing so much emphasis on Google cracking down on bad linking, have they let up enforcement on manipulative on-page tactics that have faded in current years? I've been seeing hidden text popping up again and ranking. Here is an example. Google "landscaping Portsmouth NH" and find the #1 result. Now find "Portsmouth" on the page. So what I find interesting, the site has a clean backilnk profile, but that's a pretty blatant manipulation hiding those keywords. What I find interesting is I filled out a report on it a year ago. (I'm not a big "fill out spam report" guy, I was curious if Google would take action). A year later it is still #1 for the competitive keyword. So I'm curious if others have seemed similar trends like font-size:0px, or text color as the background popping back up and ranking. I would love other's thoughts on it.
White Hat / Black Hat SEO | | BCutrer0 -
Local Doorway Pages
Based on what I've read, setting up localized landing pages ie: /web-design-atlanta, web-design-nyc, /web-design-chicago, etc especially with duplicate content is a big no-no. Remarkably, 2 of our competitors are doing it, (they are just swapping out the locations), and it's working. They don't even have office addresses or local phone numbers listed. They are on the first page for multiple location based searches ("web design nyc", "web design atlanta", etc.). I thought Google penalized for this, or at least didn't index the content. What gives? Am I misinterpreting Google's AUP? Can I report them? If it's legal, we should be doing it as well.
White Hat / Black Hat SEO | | CsmBill0 -
Link Removal and Disavow - Is Page Rank a sign directory is okay with Google
Hi, Currently cleaning up a clients link profile in preparation for disavow file and I have reached the stage where I am undecided on some directories as I don't want to remove all links. Is Page Rank an indication that Google is okay with a particular directory? For example the following domain is questionable, but has a PR of 3. Do I need to consider scrapping all such links in anticipation of future updates? http://www.easyfinddirectory.com/shopping-and-services/clothing http://www.toplocallistings.co.uk/Apparel/West_Midlands/Shropshire/ Thanks in advance Andy
White Hat / Black Hat SEO | | MarzVentures0 -
Can a Page Title be all UPPER CASE?
My clients wants to use UPPER CASE for all his page titles. Is this okay? Does Google react badly to this?
White Hat / Black Hat SEO | | petewinter0