Page not being indexed or crawled and no idea why!
-
Hi everyone,
There are a few pages on our website that aren't being indexed right now on Google and I'm not quite sure why. A little background:
We are an IT training and management training company and we have locations/classrooms around the US. To better our search rankings and overall visibility, we made some changes to the on page content, URL structure, etc. Let's take our Washington DC location for example. The old address was:
http://www2.learningtree.com/htfu/location.aspx?id=uswd44
And the new one is:
http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training
All of the SEO changes aren't live yet, so just bear with me. My question really regards why the first URL is still being indexed and crawled and showing fine in the search results and the second one (which we want to show) is not. Changes have been live for around a month now - plenty of time to at least be indexed.
In fact, we don't want the first URL to be showing anymore, we'd like the second URL type to be showing across the board. Also, when I type into Google site:http://www2.learningtree.com/htfu/uswd44/reston/it-and-management-training I'm getting a message that Google can't read the page because of the robots.txt file. But, we have no robots.txt file. I've been told by our web guys that the two pages are exactly the same. I was also told that we've put in an order to have all those old links 301 redirected to the new ones. But still, I'm perplexed as to why these pages are not being indexed or crawled - even manually submitted it into Webmaster tools.
So, why is Google still recognizing the old URLs and why are they still showing in the index/search results?
And, why is Google saying "A description for this result is not available because of this site's robots.txt"
Thanks in advance!
- Pedram
-
Hi Mike,
Thanks for the reply. I'm out of the country right now, so reply might be somewhat slow.
Yes, we have links to the pages on our sitemaps and I have done fetch requests. I did a check now and it seems that the niched "New York" page is being crawled now. Might have been a time issue as you suggested. But, our DC page still isn't being crawled. I'll check up on it periodically and see the progress. I really appreciate your suggestions - it's already helping. Thank you!
-
It possibly just hasn't been long enough for the spiders to re-crawl everything yet. Have you done a fetch request in Webmaster Tools for the page and/or site to see if you can jumpstart things a little? Its also possible that the spiders haven't found a path to it yet. Do you have enough (or any) pages linking into that second page that isn't being indexed yet?
-
Hi Mike,
As a follow up, I forwarded your suggestions to our Webmasters. The adjusted the robots.txt and now reads this, which I think still might cause issues and am not 100% sure why this is:
User-agent: * Allow: /htfu/ Disallow: /htfu/app_data/ Disallow: /htfu/bin/ Disallow: /htfu/PrecompiledApp.config Disallow: /htfu/web.config Disallow: / Now, this page is being indexed: http://www2.learningtree.com/htfu/uswd74/alexandria/it-and-management-training But, a more niched page still isn't being indexed: http://www2.learningtree.com/htfu/usny27/new-york/sharepoint-training Suggestions?
-
The pages in question don't have any Meta Robots Tags on them. So once the Disallow in Robots.txt is gone and you do a fetch request in Webmaster Tools, the page should get crawled and indexed fine. If you don't have a Meta Robots Tag, the spiders consider it Index,Follow. Personally I prefer to include the index, follow tag anyway even if it isn't 100% necessary.
-
Thanks, Mike. That was incredibly helpful. See, I did click the link on the SERP when I did the "site" search on Google, but I was thinking it was a mistake. Are you able to see the disallow robot on the source code?
-
Your Robots.txt (which can be found at http://www2.learningtree.com/robots.txt) does in fact have Disallow: /htfu/ which would be blocking http://www2.learningtree.com**/htfu/**uswd44/reston/it-and-management-training from being crawled. While your old page is also technically blocked, it has been around longer and would already have been cached so will still appear in the SERPs.... the bots just won't be able to see changes made to it because they can't crawl it.
You need to fix the disallow so the bots can crawl your site correctly and you should 301 your old page to the new one.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moz spam score 16 for some pages - Never a manual penalty: Disavow needed?
Hi community, We have some top hierarchy pages with spam score 16 as per Moz due to the backlinks with very high spam score. I read that we could ignore as long as we are not employing paid links or never got a manual penalty. Still we wanna give a try by disavowing certain domains to check if this helps. Anyway we are not going to loose any backlink score by rejecting this low-quality backlinks. Can we proceed? Thanks
White Hat / Black Hat SEO | | vtmoz0 -
Hacked Websites (Doorways) Ranking First Page of Google
Hello Moz community! I could really use your help with some suggestions here with some recent changes I've noticed in the Google serps for terms I've been currently working on. Currently one of the projects I am working on is for an online pharmacy and noticed that the SERPs are being now taken up by hacked websites which look like doorways to 301 redirect to an online pharmacy the hacker wants the traffic to go to. Seems like they may be wordpress sites that are hacked and have unrelated content on their websites compared to online pharmacies. We've submitted these issues as spam to Google and within chrome as well but haven't heard back. When searching terms like "Canadian Pharmacy Viagra" and other similar terms we see this issue. Any other recommendations on how we can fix this issue? Thanks for your time and attached is a screenshot of the results we are seeing for one of our searches. 1Orus
White Hat / Black Hat SEO | | monarkg0 -
Google suddenly stops ranking a page for a "keyword" with same "keyword" in title tag. Low competition.
Hi all, We have released our next version of product called like "software 11", which have thousands of searches every month. So we have just added this same keyword "software 11" as page title suffix to one of the top ranking pages. Obviously this is the page has been added suddenly with "software 11" at page title, multiple header tags and 1 mention in paragraph. Google ranked it for 2 days and suddenly stopped showing this page in entire results for the same keyword we optimised the page for. Why does it happened? Does Google think that we are overdoing with this page and ignoring it? Thanks
White Hat / Black Hat SEO | | vtmoz0 -
Clean-up Question after a wordpress site Hack added pages with external links from a massive link wheel?
Hey All, Thought I would throw this out to ensure I am dotting my "i's" and crossing my "t's"..... Client WordPress site was hacked injected 3-4 pages that cross linked to hundreds (affiliate junk spam link wheel). Pages were removed, 3rd party cleared all malware/viruses. Heavy duty firewall and security monitoring are in place. Hacked pages are now showing as 404. No penalties, ranking issues....If anything there was a temporary BOOST in rankings due to the large link-wheel type net that the pages were receiving....That has since leveled out rankings. I guess my question is, in your opinion is it best to let those pages 404, I am noticing a large amount of links going to them from all over the world from this large link net that was built. I find the temptation to 301 re-direct deleted pages to the homepage difficult...lol..{the temptation is REAL}. Is there anything I am missing? Any other steps that YOU would take? I am assuming letting those pages 404 would be the best bet, as in time they will roll off index.... Thank you in advance, I appreciate any feedback or opinions....
White Hat / Black Hat SEO | | Anthony_Howard0 -
How does Google handle product detail page links hiden in a <noscript>tag?</noscript>
Hello, During my research of our website I uncovered that our visible links to our product detail pages (PDP) from grid/list view category-nav/search pages are <nofollowed>and being sent through a click tracking redirect with the (PDP) appended as a URL query string. But included with each PDP link is a <noscript>tag containing the actual PDP link. When I confronted our 3rd party e-commerce category-nav/search provider about this approach here is the response I recieved:</p> <p style="padding-left: 30px;">The purpose of these links is to firstly allow us to reliably log the click and then secondly redirect the visitor to the target PDP.<br /> In addition to the visible links there is also an "invisible link" inside the no script tag. The noscript tag prevents showing of the a tag by normal browsers but is found and executed by bots during crawling of the page.<br /> Here a link to a blog post where an SEO proved this year that the noscript tag is not ignored by bots: <a href="http://www.theseotailor.com.au/blog/hiding-keywords-noscript-seo-experiment/" target="_blank">http://www.theseotailor.com.au/blog/hiding-keywords-noscript-seo-experiment/<br /> </a> <br /> So the visible links are not obfuscating the PDP URL they have it encoded as it otherwise cannot be passed along as a URL query string. The plain PDP URL is part of the noscript tag ensuring discover-ability of PDPs by bots.</p> <p>Does anyone have anything in addition to this one blog post, to substantiate the claim that hiding our links in a <noscript> tag are in fact within the SEO Best Practice standards set by Google, Bing, etc...? </p> <p>Do you think that this method skirts the fine line of grey hat tactics? Will google/bing eventually penalize us for this?</p> <p>Does anyone have a better suggestion on how our 3rd party provider could track those clicks without using a URL redirect & hiding the actual PDP link?</p> <p>All insights are welcome...Thanks!</p> <p>Jordan K.</p></noscript></nofollowed>
White Hat / Black Hat SEO | | eImprovement-SEO0 -
How can I 100% safe get some of my keywords ranking on second & third page?
Hi, I want to know how can I rank some of my keywords which are in the second and third page on google on page one 100% save, so it will pass all penguin, pandas etc as quick as possible? Kind Regards
White Hat / Black Hat SEO | | rodica70 -
SEO for Career sites and sup-pages
For main job categories: We manage several career pages for several clients but the competition for the main keywords (even several long tail) is from big names like Indeed and similar job boards?
White Hat / Black Hat SEO | | rflores
What would you recommend? For job posts: Since the job posts that our clients post are short lived (80% live less than a month) would it still be incorrect to purchase backlinks? or is it always a big no Thanks for your help. And if a similar question has been asked I would appreciate if you could point me to it. I could not find one.0 -
HOW TO: City Targeted Landing Pages For Lead Generation
Hi guys, So one of my clients runs a web development agency in San Diego and for lead generation purposes we are thinking of creating him city targeted landing pages which will all be on different domains ie. lawebdesginstudio / sfwebdesigngurus I plan to register these 20-30 domains for my client and load them all up on a my single linux server I have from godaddy. I noticed however today using google's keyword tool that roughly only 5-10 cities have real traffic worth trying to capture to turn into leads. Therefore I am not sure if its even worth building those extra 20 landing pages since they will receive very little traffic. My only thought is, if I do decide to build all 30 landing pages, then I assume I will have a very strong private network of authority websites that I can use to point to the clients website. I mean I figure I can rank almost all of them page 1 top 5 within 2-3 months. My question is: 1. Do city targeted micro sites for the purpose of lead generation still work? If so are there any threads that have more info on this topic? 2. Do you suggest I interlink all 30 sites together and perhaps point them all to the money site? If so i'm wondering if I should diversify the ip's that I used to register the domains as well as the whois info. Thanks guys, all help is appreciated!
White Hat / Black Hat SEO | | AM2130