Indexed, though blocked by robots.txt: Need to bother?
-
Hi,
We have intentionally blocked some of the website files which were indexed for years. Now we receive a message "Indexed, though blocked by robots.txt" in GSC. We can ignore as per my knowledge? Are any actions required about this? We thought of blocking them with meta tags but these are PDF files.
Thanks
-
Hi there!
What Google is telling you is that you are indexing URLs that you probably are not wanting to be indexed, or the other way around, that important pages are being blocked but indexed for other reasons.
If I might ask, why did you blocked through robots.txt those files?
There most 2 answers are:
1- Wanted to remove those from search results. If this is your case, you've solved only a part of the problem. What you should have done is (previously allowing robots to crawl those urls) apply noindex rules (keep in mind that can be set up in the HTTP header, as long as not html files cant have meta robots tag), then after a sufficient time block them in robots.txt.
_2- Optimize how GoogleBot (crawiling) time. _Being this case, then you've done it correctly and there is nothing to worry.Hope this help.
Best luck.
GR
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google Search Console Not Indexing Pages
Hi there! I have a problem that I was hoping someone could help me with. On google search console, my website does not seem to be indexed well. In fact, even after rectifying problems that Moz's on-demand crawl has pointed out, it still does not become "valid". There are some of the excluded pages that Google has pointed out. I have rectified some of the issues but it doesn't seem to be helping. However, when I submitted the sitemap, it says that the URLs were discoverable, hence I am not sure why they can be discovered but are not deemed "valid". I would sincerely appreciate any suggestions or insights as to how can I go about to solve this issue. Thanks! Screenshot+%28341%29.png Screenshot+%28342%29.png Screenshot+%28343%29.png
Algorithm Updates | | Chowsey0 -
Non-indexed or indexed top hierarchy pages get high PageRank at Google?
Hi, We are creating some pages just to capture leads from blog-posts. We created few pages at top hierarchy like website.com/new-page/. I'm just wondering if these pages will take away more PageRank. Do we need to create these pages at low hierarchy like website.com/folder/new-page to avoid passing more PageRank? Is this is how PR distributed even now and it's same for indexed or non-indexed pages? Thanks
Algorithm Updates | | vtmoz0 -
Sizable decrease in amount of pages indexed, however no drop in clicks, impressions, or ranking.
Hi everyone, I've run into a worrying phenomenon in GSC and im wondering if anyone has come across something similar. Since August, I have seen a steady decline in the number of pages that are indexed from my site, from 1.3 million down to about 800,000 in two months. Interestingly, my clicks/impressions continue to increase gradually (on the same pace they have been for months) and I see no other negative side affects resulting from this drop in coverage. In total I have 1.2 million urls that fall into one of three categories, "Crawled - currently not indexed", "Crawl anomaly", and "Discovered - currently not indexed" Some other notes - all of my valid, error, and excluded pages are https://www. , so I don't believe there is an issue with different versions of the same site being submitted. Also, my rankings have not changed so I tentatively believe that this is unrelated to the Medic Update. If anyone else has experienced this or has any insight to the problem I would love to know. Thanks!
Algorithm Updates | | Jason-Reid0 -
Need to be reindexed quickly - SERP is showing a 404
So there was a mistake made where a 404 error was placed in the canonical URL for the pages my company made. We need to have these pages quickly reindexed. I asked GWT to fetch them and have an updated sitemap but the SERPs are still the same. Any tricks anyone knows that would allow me to get reindexed faster?
Algorithm Updates | | mattdinbrooklyn0 -
How To Index Backlinks Easily?
I have already pinged my backlinks, While pinging individual urls but all the same backlinks are not indexed. How to index my backlinks?
Algorithm Updates | | surabhi60 -
What is the appropriate Robot.txt to unblock if Google cannot get all the resources from my homepage?
Hello everyone. I did some research as to why my website has decreased in the Google search rankings recently. After reading this Yoast article I believe it's because the robot.txt files I have set up on my wordpress website. The following is a screen shot of the results of a fetch & render query of my webpage.Googlebot couldn't get all resources for this page. Here's a list: URL Type Reason http://fonts.googleapis.com/css?family=Open+Sans:400,600,700,800%7CPT+Sans:400,400italic,700,700italic%7COswald:400,300,700&subset=latin,latin-ext Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/slick-contact-forms/css/admin.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/contact-form-plugin/css/style.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/hupso-share-buttons-for-twitter-facebook-google/style.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/latest-post-accordian-slider/css/lpaccordion.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/latest-post-accordian-slider/css/style.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/revslider/rs-plugin/css/settings.css?rev=4.1.1&ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/revslider/rs-plugin/css/dynamic-captions.css?rev=4.1.1&ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/revslider/rs-plugin/css/static-captions.css?rev=4.1.1&ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/wp-email-capture/inc/css/wp-email-capture-styles.css?ver=1.0 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/themes/infographer/style.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/themes/infographer/css/stylesheet.min.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/themes/infographer/css/style_dynamic.php?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/themes/infographer/css/custom_css.php?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/convertable-contact-form-builder-analytics-and-lead-management-dashboard/assets/css/convertable.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/google-maps-widget/css/gmw.css?ver=1.66 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-content/plugins/acurax-social-media-widget/style.css?ver=3.9.1 Style Sheet Denied by robots.txt http://www.kmollinslaw.com/wp-includes/js/swfobject.js?ver=2.2-20120417 Script Denied by robots.txt My current robot.txt settings are as follows; User-agent: * Disallow: /wp-admin/ Disallow: /wp-includes/ Disallow: */xmlrpc.php Disallow: */wp-*.php Disallow: */trackback/ Disallow: *?wptheme= Disallow: *?comments= Disallow: *?replytocom Disallow: */comment-page- Disallow: *?s= Disallow: */wp-content/ Allow: */wp-content/uploads/ ```What to I need to allow/disallow to allow Google spiders to properly read my website?
Algorithm Updates | | gamesotd0 -
Google indexing my website's Search Results pages. Should I block this?
After running the SEOmoz crawl test, i have a spreadsheet of 11,000 urls of which 6381 urls are search results pages from our website that have been indexed. I know I've read that /search should be blocked from the engines, but can't seem to find that information at this point. Does anyone have facts behind why they should be blocked? Or not blocked?
Algorithm Updates | | Jenny10 -
Website moving up and down SERPs alongside others in 'blocks'.
I have noticed that since the so-called 'Panda' updates my website has been moving up and down the Google SERPs in a kind of 'block' alongside other unrelated websites for certain keyword phrases. Whenever there is upwards or downwards movement it happens in tandem with the other websites in those blocks and it is very frustrating. Why is this and has anybody experienced anything similar? The website - http://bit.ly/jIFHpm The search engine - Google, US The keyword phrase - First phrase of website meta title Sorry for being so cryptic I just don't like openly giving out certain information - think it's a bad hangover from the Google Webmaster help forums where everything you post is indexed for Joe nobody to read at his leisure. If anybody would like to mention anything else related to any on or off-page factors on the website then your time would be much appreciated. One thing I am a bit concerned about, for example, is the repition of 'monitors' on the 'monitor shop' dropdown on the top navigation menu and also the sidebar. This would put a localised high density of the keyword in these navigation areas and I am slightly concerned about that. I have no malicious intent and it is appropriate for the user but perhaps the manufacturer names alone would suffice? Thanks in advance.
Algorithm Updates | | teebus0