Google: How to See URLs Blocked by Robots?
-
Google Webmaster Tools says we have 17K out of 34K URLs that are blocked by our Robots.txt file.
How can I see the URLs that are being blocked?
Here's our Robots.txt file.
User-agent: * Disallow: /swish.cgi Disallow: /demo Disallow: /reviews/review.php/new/ Disallow: /cgi-audiobooksonline/sb/order.cgi Disallow: /cgi-audiobooksonline/sb/productsearch.cgi Disallow: /cgi-audiobooksonline/sb/billing.cgi Disallow: /cgi-audiobooksonline/sb/inv.cgi Disallow: /cgi-audiobooksonline/sb/new_options.cgi Disallow: /cgi-audiobooksonline/sb/registration.cgi Disallow: /cgi-audiobooksonline/sb/tellfriend.cgi Disallow: /*?gdftrk
-
It seems you might be asking two different questions here, Larry.
You ask which URLs are blocked by your robots file. You then answered your own question by listing the entries in your robots file which are the actual URLs that it is blocking.
If in fact what you want to know is which pages exist on your website but are not currently indexed, that's a much bigger question and requires a lot more work to answer.
There is no way Webmaster Tools can give you that answer, because if it was aware of the URL it would already be indexing it.
HOWEVER! It is possible to do it if you are willing to do some of the work on your own to collect and manipulate data using several tools. Essentially, you have to do it in three steps:
- create a list of all the URLs that Google says are indexed. (This info comes from Google's SERPs.)
- then create a separate list of all of the URLs that actually exist on your website. (This must come from a 3rd-party tool you run against your site yourself.)
- From there, you will use Excel to subtract the indexed URLs from the known URLs, leaving a list of non-indexed URLS, which is what you asked for.
I actually laid out this process step-by-step in response to an earlier question, so you can read the process there http://www.seomoz.org/q/how-to-determine-which-pages-are-not-indexed
Is that what you were looking for?
Paul
-
Okay, well the robots.txt will only be excluding robots from the folders and URLs specified and as I say, there's no way to download a list of all the URLs that Google is not indexing from webmaster tools.
If you have exact URLs in mind which you think might be getting excluded, you can test individual URLs in Google Webmaster Tools in:
Health > Blocked URLs > URLs Specify the URLs and user-agents to test against.
Beyond this, if you want to know if there are URLs that shouldn't be excluded in the folders you have specified, I would run a crawl of your website using SEOMoz' crawl test or Screaming Frog. Then sort the URLs alphabetically and make sure that all of the URLs in the folders you have excluded via robots.txt are ones that you want to exclude.
-
I want to make sure that Google is indexing all of our pages we want them to. I.E. That all of the NOT indexed URLs are valid.
-
Hi Larry
Why do you want to find those URLs out for my understanding? Are you concerned that the robots.txt is blocking URLs it shouldn't be?
As for downloading a list of URLs which aren't indexed from Google Webmaster Tools, which is what I think you would really like, this isn't possible at the moment.
-
Liz; Perhaps my post was unclear or I am misunderstanding your answer.
I want to find out the specific URLs that Google says it isn't indexing because of our Robots.txt file.
-
If you want to see if Google has indexed individual pages which are supposed to be excluded, you can check the URLs in your robots.txt using the site: command.
E.g. type the following into Google:
site:http://www.audiobooksonline.com/swish.cgi
site:http://www.audiobooksonline.com/reviews/review.php/new/
...continue for all the URLs in your robots.txtJust from searching on the last example above (site:http://www.audiobooksonline.com/reviews/review.php/new/) I can see that you have results indexed. This is probably because you added the robots.txt after it was already indexed.
To get rid of these results you need to take the culprit line out of the robots.txt, add the robots meta tag set to noindex to all pages you want removed, submit a URL removal request via webmaster tools, check it has been nonidexed then you can add the line back into the robots.txt.
This is the tag:
I hope that makes sense and is useful!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Why a certain URL ( a category URL ) disappears?
the page hasn't been spammed. - links are natural - onpage grader is perfect - there are useful high ranking articles linking to the page...pretty much everything is okay.....also all of my websites pages are okay and none of them has disappeared only this one ( the most important category of my site. )
Intermediate & Advanced SEO | | mohamadalieskandariii0 -
Google has penalized me for a keyword,and removed from google some one know for how long time is the penalty
i have by some links from fiverr i was ranking 9 for this keyword with 1200 of searches after fiverr it has disappeared from google more then 10 days i guess this is a penalty someone know how long a penalty like this is how many days to months ? i don't get any messages in webmaster tools this is the gig https://www.fiverr.com/carissa30/do-20-unique-domains-high-tf-and-cf-flow-backlinks-high-da?source=Order+page+gig+link&funnel=a7b5fa4f-8c0a-4c3e-98a3-74112b658c7f
Intermediate & Advanced SEO | | alexmuller870 -
Google News error in Google Search Console
My google search console states some errors as below: 1. Article fragmented Some of the urls in this error are the category urls. How to make google bot understand it is a category not an article? 2. Article too short In fact the article is quite long. I do not know why this is happen... 3. No sentence found In fact, there are a lot of sentences Please help!
Intermediate & Advanced SEO | | binhlai0 -
Should I worry about rendering problems of my pages in google search console fetch as google?
Some elements are not properly shown when I preview our pages in search console (fetch as google), e.g.
Intermediate & Advanced SEO | | lcourse
google maps, css tables etc. and some parts are not showing up since we load them asynchroneously for best page speed. Is this something should pay attention to and try to fix?0 -
Mixing static.htm urls and dynamic urls on a Windows IIS Server?
Hi all, We've had a website originally built using static html with .htm extensions ranking well in Google hence we want to keep those pages/urls. We are on a dedicated sever (Windows IIS). However our developer has custom made a new DYNAMIC section for the site which shows new added products dynamically and allows them to be booked online via shopping cart. We are having problems displaying them both on the same domain even if we put the dynamic section withing its own subfolder and keep the static htms in the root. Is it possible to have both function on IIS (even if they may have to function a little separately)? Does anyone have previous experience of this kind of issue or a way of making both work? What setup do we need to do on the dedicated server.
Intermediate & Advanced SEO | | emerald0 -
Use Canonical or Robots.txt for Map View URL without Backlink Potential
I have a Page X with lots of unique content. This page has a "Map view" option, which displays some of the info from Page X, but a lot is ommitted. Questions: Should I add canonical even though Map View URL does not display a lot of info from Page X or adding to robots.txt or noindex, follow? I don't see any back links coming to Map View URL Should Map View page have unique H1, title tag, meta des?
Intermediate & Advanced SEO | | khi50 -
Why are these results being showed as blocked by robots.txt?
If you perform this search, you'll see all m. results are blocked by robots.txt: http://goo.gl/PRrlI, but when I reviewed the robots.txt file: http://goo.gl/Hly28, I didn't see anything specifying to block crawlers from these pages. Any ideas why these are showing as blocked?
Intermediate & Advanced SEO | | nicole.healthline0 -
Is My site seeing a Google Dance ? - Rankings all over the place
My eCommerce website is seeing some rankings flucuate daily from say rank 20 to rank 130 whilst some other keywords ago up and down by as much as 40 places. I have been putting up alot of new unique content and can see in GWT that google has been crawling my site more but given that I was affected by the google panda updates which saw a 40% drop in traffic I'm only just to recover some of it., i have also been trying to get rid of any poor links and our linking building is only concentrating on high quality posts and links. I am wondering if this is the post panda update - "google dance" or is google having issues trying to work where to rank my site and possibly punish me? thanks Sarah.
Intermediate & Advanced SEO | | SarahCollins0