How to Hide Directories in Search?
-
I noticed bad 404 error links in Google Webmaster Tools and they were pointing to directories that do not have an actual page, but hold information.
Ex: there are links pointing to our PDF folder which holds all of our pdf documents. If i type in , example.com/pdf/ it brings up a unformated webpage that displays all of our PDF links.
How do I prevent this from happening. Right now I am blocking these in my robots.txt file, but if i type them in, they still appear.
Or should I not worry about this?
-
Yes, a visit to example.com/dir should now return a 404 error (if you haven't done any redirecting/canonicalizing). This will increase your 404 count in Web Master tools but it's far preferable to the alternative. If you're not redirecting the robots.txt will eventually work and hopefully the links will just fall out of WMT.
-
My hosting company turned off directory browsing and now everything is how it should be. So to my understanding, if the server sees a file that does not have a index file, it should not be view able and should be forbidden. This shoujld not affect us from an SEO standpoint should it? My hosting company said they disabled all directories in our site, however everything still works, except for the forbidden file directories.
-
Basically it shouldn't really have an affect; those unformatted file listings are literally the web server automatically saying 'here's the files that are in this folder', there's no meta tags, description, on page elements, etc.
If you have these pages and they're ranking well, you generally don't want them to be. The automatic file browsing pages don't have your name, your company, etc. in them, and they're generally pretty ugly. They also theoretically could be 'stealing' juice from your 'real' pages, if your internal structure isn't flowing relevance properly.
Basically what I'm saying is that if these pages are having some kind of SEO effect, you probably don't want them to be since they're so basic.
Also I can't overstate the security concerns that directory browsing might be introducing. If someone can directory browse to where your code lives (.php, .aspx.vb, whatever) they may be able to read it. Code sometimes has important things like logins, passwords, merchant account ids, etc. in it that you definitely don't want people reading.
-
Agreed with Valerie that step 1 is to turn off those directory listing pages - that can be a security issue and you don't necessarily want people to see/access the whole list. Also, make doubly sure you don't have any internal links to that directory (Google crawled it somehow).
Generally, Robots.txt should prevent crawling, but it's not foolproof, and it's pretty bad about removing pages once they're indexed. If you can block the page from browsing and return a 404 for the root page, that should be fine. The other option would be to have the page removed in Google Webmaster Tools. You could request removal for the entire folder, but I'm guessing that you may want the actual PDFs indexed.
-
Will turning of directory browsing affect Search for all directories?
-
I really don't want to 301 redirect them as they are just holding files. This is happening with my includes file too. that holds our header, footer, navigation etc. I can check with our hosting company to find out.
-
I'd create an index.html for the directory, and then redirect it somewhere. This way, you're capturing the inbound links and then rescuing some of the inbound juice.
Otherwise, you can also check out this post for more info on other solutions and modifying your htaccess file to prevent the directory view - http://perishablepress.com/better-default-directory-views-with-htaccess/
-
Blocking it in robots.txt will work to hide it from search engines.
If you want to hide it from users or people to who type in the url, you can simply drop a blank "index.html" in the /pdf folder.
-
I would suggest 301'ing them to their /index.htm or /pdf.htm equivalents. If you don't know, a 301 is a signal to a web browser (or search crawler) saying "this page has permanently moved, please go to (otherpage.htm) instead".
Here's a good SEOMoz article explaining it a bit more:
http://www.seomoz.org/learn-seo/redirection
What might be more of a concern, is it sounds like your web server has directory browsing enabled. This could be a security issue (depending on your web server setup). Generally you don't want to expose directories if you don't have to because it gives a potential attacker insight into your system setup. Here's an example how to do it in Apache:
www.camelrichard.org/topics/Apache/Turn_OffDirectoryBrowsing
And IIS:
technet.microsoft.com/en-us/library/cc731109(v=ws.10).aspx
If you like I can confirm if you have open directories if you give me the link, either here or through private message.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Subdirectory site / 301 Redirects / Google Search Console
Hi There, I'm a web developer working on an existing WordPress site (Site #1) that has 900 blog posts accessible from this URL structure: www.site-1.com/title-of-the-post We've built a new website for their content (Site #2) and programmatically moved all blog posts to the second website. Here is the URL structure: www.site-1.com/site-2/title-of-the-post Site #1 will remain as a normal company site without a blog, and Site #2 will act as an online content membership platform. The original 900 posts have great link juice that we, of course, would like to maintain. We've already set up 301 redirects that take care of this process. (ie. the original post gets redirected to the same URL slug with '/site-2/' added. My questions: Do you have a recommendation about how to best handle this second website in Google Search Console? Do we submit this second website as an additional property in GSC? (which shares the same top-level-domain as the original) Currently, the sitemap.xml submitted to Google Search Console has all 900 blog posts with the old URLs. Is there any benefit / drawback to submitting another sitemap.xml from the new website which has all the same blog posts at the new URL. Your guidance is greatly appreciated. Thank you.
Intermediate & Advanced SEO | | HimalayanInstitute0 -
Google Mobile Friendly designation in Search results
We have recently deployed a mobile (http://m.pssl.com) version of our desktop website (http://www.pssl.com). We've followed the guidelines in their documentation (https://support.google.com/webmasters/answer/6101188) & (http://googlewebmastercentral.blogspot.com/2015/04/rolling-out-mobile-friendly-update.html), added the appropriate rel=alternate/rel=canonical tags updated site maps and robots.txt files, etc. A mobile search for our company shows the "mobile-friendly" flag in the search results for our home page, but for some reason other pages such as category and brand are not showing showing as "mobile-friendly". I can submit the pages using the mobile-friendly tester (https://www.google.com/webmasters/tools/mobile-friendly/) and all of the pages I test come back as mobile friendly. Does anyone have any experience or advice they'd be willing to share that might help us resolve this issue?
Intermediate & Advanced SEO | | ovenbird0 -
Is un-searched content worth writing?
Hi, Is every post you write on your site is SERPs worthy? I'll give an example -
Intermediate & Advanced SEO | | BeytzNet
We often cover industry related news items. It is written very well with personal opinions, comments and detailed explanations. Our readers find it interesting, "like" and "plus" it. However, these items will never appear in the SERPs simply because they won't be searched. Needless to say that these are not ever green pieces. If by chance it lands a subject that may be searched in the future, usually it won't appear because it means that the item was also covered by major sites like CNN, Forbes, Bloomberg etc. Is it worth out time to keep "investing" in these types of articles? Thanks0 -
Image Search algo changes
Just seen the image algo changes that have apparently been put into place in the last 3 months My spam tester image site didnt seem to feel anything. Did anyone feel image search change (for better or worse?) And on what dates? Cheers Stephen
Intermediate & Advanced SEO | | firstconversion0 -
Best way to block a search engine from crawling a link?
If we have one page on our site that is is only linked to by one other page, what is the best way to block crawler access to that page? I know we could set the link to "nofollow" and that would prevent the crawler from passing any authority, and we can set the page to "noindex" to prevent it from appearing in search results, but what is the best way to prevent the crawler from accessing that one link?
Intermediate & Advanced SEO | | nicole.healthline0 -
Why do I have better results with absolute search
Hi, I completed the optimization of my website, and while my rankings have improved, i seem to have consistently better resultats with absolute search terms: ex: i'm better ranked on "word1 word2" than just word1 word2 without the guillemets. Do you have any idea why? As most people perform search without guillemets, i would like to improve my rankings with these searches too. thanks, cedric
Intermediate & Advanced SEO | | smartgrains0 -
Advanced search operators
Hi mozzers, Anyone know how to do a yahoo search to return results that meet have a specific URL For example, say I wanted to search for the keyword finance and I only wanted it to return results with the word "business" anywhere in the URL. I have done in the past, but I just cant remember how to do it! Thanks
Intermediate & Advanced SEO | | PeterM220 -
We would like to know where googlebots search from - IP location?
We have a client who has two sites for different countries, 1 US, 1 UK and redirects visitors based on IP. In order to make sure that the English site is crawable, we need to know where the googlebot searches from. Is this a US IP or a UK IP for a UK site / server?
Intermediate & Advanced SEO | | AxonnMedia0