Robot.txt help
-
Hi,
We have a blog that is killing our SEO.
We need to
Disallow
Disallow: /Blog/?tag*
Disallow: /Blog/?page*
Disallow: /Blog/category/*
Disallow: /Blog/author/*
Disallow: /Blog/archive/*
Disallow: /Blog/Account/.
Disallow: /Blog/search*
Disallow: /Blog/search.aspx
Disallow: /Blog/error404.aspx
Disallow: /Blog/archive*
Disallow: /Blog/archive.aspx
Disallow: /Blog/sitemap.axd
Disallow: /Blog/post.aspxBut Allow everything below /Blog/Post
The disallow list seems to keep growing as we find issues. So rather than adding in to our Robot.txt all the areas to disallow. Is there a way to easily just say Allow /Blog/Post and ignore the rest. How do we do that in Robot.txt
Thanks
-
These: http://screencast.com/t/p120RbUhCT
They appear on every page I looked at, and take up the entire area "above the fold" and the content is "below the fold"
-Dan
-
Thanks Dan, but what grey areas, what url are you looking at?
-
Ahh. I see. You just need to "noindex" the pages you don't want in the index. As far as how to do that with blogengine, I am not sure, as I have never used it before.
But I think a bigger issue is like the giant box areas at the top of every page. They are pushing your content way down. That's definitely hurting UX and making the site a little confusing. I'd suggest improving that as well
-Dan
-
Hi Dan, Yes sorry that's the one!
-
Hi There... that address does not seem to work for me. Should it be .net? http://www.dotnetblogengine.net/
-Dan
-
Hi
The blog is www.dotnetblogengine.com
The content is only on the blog once it is just it can be accessed lots of different ways
-
Andrew
I doubt that one thing made your rankings drop so much. Also, what type of CMS are you on? Duplicate content like that should be controlled through indexation for the most part, but I am not recognizing that type of URL structure as any particular CMS?
Are just the title tags duplicate or the entire page content? Essentially, I would either change the content of the pages so they are not duplicate, or if that doesn't make sense I would just "noindex" them.
-Dan
-
Hi Dan,
I am getting duplicate content errors in WMT like
This is because tag=ABC and page=1 are both different ways to get to www.mysite.com/Blog/Post/My-Blog-Post.aspx
To fix this I have remove the URL's www.mysite.com/Blog/?tag=ABC and www.mysite.com/Blog/?Page=1from GWMT and by setting robot.txt up like
User-agent: *
Disallow: /Blog/
Allow: /Blog/post
Allow: /Blog/PostI hope to solve the duplicate content issue to stop it happening again.
Since doing this my SERP's have dropped massively. Is what I have done wrong or bad? How would I fix?
Hope this makes sense thanks for you help on this its appreciated.
Andrew
-
Hi There
Where are they appearing in WMT? In crawl errors?
You can also control crawling of parameters within webmaster tools - but I am still not quite sure if you are trying to remove these from the index or just prevent crawling (and if preventing crawling, for what reason?) or both?
-Dan
-
Hi Dan,
The issue is my blog had tagging switched on, it cause canonicalization mayhem.
I switched it off, but the tags still appears in Google Webmaster Tools (GWMT). I Remove URL via GWMT but they are still appearing. This has also caused me to plummet down the SERPs! I am hoping this is why my SERPs had dropped anyway! I am now trying to get to a point where google just sees my blog posts and not the ?Tag or ?Author or any other parameter that is going to cause me canoncilization pain. In the meantime I am sat waiting for google to bring me back up the SERPs when things settle down but it has been 2 weeks now so maybe something else is up?
-
I'm wondering why you want to block crawling of these URLs - I think what you're going for is to not index them, yes? If you block them from being crawled, they'll remain in the index. I would suggest considering robots meta noindex tags - unless you can describe in a little more detail what the issue is?
-Dan
-
Ok then you should be all set if your tests on GWMT did not indicate any errors.
-
Thanks it goes straight to www.mysite.com/Blog
-
Yup, I understand that you want to see your main site. This is why I recommended blocking only /Blog and not / (your root domain).
However, many blogs have a landing page. Does yours? In other words, when you click on your blog link, does it take you straight to Blog/posts or is there another page in between, eg /Blog/welcome?
If it does not go straight into Blog/posts you would want to also allow the landing page.
Does that make sense?
-
The structure is:
www.mysite.com - want to see everything at this level and below it
www.mysite.com/Blog - want to BLOCK everything at this level
www.mysite.com/Blog/posts - want to see everything at this level and below it
-
Well what Martijn (sorry, I spelled his name wrong before) and I were saying was not to forget to allow the landing page of your blog - otherwise this will not be indexed as you are disallowing the main blog directory.
Do you have a specific landing page for your blog or does it go straight into the /posts directory?
I'd say there's nothing wrong with allowing both Blog/Post and Blog/post just to be on the safe side...honestly not sure about case sensitivity in this instance.
-
"We're getting closer David, but after reading the question again I think we both miss an essential point ;-)" What was the essential point you missed. sorry I don't understand. I don;t want to make a mistake in my Robot.txt so would like to be 100% sure on what you are saying
-
Thanks guys so I have
User-agent: *
Disallow: /Blog/
Allow: /Blog/post
Allow: /Blog/Postthat works. My Home page also works. I there anything wrong with including both uppercase "Post" and lowercase "post". It is lowercase on the site but want uppercase "P" just incase. Is there a way to make the entry non case sensitive?
Thanks
-
Correct, Martijin. Good catch!
-
There was a reason that I said he should test this!
We're getting closer David, but after reading the question again I think we both miss an essential point ;-). As we know also exclude the robots from crawling the 'homepage' of the blog. If you have this homepage don't forget to also Allow it.
-
Well, no point in a blog that hurts your seo
I respectfully disagree with Martijin; I believe what you would want to do is disallow the Blog directory itself, not the whole site. It would seem if you Disallow: / and _Allow:/Blog/Post _ that you are telling SEs not to index anything on your site except for /Blog/Post.
I'd recommend:
User-agent: *
Disallow: /Blog/
Allow: /Blog/PostThis should block off the entire Blog directory except for your post subdirectory. As Maritijin stated; always test before you make real changes to your robots.txt.
-
That would be something like this, please check this or test this within Google Webmaster Tools if it works because I don't want to screw up your whole site. What this does is disallowing your complete site and just allows the /Blog/Post urls.
User-agent: *
Disallow: /
Allow: /Blog/Post
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Help Dealing with Sustained Negative SEO Attack
Hello, I am hoping that someone is able to help with a problem that is destroying both my business and my health. We are an ecommerce site who have been trading since 2004 and who have always had strong rankings in Google. Unfortunately, over the past couple of months, these have significantly decreased (I would estimate around 40% drop in organic traffic). We have not had a manual penalty and still have decent rankings for a lot of competitive keywords, so we think it is more likely to be an algorithmic penalty.The most likely culprit is due to a huge scale negative SEO attack that has been going on for around 18 months. Last September, we suffered a major drop in rankings as a result of the 302 hijack scheme, but after submitting a disavow file (of around 500 domains) on 12th November, we recovered on 26th November (although we now don't know whether this was due to disavow file or the Phantom III update on 19th November).After suffering another major drop at the end of June, we submitted a disavow file of 1100 domains (this the scale of the problem!). This tempoarily halted the slide, however it is getting worse again. I have attached a file from Majestic which shows the increase in the backlinks (however we are not building these).We are at a loss and desperately need help. We have contacting all the sites to try and get links removed but they are happening faster than we can contact them. We have also done a full technical audit and added around 50,000 words of unique, handwritten content, as well as continuing to work through all technical fixes and improvements.At the moment, the only thing we can think of doing is submitting a weekly disavow for all the new spammy domains that come up. The questions I have are: Is there anything we can do to stop the attack? Is this increase in backlinks likely to be the culprit for the drops (both the big drops and the subsequent weekly 10% drop)? If so, would weekly disavows solve the problem? Is this likely to take months (years?) to recover from or can it be done quicker? Can you give me any ray of light to help me sleep at night? 😞 Really appreciate any and all help. I wouldn't wish ths on anyone.Thanks,Simon
Intermediate & Advanced SEO | | simonukss0 -
All of my blog titles have disappeared. In need of Wordpress help.
Not sure if this is the right place to ask this question but here it goes. All of the titles on my real estate website have disappeared. I have spent hours looking through different forums trying to figure out how to make them show up. Also whenever I hover the cursor over links they turn to white and disappear as well. This is the website: http://www.acolerealty.com/blog/ If this helps here is the custom CSS in worpress is the following: /* GREEN */ body {background: #eff3ec !important;} .header-membership {
Intermediate & Advanced SEO | | artscube.biz
background: #fff !important;
box-shadow: none !important;
border-bottom: 2px solid #e5e9e3 !important;
} .header-membership a {
color: #909090 !important;
text-shadow: none !important
} h1#site-title a {
color: #397249 !important;
} header nav#main-nav {
background: #7aad79 !important; /* Old browsers /
background: -moz-linear-gradient(top, #7aad79 0%, #397249 100%) !important; / FF3.6+ /
background: -webkit-gradient(linear, left top, left bottom, color-stop(0%,#7aad79), color-stop(100%,#397249)) !important; / Chrome,Safari4+ /
background: -webkit-linear-gradient(top, #7aad79 0%,#397249 100%); / Chrome10+,Safari5.1+ /
background: -o-linear-gradient(top, #7aad79 0%,#397249 100%) !important; / Opera 11.10+ /
background: -ms-linear-gradient(top, #7aad79 0%,#397249 100%) !important; / IE10+ /
background: linear-gradient(to bottom, #7aad79 0%,#397249 100%) !important; / W3C /
filter: progid:DXImageTransform.Microsoft.gradient( startColorstr='#7aad79', endColorstr='#397249',GradientType=0 ) !important; / IE6-9 */
} #t-header-container .home-search-container #header-top-search::before {
background: #7aad79 !important; /* Old browsers /
background: -moz-linear-gradient(top, #7aad79 0%, #397249 100%) !important; / FF3.6+ /
background: -webkit-gradient(linear, left top, left bottom, color-stop(0%,#7aad79), color-stop(100%,#397249)) !important; / Chrome,Safari4+ /
background: -webkit-linear-gradient(top, #7aad79 0%,#397249 100%); / Chrome10+,Safari5.1+ /
background: -o-linear-gradient(top, #7aad79 0%,#397249 100%) !important; / Opera 11.10+ /
background: -ms-linear-gradient(top, #7aad79 0%,#397249 100%) !important; / IE10+ /
background: linear-gradient(to bottom, #7aad79 0%,#397249 100%) !important; / W3C /
filter: progid:DXImageTransform.Microsoft.gradient( startColorstr='#7aad79', endColorstr='#397249',GradientType=0 ) !important; / IE6-9 */
} input.button-primary {
background: #7aad79 !important; /* Old browsers /
background: -moz-linear-gradient(top, #7aad79 0%, #397249 100%) !important; / FF3.6+ /
background: -webkit-gradient(linear, left top, left bottom, color-stop(0%,#7aad79), color-stop(100%,#397249)) !important; / Chrome,Safari4+ /
background: -webkit-linear-gradient(top, #7aad79 0%,#397249 100%); / Chrome10+,Safari5.1+ /
background: -o-linear-gradient(top, #7aad79 0%,#397249 100%) !important; / Opera 11.10+ /
background: -ms-linear-gradient(top, #7aad79 0%,#397249 100%) !important; / IE10+ /
background: linear-gradient(to bottom, #7aad79 0%,#397249 100%) !important; / W3C /
filter: progid:DXImageTransform.Microsoft.gradient( startColorstr='#7aad79', endColorstr='#397249',GradientType=0 ) !important; / IE6-9 */ border:1px solid #23472d !important;
} input.button-primary:hover {
background: #628b61 !important;
} footer {
background: #e4e8e1 !important;
}0 -
"noindex, follow" or "robots.txt" for thin content pages
Does anyone have any testing evidence what is better to use for pages with thin content, yet important pages to keep on a website? I am referring to content shared across multiple websites (such as e-commerce, real estate etc). Imagine a website with 300 high quality pages indexed and 5,000 thin product type pages, which are pages that would not generate relevant search traffic. Question goes: Does the interlinking value achieved by "noindex, follow" outweigh the negative of Google having to crawl all those "noindex" pages? With robots.txt one has Google's crawling focus on just the important pages that are indexed and that may give ranking a boost. Any experiments with insight to this would be great. I do get the story about "make the pages unique", "get customer reviews and comments" etc....but the above question is the important question here.
Intermediate & Advanced SEO | | khi50 -
Help Me find a Great Seo for my Budget!
I am looking for a Good SEO for my tech news site and would like your help in recommending a good SEO that will fit in my budget of 300-500 per month.I have contacted many firms in the Moz directory of recommended firms but found they are out of my monthly price range.Google search for a decent SEO can be scary with so many so called SEO companies.I would like to work with a experienced SEO individual who can come up with a great plan for our site and also implement them.We just had a SEO forensic audit done with Alan Blieweiss and implemented his suggestions and are now looking for someone to work with long term for the rest of our SEO needs.I understand that I cannot afford the top SEO firms or industry leaders but with your help and suggestions I am sure we can afford and find a great SEO. Please reply here or message me.
Intermediate & Advanced SEO | | chrisyak0 -
Google: How to See URLs Blocked by Robots?
Google Webmaster Tools says we have 17K out of 34K URLs that are blocked by our Robots.txt file. How can I see the URLs that are being blocked? Here's our Robots.txt file. User-agent: * Disallow: /swish.cgi Disallow: /demo Disallow: /reviews/review.php/new/ Disallow: /cgi-audiobooksonline/sb/order.cgi Disallow: /cgi-audiobooksonline/sb/productsearch.cgi Disallow: /cgi-audiobooksonline/sb/billing.cgi Disallow: /cgi-audiobooksonline/sb/inv.cgi Disallow: /cgi-audiobooksonline/sb/new_options.cgi Disallow: /cgi-audiobooksonline/sb/registration.cgi Disallow: /cgi-audiobooksonline/sb/tellfriend.cgi Disallow: /*?gdftrk Sitemap: http://www.audiobooksonline.com/google-sitemap.xml
Intermediate & Advanced SEO | | lbohen0 -
Can someone help me with RegEx?
Hey! I am having a tough time figuring this out, and I have already signed up for my RegEx course. So in the mean time, could you please help me? I have two old URLs: /faq /faq.php The new one is /faqs How can I write a 301 redirect to include faq & faq.php in the same line? I basically want to capture /faq and anything beyond the q including the .php. I thought this would work: Redirect 301 /faq. http://www.blah.com/faqs Using the period to catch everything after the q and redirect it to the /faqs page. Extra credit: And why Redirect 301 vs. RedirectMatch 301? It is an Apache server and mod_ rewrite is on. Thanks in advance!
Intermediate & Advanced SEO | | cyberlicious0 -
How can I improve my rankings in Google with help of seomoz
Hey guys, I have joined seomoz today and set up campaigns for my sites. I got reports about keyword rankings, errors, notices etc. But I am still confused about how to use seomoz in order to improve my rankings. My point is does seomoz provides any services for improving position in google or simply seomoz provides only reporting? These reports are good but my ultimate goal to join seomoz is to improve my rankings for my each website and each post. Please help. BJ
Intermediate & Advanced SEO | | intmktcom0 -
Panda Updates - robots.txt or noindex?
Hi, I have a site that I believe has been impacted by the recent Panda updates. Assuming that Google has crawled and indexed several thousand pages that are essentially the same and the site has now passed the threshold to be picked out by the Panda update, what is the best way to proceed? Is it enough to block the pages from being crawled in the future using robots.txt, or would I need to remove the pages from the index using the meta noindex tag? Of course if I block the URLs with robots.txt then Googlebot won't be able to access the page in order to see the noindex tag. Anyone have and previous experiences of doing something similar? Thanks very much.
Intermediate & Advanced SEO | | ianmcintosh0