Robot.txt help
-
Hi,
We have a blog that is killing our SEO.
We need to
Disallow
Disallow: /Blog/?tag*
Disallow: /Blog/?page*
Disallow: /Blog/category/*
Disallow: /Blog/author/*
Disallow: /Blog/archive/*
Disallow: /Blog/Account/.
Disallow: /Blog/search*
Disallow: /Blog/search.aspx
Disallow: /Blog/error404.aspx
Disallow: /Blog/archive*
Disallow: /Blog/archive.aspx
Disallow: /Blog/sitemap.axd
Disallow: /Blog/post.aspxBut Allow everything below /Blog/Post
The disallow list seems to keep growing as we find issues. So rather than adding in to our Robot.txt all the areas to disallow. Is there a way to easily just say Allow /Blog/Post and ignore the rest. How do we do that in Robot.txt
Thanks
-
These: http://screencast.com/t/p120RbUhCT
They appear on every page I looked at, and take up the entire area "above the fold" and the content is "below the fold"
-Dan
-
Thanks Dan, but what grey areas, what url are you looking at?
-
Ahh. I see. You just need to "noindex" the pages you don't want in the index. As far as how to do that with blogengine, I am not sure, as I have never used it before.
But I think a bigger issue is like the giant box areas at the top of every page. They are pushing your content way down. That's definitely hurting UX and making the site a little confusing. I'd suggest improving that as well
-Dan
-
Hi Dan, Yes sorry that's the one!
-
Hi There... that address does not seem to work for me. Should it be .net? http://www.dotnetblogengine.net/
-Dan
-
Hi
The blog is www.dotnetblogengine.com
The content is only on the blog once it is just it can be accessed lots of different ways
-
Andrew
I doubt that one thing made your rankings drop so much. Also, what type of CMS are you on? Duplicate content like that should be controlled through indexation for the most part, but I am not recognizing that type of URL structure as any particular CMS?
Are just the title tags duplicate or the entire page content? Essentially, I would either change the content of the pages so they are not duplicate, or if that doesn't make sense I would just "noindex" them.
-Dan
-
Hi Dan,
I am getting duplicate content errors in WMT like
This is because tag=ABC and page=1 are both different ways to get to www.mysite.com/Blog/Post/My-Blog-Post.aspx
To fix this I have remove the URL's www.mysite.com/Blog/?tag=ABC and www.mysite.com/Blog/?Page=1from GWMT and by setting robot.txt up like
User-agent: *
Disallow: /Blog/
Allow: /Blog/post
Allow: /Blog/PostI hope to solve the duplicate content issue to stop it happening again.
Since doing this my SERP's have dropped massively. Is what I have done wrong or bad? How would I fix?
Hope this makes sense thanks for you help on this its appreciated.
Andrew
-
Hi There
Where are they appearing in WMT? In crawl errors?
You can also control crawling of parameters within webmaster tools - but I am still not quite sure if you are trying to remove these from the index or just prevent crawling (and if preventing crawling, for what reason?) or both?
-Dan
-
Hi Dan,
The issue is my blog had tagging switched on, it cause canonicalization mayhem.
I switched it off, but the tags still appears in Google Webmaster Tools (GWMT). I Remove URL via GWMT but they are still appearing. This has also caused me to plummet down the SERPs! I am hoping this is why my SERPs had dropped anyway! I am now trying to get to a point where google just sees my blog posts and not the ?Tag or ?Author or any other parameter that is going to cause me canoncilization pain. In the meantime I am sat waiting for google to bring me back up the SERPs when things settle down but it has been 2 weeks now so maybe something else is up?
-
I'm wondering why you want to block crawling of these URLs - I think what you're going for is to not index them, yes? If you block them from being crawled, they'll remain in the index. I would suggest considering robots meta noindex tags - unless you can describe in a little more detail what the issue is?
-Dan
-
Ok then you should be all set if your tests on GWMT did not indicate any errors.
-
Thanks it goes straight to www.mysite.com/Blog
-
Yup, I understand that you want to see your main site. This is why I recommended blocking only /Blog and not / (your root domain).
However, many blogs have a landing page. Does yours? In other words, when you click on your blog link, does it take you straight to Blog/posts or is there another page in between, eg /Blog/welcome?
If it does not go straight into Blog/posts you would want to also allow the landing page.
Does that make sense?
-
The structure is:
www.mysite.com - want to see everything at this level and below it
www.mysite.com/Blog - want to BLOCK everything at this level
www.mysite.com/Blog/posts - want to see everything at this level and below it
-
Well what Martijn (sorry, I spelled his name wrong before) and I were saying was not to forget to allow the landing page of your blog - otherwise this will not be indexed as you are disallowing the main blog directory.
Do you have a specific landing page for your blog or does it go straight into the /posts directory?
I'd say there's nothing wrong with allowing both Blog/Post and Blog/post just to be on the safe side...honestly not sure about case sensitivity in this instance.
-
"We're getting closer David, but after reading the question again I think we both miss an essential point ;-)" What was the essential point you missed. sorry I don't understand. I don;t want to make a mistake in my Robot.txt so would like to be 100% sure on what you are saying
-
Thanks guys so I have
User-agent: *
Disallow: /Blog/
Allow: /Blog/post
Allow: /Blog/Postthat works. My Home page also works. I there anything wrong with including both uppercase "Post" and lowercase "post". It is lowercase on the site but want uppercase "P" just incase. Is there a way to make the entry non case sensitive?
Thanks
-
Correct, Martijin. Good catch!
-
There was a reason that I said he should test this!
We're getting closer David, but after reading the question again I think we both miss an essential point ;-). As we know also exclude the robots from crawling the 'homepage' of the blog. If you have this homepage don't forget to also Allow it.
-
Well, no point in a blog that hurts your seo
I respectfully disagree with Martijin; I believe what you would want to do is disallow the Blog directory itself, not the whole site. It would seem if you Disallow: / and _Allow:/Blog/Post _ that you are telling SEs not to index anything on your site except for /Blog/Post.
I'd recommend:
User-agent: *
Disallow: /Blog/
Allow: /Blog/PostThis should block off the entire Blog directory except for your post subdirectory. As Maritijin stated; always test before you make real changes to your robots.txt.
-
That would be something like this, please check this or test this within Google Webmaster Tools if it works because I don't want to screw up your whole site. What this does is disallowing your complete site and just allows the /Blog/Post urls.
User-agent: *
Disallow: /
Allow: /Blog/Post
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Moving to a new domain for second time - critical, help needed fast!
Hello, Important: please do not ask why we need to change the domain, its not the matter at all, thank you for understanding. Over a month ago we successfully changed our domain name, 301 redirected, did GWT 'change of address' and all. The old domain was 2 years old, ranking very well, the new domain change of address was a success and traffic back on the new domain after a week. Today we need to change the domain name again, unfortunately, for some reasons, we have to, however we are not sure what to do in GWT, when I went to 'change of address' in the domain (the new first domain), i saw the following message (screenshot attached too): This site is undergoing a move Old URL | New URL If any URL on the left should not be moved, you can withdraw its move request. To do this, click the URL and then Withdraw. Now our questions: 1. For second time moving to a new domain, we should move from the old first domain (301 from the first old domain) or from the second domain (301 from the second domain)? 2. If from the old first domain, should we Withdraw from the first domain (lift up the first change of address in GWT) and then redirect the old first domain to the second new domain (the one we want to move now)? If yes, what to do with the first new domain (the one which we moved to a month ago) 3. If we should move from the first new domain, then what to do? The situation is clear but confusing what to do? It's just that we need to change the domain name again, move to a new one, for the second time, now we should redirect from the first old domain or first new domain? I purchased MOZ just to get help from you guys here, the only place i thought I could be helped. Of course gonna use Moz service too now that i have puurchased it 🙂 Awaiting your quick help guys. Thank you! 8csVpOZ2QoiYCoTR1t_SnQ.png
Intermediate & Advanced SEO | | mdmoz0 -
Help to identify that this SEO agency is doing a TERRIBLE job
Hi folks, I am working with a group for which I do SEO etc. for one part of the group. Another part of the group hired an SEO agency to carry out their SEO for them (before I joined). In short, they are doing a terrible job by building links in very dodgy directories (ones which get taken offline) and via machine generated 'articles' on horrendously bad 'blogs'. Please take a look at these 'articles' and leave your thoughts below so I can back up the point that these guys are not the kind of SEOs we should be working with. [List of links to articles removed by moderator] Many thanks in advance, Gill.
Intermediate & Advanced SEO | | Cannetastic0 -
Robots.txt - Do I block Bots from crawling the non-www version if I use www.site.com ?
my site uses is set up at http://www.site.com I have my site redirected from non- www to the www in htacess file. My question is... what should my robots.txt file look like for the non-www site? Do you block robots from crawling the site like this? Or do you leave it blank? User-agent: * Disallow: / Sitemap: http://www.morganlindsayphotography.com/sitemap.xml Sitemap: http://www.morganlindsayphotography.com/video-sitemap.xml
Intermediate & Advanced SEO | | morg454540 -
How to take out international URL from google US index/hreflang help
Hi Moz Community, Weird/confusing question so I'll try my best. The company I work for also has an Australian retail website. When you do a site:ourbrand.com search the second result that pops up is au.brand.com, which redirects to the actual brand.com.au website. The Australian site owner removed this redirect per my bosses request and now it leads to a an unavailable webpage. I'm confused as to best approach, is there a way to noindex the au.brand.com URL from US based searches? My only problem is that the au.brand.com URL is ranking higher than all of the actual US based sub-cat pages when using a site search. Is this an appropriate place for an hreflang tag? Let me know how I can help clarify the issue. Thanks,
Intermediate & Advanced SEO | | IceIcebaby
-Reed0 -
Magento Help - Server Reset
Good Morning, After rebooting a server, a magento based website reset itself going back to December 2013. All changes to the site and orders dating up until yesterday (6/19/14) have disappeared. There are several folders on the root of the server that have files with yesterday's date but we don't know how to bring everything back and restore. Any Magento or server experts out there ever face this issue or have any ideas or potential solutions? Thanks
Intermediate & Advanced SEO | | Prime850 -
Where do I start? A little overwhelmed - Really appreciate any help.
Hello everyone, I was hoping to be able to get some advice if at all possible. I'm know there's some really skilled people on these forums. I'm not sure of the etiquette of posting links so if it's crossing the line to post the link of the site I'm working on please let me know. I was just hired as the Director of Digital Marketing for a decent sized company. I have a lot of experience with PPC, media buys and digital integration but unfortunately I'm a noob with SEO. I'm trying to learn as quickly as possible, and I'm reading everything I can (including everything I can find from Rand) The issue is that I have to start somewhere but there's so much that needs to be done that I'm getting a bit lost. We used to have 3 separate sites that ranked really well for our main keywords on Google, but with the growth of our company we consolidated the different market sites into one main one and 301'd the older ones. In doing so we lost nearly all of our search rankings. Our current site is: Http://www.GoldMaxUSA.com/ Does anyone have any advice on where to dive in here? I'm currently paying a ton each month for traffic through search PPC and contextual display, along with a bunch of geographically targeted brand awareness display campaigns, and I really need to show some progress on our organic search. Thanks in advance for any help!
Intermediate & Advanced SEO | | JFritton0 -
How Does SEO Help Local Businesses
Hello, I recently took a position as a digital marketing manger with a advertising agency. Its my job to grow the digital marketing department. One of the issues I am running into is 90% of our clients are local businesses. When doing keyword research it is very difficult to find keywords with lots of search. For example, if I am optimizing for a Ford dealership in Hackensack,NJ there are not a lot of searches for this term. How can I justify a larger SEO budget when there is just not a lot of search volume for these keywords? This is nothing like Dog Training Videos or something similar. Am I missing something? Where can I pull traffic from for local businesses to justify larger SEO budgets? Thanks, Bill
Intermediate & Advanced SEO | | wparlaman0 -
Need some help with a tricky 301
I can't find anything online that deals with this issue. I have a page getting indexed by Google at mydomain.com/widgets and I don't know why. No links to it anywhere. The page it is closest to is mydomain.com/reviews/widgets and so I tried to set up a 301 to point one to the other. The problem is each individual widget review is at mydomain.com/widgets/reviews/products/widget-name and so when I redirect /widgets to mydomain.com/reviews/widgets it also redirects each individual product to mydomain.com/reviews/widgets/reviews/products/widget-name. Is there some way to just redirect /widgets without having it affect each product review? I cannot change URL structure either, nature of the site. Any ideas?
Intermediate & Advanced SEO | | DanDeceuster0