Restricted by robots.txt does this cause problems?
-
I have restricted around 1,500 links which are links to retailers website and links that affiliate links accorsing to webmaster tools
Is this the right approach as I thought it would affect the link juice? or should I take the no follow out of the restricted by robots.txt file
-
Hello Ocelot,
I am assuming you have a site that has affiliate links and you want to keep Google from crawling those affiliate links. If I am wrong, please let me know. Going forward with that assumption then...
That is one way to do it. So perhaps you first send all of those links through a redirect via a folder called /out/ or /links/ or whatever, and you have blocked that folder in the robots.txt file. Correct? If so, this is how many affiliate sites handle the situation.
I would not rely on rel nofollow alone, though I would use that in addition to the robots.txt block.
There are many other ways to handle this. For instance, you could make all affilaite links javascript links instead of href links. Then you could put the javascript into a folder called /js/ or something like that, and block that in the robots.txt file. This works less and less now that Google Preview Bot seems to be ignoring the disallow statement in those situations.
You could make it all the same URL with a unique identifyer of some sort that tells your database where to redirect the click. For example:
www.yoursite.com/outlink/mylink#123
or
www.yoursite.com/mylink?link-id=123
In which case you could then block /mylink in the robots.txt file and tell Google to ignore the link-ID parameter via Webmaster Tools.
As you can see, there is more than one way to skin this cat. The problem is always going to be doing it without looking like you're trying to "fool" Google - because they WILL catch up with any tactic like that eventually.
Good luck!
Everett
-
From a coding perspective, applying the nofollow to the links is the best way to go.
With the robots.txt file, only the top tier search engines respect the information contained within, so lesser known bots or spammers might check your robots.txt file to see what you don't want listed, and that info will give them a starting point to look deeper into your site.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robot.txt : How to block a specific file type in several subdirectories ?
Hello everyone ! I need help setting up a robot.txt. I'm trying to block all pdf files in particular directories so I'm using this command. In the example below the line is blocking all .gif in the entire site. Block files of a specific file type (for example, .gif) | Disallow: /*.gif$ 2 questions : Can I use this command to specify one particular directory in which I want to block pdf files ? Will this line be recognized by googlebots ? Disallow: /fileadmin/xxxxxxx/xxx/xxxxxxx/*.pdf$ Then I realized that I would have to write as many lines as many directories there are in which I want to block pdf files. Let's say I want to block pdf files in all these 3 directories /fileadmin/directory1 /fileadmin/directory1/sub1 /fileadmin/directory1/sub1/pdf Is there a pattern-matching rule I could use to blocks access to pdf files in all subdirectories instead of writing 3x the above line for each subdirectory ? For exemple : Disallow: /fileadmin/directory1*/ Many thanks in advance for any insight you may have.
Technical SEO | | LabeliumUSA0 -
Problems with canonical urls / redirect (magento webshop)
Hi all, We're running a Magento webshop and we discover some strangs things regarding canonical urls and redirects after using the Amasty improved navigation extension. To clarify, please check these four urls. They contain the same content (the same product page). https://www.afwerkingshop.be/gyproc-gipskartonplaat-ak-2600x1200x9-5mm.html https://www.afwerkingshop.be/wanden/gyproc-gipskartonplaat-ak-2600x1200x9-5mm.html https://www.afwerkingshop.be/wanden/gipsplaten/gyproc-gipskartonplaat-ak-2600x1200x9-5mm.html https://www.afwerkingshop.be/wanden/gipsplaten/standaard/gyproc-gipskartonplaat-ak-2600x1200x9-5mm.html All these four pages have different canoncials (the page url). Obviously, that's not good. However, in Google (site:...) url (1) is the only one that's indexed. Thereby, if I visit the productpage by first going to a category page (fe. www.afwerkingshop.be/wanden.html), I'm redirected to url (1), but the canonical url is www.afwerkingshop.be/last_visited_category_name/product. So, the canonical seems dynamic depending on the last visited category. And still, only url (1) is indexed. Additionally, all aforementioned pages contain . Is anyone familiar with this issue? And more important, will it cause problems in future? Thanks in advance. Kind regards, Chendon
Technical SEO | | RBijsterveld0 -
Blocked jquery in Robots.txt, Any SEO impact?
I've heard that Google is now indexing links and stuff available in javascript and jquery. My webmastertools is showing that some links are blocked in robots.txt of jquery. Sorry I'm not a developer or designer. I want to know is there any impact of this on my SEO? and also how can I unblock it for the robots? Check this screenshot: http://i.imgur.com/3VDWikC.png
Technical SEO | | hammadrafique0 -
Problem www/non-www domain rewrite
Hello, I've made a site for a client about 1 year ago. The rankings are quite okay, but the home page suffers from a penalty I think. I found out via OSE that PageAuthority strangely is higher on the 301-ed page www.myanmar-rundreisen.de - PA 32
Technical SEO | | hgw57
myanmar-rundreisen.de/ - PA 33 I don't understand what is happening here as I am using the usual htaccess 301-redirect: Rewrite domain.com -> www.domain.com RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www.myanmar-rundreisen.de [NC]
RewriteRule (.*) http://www.myanmar-rundreisen.de/$1 [L,R=301] which is working fine with other domains ... I tried also (last line) RewriteRule (.*) http://www.myanmar-rundreisen.de/$1 [L,R=301] So thanks to anyone who can share an idea on that ... Guenter K04xy.jpg0 -
How long to reverse the benefits/problems of a rel=canonical
If this wasn't so serious an issue it would be funny.... Long store cut short, a client had a penalty on their website so they decided to stop using the .com and use the .co.uk instead. They got the .com removed from Google using webmaster tools (it had to be as it was ranking for a trade mark they didn't own and there are legal arguments about it) They launched a brand new website and placed it on both domains with all seo being done on the .co.uk. The web developer was then meant to put the rel=canonical on the .com pointing to the .co.uk (maybe not needed at all thinking about it, if they had deindexed the site anyway). However he managed to rel=canonical from the good .co.,uk to the ,com domain! Maybe I should have noticed it earlier but you shouldn't have to double check others' work! I noticed it today after a good 6 weeks or so. We are having a nightmare to rank the .co.uk for terms which should be pretty easy to rank for given it's a decent domain. Would people say that the rel=canonical back to the .com has harmed the co.uk and is harming with while the tag remains in place? I'm off the opinion that it's basically telling google that the co.uk domain is a copy of the .com so go rank that instead. If so, how quickly after removing this tag would people expect any issues caused by it's placement to vanish? Thanks for any views on this. I've now the fun job of double checking all the coding done by that web developer on other sites!
Technical SEO | | Grumpy_Carl0 -
H1 problem on my site not sure how to solve it
Hi i have just done an on grade report for my site www.in2town.co.uk and i found that i had a number of h1 which was not doing my seo any good. I have sorted most of the h1 problems out but the report is still showing i have two h1 but i cannot find them, i have found one which i have done which is a short description of the site under the main banner page but i cannot find the second h1 can anyone please let me know if their is a simple way of finding the other h1 so i can deal with it many thanks
Technical SEO | | ClaireH-1848860 -
Problem of printer friendly version.
For one of our client's side, most of the backlinks are going to printer friendly version page. I recommeded to him to use the canonical tag on printer friendly version pointing to other page. Luckily, while searching i came across this posts at - http://www.seomoz.org/q/solving-printer-friendly-version The solution recommended was this - <link type="text/css" rel="stylesheet" media="print" href="our-print-version.css"> My questions are - 1. what should i write in place of our-print-version.css Should it be print.css ? 2. Where do i place this code ? in which file ?
Technical SEO | | seoug_20050 -
Using robots.txt to deal with duplicate content
I have 2 sites with duplicate content issues. One is a wordpress blog. The other is a store (Pinnacle Cart). I cannot edit the canonical tag on either site. In this case, should I use robots.txt to eliminate the duplicate content?
Technical SEO | | bhsiao0