Why Moz OSE, Ahrefs, Majestic and so on, don't change their user agent while crawling?
-
Some blackhat websites, PBNs and other "cheaters" are using various methods to effectively block third party backlink checker bots (OSE, Ahrefs, Majestic...) : robot.txt, IP and such.
A simple solution for those bots would be to mimic Google by using its user agent string for example.
Or if not legally permitted (which I doubt) use some kind of randomness in user agent strings, urls, and IPs in order to prevent blocking.This should not be a big deal IMHO, am I missing something obvious ?
-
The ethics of the Internet dictate that you
- crawl politely,
- obey robots.txt and
- properly identify yourself
This isn't a new issue. Link networks and sites have blocked crawlers and manipulated Google for years. Fortuneatly, it's only a small fraction of the web. Also, it unlikely links from those networks have much value, so crawl priority would be super low anyway.
Actually, it could be viewed as beneficial when blackhat sites block OSE and aHrefs, because those sites often get penalized by Google, but 3rd party crawlers have no way to know this, so blocking effectively keeps them out of the indexes.
-
Well, I think bot blocking is an obvious problem even now, and will be more important tomorrow with all private networks as you can imagine.
MOZ (and others) should find and implement the best possible solution, I see no problem with TAGFEE as soon as you are transparent with regards to the fact that your bots are undetectable.
I understand that what I'm proposing is maybe not best nor wanted solution, but the problem must be addressed or OSE will soon have no value at all
What do you propose ?
-
I agree with George here -- we'd hear a huge outcry if we pretended to be Googlebot or a different bot. We'd also likely get blocked, as sometimes people only let in a certain few known bots/IPs to crawl their site. If we changed user agents and IPs regularly, it would not be cool or TAGFEE.
-
What about using different user agents and IPs regurarly in order to avoid detection ?
Is there any acceptable other solution ?
-
The reputation and integrity of the major players would be at stake here. If they changed their user agent identification (to spoof Googlebot or Bing or whatever) that could be detected, and they would be castigated. The crawler IP address and its user agent ID would be out of sync...
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Whats the best way to get credit links from sites i've built?
Hello! I've build 100's of sites. They mostly have site wide footer links pointing back to me. I know this is now frowned on. But does anyone have a good solution to get maximum value back from these? A few have a footer link to a credit page that then links back. I get quite a lot of work back from them. So I don't really fancy removing them. Many thanks in advance.
Link Building | | SolveWebMedia0 -
Domain Change, loss of inbound links ...
We're strongly considering a domain name change; this is purely for marketing reasons. We think in the long term, this will be a good thing. I believe we can mitigate the page redirection, branding changes, etc. My concern are the inbound links: from 200+ domains, 3,000+ links. So, I guess we can contact each of the top sites linking to us and hope they update our links. I'm not hopeful. I believe we'll loose must of the links. Has anyone been down this road and have experience to share?
Link Building | | jmueller0823
What should I expect, worst case? Is there a way to mitigate? Thanks much.
Jim0 -
Best Backlink checker - OSE or Hrefs?
Hi Guys I am looking for a new back link tool but i'm not sure which one to go for. Should I continue using Open Site Explorer or should I go for something else like Ahrefs.com or something else?
Link Building | | AndersDK0 -
Has anything changed www vs non www
Hi, I started to read different comments on www vs non www and I am a little confused. As far as i know from SEO point, either url is fine but can you please share your latest view ? I was just wondering if anything has changed since penguin on this? Thanks!
Link Building | | Rubix0 -
Whoa 1000's of links from Industrial Interface?
Hello all! I just took over an account, and in webmater tools the site has thousands of links to its homepage from a site named http://www.industrialinterface.com. Not sure if this is a good or bad thing. (thinking bad) Tried to contact the webmaster, and the contact form does not work, so that right there is a bad sign. Does anyone have an opinion on industrial interface? Anyone have luck in reaching them? Appreciate the feedback! Dorian
Link Building | | drufast10 -
Backlink's using duplicate content in article directories
I've had a link build report of all the links an SEO company have got us but a simple search for some of the content in Google with phrase match brings up about 3 pages of the same content posted to different article sites. I think it is doing something but will it just be penalised over time?
Link Building | | acs1110 -
What's your strategy for guest blogging?
How long? a.) 500 words b.) 1000 words c.) As many as I need to get a link How good should it be? a.) Digg/StumbleUpon worthy b.) A decent article that the webmaster wouldn't mind posting. c.) Good enough for a link Who should you go for? a.) Go big or go home! High authority b.) Start small Other thoughts and strategies?
Link Building | | 10JQKAs0 -
How do paid directories like thomasnet.com do so well in the serps? Aren't the Panda updates supposed to be moving us away from this?
With all of the updates/changes to Google's algo, I assumed that paid listings & links like those on thomasnet.com would have less merit. Is this an incorrect assumption?
Link Building | | PropelMike0