Why Moz OSE, Ahrefs, Majestic and so on, don't change their user agent while crawling?
-
Some blackhat websites, PBNs and other "cheaters" are using various methods to effectively block third party backlink checker bots (OSE, Ahrefs, Majestic...) : robot.txt, IP and such.
A simple solution for those bots would be to mimic Google by using its user agent string for example.
Or if not legally permitted (which I doubt) use some kind of randomness in user agent strings, urls, and IPs in order to prevent blocking.This should not be a big deal IMHO, am I missing something obvious ?
-
The ethics of the Internet dictate that you
- crawl politely,
- obey robots.txt and
- properly identify yourself
This isn't a new issue. Link networks and sites have blocked crawlers and manipulated Google for years. Fortuneatly, it's only a small fraction of the web. Also, it unlikely links from those networks have much value, so crawl priority would be super low anyway.
Actually, it could be viewed as beneficial when blackhat sites block OSE and aHrefs, because those sites often get penalized by Google, but 3rd party crawlers have no way to know this, so blocking effectively keeps them out of the indexes.
-
Well, I think bot blocking is an obvious problem even now, and will be more important tomorrow with all private networks as you can imagine.
MOZ (and others) should find and implement the best possible solution, I see no problem with TAGFEE as soon as you are transparent with regards to the fact that your bots are undetectable.
I understand that what I'm proposing is maybe not best nor wanted solution, but the problem must be addressed or OSE will soon have no value at all
What do you propose ?
-
I agree with George here -- we'd hear a huge outcry if we pretended to be Googlebot or a different bot. We'd also likely get blocked, as sometimes people only let in a certain few known bots/IPs to crawl their site. If we changed user agents and IPs regularly, it would not be cool or TAGFEE.
-
What about using different user agents and IPs regurarly in order to avoid detection ?
Is there any acceptable other solution ?
-
The reputation and integrity of the major players would be at stake here. If they changed their user agent identification (to spoof Googlebot or Bing or whatever) that could be detected, and they would be castigated. The crawler IP address and its user agent ID would be out of sync...
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Google isn't crawling my website page
I have been trying to get indexed my inner page's title tag but still I cannot see this. It has been almost a week. I have done "Fetch as Google" even though I cannot see this. What can be the reason?
Link Building | | ksmith880 -
Is too high a frequency of 'money' keywords backlinks considered factor for Penguin penalties, even if the money keywords are on reputable pages ?
Is too high a frequency of 'money' keywords backlinks (eg. a money keyword backlink for moz.com would be "Seo tools") a considered factor for Penguin penalties even if the money keywords are only on reputable pages, with decent PA, DA and trust ?
Link Building | | jpeg800 -
I've watched a few backlinks disappear from our inbound link search.
After inquiring to Moz.org about this, they suggested this might be due to a new feature that was rolled out, however, this 2 year old backlink still hasn't shown up and I'm a little concerned. Has anyone else experienced this?
Link Building | | Deacyde0 -
A link with "return false"- OSE sees as a No Followed Link
Hello, I couldn't find a clear answer to the impact on SEO for a link written in this way: [" class="expert_info" onclick="window.open(this.href);return false;">](w</span>ww.yourwebsite.com<span style=) [Does the "return false" act as a "no follow"? I came across this in our link data in Open Site Explorer which lists these links all as "no follows." However, an engineer I spoke to said that it shouldn't impact search engine behavior. Any ideas? Thank you in advance! -Sarah K.](w</span>ww.yourwebsite.com<span style=)
Link Building | | OneMedical0 -
PR4 blog but no data in OSE?
I've recently been asked to guest post on a site that appears to be PR4 but when checking the sites authority on OSE no data shows up. The site is http://www.osozo.com. I'm looking for opinions really, first of all on how it can be PR4 but OSE has no data on it and secondly whether it's a good site for me to post on? Any help much appreciated!
Link Building | | SamCUK0 -
Multiple Links from High Ranking Site Vs. Links from Multiple Domains - What's More Important?
I understand it is important to get links from many quality domains. Currently, I do have links from top domains (PR, Trust) and it I can get more from (high rank) pages on these same domains. Would it be better to focus on expanding my reach (find additional domains to link from) or to continue to build links from the current domains I have a connection with? What is weighted more? I realize doing both is important, but trying to figure out how to best use my time. Thanks! David
Link Building | | DWill0 -
How do sites have so many 'total links'?
I've been analyzing some of our competitors: essayedge.com and papercheck.com Both sites have a large number of 'total links'... about 93,000 each. The former has about 1,200 linking root domains while the latter only has 195. Even for 1,200 linking root domains, 93k total links seems like a ton to me. Our site has 101 linking root domains and only 299 'total links'. I'm quite new to this whole SEO game and admittedly still learning a TON. Am I missing something here? How do sites generate so many links? This seems nuts to me. Thanks for any help!
Link Building | | TBiz0 -
What is the best way to make sure competitors or others aren't buying links on my sites behalf to penalize us?
Is there a good way to do this? Does the Open Site Explorer have an ability to screen by when the link was found, or help by picking up on potentially shady links? Thanks much..
Link Building | | jim_shook0