Block Baidu crawler?

AJPro

Hello!

One of our websites receives a large amount of traffic from the Baidu crawler. We do not have any Chinese content or do any business with China since our market is Uk.

Is it a good idea to block the Baidu crawler in the robots.txt or could it have any adverse effects on SEO of our site?

What do you suggest?

William.Lau

I'm also trying to get this done as well, not sure if its doable on Volusion(don't use them).

Yandex actually crawls more than Baidu for me, and both don't benefit me at all(sucks when you pay for the bandwidth)

LoveFitness

Thanks for that I have just looked that up-I didn't realise that this was such a common problem.

Metropolis

Hi

Further to Ally's answer, in my experiance Baidu tends to ignor the robot.txt, so just do it on the server side.

S

AJPro

Thanks Ally for your answer, will now block Baidu

LoveFitness

Hi Stefan,

You can block the Baidu crawler in in the robots.txt.

There should be no adverse affect to your site. As this is not an area you are targeting and has no future long term benerfit to your business. Blocking the crawler will mean that your server has less load to deal with from the unnecessary traffic you have been receiving.

You can block the spiders in the following ways:

Robots.txt (below is code for Baidu)

User-agent: Baiduspider
User-agent: Baiduspider-video
User-agent: Baiduspider-image
Disallow: /

Blocking Spiders via the Apache Configuration File httpd.conf

See the below article for more details on this method

http://searchenginewatch.com/article/2067357/Bye-bye-Crawler-Blocking-the-Parasites

You may also want to check out:

http://www.robotstxt.org/

I hope this helps,

Ally

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Block Baidu crawler?

Browse Questions

Explore more categories

Related Questions

Blocking subdomains with Robots.txt file

Sitemap Contains Blocked Resources

Blocking Affiliate Links via robots.txt

Blocking https from being crawled

Robots.txt blocking site or not?

Client accidently blocked entire site with robots.txt for a week

Does using parentheses affect the crawlers?

Best blocking solution for Google