How to allow bots to crawl all but WP-content

Tom3_15

Hello,

I would like my website to remain crawlable to bots, but to block my wp content and media. Does the following robots.txt work? I worry that the * user agent may conflict with the others.

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/

User-agent: GoogleBot
Allow: /

User-agent: GoogleBot-Mobile
Allow: /

User-agent: GoogleBot-Image
Allow: /

User-agent: Bingbot
Allow: /

User-agent: Slurp
Allow: /

Tom3_15

Thank you for the help, Gaston!

Gaston Riera

Yeap, with that you are allowing every file ending with that extension

Tom3_15

Can I do so with:

Allow: *.jpg

Allow: *.png

Tom3_15

Thanks, Gaston. I should have been more clear about what I am looking to do. I currently am having an indexation issue. Somehow, pages are being automatically generated by WordPress.

These pages are often .txt files of information or code from plugins, all beginning with /wp-content/uploads/ in their URL. I have been manually removing them from the index and would like to now have them be uncrawlable.

Best

Gaston Riera

Oh god, my mistake!
Im deeply sorry, yes, this configuration will block images! that follow that folder structure!

I'll correct myself.
Thanks for pointing it out!

Tom3_15

Gaston,

Thanks for the fast reply! My images folder does follow that format, which is what makes me worrisome as we are blocking the wp-conent folder.

Thanks!

Gaston Riera

Hi Tom,

Yes, this config will allow images to be crawled,

No, this config will block images to be crawled,as long as your wordpress has the defalt folder for images: /wp-content/uploads/year/month/image-name.png

How to know, super easy, where your images are stored? Go to the web where you can find an image... Then right clic and then copy link address. With that link you will find that folder structure.

Hope it helps.
Best luck.
GR

Tom3_15

Hi Gaston,

I just wanted to follow up with you with one last question if possible. Would this allow my images and PDF's to be crawled & indexed still?

Thanks!

Tom3_15

Awesome. Thanks, Gaston!

Gaston Riera

Yes it does.

As I said earlier. Copy and paste that code into the robot.txt tester in any of your search console and try with some name.css or testing.js just for testing.
Check the image i've attached.

Hope it helps.
Best luck
GR

btsycPz

Tom3_15

Thank you for the response. I'm still a little uncertain, does the version you wrote allow the bots to crawl the css and js as well?

Best

Gaston Riera

Hi Tom!

That Robots.txt config is pretty redundant.
To acheive what you what, thy this:

User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Allow: *.js
Allow: *.css

Just 3 things to note here:
1- That User-agent:* and those disallows blocks for every bot to crawl whats in those folders.
2- When blocking /wp-content/ you are also blocking the /themes/ folder and inside are the .js and .css files. Blocking those files cause to googlebot not being able to render correctly that page and see it different from what a normal user would see.
3- Those Allow:/ dont prevent the disallow.

To try that configuration, you can use the robots.txt tester in search console, just inder the Crawl menu.

Remember that by default google considers that you are not blocking nothing.
More info here: The web robots.tat page

Hope it helps.
Best luck.
GR

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

How to allow bots to crawl all but WP-content

Browse Questions

Explore more categories

Related Questions

Duplicate Footer Content Issue

Do mobile and desktop sites that pull content from the same source count as duplicate content?

Wordpress tags and duplicate content?

CDN Being Crawled and Indexed by Google

Content on top-level-domain vs. content on subpage

Duplicate Content

Content Delivery Network

Duplicate content