How to allow bots to crawl all but WP-content
-
Hello,
I would like my website to remain crawlable to bots, but to block my wp content and media. Does the following robots.txt work? I worry that the * user agent may conflict with the others.
User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/User-agent: GoogleBot
Allow: /User-agent: GoogleBot-Mobile
Allow: /User-agent: GoogleBot-Image
Allow: /User-agent: Bingbot
Allow: /User-agent: Slurp
Allow: / -
Thank you for the help, Gaston!
-
Yeap, with that you are allowing every file ending with that extension
-
Can I do so with:
Allow: *.jpg
Allow: *.png
-
Thanks, Gaston. I should have been more clear about what I am looking to do. I currently am having an indexation issue. Somehow, pages are being automatically generated by WordPress.
These pages are often .txt files of information or code from plugins, all beginning with /wp-content/uploads/ in their URL. I have been manually removing them from the index and would like to now have them be uncrawlable.
Best
-
Oh god, my mistake!
Im deeply sorry, yes, this configuration will block images! that follow that folder structure!I'll correct myself.
Thanks for pointing it out! -
Gaston,
Thanks for the fast reply! My images folder does follow that format, which is what makes me worrisome as we are blocking the wp-conent folder.
Thanks!
-
Hi Tom,
Yes, this config will allow images to be crawled,
No, this config will block images to be crawled,as long as your wordpress has the defalt folder for images: /wp-content/uploads/year/month/image-name.png
How to know, super easy, where your images are stored? Go to the web where you can find an image... Then right clic and then copy link address. With that link you will find that folder structure.
Hope it helps.
Best luck.
GR -
Hi Gaston,
I just wanted to follow up with you with one last question if possible. Would this allow my images and PDF's to be crawled & indexed still?
Thanks!
-
Awesome. Thanks, Gaston!
-
Yes it does.
As I said earlier. Copy and paste that code into the robot.txt tester in any of your search console and try with some name.css or testing.js just for testing.
Check the image i've attached.Hope it helps.
Best luck
GR -
Thank you for the response. I'm still a little uncertain, does the version you wrote allow the bots to crawl the css and js as well?
Best
-
Hi Tom!
That Robots.txt config is pretty redundant.
To acheive what you what, thy this:User-agent: *
Disallow: /wp-admin/
Disallow: /wp-includes/
Disallow: /wp-content/
Allow: *.js
Allow: *.cssJust 3 things to note here:
1- That User-agent:* and those disallows blocks for every bot to crawl whats in those folders.
2- When blocking /wp-content/ you are also blocking the /themes/ folder and inside are the .js and .css files. Blocking those files cause to googlebot not being able to render correctly that page and see it different from what a normal user would see.
3- Those Allow:/ dont prevent the disallow.To try that configuration, you can use the robots.txt tester in search console, just inder the Crawl menu.
Remember that by default google considers that you are not blocking nothing.
More info here: The web robots.tat pageHope it helps.
Best luck.
GR
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Duplicate Footer Content Issue
Please check given screenshot URL. As per the screenshot we are using highlighted content through out the website in the footer section of our website (https://www.mastersindia.co/) . So, please tell us how Google will treat this content. Will Google count it as duplicate content or not? What is the solution in case if the Google treat it as duplicate content. Screenshot URL: https://prnt.sc/pmvumv
Technical SEO | | AnilTanwarMI0 -
Do mobile and desktop sites that pull content from the same source count as duplicate content?
We are about to launch a mobile site that pulls content from the same CMS, including metadata. They both have different top-level domains, however (www.abcd.com and www.m.abcd.com). How will this affect us in terms of search engine ranking?
Technical SEO | | ovenbird0 -
Wordpress tags and duplicate content?
I've seen a few other Q&A posts on this but I haven't found a complete answer. I read somewhere a while ago that you can use as many tags as you would like. I found that I rank for each tag I used. For example, I could rank for best night clubs in san antonio, good best night clubs in san antonio, great best night clubs in san antonio, top best night clubs in san antonio, etc. However, I now see that I'm creating a ton of duplicate content. Is there any way to set a canonical tag on the tag pages to link back to the original post so that I still keep my rankings? Would future tags be ignored if I did this?
Technical SEO | | howlusa0 -
CDN Being Crawled and Indexed by Google
I'm doing a SEO site audit, and I've discovered that the site uses a Content Delivery Network (CDN) that's being crawled and indexed by Google. There are two sub-domains from the CDN that are being crawled and indexed. A small number of organic search visitors have come through these two sub domains. So the CDN based content is out-ranking the root domain, in a small number of cases. It's a huge duplicate content issue (tens of thousands of URLs being crawled) - what's the best way to prevent the crawling and indexing of a CDN like this? Exclude via robots.txt? Additionally, the use of relative canonical tags (instead of absolute) appear to be contributing to this problem as well. As I understand it, these canonical tags are telling the SEs that each sub domain is the "home" of the content/URL. Thanks! Scott
Technical SEO | | Scott-Thomas0 -
Content on top-level-domain vs. content on subpage
Hello Seomoz community, I just built a new website, mainly for a single affiliate programm and it ranks really well at google. Unfortunately the merchant doesn’t like the name of my domain, that’s why I was thrown out of the affiliate program. So suppose the merchant is a computer monitor manufacturer and his name is “Digit”. The name of my domain is something like monitorsdigital.com at the moment. (It’s just an example, I don’t own this URL). The structure of my website is: 1 homepage with much content on it + a blog. The last 5 blog entries are displayed on the homepage. Because I got kicked out of the affiliate program I want to permanent redirect monitorsdigital.com to another domain. But what should the new website look like? I have two possibilities: Copy the whole monitorsdigital website to a new domain, called something like supermonitors.com. Integrate the monitorsdigital website into my existing website about different monitor manufacturers. E.g.: allmonitors.com/digit-monitors.html (that url is permitted by the merchant) What do you think is the better way? I just got the impression, that it seems to be a little easier to rank high with a top-level-domain (www.supermonitors.com) than with a subpage (www.allmonitors.com/digit-monitors.html). However the subpage can benefit from the domain authority, that was generated by other subpages. Thanks for your help and best regards MGMT
Technical SEO | | MGMT0 -
Duplicate Content
Hello All, my first web crawl has come back with a duplicate content warning for www.simodal.com and www.simodal.com/index.htm slightly mystified! thanks paul
Technical SEO | | simodal0 -
Content Delivery Network
Hi I have a question My site is hosted on a server in the city of my target audience Should i use a Content Delivery Network anyway? Even if the closest edge location of the cdn is in the neighbor country? Thank you Paulo
Technical SEO | | paulogoncalves0 -
Duplicate content
I have to sentences that I want to optimize to different pages for. sentence number one is travel to ibiza by boat sentence number to is travel to ibiza by ferry My question is, can I have the same content on both pages exept for the keywords or will Google treat that as duplicate content and punish me? And If yes, where goes the limit/border for duplicate content?
Technical SEO | | stlastla0