What does Disallow: /french-wines/?* actually do - robots.txt

McTaggart

Hello Mozzers - Just wondering what this robots.txt instruction means: Disallow: /french-wines/?*

Does it stop Googlebot crawling and indexing URLs in that "French Wines" folder - specifically the URLs that include a question mark?

Would it stop the crawling of deeper folders - e.g. /french-wines/rhone-region/ that include a question mark in their URL?

I think this has been done to block URLs containing query strings.

Thanks, Luke

LoganRay

Glad to help, Luke!

McTaggart

Thanks Logan for your help with this - much appreciated. Really helpful!

LoganRay

Disallow: /?* is the same thing as Disallow:/?, since the asterisk is a wildcard, both of those disallows prevent any URL that begins with /? from being crawled.

And yes, it is incredibly easy to disallow the wrong thing! The robots.txt tester in Search Console (under the Crawl menu) is very helpful for figuring out what a disallow will catch and what it will let by. I highly recommend testing any new disallows there before releasing them into the wild.

McTaggart

Thanks again Logan.

What would Disallow: /?* do because that is what the site I am looking at has implemented. Perhaps it works both ways around?

I imagine it's easy to disallow the wrong thing or possibly not disallow the right thing. Ugh.

LoganRay

Disallow: /*?

This disallow literally says to crawlers 'if a URL starts with a slash (all URLs) and has a parameter, don't crawl it'. The * is a wildcard that says anything between / and ? is applicable to the disallow.

It's very easy to disallow the wrong this especially in regards to parameters, for this reason I always do these 2 things rather than using robots.txt:

Set the purpose of each parameter in Search Console - Go to Crawl > URL Parameters to configure for your site
Self-referring canonicals - most people disallow URLs with parameters in robots.txt to prevent indexing, but this only prevents crawling. A self-referring canonical pointing to the root level of that URL will prevent indexing or URLs with parameters.

Hope that's helpful!

McTaggart

Thanks Logan - I was just reading: Disallow: /*? # block any URL that includes a ? (and thus a query string) - do you know why the ? comes before the * in this case?

LoganRay

Hi Luke,

You are correct that this was done to block URLs with parameters. However, since there's no wildcard (the asterisk) before the folder name, the URL would have to start with /french-wines/. This disallow is really only preventing crawling on the single URL www.yoursite.com/french-wines/ with any parameters appended.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

What does Disallow: /french-wines/?* actually do - robots.txt

Browse Questions

Explore more categories

Related Questions

Not sure how we're blocking homepage in robots.txt; meta description not shown

Using folder blocked by robots.txt before uploaded to indexed folder - is that OK?

Scraping / Duplicate Content Question

Benefit of Targeting Low/No Volume Keyword Phrases

Indexing falling/search queries the same - concerned

Keyphrase / Keyword arrangement

Microdata / Schema.org and HTTPS

Robots.txt 404 problem