Robots.txt & Disallow: /*? Question!

vetofunk

Hi,

I have a site where they have:

Disallow: /*?

Problem is we need the following indexed:

?utm_source=google_shopping

What would the best solution be? I have read:

User-agent: *
Allow: ?utm_source=google_shopping
Disallow: /*?

Any ideas?

BabaBha0173

User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /archives/ Disallow: /? Allow: /comments/feed/ Disallow: /refer/ Disallow: /index.php Disallow: /wp-content/plugins/ Allow: /wp-admin/admin-ajax.php User-agent: Mediapartners-Google* Allow: / User-agent: Googlebot-Image Allow: /wp-content/uploads/ User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Mobile Allow: / Sitemap: https://site.com/sitemap_index.xml

use this it will help you and your problem will solve

Regards

Chotapao

Hoslaa

User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /archives/ Disallow: /? Allow: /comments/feed/ Disallow: /refer/ Disallow: /index.php Disallow: /wp-content/plugins/ Allow: /wp-admin/admin-ajax.php User-agent: Mediapartners-Google* Allow: / User-agent: Googlebot-Image Allow: /wp-content/uploads/ User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Mobile Allow: / Sitemap: https://site.com/sitemap_index.xml

this will work ??
Regards
Sajad

SAjad687

User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /archives/
Disallow: /*?*
Allow: /comments/feed/
Disallow: /refer/
Disallow: /index.php
Disallow: /wp-content/plugins/
Allow: /wp-admin/admin-ajax.php

User-agent: Mediapartners-Google*
Allow: /

User-agent: Googlebot-Image
Allow: /wp-content/uploads/

User-agent: Adsbot-Google
Allow: /

User-agent: Googlebot-Mobile
Allow: /

Sitemap: https://site.com/sitemap_index.xml

use this it will help you

Regards
[Saad](https://clicktestworld.com/)

NickSamuel

Hi Jeff,

Robots.txt tester as per the above link is definitely worth playing with and is the easiest route to achieving what you want.

Another reactive way of managing this is in some cases is to simply see the range of parameters Google has naturally crawled within Search Console.

You can see this in the old search console for now. So login and go to Crawl --> URL Parameters.

If Googlebot has encountered any ?=params it will list them. You'll then have an option how to manage them or exclude them from the index.

It can be a decent way of cleaning up a site with lot's of indexed pages (1,000+), although please be sure to read this documentation before using it: https://support.google.com/webmasters/answer/6080548?hl=en

effectdigital

With this kind of thing, it's really better to pick the specific parameters (or parameter combinations) which you'd like to exclude, e.g:

User-agent: *

Disallow: /shop/product/&size=*

Disallow: */shop/product/*?size=*

Disallow: /stockists?product=*

^ I just took the above from a robots.txt file which I have been working on, as these particular pages don't have 'pretty' URLs with unique content on. Very soon now that will change and the blocks will be lifted

If you are really 100% sure that there's only one param which you want to let through, then you'd go with:

User-agent: *

Disallow: /?

Allow: /?utm_source=google_shopping

Allow: /*&utm_source=google_shopping*

(or something pretty similar to that!)

Before you set anything live, get down a list of URLs which represent the blocks (and allows) which you want to achieve. Test it all with the Robots.txt tester (in Search Console) before you set anything live!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Robots.txt & Disallow: /*? Question!

Browse Questions

Explore more categories

Related Questions

AMP Benefits

Ecommerce, SEO & Pagination

SEO question regarding rails app on www.site.com hosted on Heroku and www.site.com/blog at another host

Effect duration of robots.txt file.

Disallow my store in robots.txt?

Penguin/Panda/Domain Purchase

Block an entire subdomain with robots.txt?

We are changing ?page= dynamic url's to /page/ static urls. Will this hurt the progress we have made with the pages using dynamic addresses?