Robots.txt & Disallow: /*? Question!

vetofunk

Hi,

I have a site where they have:

Disallow: /*?

Problem is we need the following indexed:

?utm_source=google_shopping

What would the best solution be? I have read:

User-agent: *
Allow: ?utm_source=google_shopping
Disallow: /*?

Any ideas?

BabaBha0173

User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /archives/ Disallow: /? Allow: /comments/feed/ Disallow: /refer/ Disallow: /index.php Disallow: /wp-content/plugins/ Allow: /wp-admin/admin-ajax.php User-agent: Mediapartners-Google* Allow: / User-agent: Googlebot-Image Allow: /wp-content/uploads/ User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Mobile Allow: / Sitemap: https://site.com/sitemap_index.xml

use this it will help you and your problem will solve

Regards

Chotapao

Hoslaa

User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /archives/ Disallow: /? Allow: /comments/feed/ Disallow: /refer/ Disallow: /index.php Disallow: /wp-content/plugins/ Allow: /wp-admin/admin-ajax.php User-agent: Mediapartners-Google* Allow: / User-agent: Googlebot-Image Allow: /wp-content/uploads/ User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Mobile Allow: / Sitemap: https://site.com/sitemap_index.xml

this will work ??
Regards
Sajad

SAjad687

User-agent: *
Disallow: /cgi-bin/
Disallow: /wp-admin/
Disallow: /archives/
Disallow: /*?*
Allow: /comments/feed/
Disallow: /refer/
Disallow: /index.php
Disallow: /wp-content/plugins/
Allow: /wp-admin/admin-ajax.php

User-agent: Mediapartners-Google*
Allow: /

User-agent: Googlebot-Image
Allow: /wp-content/uploads/

User-agent: Adsbot-Google
Allow: /

User-agent: Googlebot-Mobile
Allow: /

Sitemap: https://site.com/sitemap_index.xml

use this it will help you

Regards
[Saad](https://clicktestworld.com/)

NickSamuel

Hi Jeff,

Robots.txt tester as per the above link is definitely worth playing with and is the easiest route to achieving what you want.

Another reactive way of managing this is in some cases is to simply see the range of parameters Google has naturally crawled within Search Console.

You can see this in the old search console for now. So login and go to Crawl --> URL Parameters.

If Googlebot has encountered any ?=params it will list them. You'll then have an option how to manage them or exclude them from the index.

It can be a decent way of cleaning up a site with lot's of indexed pages (1,000+), although please be sure to read this documentation before using it: https://support.google.com/webmasters/answer/6080548?hl=en

effectdigital

With this kind of thing, it's really better to pick the specific parameters (or parameter combinations) which you'd like to exclude, e.g:

User-agent: *

Disallow: /shop/product/&size=*

Disallow: */shop/product/*?size=*

Disallow: /stockists?product=*

^ I just took the above from a robots.txt file which I have been working on, as these particular pages don't have 'pretty' URLs with unique content on. Very soon now that will change and the blocks will be lifted

If you are really 100% sure that there's only one param which you want to let through, then you'd go with:

User-agent: *

Disallow: /?

Allow: /?utm_source=google_shopping

Allow: /*&utm_source=google_shopping*

(or something pretty similar to that!)

Before you set anything live, get down a list of URLs which represent the blocks (and allows) which you want to achieve. Test it all with the Robots.txt tester (in Search Console) before you set anything live!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Robots.txt & Disallow: /*? Question!

Browse Questions

Explore more categories

Related Questions

Search Results Pages Blocked in Robots.txt?

How does Googlebot evaluate performance/page speed on Isomorphic/Single Page Applications?

Robots.txt & Duplicate Content

Htaccess 301 regex question

Robot.txt error

If I hired you/your company to do my SEO ...

Keyword Question: How to Target my Niche

Duplicate Content http://www.website.com and http://website.com