Robots.txt & Disallow: /*? Question!
-
Hi,
I have a site where they have:
Disallow: /*?
Problem is we need the following indexed:
?utm_source=google_shopping
What would the best solution be? I have read:
User-agent: *
Allow: ?utm_source=google_shopping
Disallow: /*?Any ideas?
-
User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /archives/ Disallow: /? Allow: /comments/feed/ Disallow: /refer/ Disallow: /index.php Disallow: /wp-content/plugins/ Allow: /wp-admin/admin-ajax.php User-agent: Mediapartners-Google* Allow: / User-agent: Googlebot-Image Allow: /wp-content/uploads/ User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Mobile Allow: / Sitemap: https://site.com/sitemap_index.xml
use this it will help you and your problem will solve
Regards
-
User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /archives/ Disallow: /? Allow: /comments/feed/ Disallow: /refer/ Disallow: /index.php Disallow: /wp-content/plugins/ Allow: /wp-admin/admin-ajax.php User-agent: Mediapartners-Google* Allow: / User-agent: Googlebot-Image Allow: /wp-content/uploads/ User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Mobile Allow: / Sitemap: https://site.com/sitemap_index.xml
this will work ??
Regards
Sajad -
User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /archives/ Disallow: /*?* Allow: /comments/feed/ Disallow: /refer/ Disallow: /index.php Disallow: /wp-content/plugins/ Allow: /wp-admin/admin-ajax.php User-agent: Mediapartners-Google* Allow: / User-agent: Googlebot-Image Allow: /wp-content/uploads/ User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Mobile Allow: / Sitemap: https://site.com/sitemap_index.xml use this it will help you Regards [Saad](https://clicktestworld.com/)
-
Hi Jeff,
Robots.txt tester as per the above link is definitely worth playing with and is the easiest route to achieving what you want.
Another reactive way of managing this is in some cases is to simply see the range of parameters Google has naturally crawled within Search Console.
You can see this in the old search console for now. So login and go to Crawl --> URL Parameters.
If Googlebot has encountered any ?=params it will list them. You'll then have an option how to manage them or exclude them from the index.
It can be a decent way of cleaning up a site with lot's of indexed pages (1,000+), although please be sure to read this documentation before using it: https://support.google.com/webmasters/answer/6080548?hl=en
-
With this kind of thing, it's really better to pick the specific parameters (or parameter combinations) which you'd like to exclude, e.g:
User-agent: *
Disallow: /shop/product/&size=*
Disallow: */shop/product/*?size=*
Disallow: /stockists?product=*
^ I just took the above from a robots.txt file which I have been working on, as these particular pages don't have 'pretty' URLs with unique content on. Very soon now that will change and the blocks will be lifted
If you are really 100% sure that there's only one param which you want to let through, then you'd go with:
User-agent: *
Disallow: /?
Allow: /?utm_source=google_shopping
Allow: /*&utm_source=google_shopping*
(or something pretty similar to that!)
Before you set anything live, get down a list of URLs which represent the blocks (and allows) which you want to achieve. Test it all with the Robots.txt tester (in Search Console) before you set anything live!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Redirect wordpress from /%post_id%/%postname%/ to /blog/%postname%/
Hi what is the code to redirect wordpress blog from site.com/%post_id%/%postname%/ to site.com/blog/%postname%/ We are moving the site to a new server and new url structure. Thanks in advance
Intermediate & Advanced SEO | | Taiger0 -
Robots.txt and redirected backlinks
Hey there, since a client's global website has a very complex structure which lead to big duplicate content problems, we decided to disallow crawler access and instead allow access to only a few relevant subdirectories. While indexing has improved since this I was wondering if we might have cut off link juice. Since several backlinks point to the disallowed root directory and are from there redirected (301) to the allowed directory I was wondering if this could cause any problems? Example: If there is a backlink pointing to example.com (disallowed in robots.txt) and is redirected from there to example.com/uk/en (allowed in robots.txt). Would this cut off the link juice? Thanks a lot for your thoughts on this. Regards, Jochen
Intermediate & Advanced SEO | | Online-Marketing-Guy0 -
To redirect or not to redirect, that is the question
I work for a software company that is redeveloping the website (same domain.) We have tons of content in the form of articles and documents for support, how to use the product better, case studies, and blog posts. I've downloaded a landing page report and many of these have low impressions and little or no clicks (some ranked high other very low.) Should I redirect all this content to the new site where some of it won't exist or forget about it because of the lack of juice? Is there a rule-of-thumb threshold for redirecting for content?
Intermediate & Advanced SEO | | Nobody15969167212220 -
Does anyone know how to appear with snippet that says something like: Jobs 1-10 of 80 in the beginning of the description on Google? e.g. like on: https://www.google.co.za/#q=pickers+and+packers
Does anyone know how to appear with snippet that says something like: Jobs 1-10 of 80 in the beginning of the description on Google? e.g. like on: https://www.google.co.za/#q=pickers+and+packers Any markup that could be used to be listed like this. Why is some sites listed like this and some not. Why is the adzuna.co.za page listed with Results 1-10 while some other with Jobs 1-10 ?
Intermediate & Advanced SEO | | classifiedtech0 -
Robot.txt File Not Appearing, but seems to be working?
Hi Mozzers, I am conducting a site audit for a client, and I am confused with what they are doing with their robot.txt file. It shows in GWT that there is a file and it is blocking about 12K URLs (image attached). It also shows in GWT that the file was downloaded 10 hours ago successfully. However, when I go to the robot.txt file link, the page is blank. Would they be doing something advanced to be blocking URLs to hide it it from users? It appears to correctly be blocking log-ins, but I would like to know for sure that it is working correctly. Any advice on this would be most appreciated. Thanks! Jared ihgNxN7
Intermediate & Advanced SEO | | J-Banz0 -
Robots.txt: Can you put a /* wildcard in the middle of a URL?
We have noticed that Google is indexing the language/country directory versions of directories we have disallowed in our robots.txt. For example: Disallow: /images/ is blocked just fine However, once you add our /en/uk/ directory in front of it, there are dozens of pages indexed. The question is: Can I put a wildcard in the middle of the string, ex. /en/*/images/, or do I need to list out every single country for every language in the robots file. Anyone know of any workarounds?
Intermediate & Advanced SEO | | IHSwebsite0 -
Google Listing & Description Categories Question
How do you set up a website or home page to get the individual listing below the description text. For example with the SEO moz listing there are many of the categories listed below the description. Such as SEO Blog, SEO tools ... Is there a way to add this to a home page so good will pick it up this way? I attached a screen shot of what I am talking about because I think I am likely using the wrong terminology. Npcpu.png
Intermediate & Advanced SEO | | fertilityhealth0 -
First Link Priority question - image/logo in header links to homepage
I have not found a clear answer to this particular aspect of the "first link priority" discussion, so wanted to ask here. Noble Samurai (makers of Market Samurai seo software) just posted a video discussing this topic and referencing specifically a use case example where when you disable all the css and view the page the way google sees it, many times companies use an image/logo in their header which links to their homepage. In my case, if you visit our site you can see the logo linking back to the homepage, which is present on every page within the site. When you disable the styling and view the site in a linear path, the logo is the first link. I'd love for our first link to our homepage include a primary keyword phrase anchor text. Noble Samurai (presumably seo experts) posted a video explaining this specifically http://www.noblesamurai.com/blog/market-samurai/website-optimization-first-link-priority-2306 and their suggested code implementations to "fix" it http://www.noblesamurai.com/first-link-priority-templates which use CSS and/or javascript to alter the way it is presented to the spiders. My web developer referred me to google's webmaster central: http://www.google.com/support/webmasters/bin/answer.py?answer=66353 where they seem to indicate that this would be attempting to hide text / links. Is this a good or bad thing to do?
Intermediate & Advanced SEO | | dcutt0