Robots.txt & Disallow: /*? Question!
-
Hi,
I have a site where they have:
Disallow: /*?
Problem is we need the following indexed:
?utm_source=google_shopping
What would the best solution be? I have read:
User-agent: *
Allow: ?utm_source=google_shopping
Disallow: /*?Any ideas?
-
User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /archives/ Disallow: /? Allow: /comments/feed/ Disallow: /refer/ Disallow: /index.php Disallow: /wp-content/plugins/ Allow: /wp-admin/admin-ajax.php User-agent: Mediapartners-Google* Allow: / User-agent: Googlebot-Image Allow: /wp-content/uploads/ User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Mobile Allow: / Sitemap: https://site.com/sitemap_index.xml
use this it will help you and your problem will solve
Regards
-
User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /archives/ Disallow: /? Allow: /comments/feed/ Disallow: /refer/ Disallow: /index.php Disallow: /wp-content/plugins/ Allow: /wp-admin/admin-ajax.php User-agent: Mediapartners-Google* Allow: / User-agent: Googlebot-Image Allow: /wp-content/uploads/ User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Mobile Allow: / Sitemap: https://site.com/sitemap_index.xml
this will work ??
Regards
Sajad -
User-agent: * Disallow: /cgi-bin/ Disallow: /wp-admin/ Disallow: /archives/ Disallow: /*?* Allow: /comments/feed/ Disallow: /refer/ Disallow: /index.php Disallow: /wp-content/plugins/ Allow: /wp-admin/admin-ajax.php User-agent: Mediapartners-Google* Allow: / User-agent: Googlebot-Image Allow: /wp-content/uploads/ User-agent: Adsbot-Google Allow: / User-agent: Googlebot-Mobile Allow: / Sitemap: https://site.com/sitemap_index.xml use this it will help you Regards [Saad](https://clicktestworld.com/)
-
Hi Jeff,
Robots.txt tester as per the above link is definitely worth playing with and is the easiest route to achieving what you want.
Another reactive way of managing this is in some cases is to simply see the range of parameters Google has naturally crawled within Search Console.
You can see this in the old search console for now. So login and go to Crawl --> URL Parameters.
If Googlebot has encountered any ?=params it will list them. You'll then have an option how to manage them or exclude them from the index.
It can be a decent way of cleaning up a site with lot's of indexed pages (1,000+), although please be sure to read this documentation before using it: https://support.google.com/webmasters/answer/6080548?hl=en
-
With this kind of thing, it's really better to pick the specific parameters (or parameter combinations) which you'd like to exclude, e.g:
User-agent: *
Disallow: /shop/product/&size=*
Disallow: */shop/product/*?size=*
Disallow: /stockists?product=*
^ I just took the above from a robots.txt file which I have been working on, as these particular pages don't have 'pretty' URLs with unique content on. Very soon now that will change and the blocks will be lifted
If you are really 100% sure that there's only one param which you want to let through, then you'd go with:
User-agent: *
Disallow: /?
Allow: /?utm_source=google_shopping
Allow: /*&utm_source=google_shopping*
(or something pretty similar to that!)
Before you set anything live, get down a list of URLs which represent the blocks (and allows) which you want to achieve. Test it all with the Robots.txt tester (in Search Console) before you set anything live!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt was set to disallow for 14 days
We updated our website and accidentally overwrote our robots file with a version that prevented crawling ( "Disallow: /") We realized the issue 14 days later and replaced after our organic visits began to drop significantly and we quickly replace the robots file with the correct version to begin crawling again. With the impact to our organic visits, we have a few and any help would be greatly appreciated - Will the site get back to its original status/ranking ? If so .. how long would that take? Is there anything we can do to speed up the process ? Thanks
Intermediate & Advanced SEO | | jc42540 -
Not sure how we're blocking homepage in robots.txt; meta description not shown
Hi folks! We had a question come in from a client who needs assistance with their robots.txt file. Metadata for their homepage and select other pages isn't appearing in SERPs. Instead they get the usual message "A description for this result is not available because of this site's robots.txt – learn more". At first glance, we're not seeing the homepage or these other pages as being blocked by their robots.txt file: http://www.t2tea.com/robots.txt. Does anyone see what we can't? Any thoughts are massively appreciated! P.S. They used wildcards to ensure the rules were applied for all locale subdirectories, e.g. /en/au/, /en/us/, etc.
Intermediate & Advanced SEO | | SearchDeploy0 -
Schema question
Hi all, We have two Trustpilot schemas (Local Business) on our web pages ( One on desktop / one on mobile) but we are finding that it is not updating the number of reviews in the search results. When using the tool : https://developers.google.com/structured-data/testing-tool/ , the test results are coming back ok. I have two ideas as to why it may not be working; 1) The duplication of the schema code is causing issues 2) We had to change the html code for all of our 50+ backend pages using a search&replace WordPress plugin to save a vast amount of time. Maybe this is plugin related? The fact that the google testing tool gives back positive results adds to the confusion. I test both of the theorised issues to see if it provides a fixes. Can anyone shed some further light on this issue? Is there something obvious I am missing? All responses are greatly appreciated! Thanks, Tom p.s. Example Page: https://www.allcleartravel.co.uk/asthma-travel-insurance/
Intermediate & Advanced SEO | | AllClearMarketing0 -
Scraping / Duplicate Content Question
Hi All, I understanding the way to protect content such as a feature rich article is to create authorship by linking to your Google+ account. My Question
Intermediate & Advanced SEO | | Mark_Ch
You have created a webpage that is informative but not worthy to be an article, hence no need create authorship in Google+
If a competitor comes along and steals this content word for word, something similar, creates their own Google+ page, can you be penalised? Is there any way to protect yourself without authorship and Google+? Regards Mark0 -
Advanced SEO question.
Hi, I manage and do the SEO for this site: www.aerlawgroup.com. If you Google "Los Angeles Criminal Defense Attorney", you can see I rank well (1st page). I have managed to achieve similar rankings for interior pages within the site: www.aerlawgroup.com/domestic-violence.html (Google: "Los Angeles Domestic Violence Attorney".) Here is my question. No matter how hard I try, I cannot get to the first page on Google for the search term: "Los Angeles DUI Lawyer", for the following interior page: www.aerlawgroup.com/dui.html. Is there anyway that I can pass the authority/ranking (not sure what to call it) that I have for www.aerlawgroup.com to www.aerlawgroup.com/dui.html so that internal page ranks higher for "Los Angeles DUI Lawyer"? I apologize if my question doesn't make sense. In a nutshell, I'm trying to understand if there is anyway to use the ranking I have for www.aerlawgroup.com to help me rank higher for Los Angeles DUI lawyer for the dui interior page. If not, are there any other suggestions anyone has to achieve a higher ranking? Thanks!
Intermediate & Advanced SEO | | mrodriguez14400 -
Domain Name Redirect Question
My agency just built a new website for a client who is a franchisee. It's not launched yet - it's currently under an IP address. I suggested to client that he buy a keyword-rich domain name for it, which he did. Then he found out that the franchisor will not allow it to be his main domain name. They want him to use a domain name with the franchisor name in it. But they WILL allow him to put a 301 redirect on that franchisor-approved domain name, and redirect it to his keyword-rich domain name. He is interested in having my agency perform an SEO Campaign for this new website. But would SEO and link marketing work for a website that has a new non-keyword domain name that 301 redirects to a new keyword-rich domain name?
Intermediate & Advanced SEO | | netsites0 -
Duplicate Content/ Indexing Question
I have a real estate Wordpress site that uses an IDX provider to add real estate listings to my site. A new page is created as a new property comes to market and then the page is deleted when the property is sold. I like the functionality of the service but it creates a significant amount of 404's and I'm also concerned about duplicate content because anyone else using the same service here in Las Vegas will have 1000's of the exact same property pages that I do. Any thoughts on this and is there a way that I can have the search engines only index the core 20 pages of my site and ignore future property pages? Your advice is greatly appreciated. See link for example http://www.mylvcondosales.com/mandarin-las-vegas/
Intermediate & Advanced SEO | | AnthonyLasVegas0 -
Analytics Question?
Is there a way to see in GA traffic from other IP address's. I want to subtract all the times I visit the site from my IP and get a real traffic %.
Intermediate & Advanced SEO | | SEObleu.com0