Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Your browser does not seem to support JavaScript. As a result, your viewing experience will be diminished, and you have been placed in read-only mode.

Please download a browser that supports JavaScript, or enable it if it's disabled (i.e. NoScript).

Robots User-agent Query

Technical SEO

1392

Locked

ThomasHarvey last edited by

Am I correct in saying that the allow/disallow is only applied to msnbot_mobile?

mobile robots file

User-agent: Googlebot-Mobile

User-agent: YahooSeeker/M1A1-R2D2

User-agent: MSNBOT_Mobile

Allow: /

Disallow: /1

Disallow: /2/

Disallow: /3

Disallow: /4/
1 Reply Last reply
Reply Quote 1
donford last edited by

Hi Thomas

Unless I'm mistaken. If you list multiple user agents before a rule all the users agents are subjected to the rule.

So what you have is a list of 3 user agents allowed "anything" disallowed 4 specific things.

In the end the rules apply to all.

Don
1 Reply Last reply
Reply Quote 3

Browse Questions

View

From

Sorted by

With category

Explore more categories

Related Questions

Robots.txt in subfolders and hreflang issues

A client recently rolled out their UK business to the US. They decided to deploy with 2 WordPress installations: UK site - https://www.clientname.com/uk/ - robots.txt location: UK site - https://www.clientname.com/uk/robots.txt
US site - https://www.clientname.com/us/ - robots.txt location: UK site - https://www.clientname.com/us/robots.txt We've had various issues with /us/ pages being indexed in Google UK, and /uk/ pages being indexed in Google US. They have the following hreflang tags across all pages: We changed the x-default page to .com 2 weeks ago (we've tried both /uk/ and /us/ previously). Search Console says there are no hreflang tags at all. Additionally, we have a robots.txt file on each site which has a link to the corresponding sitemap files, but when viewing the robots.txt tester on Search Console, each property shows the robots.txt file for https://www.clientname.com only, even though when you actually navigate to this URL (https://www.clientname.com/robots.txt) you’ll get redirected to either https://www.clientname.com/uk/robots.txt or https://www.clientname.com/us/robots.txt depending on your location. Any suggestions how we can remove UK listings from Google US and vice versa?
Technical SEO | | lauralou82

0
Robot.txt : How to block a specific file type in several subdirectories ?

Hello everyone ! I need help setting up a robot.txt. I'm trying to block all pdf files in particular directories so I'm using this command. In the example below the line is blocking all .gif in the entire site. Block files of a specific file type (for example, .gif) | Disallow: /*.gif$ 2 questions : Can I use this command to specify one particular directory in which I want to block pdf files ? Will this line be recognized by googlebots ? Disallow: /fileadmin/xxxxxxx/xxx/xxxxxxx/*.pdf$ Then I realized that I would have to write as many lines as many directories there are in which I want to block pdf files. Let's say I want to block pdf files in all these 3 directories /fileadmin/directory1 /fileadmin/directory1/sub1 /fileadmin/directory1/sub1/pdf Is there a pattern-matching rule I could use to blocks access to pdf files in all subdirectories instead of writing 3x the above line for each subdirectory ? For exemple : Disallow: /fileadmin/directory1*/ Many thanks in advance for any insight you may have.
Technical SEO | | LabeliumUSA

0
How to stop robots.txt restricting access to sitemap?

I'm working on a site right now and having an issue with the robots.txt file restricting access to the sitemap - with no web dev to help, I'm wondering how I can fix the issue myself? The robots.txt page shows User-agent: * Disallow: / And then sitemap: with the correct sitemap link
Technical SEO | | Ad-Rank

0
Will an XML sitemap override a robots.txt

I have a client that has a robots.txt file that is blocking an entire subdomain, entirely by accident. Their original solution, not realizing the robots.txt error, was to submit an xml sitemap to get their pages indexed. I did not think this tactic would work, as the robots.txt would take precedent over the xmls sitemap. But it worked... I have no explanation as to how or why. Does anyone have an answer to this? or any experience with a website that has had a clear Disallow: / for months , that somehow has pages in the index?
Technical SEO | | KCBackofen

0
How do SE's see abbreviated queries.

Do search engines pay attention to periods in abbreviated queries? If I use Mt. Bachelor all over my site, would SE's not rank my site well for queries that use Mt Bachelor?
Technical SEO | | Shawn_Huber

0
Oh no googlebot can not access my robots.txt file

I just receive a n error message from google webmaster Wonder it was something to do with Yoast plugin. Could somebody help me with troubleshooting this? Here's original message Over the last 24 hours, Googlebot encountered 189 errors while attempting to access your robots.txt. To ensure that we didn't crawl any pages listed in that file, we postponed our crawl. Your site's overall robots.txt error rate is 100.0%. Recommended action If the site error rate is 100%: Using a web browser, attempt to access http://www.soobumimphotography.com//robots.txt. If you are able to access it from your browser, then your site may be configured to deny access to googlebot. Check the configuration of your firewall and site to ensure that you are not denying access to googlebot. If your robots.txt is a static page, verify that your web service has proper permissions to access the file. If your robots.txt is dynamically generated, verify that the scripts that generate the robots.txt are properly configured and have permission to run. Check the logs for your website to see if your scripts are failing, and if so attempt to diagnose the cause of the failure. If the site error rate is less than 100%: Using Webmaster Tools, find a day with a high error rate and examine the logs for your web server for that day. Look for errors accessing robots.txt in the logs for that day and fix the causes of those errors. The most likely explanation is that your site is overloaded. Contact your hosting provider and discuss reconfiguring your web server or adding more resources to your website. After you think you've fixed the problem, use Fetch as Google to fetch http://www.soobumimphotography.com//robots.txt to verify that Googlebot can properly access your site.
Technical SEO | | BistosAmerica

0
Un-Indexing a Page without robots.txt or access to HEAD

I am in a situation where a page was pushed live (Went live for an hour and then taken down) before it was supposed to go live. Now normally I would utilize the robots.txt or but I do not have access to either and putting a request in will not suffice as it is against protocol with the CMS. So basically I am left to just utilizing the and I cannot seem to find a nice way to play with the SE to get this un-indexed. I know for this instance I could go to GWT and do it but for clients that do not have GWT and for all the other SE's how could I do this? Here is the big question here: What if I have a promotional page that I don't want indexed and am met with these same limitations? Is there anything to do here?
Technical SEO | | DRSearchEngOpt

0
SERP Meta Dependant Upon Search Query (strange Google bug?)

Hi, I have on-page optimised a client's website Now take a look at the Title Tag & Meta description of the front page. This is the correct updates I have made - Title: Practice Management and Financial Consultants to the Health Industry
Description: Award winning Health and Life have been providing accounting, tax and practice management services for Medical, Dental, Allied Health businesses. Now, take a look when the business name is Googled. Notice how the Title Tag switches back to the original, yet the Description Tag is Correct. Now, take a look when the owner's name is Googled. The Title Tag is now correct, but the description is incorrect. Ive set the preferred URL to be the www version Ive spent ages in the custom CMS trying to find what could be causing this The developer says it's a "Google Thing" Anyone have any ideas?
Technical SEO | | LukeyJamo

0