Robots.txt in subfolders and hreflang issues

lauralou82

A client recently rolled out their UK business to the US. They decided to deploy with 2 WordPress installations:

UK site - https://www.clientname.com/uk/ - robots.txt location: UK site - https://www.clientname.com/uk/robots.txt
US site - https://www.clientname.com/us/ - robots.txt location: UK site - https://www.clientname.com/us/robots.txt

We've had various issues with /us/ pages being indexed in Google UK, and /uk/ pages being indexed in Google US.

They have the following hreflang tags across all pages:

We changed the x-default page to .com 2 weeks ago (we've tried both /uk/ and /us/ previously).

Search Console says there are no hreflang tags at all.

Additionally, we have a robots.txt file on each site which has a link to the corresponding sitemap files, but when viewing the robots.txt tester on Search Console, each property shows the robots.txt file for https://www.clientname.com only, even though when you actually navigate to this URL (https://www.clientname.com/robots.txt) you’ll get redirected to either https://www.clientname.com/uk/robots.txt or https://www.clientname.com/us/robots.txt depending on your location.

Any suggestions how we can remove UK listings from Google US and vice versa?

Tom-Anthony

Hi there!

Ok, it is difficult to know all the ins and outs without looking at the site, but the immediate issue is that your robots.txt setup is incorrect. robots.txt files should be one per subdomain, and cannot exist inside sub-folders:

A **robots.txt **file is a file at the root of your site that indicates those parts of your site you don’t want accessed by search engine crawlers

From Google's page here: https://support.google.com/webmasters/answer/6062608?hl=en

You shouldn't be blocking Google from either site, and attempting to do so may be the problem with why your hreflang directives are not being detected. You should move to having a single robots.txt file located at https://www.clientname.com/robots.txt, with a link to a single sitemap index file. That sitemap index file should then link to each of your two UK & US sitemap files.

You should ensure you have hreflang directives for every page. Hopefully after these changes you will see things start to get better. Good luck!

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Robots.txt in subfolders and hreflang issues

Browse Questions

Explore more categories

Related Questions

Issues Indexing Translated Pages

One server, two domains - robots.txt allow for one domain but not other?

Meta Robots Noindex and Robots.txt File

The use of robots.txt

Crawling a subfolder with a dev site

Sitemaps - Format Issue

Trying to reduce pages crawled to within 10K limit via robots.txt

Search Engine blocked by robots.txt