Standard Syntax in robots.txt doesn't prevent Moz bot from crawling

btreloar

A client is getting many false positive site crawl errors for things like duplicate titles and duplicate content on pages that include /tag/ in the URL. An example is https://needquest.com/place_tag/autism-spectrum-disorder/page/4/

To resolve this we have set up a disallow statement in the robots.txt file that says
Disallow: /page/

For some reason this appears not to work, as the site crawl errors continue to list pages like this. Does anyone understand why that would be and what we need to do to properly disallow crawling these pages?

btreloar

Thanks, Tawny,

If you look at Duplicate titles, check the first one (https://needquest.com/place_tag/autism-spectrum-disorder/). All the URLs with a duplicate title have /page/ in them. I will suggest they move the Allow statement and see if that helps.

tawnycase

I'm not seeing that URL coming up with Duplicate Title or Duplicate Content issues — when I search by that URL I see no Content issues at that URL. I do see that URL in the All Crawled Pages section, but I can't find it bringing up Content issues in the app.

That said, I took a look at your robots.txt file, and I think this could be a result of having an Allow command before the rest of the Disallow commands. I think possibly if you put that Allow command at the end of the block of Disallow commands, rogerbot would see the disallow for /page/ and stop crawling those URLs.

If you're still running into trouble, I would suggest writing in to us at [email protected] so we can take a closer look at the Campaign and what could be going on there.

btreloar

Any reason the Disallow: /page/ isn't preventing URLs like
https://needquest.com/place_tag/autism-spectrum-disorder**/page/**4/
from generating duplicate descriptions and title errors in our site crawl? It was my hope that those pages wouldn't be crawled at all.

btreloar

Sorry, Tawny ... I did go back and correct y question. We did apply Disallow: /page/ to address this issue. The /place_tag/ is found in many pages we DO want to crawl and index ... and we only want here to disallow those page 2, page 3, page 4, etc. pages.

(We also disallowed /tag/, /category/, and a few other common issues that generate false positives in the site crawl.)

tawnycase

Hey there!

Tawny from Moz's Help Team here.

Adding a disallow directive for /tag/ won't help with the example URL you've provided — that URL doesn't have /tag/ in the URL pathway. To block us from seeing content like that URL you listed, you'd need a disallow directive for /place_tag/.

If you include that disallow directive, that should stop us from seeing duplicate content on pages with /place_tag/ in the URL.

Hope that helps! If you've still got questions, feel free to shoot us a note over at [email protected] and we'll do our best to sort things out with you.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Standard Syntax in robots.txt doesn't prevent Moz bot from crawling

Browse Questions

Explore more categories

Related Questions

How to remove my card on moz

Domain Authority hasn't recovered since August

My page can not be crawled

Can I upgrade my moz subscription from large to premium without getting my campaigns being deleted?Do i need to wait till the end of the month to resubscribe?

Creating a Campaign Issue: Not Valid URL (URL Doesn't Resolve?)

Moz Staff Should Consider it - important

How to remove campaign from Moz.com

How to get moz to crawl a staging domain that is blocked by robots.txt