Standard Syntax in robots.txt doesn't prevent Moz bot from crawling
-
A client is getting many false positive site crawl errors for things like duplicate titles and duplicate content on pages that include /tag/ in the URL. An example is https://needquest.com/place_tag/autism-spectrum-disorder/page/4/
To resolve this we have set up a disallow statement in the robots.txt file that says
Disallow: /page/For some reason this appears not to work, as the site crawl errors continue to list pages like this. Does anyone understand why that would be and what we need to do to properly disallow crawling these pages?
-
Thanks, Tawny,
If you look at Duplicate titles, check the first one (https://needquest.com/place_tag/autism-spectrum-disorder/). All the URLs with a duplicate title have /page/ in them. I will suggest they move the Allow statement and see if that helps.
-
I'm not seeing that URL coming up with Duplicate Title or Duplicate Content issues — when I search by that URL I see no Content issues at that URL. I do see that URL in the All Crawled Pages section, but I can't find it bringing up Content issues in the app.
That said, I took a look at your robots.txt file, and I think this could be a result of having an Allow command before the rest of the Disallow commands. I think possibly if you put that Allow command at the end of the block of Disallow commands, rogerbot would see the disallow for /page/ and stop crawling those URLs.
If you're still running into trouble, I would suggest writing in to us at [email protected] so we can take a closer look at the Campaign and what could be going on there.
-
Any reason the Disallow: /page/ isn't preventing URLs like
https://needquest.com/place_tag/autism-spectrum-disorder**/page/**4/
from generating duplicate descriptions and title errors in our site crawl? It was my hope that those pages wouldn't be crawled at all. -
Sorry, Tawny ... I did go back and correct y question. We did apply Disallow: /page/ to address this issue. The /place_tag/ is found in many pages we DO want to crawl and index ... and we only want here to disallow those page 2, page 3, page 4, etc. pages.
(We also disallowed /tag/, /category/, and a few other common issues that generate false positives in the site crawl.)
-
Hey there!
Tawny from Moz's Help Team here.
Adding a disallow directive for /tag/ won't help with the example URL you've provided — that URL doesn't have /tag/ in the URL pathway. To block us from seeing content like that URL you listed, you'd need a disallow directive for /place_tag/.
If you include that disallow directive, that should stop us from seeing duplicate content on pages with /place_tag/ in the URL.
Hope that helps! If you've still got questions, feel free to shoot us a note over at [email protected] and we'll do our best to sort things out with you.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Domain Authority hasn't recovered since August
I really need some major advice on this one. Back in September, I asked a question on here as follows: "A client wanted to change their domain name, which we have now done. The site content itself is exactly the same. We put 301 redirect links in so that Google searchers would redirect from the old site to the new one. However Moz then said that it couldn't crawl the old domain because of the redirects and advised creating a brand new campaign for the new domain. We have done this but now Moz says that the domain authority of the new site is 2 (it was 14 on the old domain)." My original question and the answers I got are here: https://mza.seotoolninja.com/community/q/new-domain-wipes-out-domain-authority). Generally the responses I got were that we should give Moz time to crawl the new domain and process all the "new" pages. It is now February, ie 6 months after the domain rename, and on Moz the site still has a DA of 2. It seems like 6 months is enough time to wait. We checked all the recommended guides and believe we have done it all correctly. I really don't know what to do now. Can anyone help or have a quick look and work out why this is so bad? Specifics are:
Getting Started | | mfrgolfgti
old domain: https://ryemeadcleaning.co.uk
new domain: https://ryemeadgroup.co.uk0 -
My page can not be crawled
Hi all,Hope you could you help me here.I just seen this message but I don't know how to fix it?"Your page redirects or links to a page that is outside of the scope of your campaign settings. Your campaign is limited to pages with _____ in the URL path, which prevents us from crawling through the redirect or the links on your page. To enable a full crawl of your site, you may need to create a new campaign with a broader scope, adjust your redirects, or add links to other pages that include ig.com/de. Typically errors like this should be investigated and fixed by the site webmaster."Any ideas about how should I fix it?Thanks
Getting Started | | lauracelada23100 -
Can I upgrade my moz subscription from large to premium without getting my campaigns being deleted?Do i need to wait till the end of the month to resubscribe?
I'm planning to take a large subscription of moz..but i may need a premium account in the future.Can I upgrade it in the middle of my current subscription or will i have to wait till my subscription gets over? Also, I'm planning to take only one month currently but i'll renew it once the month completes so can i continue the same account without my campaigns being deleted?
Getting Started | | kdcdmp0 -
Creating a Campaign Issue: Not Valid URL (URL Doesn't Resolve?)
I'm unable to start a campaign. The MOZ Analytics page is telling me the URL is not Valid. the site is www.veinguys.com I'm not sure what the issue is - the site is live. Please advise.
Getting Started | | ignaciolorenzo0 -
Moz Staff Should Consider it - important
Hello Rand, I was looking for the best content quality checker and I've found many websites saying Free to service. but I got bad experienced there was something poorly coded system on their website so they couldn't check the content quality and duplication. So I suggest you to make a tool that should be helpful for users who are seeking to find out the quality of their content. it should Tell us following factors which are important! Content quality score - English and Grammar Duplication Uniqueness Suggestion to optimize the content
Getting Started | | shubham12340 -
How to remove campaign from Moz.com
I've accidentally created a duplicate campaign, which I am unable to delete, please help
Getting Started | | shopperlocal_DM0 -
How to get moz to crawl a staging domain that is blocked by robots.txt
Is it possible to get Moz to do a crawl report on a domain blocked by robots.txt and actually display all errors instead of only one saying the domain was blocket in robots.txt? Anything i can add to robots.txt to make moz able to do the crawl report but still hinder google from crawling a staging domain?
Getting Started | | classifiedtech0