What to do about "blocked by meta-robots"?
-
The crawl report tells me "Notices are interesting facts about your pages we found while crawling". One of these interesting facts is that my blog archives are "blocked by meta robots".
Articles are not blocked, just the archives.
What is a "meta" robot?
I think its just normal (since the article need only be crawled once) but want a second opinion. Should I care about this?
-
Meta robots refers to the < meta name="robots" > tag at the page header level. This is usually the case when a blog is set up with an SEO program like All In One SEO for example, where you can manually set which content is blocked. It's common to block archives, tags, and other sections, in the theory that allowing these to be crawled could either cause duplicate content issues, or drain link value from the primary category navigation.
-
In general, there are two ways you can block crawlers from indexing your content.
-
You can add a Disallow entry to your robots.txt file
-
You can add a meta tag to your pages:
What you are saying in either case is "please do not list this content in your search engine".
In general, you would not want to block your archives. There certainly can be specific cases where you only want the public to see your most current content, in which case you can block it.
-
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Robots.txt - "File does not appear to be valid"
Good afternoon Mozzers! I've got a weird problem with one of the sites I'm dealing with. For some reason, one of the developers changed the robots.txt file to disavow every site on the page - not a wise move! To rectify this, we uploaded the new robots.txt file to the domain's root as per Webmaster Tool's instructions. The live file is: User-agent: * (http://www.savistobathrooms.co.uk/robots.txt) I've submitted the new file in Webmaster Tools and it's pulling it through correctly in the editor. However, Webmaster Tools is not happy with it, for some reason. I've attached an image of the error. Does anyone have any ideas? I'm managing another site with the exact same robots.txt file and there are no issues. Cheers, Lewis FNcK2YQ
Technical SEO | | PeaSoupDigital0 -
"non-WWW" vs "WWW" in Google SERPS and Lost Back Link Connection
A Screaming Frog report indicates that Google is indexing a client's site for both: www and non-www URLs. To me this means that Google is seeing both URLs as different even though the page content is identical. The client has not set up a preferred URL in GWMTs. Google says to do a 301 redirect from the non-preferred domain to the preferred version but I believe there is a way to do this in HTTP Access and an easier solution than canonical.
Technical SEO | | RosemaryB
https://support.google.com/webmasters/answer/44231?hl=en GWMTs also shows that over the past few months this client has lost more than half of their backlinks. (But there are no penalties and the client swears they haven't done anything to be blacklisted in this regard. I'm curious as to whether Google figured out that the entire site was in their index under both "www" and "non-www" and therefore discounted half of the links. Has anyone seen evidence of Google discounting links (both external and internal) due to duplicate content? Thanks for your feedback. Rosemary0 -
Blocking Affiliate Links via robots.txt
Hi, I work with a client who has a large affiliate network pointing to their domain which is a large part of their inbound marketing strategy. All of these links point to a subdomain of affiliates.example.com, which then redirects the links through a 301 redirect to the relevant target page for the link. These links have been showing up in Webmaster Tools as top linking domains and also in the latest downloaded links reports. To follow guidelines and ensure that these links aren't counted by Google for either positive or negative impact on the site, we have added a block on the robots.txt of the affiliates.example.com subdomain, blocking search engines from crawling the full subddomain. The robots.txt file is the following code: User-agent: * Disallow: / We have authenticated the subdomain with Google Webmaster Tools and made certain that Google can reach and read the robots.txt file. We know they are being blocked from reading the affiliates subdomain. However, we added this affiliates subdomain block a few weeks ago to the robots.txt, but links are still showing up in the latest downloads report as first being discovered after we added the block. It's been a few weeks already, and we want to make sure that the block was implemented properly and that these links aren't being used to negatively impact the site. Any suggestions or clarification would be helpful - if the subdomain is being blocked for the search engines, why are the search engines following the links and reporting them in the www.example.com subdomain GWMT account as latest links. And if the block is implemented properly, will the total number of links pointing to our site as reported in the links to your site section be reduced, or does this not have an impact on that figure?From a development standpoint, it's a much easier fix for us to adjust the robots.txt file than to change the affiliate linking connection from a 301 to a 302, which is why we decided to go with this option.Any help you can offer will be greatly appreciated.Thanks,Mark
Technical SEO | | Mark_Ginsberg0 -
Google is indexing blocked content in robots.txt
Hi,Google is indexing some URLs that i don't want to be indexed and also is indexing the same URLs with https. This URLs are blocked in the file robots.txt.I've tried to block this URLs through Google WebmasterTools but Google doesn't let me do it because this URL are httpsThe file robots.txt is correct so, what can i do to avoid this content to be indexed?
Technical SEO | | elisainteractive0 -
Rel="next"
Hi I was just wondering if there is any difference in using rel='next' rather than rel="next". Would it still work the same way? I mean using the apostrophes differently, would it matter? Thanks!
Technical SEO | | pikka0 -
Video thumbnail pages with "sort" feature -- tons of duplicate content?
A client has 2 separate pages for video thumbnails. One page is "popular videos" with a sort function for over 700 pages of video thumbnails with 10 thumbnails and short desriptions per page. (/videos?sort_by=popularity). The second page is "latest videos" (/videos?sort_by=latest) with over 7,000 pages. Both pages have a sort function -- including latest, relevance, popularity, time uploaded, etc. Many of the same video thumbnails appear on both pages. Also, when you click a thumbnail you get a full video page and these pages appear to get indexed well. There seem to be duplicate content issues between the "popular" and "latest" pages, as well as within the sort results on each of those pages. (A unique URL is generated everytime you use the sort function i.e. /videos?sort_by=latest&uploaded=this_week). Before my head explodes, what is the best way to treat this? I was thinking a noindex,follow meta robot on every page of thumbnails since the individual video pages are well indexed, but that seems extreme. Thoughts?
Technical SEO | | 540SEO0 -
Why crawl error "title missing or empty" when there is already "title and meta desciption" in place?
I've been getting 73 "title missing or empty" warnings from SEOMOZ crawl diagnostic. This is weird as I've installed yoast wordpress seo plugin and all posts do have title and meta description. But why the results here.. can anyone explain what's happening? Thanks!! Here are some of the links that are listed with "title missing, empty". Almost all our blog posts were listed there. http://www.gan4hire.com/blog/2011/are-you-here-for-good/ http://www.gan4hire.com/blog/2011/are-you-socially-awkward/ MaeM3.png TLcD8.png
Technical SEO | | JasonDGreat0 -
Is SEOMoz only good for "ideas"?
Perhaps I've learned too much about the technical aspects of SEO, but nowhere have I found scientific studies backing up any claims made here, or a useful answer to a discussion I recently started. Maybe it doesn't exist. I do enjoy Whiteboard Friday's. They're fantastic for new ideas. This site is great. But I take it there are no proper studies conducted that examine SEO, rather just the usual spin of "belief from authority". No?
Technical SEO | | stevenheron0