Robots.txt file - How to block thosands of pages when you don't have a folder path
-
Hello.
Just wondering if anyone has come across this and can tell me if it worked or not.Goal:
To block review pagesChallenge:
The URLs aren't constructed using folders, they look like this:
www.website.com/default.aspx?z=review&PG1234
www.website.com/default.aspx?z=review&PG1235
www.website.com/default.aspx?z=review&PG1236So the first part of the URL is the same (i.e. /default.aspx?z=review) and the unique part comes immediately after - so not as a folder. Looking at Google recommendations they show examples for ways to block 'folder directories' and 'individual pages' only.
Question:
If I add the following to the Robots.txt file will it block all review pages?User-agent: *
Disallow: /default.aspx?z=reviewMuch thanks,
Davinia -
Also remember that blocking in robots.txt doesn't prevent Google from indexing those URLs. If the URLs are already indexed or if they are linked to, either internally or externally they may still in appear in the index with limited snippet information. If so, you'll need to add a noindex meta tag to those pages.
-
An * added to the end! Great thank you!
-
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=156449
Head down to the pattern matching section.
I think
User-agent: *
Disallow: /default.aspx?z=review*should do the trick though.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Set Robots.txt file to crawl my website at specific times
Our website provider has stated that they can only 'lift' their block on our website in order for it to be crawled as specific times. Is there any way to amend a robots.txt to ensure that it crawls our website at a specific time of day/night in order to coincide with the block being lifted? Many Thanks, Charlene
Intermediate & Advanced SEO | | CharleneKennedy120 -
What are best page titles for sub-folders or sub-directories? Same as website?
Hi all, We always mention "brand & keyword" in every page title along with topic in the website, like "Topic | vertigo tiles". Let's say there is a sub-directory with hundreds of pages...what will be the best page title practice in mentioning "brand & keyword" across all pages of sub-directory to benefit in-terms if SEO? Can we add "vertigo tiles" to all pages of sub-directory? Or we must not give same phrase? Thanks,
Intermediate & Advanced SEO | | vtmoz0 -
Question about Syntax in Robots.txt
So if I want to block any URL from being indexed that contains a particular parameter what is the best way to put this in the robots.txt file? Currently I have-
Intermediate & Advanced SEO | | DRSearchEngOpt
Disallow: /attachment_id Where "attachment_id" is the parameter. Problem is I still see these URL's indexed and this has been in the robots now for over a month. I am wondering if I should just do Disallow: attachment_id or Disallow: attachment_id= but figured I would ask you guys first. Thanks!0 -
Block in robots.txt instead of using canonical?
When I use a canonical tag for pages that are variations of the same page, it basically means that I don't want Google to index this page. But at the same time, spiders will go ahead and crawl the page. Isn't this a waste of my crawl budget? Wouldn't it be better to just disallow the page in robots.txt and let Google focus on crawling the pages that I do want indexed? In other words, why should I ever use rel=canonical as opposed to simply disallowing in robots.txt?
Intermediate & Advanced SEO | | YairSpolter0 -
Incoming links which don't exists...
I believe our site is being penalized/held back in rankings, and I think this is why... We placed an advert on a website which they didn't make "no follow" so we had hundreds of site-wide links coming into our site. We asked them to remove the advert which they did. This was 4 months ago, and the links are still showing in GWMT. We have look into their pages which GWMT is saying still link to us, but these a number pages aren't being indexed by Google, and others aren't being cached. Is it possible that because Google cant find these pages, it can tell our link has been removed? And/or are we being penalized for this? Many thanks
Intermediate & Advanced SEO | | jj34341 -
What NAP format do I use if the USPS can't even find my client's address?
My client has a site already listed on Google+Local under "5208 N 1st St". He has some other NAPs, e.g., YellowPages, under "5208 N First Street". The USPS finds neither of these, nor any variation that I can possibly think of! Which is better? Do I just take the one that Google has accepted and make all the others like it as best I can? And doesn't it matter that the USPS doesn't even recognize the thing? Or no? Local SEO wizards, thanks in advance for your guidance!
Intermediate & Advanced SEO | | rayvensoft0 -
Reciprocal Links and nofollow/noindex/robots.txt
Hypothetical Situations: You get a guest post on another blog and it offers a great link back to your website. You want to tell your readers about it, but linking the post will turn that link into a reciprocal link instead of a one way link, which presumably has more value. Should you nofollow your link to the guest post? My intuition here, and the answer that I expect, is that if it's good for users, the link belongs there, and as such there is no trouble with linking to the post. Is this the right way to think about it? Would grey hats agree? You're working for a small local business and you want to explore some reciprocal link opportunities with other companies in your niche using a "links" page you created on your domain. You decide to get sneaky and either noindex your links page, block the links page with robots.txt, or nofollow the links on the page. What is the best practice? My intuition here, and the answer that I expect, is that this would be a sneaky practice, and could lead to bad blood with the people you're exchanging links with. Would these tactics even be effective in turning a reciprocal link into a one-way link if you could overlook the potential immorality of the practice? Would grey hats agree?
Intermediate & Advanced SEO | | AnthonyMangia0 -
Should I robots block this directory?
There's about 43k pages indexed in this directory, and while helpful to end users, I don't see it being a great source of unique content for search engines. Would you robots block or meta noindex nofollow these pages in the /blissindex/ directory? ie. http://www.careerbliss.com/blissindex/petsmart-index-980481/ http://www.careerbliss.com/blissindex/att-index-1043730/ http://www.careerbliss.com/blissindex/facebook-index-996632/
Intermediate & Advanced SEO | | CareerBliss0