How to get a large number of urls out of Google's Index when there are no pages to noindex tag?
-
Hi,
I'm working with a site that has created a large group of urls (150,000) that have crept into Google's index. If these urls actually existed as pages, which they don't, I'd just noindex tag them and over time the number would drift down.
The thing is, they created them through a complicated internal linking arrangement that adds affiliate code to the links and forwards them to the affiliate. GoogleBot would crawl a link that looks like it's to the client's same domain and wind up on Amazon or somewhere else with some affiiiate code. GoogleBot would then grab the original link on the clients domain and index it... even though the page served is on Amazon or somewhere else. Ergo, I don't have a page to noindex tag.
I have to get this 150K block of cruft out of Google's index, but without actual pages to noindex tag, it's a bit of a puzzler.
Any ideas? Thanks! Best... Michael
P.S.,
All 150K urls seem to share the same url pattern... exmpledomain.com/item/... so /item/ is common to all of them, if that helps.
-
If no pages which support web coding actually exist for the URLs you want to remove from Google's index, I'd probably use the HTTP header instead. Maybe use the X-Robots directives:
- https://yoast.com/x-robots-tag-play/
- https://www.searchenginejournal.com/x-robots-tag-simple-alternate-robots-txt-meta-tag/67138/
Even if you have no page with web-code, you can always have a HTTP Header. A HTTP header simply allows a client and / or server to fire additional information through 'requests' (post / get etc).
This is the only thing I can think of which would really help. Some people might suggest robots.txt wildcards, but robots.txt handles crawling and not indexation (so those answers wouldn't really be worth anything to you)
The other thing you could do (maybe combine this with the X-Robots stuff) is to get all of those URLs to serve status code 410 (gone) instead of 404 (temporarily gone, but coming back)
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Can 'follow' rather than 'nofollow' links be damaging partner's SEO
Hey guys and happy Monday! We run a content rich website, 12+ years old, focused on travel in a specific region, and advertisers pay for banners/content etc alongside editorial. We have never used 'nofollow' website links as they're no explicitly paid for by clients, but a partner has asked us to make all links to them 'nofollow' as they have stated the way we currently link is damaging their SEO. Could this be true in any way? I'm only assuming it would adversely affect them if our website was peanalized by Google for 'selling links', which we're not. Perhaps they're just keen to follow best practice for fear of being seen to be buying links. FYI we now plan to change to more full use of 'nofollow', but I'm trying to work out what the client is refering to without seeming ill-informed on the subject! Thank you for any advice 🙂
Intermediate & Advanced SEO | | SEO_Jim0 -
What's the best possible URL structure for a local search engine?
Hi Mozzers, I'm working at AskMe.com which is a local search engine in India i.e if you're standing somewhere & looking for the pizza joints nearby, we pick your current location and share the list of pizza outlets nearby along with ratings, reviews etc. about these outlets. Right now, our URL structure looks like www.askme.com/delhi/pizza-outlets for the city specific category pages (here, "Delhi" is the city name and "Pizza Outlets" is the category) and www.askme.com/delhi/pizza-outlets/in/saket for a category page in a particular area (here "Saket") in a city. The URL looks a little different if you're searching for something which is not a category (or not mapped to a category, in which case we 301 redirect you to the category page), it looks like www.askme.com/delhi/search/pizza-huts/in/saket if you're searching for pizza huts in Saket, Delhi as "pizza huts" is neither a category nor its mapped to any category. We're also dealing in ads & deals along with our very own e-commerce brand AskMeBazaar.com to make the better user experience and one stop shop for our customers. Now, we're working on URL restructure project and my question to you all SEO rockstars is, what can be the best possible URL structure we can have? Assume, we have kick-ass developers who can manage any given URL structure at backend.
Intermediate & Advanced SEO | | _nitman0 -
Does Google Read URL's if they include a # tag? Re: SEO Value of Clean Url's
An ECWID rep stated in regards to an inquiry about how the ECWID url's are not customizable, that "an important thing is that it doesn't matter what these URLs look like, because search engines don't read anything after that # in URLs. " Example http://www.runningboards4less.com/general-motors#!/Classic-Pro-Series-Extruded-2/p/28043025/category=6593891 Basically all of this: #!/Classic-Pro-Series-Extruded-2/p/28043025/category=6593891 That is a snippet out of a conversation where ECWID said that dirty urls don't matter beyond a hashtag... Is that true? I haven't found any rule that Google or other search engines (Google is really the most important) don't index, read, or place value on the part of the url after a # tag.
Intermediate & Advanced SEO | | Atlanta-SMO0 -
Could this be seen as duplicate content in Google's eyes?
Hi I'm an in-house SEO and we've recently seen Panda related traffic loss along with some of our main keywords slipping down the SERPs. Looking for possible Panda related issues I was wondering if the following could be seen as duplicate content. We've got some very similar holidays (travel company) on our website. While they are different I'm concerned it may be seen as creating content that is too similar: http://www.naturalworldsafaris.com/destinations/africa-and-the-indian-ocean/kenya/suggested-holidays/the-wildlife-and-beaches-of-kenya.aspx http://www.naturalworldsafaris.com/destinations/africa-and-the-indian-ocean/kenya/suggested-holidays/ultimate-kenya-wildlife-and-beaches.aspx http://www.naturalworldsafaris.com/destinations/africa-and-the-indian-ocean/kenya/suggested-holidays/wildlife-and-beach-family-safari.aspx They do all have unique text but as you can see from the titles, they are very similar (note from an SEO point of view the tabbed content is all within the same page at source level). At the top level of the holiday pages we have a filtered search:
Intermediate & Advanced SEO | | KateWaite
http://www.naturalworldsafaris.com/destinations/africa-and-the-indian-ocean/kenya/suggested-holidays.aspx These pages have a unique introduction but the content snippets being pulled into the boxes is drawn from each of the individual holiday pages. I'm just concerned that these could be introducing some duplicating issues. Any thoughts?0 -
Site Structure: How do I deal with a great user experience that's not the best for Google's spiders?
We have ~3,000 photos that have all been tagged. We have a wonderful AJAXy interface for users where they can toggle all of these tags to find the exact set of photos they're looking for very quickly. We've also optimized a site structure for Google's benefit that gives each category a page. Each category page links to applicable album pages. Each album page links to individual photo pages. All pages have a good chunk of unique text. Now, for Google, the domain.com/photos index page should be a directory of sorts that links to each category page. Alternatively, the user would probably prefer the AJAXy interface. What is the best way to execute this?
Intermediate & Advanced SEO | | tatermarketing0 -
Will Canonical tag on parameter URLs remove those URL's from Index, and preserve link juice?
My website has 43,000 pages indexed by Google. Almost all of these pages are URLs that have parameters in them, creating duplicate content. I have external links pointing to those URLs that have parameters in them. If I add the canonical tag to these parameter URLs, will that remove those pages from the Google index, or do I need to do something more to remove those pages from the index? Ex: www.website.com/boats/show/tuna-fishing/?TID=shkfsvdi_dc%ficol (has link pointing here)
Intermediate & Advanced SEO | | partnerf
www.website.com/boats/show/tuna-fishing/ (canonical URL) Thanks for your help. Rob0 -
Will blocking google and SE's from indexing images hurt SEO?
Hi, We have a bit of a problem where on a website we are managing, there are thousands of "Dynamically" re-sized images. These are stressing out the server as on any page there could be upto 100 dynamically re-sized images. Google alone is indexing 50,000 pages a day, so multiply that by the number of images and it is a huge drag on the server. I was wondering if it maybe an idea to blog Robots (in robots.txt) from indexing all the images in the image file, to reduce the server load until we have a proper fix in place. We don't get any real value from having our website images in "Google Images" so I am wondering if this could be a safe way of reducing server load? Are there any other potential SEO issues this could cause?? Thanks
Intermediate & Advanced SEO | | James770 -
Pro's & Con's of registering your customers?
I know that making a user register will drop the the conversion rate. However, there are a lot of sites that still stand by making users register before you can purchase. I was wondering if they know something that I don't that would outweigh the loss of those conversions. What exactly are the Pro's & Con's of making your customers register before being able to purchase an item?
Intermediate & Advanced SEO | | HCGDiet0