How to get a large number of urls out of Google's Index when there are no pages to noindex tag?

94501

Hi,

I'm working with a site that has created a large group of urls (150,000) that have crept into Google's index. If these urls actually existed as pages, which they don't, I'd just noindex tag them and over time the number would drift down.

The thing is, they created them through a complicated internal linking arrangement that adds affiliate code to the links and forwards them to the affiliate. GoogleBot would crawl a link that looks like it's to the client's same domain and wind up on Amazon or somewhere else with some affiiiate code. GoogleBot would then grab the original link on the clients domain and index it... even though the page served is on Amazon or somewhere else. Ergo, I don't have a page to noindex tag.

I have to get this 150K block of cruft out of Google's index, but without actual pages to noindex tag, it's a bit of a puzzler.

Any ideas? Thanks! Best... Michael

P.S.,

All 150K urls seem to share the same url pattern... exmpledomain.com/item/... so /item/ is common to all of them, if that helps.

effectdigital

If no pages which support web coding actually exist for the URLs you want to remove from Google's index, I'd probably use the HTTP header instead. Maybe use the X-Robots directives:

Even if you have no page with web-code, you can always have a HTTP Header. A HTTP header simply allows a client and / or server to fire additional information through 'requests' (post / get etc).

This is the only thing I can think of which would really help. Some people might suggest robots.txt wildcards, but robots.txt handles crawling and not indexation (so those answers wouldn't really be worth anything to you)

The other thing you could do (maybe combine this with the X-Robots stuff) is to get all of those URLs to serve status code 410 (gone) instead of 404 (temporarily gone, but coming back)

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

How to get a large number of urls out of Google's Index when there are no pages to noindex tag?

Browse Questions

Explore more categories

Related Questions

Can 'follow' rather than 'nofollow' links be damaging partner's SEO

What's the best possible URL structure for a local search engine?

Does Google Read URL's if they include a # tag? Re: SEO Value of Clean Url's

Could this be seen as duplicate content in Google's eyes?

Site Structure: How do I deal with a great user experience that's not the best for Google's spiders?

Will Canonical tag on parameter URLs remove those URL's from Index, and preserve link juice?

Will blocking google and SE's from indexing images hurt SEO?

Pro's & Con's of registering your customers?