Stuck trying to deindex pages from google

Ruchy

Hi There,

We had developers put a lot of spammy markups in one of our websites. We tried many ways to deindex them by fixing it and requesting recrawls... However, some of the URLs that had these spammy markups were incorrect URLs - redirected to the right version, (ex. same URL with or without / at the end)

so now all the regular URLs are updated and clean, however, the redirected URLs can't be found in crawls so they weren't updated, and couldn't get the spam removed. They still show up in the serp.

I tried deindexing those spammed pages by making then no-index in the robot.txt file. This seemed to be working for about a week, and now they showed up again in the serp

Can you help us get rid of these spammy urls?

edit?usp=sharing

Gaston Riera

Ruchy,

Yeap it might had helped for a few weeks. But internal links from your site are not the only way to crawl all your pages. Remember that there may be other sites linking other pages.

B- Absolutely, adding noindex will help. There is no way to know for sure how long will it take, give it a few weeks. Also, it could help removing manually all those pages with the Google Search Console, as Logan said.

Hope it helps!.
GR

Ruchy

Hi Gaston,

Thanks so much for taking your time to answer my question

here are two points - A- My mistake, in the robot.txt we disallowed it, and it was done right. Our devs did it for us and I double checked in in search console tester. Also, this idea did work for us the first few weeks.

B - There is no place the crawlers can find these pages to recrawl, as they are no longer linked from anywhere in my site. will adding the no index help? If yes, how long can it take?

LoganRay

I second what Gaston said. This usage of robots.txt is one of the most common misconceptions in SEO, so don't feel bad. Google actually explicitly says to not use robots.txt for index-prevention in their webmaster guide.

To add to Gaston's point, make sure you remove the robots.txt disallow when you add the meta noindex tag he provided. If you don't let them crawl the page, they won't see the tag.

You can also use remove these URLs temporarily in Search Console by going to the Google Index menu and selecting "Remove URLs". It'll remove from search results, then when they come back to crawl that page again (as long as you're letting them), they'll see your noindex tag and keep it out.

Gaston Riera

Hello Ruchy,

If by "making no-index" in the robots you are meaning _disallowing _them, you are making ir wrong.
Robots.txt are just signs to the robots and only tell them to NOT CRAWL them, it doesnt prevent from indexing those pages. (it can happen the case that there is a link pointing to that page and the crawler just passes by it).

The most used way to remove certaing indexed pages is by adding the robots noindex meta tag, it should look like this:

Also, some useful links:
Robots meta directives - Moz
Robots meta tag - Google developers
Robots tag generator

Hope it helps.
GR

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

Stuck trying to deindex pages from google

Browse Questions

Explore more categories

Related Questions

Pages are Indexed but not Cached by Google. Why?

Deindexed homepage by Google

Why is Google Webmaster Tools showing 404 Page Not Found Errors for web pages that don't have anything to do with my site?

How do I find which pages are being deindexed on a large site?

How Does Google's "index" find the location of pages in the "page directory" to return?

Can you 301 redirect a page to an already existing/old page ?

Secondary Pages Indexed over Primary Page

Are .html pages better for ranking than .asp pages