How to de-index old URLs after redesigning the website?

Chemometec

Thank you for reading.

After redesigning my website (5 months ago) in my crawl reports (Moz, Search Console) I still get tons of 404 pages which all seems to be the URLs from my previous website (same root domain).

It would be nonsense to 301 redirect them as there are to many URLs. (or would it be nonsense?)

What is the best way to deal with this issue?

Chemometec

Thank you Clever PhD, really valuable insights!

LiamMcArthur

I completely agree with all of the above - I've taken her point more like my own. Where receiving thousands of annoying 404 errors from pages that haven't existed for many months just gets annoying!

CleverPhD

I respectfully disagree with all of the above. Please repeat after me, 404s are not bad, they are diagnostic, 404s are not bad, they are diagnostic, 404s are not bad, they are diagnostic.

After redesigning my website (5 months ago) in my crawl reports (Moz, Search Console) I still get tons of 404 pages which all seems to be the URLs from my previous website (same root domain).

**Part 1 Internal links that 404s from Moz Crawl: **The 404s that show up in the Moz crawl are only going to be from an internal link on your website. The Moz crawl only looks at internal links and not links from other website. In other words, if you see 404s in your Moz crawl, that means, somewhere, you are linking to those pages and that is why the 404s are showing up. Download the CSV and you will find them in your Moz crawl. Other tools such as screaming frog, Botify, Deep Crawl, will show you a similar analysis.

Simple solution. Go through your code and remove the internal links on your site that direct the Moz crawler to those pages and the 404s will go away. (FYI this same approach will work for any internal 301s) These 404 errors in the Moz report are great diagnostic signals on where to fix your site. It is bad for users to click on a link within your website and get sent to a page that does not exist.

**Part 2 external links from Search Console: **The 404s that show up in Search console can come from your internal links on your site AND external links from other sites. Google will keep trying to crawl these links due to other sites linking to pages on your site and your own internal links. For internal link fixing - see suggestion above. For external links you need a different approach.

Look at the external links, where are they coming from? Are they from quality websites? Do they go to formerly important pages on your websites (ie pages that were good converters? If so, then use the 301 redirect to send them to the correct replacement page (and this is not always the home page). You get users to the correct page and also any link equity is passed along as well and this can help with your site rankings. If the link goes to former page on your site that was not any good to start with and the links that come into it are poor quality, then you just let the page 404. Tools such as Moz Open Site Explorer or Ahrefs or Majestic can help with this assessment - but usually you can just look at a site linking to you and tell if it is crap or not.

You need to consider the above regardless of if you want to get the pages that are 404ing in question out of the Google index as if you get Google to remove the page from the index, it will then see the internal link on your site and then find the 404 again. If you have removed the links to the 404 pages on your site, eventually Google will stop crawling them and drop out of the index.

Important note regarding the use of robots.txt. Blocking Google from crawling the 404s will not remove the pages from the index, Google will just stop crawling them. Google has to be able to crawl the URL to see the 404 and then see that it is a bad page and then remove the page from the index. Blocking with robots.txt stops Google from doing that. As soon as you take the page out of robots Google will recrawl and the 404 shows up again. Robots.txt treats a symptom that is a red herring, allowing the 404 to occur takes care of the issue permanently.

Dead pages are a natural part of the web. Let Google see the 404 (if it truly is a page that should 404 and has no link equity that should be passed along with a 301). Google will crawl the 404 several times, you will see it in search console several times. It is ok. You are not penalized for X number of 404s. You may lose ranking if you 404 a page that Google used to rank well, but this is just because Google will not keep a page highly ranked that does not exist :-). Help Google out by cleaning up your internal link structure so when it sees that you do not link to the page any more, then that is a signal that the page should 404. Google knows that due to the nature of the web, pages will time out on occasion and show an error. Google will continue to recrawl a page just to make sure, it wants to give you the benefit of the doubt. Therefore, you have to give clear directives by not linking to dead pages so that after Google double and triple checks the page, it will finally drop it. You will see the 404 in your Search Console for several months then it will eventually go away.

Hope that makes sense. Good luck!

MoosaHemani

Hey Lana, If you really think that 301 does not make sense in that case you can always add the URLs in the robots.txt file and once Google will recrawl your website, Google will de-index the pages from the index.

Another thing you can do is using the de-index feature in Google webmaster tool. You can do that by getting in to your GWT, Optimization > Remove URLs and do that accordingly.

Hope this helps!

Chemometec

I see the point. Thanks Liam. As the most of our 404 pages starts with /en-GB/ i will do like this:

Disallow: /en-GB/

LiamMcArthur

Hi Lana,

I've been having the same problem on one of our websites. I've been 301 redirecting over 5,000 URL's but still receive a lot of 404 errors. One of the main reasons for these 404 errors still appearing is other bots such as Bing Bot that is still crawling the old URL's.

To resolve this, I would just block them in your robots.txt file. We blocked our old product URL's that were under a "product directory like this:

User-agent: *
Disallow: /product/

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

How to de-index old URLs after redesigning the website?

Browse Questions

Explore more categories

Related Questions

How long will old pages stay in Google's cache index. We have a new site that is two months old but we are seeing old pages even though we used 301 redirects.

Changing URLs

New websites

Does it make sense to create new pages with friendlier URLs then redirect old pages to new?

Now that Google will be indexing Twitter, are Twitter backlinks likely to effect website rank in the SERPs?

To index search results or to not index search results?

Indexing a several millions pages new website

How can I block unwanted urls being indexed on google?