Old URLs that have 301s to 404s not being de-indexed.
-
We have a scenario on a domain that recently moved to enforcing SSL. If a page is requested over non-ssl (http) requests, the server automatically redirects to the SSL (https) URL using a good old fashioned 301. This is great except for any page that no longer exists, in which case you get a 301 going to a 404.
Here's what I mean.
Case 1 - Good page:
http://domain.com/goodpage -> 301 -> https://domain.com/goodpage -> 200
Case 2 - Bad page that no longer exists:
http://domain.com/badpage -> 301 -> https://domain.com/badpage -> 404
Google is correctly re-indexing all the "good" pages and just displaying search results going directly to the https version.
Google is stubbornly hanging on to all the "bad" pages and serving up the original URL (http://domain.com/badpage) unless we submit a removal request. But there are hundreds of these pages and this is starting to suck. Note: the load balancer does the SSL enforcement, not the CMS. So we can't detect a 404 and serve it up first. The CMS does the 404'ing.
Any ideas on the best way to approach this problem? Or any idea why Google is holding on to all the old "bad" pages that no longer exist, given that we've clearly indicated with 301s that no one is home at the old address?
-
I don't think 404 vs 410 is the answer here.The basis for this thought is the following:
========
"if we see a page and we get a 404, we are gonna protect that page for 24 hours in the crawling system, so we sort of wait and we say maybe that was a transient 404, maybe it really wasn’t intended to be a page not found.”
“If we see a 410, then the site crawling system says, OK we assume the webmasters knows what they’re doing because they went off the beaten path to deliberately say this page is gone,” he said. “So they immediately convert that 410 to an error, rather than protecting it for 24 hours."
========
I'm thinking the deeper issue is why the 301s are not being respected. If a link points to http://domain.com/badpage and we use a 301 to point to https://domain.com/badpage - shouldn't the crawler (Google or otherwise) respect the 301? Why still index and serve up a page that responds with the 301? To me, this is baffling. If we serve up a 404 or a 410 - either way we are saying "this page is gone" but we're still seeing the original http://domain.com/badpage in the index?
Does that make sense? Or is there more clarification required?
-
sym_admin is right--you'll want to find the source of those pages, as Google apparently is seeing them from somewhere and still requesting them. If there are links to those pages somewhere, you will need to remove them. Also, if you're able, I would change those URLs so that they serve up a "410 Gone" error, and not a 404.
-
Read these three, then do what you got to do...
https://www.searchcommander.com/how-to-bulk-remove-urls-google/
https://productforums.google.com/forum/#!topic/webmasters/uYFJnsyiH8w
https://mza.seotoolninja.com/community/q/404-redirects-to-the-homepage-is-this-good-bad-ugly
For proper removal, please ensure that there are no INTERNAL links anywhere on your website to 404 addresses, from sitemap, buttons, text, or images (the whole 9 yards).
Good luck!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Www. or naked url?
Hi everyone, I am about to start a new WordPress site and debating whether to use www or naked URL for the URL structure. Using naked URL makes sense from a branding and minimalistic perspective but I am reading that using naked URL might have some technical deficiencies. Specifically, cookie issues and DNS can't be cname. Are these technical deficiencies still valid when using naked url? Would appreciate any feedback on this! Cheers
Intermediate & Advanced SEO | | nsereke1 -
URL Re-Writes & HTTPS: Link juice loss from 301s?
Our URLs are not following a lot of the best practices found here: http://moz.com/blog/11-best-practices-for-urls We have also been waiting to implement HTTPS. I think it might be time to take the plunge on re-writing the URLs and converting to a fully secure site, but I am concerned about ranking dips from the lost link juice from the 301s. Many of our URLs are very old, with a decent amount of quality links. Are we better off leaving as is or taking the plunge?
Intermediate & Advanced SEO | | TheDude0 -
Old/wrong meta-titles in index
Hi, We have problems with old Meta titles in the index of google.nl. If you look for example at this wine: https://www.wijnvoordeel.nl/Italie/Just-Hugo::5460.html The Meta tile is: **Just Hugo | Heerlijke Hugo | Het zomerdrankje van 2014 | Wijnvoordeel ** If you look at the results in Google: https://www.google.nl/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#safe=active&q=just hugo The Meta tile is: Just Hugo - Wijnvoordeel(this is an old/automatic generated Meta tile). I already added the code "", but I don't see any progress. Does anybody knows what could be the problem? Thanks for the help! Douwe Veldstra
Intermediate & Advanced SEO | | Eluscious.com0 -
Duplicated Content with Index.php
Good Afternoon, My website uses Joomla CMS and has the htaccess rewrite code enabled to ensure the use of search engine friendly URLs (SEF's). While browsing the crawl diagnostics I have found that Moz considers the /index.php URL a duplicate to our root. I will always under the impression that the htaccess rewrite took care of that issue and obviously I would like to address it. I attempted to create a 301 redirect from the index.php URL to the root but ran into an issue when attempting to login to the admin portion of the website as the redirect sent me back to the homepage. I was curious if anyone had advice for handling the index.php duplication issue, specifically with Joomla. Additionally, I have confirmed that in Google Webmasters, under URL parameters, the index.php parameter is set as 'Representative URL'.
Intermediate & Advanced SEO | | BrandonEML0 -
How accurate are the index figures in GWT?
I've been looking at a site in GWT and the number of indexed urls is very low when compared with the number or submitted urls on the xml sitemaps. The site has several stores which are all submitted using different sitemaps. When you perform a search in Google, eg site:domain.com/store1 site:domain.com/store2 site:domain.com/store3 The results are similar to the webmaster urls. However, looking in the analytics for landing pages used for organic traffic from Google shows a much higher number of pages. If these pages aren't indexed as reported in GMT, how could they be found in the results and be recorded as landing pages?
Intermediate & Advanced SEO | | edwardlewis0 -
Should I change wordpress urls?
Should I change my wordpress permalinks to include the keyword? For examples at the minute my url is http://www.musicliveuk.com/home/wedding-singer. Is it better to be http://www.musicliveuk.com/live-bands/wedding-singer. 'home' is not relevant so surely 'live-bands' would be better? If I change the urls won't I lose 'link juice' as external links will all point to a url that no longer exists? Or will wordpress automatically redirect the old url to the new one? Finally, if I should change the url as described how do I do it on wordpress? I can only see how to edit the last bit of the url and not the middle bit.
Intermediate & Advanced SEO | | SamCUK0 -
Removing large section of content with traffic, what is best de-indexing option?
If we are removing 100 old urls (archives of authors that no longer write for us), what is the best option? we could 301 traffic to the main directory de-index using no-index, follow 404 the pages Thanks!
Intermediate & Advanced SEO | | nicole.healthline0 -
Redirecting my new Website URL to my old Website URL
Hi! OK, I am semi - new to SEO Moz but have been self-teaching for 3 years. However I am stuck.. I have been operating my e-commerce site from www.shopadornonline.com for the past 3 years. I just purchased www.shopadorn.com Right now Shopadorn.com re-directs to www.shopadornonline.com because all my products and links go to shopadornonline.com/productblahblahblah I guess I am stuck. Not sure what to tell my web designer to do? Do I give up on having shopadorn.com OR do I start re-directing customers and doing 301 re-directs? I think from what i have read that it is bad to have traffic going to both shopadorn and shopadornonline as they compete for rankings? Where should I start?
Intermediate & Advanced SEO | | Shopadorn0