Old pages still in index
-
Hi Guys,
I've been working on a E-commerce site for a while now. Let me sum it up :
- February new site is launched
- Due to lack of resources we started 301's of old url's in March
- Added rel=canonical end of May because of huge index numbers (developers forgot!!)
- Added noindex and robots.txt on at least 1000 urls.
- Index numbers went down from 105.000 tot 55.000 for now, see screenshot (actual number in sitemap is 13.000)
Now when i do site:domain.com there are still old url's in the index while there is a 301 on the url since March!
I know this can take a while but I wonder how I can speed this up or am doing something wrong. Hope anyone can help because I simply don't know how the old url's can still be in the index.
-
Hi Dan,
Thanks for the answer!
Indexation is already back to 42.000 so slowly going back to normal
And thanks for the last tip, that's totally right. I just discovered that several pages had duplicate url's generated so by continually monitoring we'll fix it !
-
Hi There
To noindex pages there are a few methods;
-
use a meta noindex without robots.txt - I think that is why some may not be removed. The robots.txt block crawling so they can not see the noindex.
-
use a 301 redirect - this will eventually kill off the old pages, but it can definitely take a while.
-
canonical it to another page. and as Chris says, don't block the page or add extra directives. If you canonical the page (correctly), I find it usually drops out of the index fairly quickly after being crawled.
-
use the URL removal tool in webmaster tools + robots.txt or 404. So if you 404 a page or block it with robots.txt you can then go into webmaster tools and do a URL removal. This is NOT recommended though in most normal cases, as Google prefers this be for "emergencies".
The only method that removes pages within a day or two guaranteed is the URL removal tool.
I would also examine your site since it is new, for something that is causing additional pages to be generated and indexed. I see this a lot with ecommerce sites where they have lots of pagination, facets, sorting, etc and those can generate lots of other pages which get indexed.
Again, as Chris says, you want to be careful to not mix signals. Hope this all helps!
-Dan
-
-
Hi Chris,
Thanks for your answer.
I'm either using a 301 or noindex, not both of course.
Still have to check the server logs, thanks for that!
Another weird thing. While the old url is still in the index, when i check the cache date it's a week old. That's what i don't get. Cache date is a week old but Google still has the old url in the index.
-
It can take months for pages to fall out of Google's index have you looked at your log files to verify that googlebot is crawling those pages?. Things to keep in mind:
- If you 301 a page, the rel=canonical on that page will not be seen by the bot (no biggie in your case)
- If you 301 a page, a meta noindex will not be seen by the bot
- It is suggested not to use the robots.txt to no index a page that is being 301 redirected--as the redirect may not be seen by Google.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
How can I avoid duplicate content for a new landing page which is the same as an old one?
Hello mozers! I have a question about duplicate content for you... One on my clients pages have been dropping in search volume for a while now, and I've discovered it's because the search term isn't as popular as it used to be. So... we need to create a new landing page using a more popular search term. The page which is losing traffic is based on the search query "Can I put a solid roof on my conservatory" this only gets 0-10 searches per month according to the keyword explorer tool. However, if we changed this to "replacing conservatory roof with solid roof" this gets up to 500 searches per month. Muuuuch better! The issue is, I don't want to close down and re-direct the old page because it's got a featured snippet and sits in position 1. So I'd like to create another page instead... however, as the two are effectively the same content, I would then land myself in a duplicate content issue. If I were to put a rel="canonical" tag in the original "can I put a solid roof...." page but say the master page is now the new one, would that get around the issue?
Intermediate & Advanced SEO | | Virginia-Girtz0 -
Google webcache of product page redirects back to product page
Hi all– I've legitimately never seen this before, in any circumstance. I just went to check the google webcache of a product page on our site (was just grabbing the last indexation date) and was immediately redirected away from google's cached version BACK to the site's standard product page. I ran a status check on the product page itself and it was 200, then ran a status check on the webcache version and sure enough, it registered as redirected. It looks like this is happening for ALL indexed product pages across the site (several thousand), and though organic traffic has not been affected it is starting to worry me a little bit. Has anyone ever encountered this situation before? Why would a google webcache possibly have any reason to redirect? Is there anything to be done on our side? Thanks as always for the help and opinions, y'all!
Intermediate & Advanced SEO | | TukTown1 -
Duplicate Page getting indexed and not the main page!
Main Page: www.domain.com/service
Intermediate & Advanced SEO | | Ishrat-Khan
Duplicate Page: www.domain.com/products-handler.php/?cat=service 1. My page was getting indexed properly in 2015 as: www.domain.com/service
2. Redesigning done in Aug 2016, a new URL pattern surfaced for my pages with parameter "products-handler"
3. One of my product landing pages had got 301-permanent redirected on the "products-handler" page
MAIN PAGE: www.domain.com/service GETTING REDIRECTED TO: www.domain.com/products-handler.php/?cat=service
4. This redirection was appearing until Nov 2016.
5. I took over the website in 2017, the main page was getting indexed and deindexed on and off.
6. This June it suddenly started showing an index of this page "domain.com/products-handler.php/?cat=service"
7. These "products-handler.php" pages were creating sitewide internal duplicacy, hence I blocked them in robots.
8. Then my page (Main Page: www.domain.com/service) got totally off the Google index Q1) What could be the possible reasons for the creation of these pages?
Q2) How can 301 get placed from main to duplicate URL?
Q3) When I have submitted my main URL multiple times in Search Console, why it doesn't get indexed?
Q4) How can I make Google understand that these URLs are not my preferred URLs?
Q5) How can I permanently remove these (products-handler.php) URLs? All the suggestions and discussions are welcome! Thanks in advance! 🙂0 -
Why is Google no longer Indexing and Ranking my state pages with Dynamic Content?
Hi, We have some state specific pages that display dynamic content based on the state that is selected here. For example this page displays new york based content. But for some reason google is no longer ranking these pages. Instead it's defaulting to the page where you select the state here. But last year the individual state dynamic pages were ranking. The only change we made was move these pages from http to https. But now google isn't seeing these individual dynamically generated state based pages. When I do a site: url search it doesn't find any of these state pages. Any thoughts on why this is happening and how to fix it. Thanks in advance for any insight. Eddy By the way when I check these pages in google search console fetch as google, google is able to see these pages fine and they're not being blocked by any robot.txt.
Intermediate & Advanced SEO | | eddys_kap0 -
Home page suddenly dropped from index!!
A client's home page, which has always done very well, has just dropped out of Google's index overnight!
Intermediate & Advanced SEO | | Caro-O
Webmaster tools does not show any problem. The page doesn't even show up if we Google the company name. The Robot.txt contains: Default Flywheel robots file User-agent: * Disallow: /calendar/action:posterboard/
Disallow: /events/action~posterboard/ The only unusual thing I'm aware of is some A/B testing of the page done with 'Optimizely' - it redirects visitors to a test page, but it's not a 'real' redirect in that redirect checker tools still see the page as a 200. Also, other pages that are being tested this way are not having the same problem. Other recent activity over the last few weeks/months includes linking to the page from some of our blog posts using the page topic as anchor text. Any thoughts would be appreciated.
Caro0 -
301'd an important, ranking page to the wrong new page, any recourse?
Our 1,300 page site conversion from static html to Wordpress platform went flawlessly with the exception of 1 significant issue....an old, important, highly ranking page was 301 redirected to the wrong corresponding new page. The page it was redirected to is about a similar product, but not the same. This was an oversight that slipped through. It was brought to my attention when I noticed this new page was still holding the old page's rankings but the bounce rate skyrocketed (clearly because the content on the wrong new page was not relevant). Once identified, we cleaned up the redirect. My fear is that all the juice built up on the old .html page that ranked well has now permanently been passed to an irrelevant, insignificant page. -Is there any way to clean up this mistake? -Is there anything I can do to assist Google in associating the correct 'new' page with correct 'old' page after the wrong redirect was initially set-up? -Am I going to have to start from scratch with the new page in terms of trust, backlinks, etc. since google already noted the redirect? Thanks!
Intermediate & Advanced SEO | | seagreen0 -
What Sources to use to compile an as comprehensive list of pages indexed in Google?
As part of a Panda recovery initiative we are trying to get an as comprehensive list of currently URLs indexed by Google as possible. Using the site:domain.com operator Google displays that approximately 21k pages are indexed. Scraping the results however ends after the listing of 240 links. Are there any other sources we could be using to make the list more comprehensive? To be clear, we are not looking for external crawlers like the SEOmoz crawl tool but sources that would be confidently allow us to determine a list of URLs currently hold in the Google index. Thank you /Thomas
Intermediate & Advanced SEO | | sp800 -
404 with a Javascript Redirect to the index page...
I have a client that is wanting me to issue a 404 on her links that are no longer valid to a custom 404, pause for 10 seconds, then rediirect to the root page (or whatever other redirect logic she wants)...to me it seems trying to game googlebot this way is a "bad idea" Can anyone confirm/deny or offer up a better suggestion?
Intermediate & Advanced SEO | | JusinDuff0