Huge Google index on E-commerce site
-
Hi Guys,
I got a question which i can't understand.
I'm working on a e-commerce site which recently got a CMS update including URL updates.
We did a lot of 301's on the old url's (around 3000 /4000 i guess) and submitted a new sitemap (around 12.000 urls, of which 10.500 are indexed).The strange thing is.. When i check the indexing status in webmaster tools Google tells me there are over 98.000 url's indexed.
Doing the site:domainx.com Google tells me there are 111.000 url's indexed.Another strange thing which another forum member describes here :
And next to that old url's (which have a 301 for about a month now) keep showing up in the index.
Does anyone know what i could do to solve the problem?
-
Allright guys, thanks alot for the answers.
Gonna try some things out coming monday.
Canonical url's and pagination (rel=prev) will work i guess.
The hard part is, i'm working on this site with a development company that tells me they can url redirect all the 404's to the homepage while they must be redirected either to other products or category pages.
So only solution is that i have to do that by hand, one by one via a tool they build. But it's a hell of a job!
@ Andy , I checked it and it actually says :
Total indexed : 98.000
Ever crawled: 929.762And when i check the questionmark at total indexed it says:
Total number of url's added to Google index.Thanks again for your answers
-
something to check would be in WMT if you go to the advanced section of the index status chart you should see currently in the index and ever indexed, it sounds like you are just seeing the ever indexed number which could be huge for almost any website.
-
We had similar issues with too many indexed pages (about 100,000 pages) for a site with about 3500 pages.
By setting a canonical url on each page and also preventing google from indexing and crawling some of the urls (robots.txt and meta noindex) we are now down to 3500 urls, The benefit is (besides less duplicate content), much faster indexing of new pages.
http://support.google.com/webmasters/bin/answer.py?hl=en&answer=139394
-
Hi,
A couple of things could be and probably are at work in this situation.
1. For the 301 redirects, if the site is big (12000 urls), depending on how often and much google crawls the site it could easily take more than a month for it to find and identify all the new urls/301 redirects etc and then update its cache of indexed pages. So in this case its is a matter of patience. If the 301s are implemented correctly, they will eventually be indexed.
2. You have done 3 or 4000 301s, for the rest of the the old 12000 urls what are you showing, a 404? It is a big undertaking to redirect that many pages, but worth thinking about the technical side of what is happening, part of your 98000 indexed urls could be a mix of old and new if the old ones are not being redirected to a page that clearly states that they are either somewhere else (301) or no longer available (404).
3. A common problem with e-shops is duplicate content due to various things like product filters, search string variables etc that are going to pages that are indexable and do not have rel canonical tags. A good way to see if this is the case is to search for likely url parts in your cms that could lead to this issue (maybe you have filters that result in urls like xxx?search=123 or xxx?manufacturer=23 etc) and then do a google search along the lines of site:xxx.com inurl:manufacturer which should give a good idea of if/where you have this problem. This case of duplicate content could be even more pronounced if it was occurring on your old cms urls AND your new cms urls and a combination of these are in your 98000 total.
Hope that helps!
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Staging/Development Site Indexed?
So, my company's site has been pretty tough to try to get moving in the right direction on Google's SERPs. I had believed that it was mainly due to having a shortage of back links and a horrible home page load time. Everything else seems to be set up pretty well. I was messing around and used the site: Google search operator for our staging site. I found stage.site.com and a lot of our other staging pages in the search results. I have to think that this is the problem and causing a duplicate content penalty of the entire site. I guess I now need to 301 redirect the entire site? Has anyone every had this issue before and have fixed it? Thanks for any help.
Intermediate & Advanced SEO | | aua0 -
Product Pages not indexed by Google
We built a website for a jewelry company some years ago, and they've recently asked for a meeting and one of the points on the agenda will be why their products pages have not been indexed. Example: http://rocks.ie/details/Infinity-Ring/7170/ I've taken a look but I can't see anything obvious that is stopping pages like the above from being indexed. It has a an 'index, follow all' tag along with a canonical tag. Am I missing something obvious here or is there any clear reason why product pages are not being indexed at all by Google? Any advice would be greatly appreciated. Update I was told 'that each of the product pages on the full site have corresponding page on mobile. They are referred to each other via cannonical / alternate tags...could be an angle as to why product pages are not being indexed.'
Intermediate & Advanced SEO | | RobbieD910 -
Is it possible to rank in google mexico when you don't have a local site?
Hello, someone is asking me why we don't rank in google mexico search engine. I mentioned we don't have a google mexico site, but have a USA site, so we may rank, but not as well as if we had the mexico site. IS there anyway to improve rankings or tips? THanks! Laura Robinson
Intermediate & Advanced SEO | | lauramrobinson321 -
Google is indexing the wrong page
Hello, I have a site I am optimizing and I cant seem to get a particular listing onto the first page due to the fact google is indexing the wrong page. I have the following scenario. I have a client with multiple locations. To target the locations I set them up with URLs like this /<cityname>-wedding-planner.</cityname> The home page / is optimized for their port saint lucie location. the page /palm-city-wedding-planner is optimized for the palm city location. the page /stuart-wedding-planner is optimized for the stuart location. Google picks up the first two and indexes them properly, BUT the stuart location page doesnt get picked up at all, instead google lists / which is not optimized at all for stuart. How do I "let google know" to index the stuart landing page for the "stuart wedding planner" term? MOZ also shows the / page as being indexed for the stuart wedding planner term as well but I assume this is just a result of what its finding when it performs its searches.
Intermediate & Advanced SEO | | mediagiant0 -
Ticket Industry E-commerce Duplicate Content Question
Hey everyone, How goes it? I've got a bunch of duplicate content issues flagged in my Moz report and I can't figure out why. We're a ticketing site and the pages that are causing the duplicate content are for events that we no longer offer tickets to, but that we will eventually offer tickets to again. Check these examples out: http://www.charged.fm/mlb-all-star-game-tickets http://www.charged.fm/fiba-world-championship-tickets I realize the content is thin and that these pages basically the same, but I understood that since the Title tags are different that they shouldn't appear to the Goog as duplicate content. Could anyone offer me some insight or solutions to this? Should they be noindexed while the events aren't active? Thanks
Intermediate & Advanced SEO | | keL.A.xT.o1 -
Website is not indexed in Google, please help with suggestions
Our client website was removed from Google index. Anybody could recommend how to speed up process of re index: Webmaster tools done SM done (Twitter, FB) sitemap.xml done backlinks in process PPC done Robots.txt is fine Guys any recommendations are welcome, client is very unhappy. Thank you
Intermediate & Advanced SEO | | ThinkBDW0 -
SERP Experience After You Resubmit Your Site to Google
Hello Everyone, We suddenly noticed that our keywords fell off the map and discovered that porn had been placed (via.htaccess redirects and masking) on our site. The porn links caused Google to drop us.We scrubbed our .htaccess file and asked Google to reindex our site 3 weeks ago.Does anyone have experience with reindexing?If so, how long were you down and did your keyword positions return eventually?Thanks,Bob
Intermediate & Advanced SEO | | impressem0 -
Working out exactly how Google is crawling my site if I have loooots of pages
I am trying to work out exactly how Google is crawling my site including entry points and its path from there. The site has millions of pages and hundreds of thousands indexed. I have simple log files with a time stamp and URL that google bot was on. Unfortunately there are hundreds of thousands of entries even for one day and as it is a massive site I am finding it hard to work out the spiders paths. Is there any way using the log files and excel or other tools to work this out simply? Also I was expecting the bot to almost instantaneously go through each level eg. main page--> category page ---> subcategory page (expecting same time stamp) but this does not appear to be the case. Does the bot follow a path right through to the deepest level it can/allowed to for that crawl and then returns to the higher level category pages at a later time? Any help would be appreciated Cheers
Intermediate & Advanced SEO | | soeren.hofmayer0