More Indexed Pages than URLs on site.

DavidLenehan

According to webmaster tools, the number of pages indexed by Google on my site doubled yesterday (gone from 150K to 450K). Usually I would be jumping for joy but now I have more indexed pages than actual pages on my site.

I have checked for duplicate URLs pointing to the same product page but can't see any, pagination in category pages doesn't seem to be indexed nor does parameterisation in URLs from advanced filtration.

Using the site: operator we get a different result on google.com (450K) to google.co.uk (150K).

Anyone got any ideas?

LynnPatchett

Hi David,

Its tough to say without some more digging and information, it certainly looks like you have most of the common problem areas covered from what I can see. I will throw out an idea: I see you have a few 301 redirects in place switching from .html to non html versions. If this was done on a massive scale then possibly you have a google index with both versions of the pages in the index? If so it might not really be a big issue and over the next weeks/months the old .html versions will fall out of the index and your numbers will begin to look more normal again, Just a thought.

DavidLenehan

Thanks Lynn. The 31,000 was a bit of a legacy of issue and something we have solved. The robots file was changed a couple of weeks ago. So fingers crossed Google will deindex them soon. We get the same result when using inurl: where.

Any idea where the rest have come from?

DavidLenehan

Hi Irving

We checked everything obvious and cannot explain what is going on. I cannot see any major duplicate content issues and we do not have any subdomains active. The Moz crawler also doesn't highlight any major duplicate content issues.

LynnPatchett

Hi David,

Not sure why they started showing up now (some recent changes to the site?) but I suspect your problem is indexed urls that you are trying to block with robots.txt but are finding their way into the index somehow.

If you do a search for: site:nicontrols.com inurl:/manufacturer/ and then click on the show omitted results you will see a whole bunch (31000!) of 'content blocked by robots.txt' notices but the urls are still in the index. If you do a couple more similar searches looking for other likely url paths you will likely find some more.

If you can get a no-index meta tag into these pages I think it will be more effective in keeping them out of the index. If you have in mind some recent changes you have done to the site that might have introduced internal links to these pages then it would be worth looking to see if you can get the links removed or replaced with the 'proper' link format.

Hope that helps!

irvingw

Can you see in the search the pages which are indexed and look for duplicates or technical issues causing improper indexing? Do you have other sites like subdomains Google might be counting as pages.

Welcome to the Q&A Forum

Browse the forum for helpful insights and fresh discussions about all things SEO.

More Indexed Pages than URLs on site.

Browse Questions

Explore more categories

Related Questions

Shopify Website Page Indexing issue

How long will old pages stay in Google's cache index. We have a new site that is two months old but we are seeing old pages even though we used 301 redirects.

If Robots.txt have blocked an Image (Image URL) but the other page which can be indexed has this image, how is the image treated?

Question spam malware causing many indexed pages

"No Index, No Follow" or No Index, Follow" for URLs with Thin Content?

HTTPS pages - To meta no-index or not to meta no-index?

How do I increase rankings when the indexed page is the homepage?

Best practice to change the URL of all my site pages