Is site: a reliable method for getting full list of indexed pages?
-
The site:domain.com search seems to show less pages than it used to (Google and Bing).
It doesn't relate to a specific site but all sites. For example, I will get "page 1 of about 3,000 results" but by the time I've paged through the results it will end and change to "page 24 of 201 results". In that example If I look in GSC it shows 1,932 indexed.
Should I now accept the "pages" listed in site: is an unreliable metric?
-
Keep in mind that for a site:domain.com search, Google now includes pages from OTHER SITES that are using the canonical tag to point to your site. So, even though it says there are 300 pages indexed, 30 of those pages might be on other sites that use the canonical tag pointing to your site. The numbers of pages indexed that you're looking at may not be entirely accurate because of this.
-
I just haven't seen where the pages reduced, but I only use that operator for a general search. I have never gone through all the pages, etc. For that I would use any of the crawler tools. It would be interesting to see a download of search, GSC, and then something like Screaming Frog to see what we see.
As soon as I wrote that I checked our site and realized what you are saying. For Google we get "About 281 results," as I go to last page of results it changes to "page 13 of 126 results."
Then out of curiosity I tried Bing and now I am scratching my head: "763 results." When I go to last possible page I get, "247-256 of 256 results." I think that means my 281 results from Google are mostly on Bing!!!! (in case someone does not realize my humor, that last statement can be defined as either jest or sarcasm.)
So, when doing the site: I get 126 with Google but search console has 428...
Certainly interesting. I will keep playing with it.
Best
-
Hi Robert,
Thanks for your input.
The reason for doing it is part of an SEO site review process to examine pages indexed in Google compared to a site crawl in a tool like screaming frog and the indexed pages defined in GSC.
In terms of the "page 24 of 201 results" example, I mean that when you first use the site:domain.com Google will give you an estimated number of results, e.g. 3000 but actually as you click through the pages you find that the number of results is reduced - sometimes significantly.
-
I am not sure I understand where you say, " ...it will end and change to "page 24 of 201 results." I have used the site: operator a long time and I think it is reasonably accurate. One thing I notice is the occasional "some pages have been ... duplicate" and do you want to see those? So, if you include all of those what's the magic number?
Is there a reason you want the data that demands an exact result? I am not sure of anything that would give you that. The question is "indexed" within the given search engine. If you crawl with screaming frog, etc. you may see pages that are not indexed, so the comparison is not apples to apples. Just curious as to what you are wanting to know exact indexed pages for?
Interesting question.
-
Typically, the site: command in Google is unreliable. There are lots of reasons why, one being that there may be pages indexed that aren't "good enough", for whatever reason, to show up in the search results. When we look at the site pages indexed, we typically will use the site: command, then click a few pages deep and look at the number it shows (not the first number of pages it shows).
For SEO auditing purposes, we're looking to see if there is a significant difference between the number of pages indexed and the number of pages that we find when we we crawl the website ourselves.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
I have over 3000 4xx errors on my site for pages that don't exist! Please help!
Hello! I have a new blog that is only 1 month old and I already have over 3000 4xx errors which I've never had on my previous blogs. I ran a crawl on my site and it's showing as my social media links as being indexed as pages. For example, my blog post link is:
Technical SEO | | thebloggersi
https://www.thebloggersincentive.com/blogging/get-past-a-creative-block-in-blogging/
My site is then creating a link like the below:
https://www.thebloggersincentive.com/blogging/get-past-a-creative-block-in-blogging/twitter.com/aliciajthomps0n
But these are not real pages and I have no idea how they got created. I then paid someone to index the links because I was advised by Moz, but it's still not working. All the errors are the same, it's indexing my Twitter account and my Pinterest. Can someone please help, I'm really at a loss with it.
2f86c9fe-95b4-4df5-aeb4-73570881938c-image.png0 -
Why images are not getting indexed and showing in Google webmaster
Hi, I would like to ask why our website images not indexing in Google. I have shared the following screenshot of the search console. https://www.screencast.com/t/yKoCBT6Q8Upw Last week (Friday 14 Sept 2018) it was showing 23.5K out 31K were submitted and indexed by Google. But now, it is showing only 1K 😞 Can you please let me know why might this happen, why images are not getting indexed and showing in Google webmaster.
Technical SEO | | 21centuryweb0 -
Could a dropdown list of products dilute the page content?
Hi all, On our site, due to the fact we only have some 120 or so products split across 5 different categories we have a dropdown menu that displays all of the products in the menu. Forgetting usability for a moment, my question is whether by having links to all of products appear on each and every page (because they are in the main menu), are we diluting the content on the page. For example, if I take a particular product - the main phrase I want that page to be discovered for is "perspex sheet". This phrase does appear in the H1, H2 and within the main description of the product - but, as mentioned, each of our pages has some 120+ internal links due to the menu which contain all sorts of product names that arent relevant to "perspex sheet". The Moz report does flag a Medium issue on every page due to the number of internal links. I don't know whether I'm making a fuss about nothing, or whether this does have some serious side effects. It's an eCommerce site so of course im nervous of making changes that could have an adverse affect on our rankings. I thought there used to be a tool on Moz that showed what phrases a page was optimised for but i can no longer find that tool. Any help would be greatly appreciated. Regards,
Technical SEO | | SimplyPlastic
Al0 -
Google Indexing of Site Map
We recently launched a new site - on June 4th we submitted our site map to google and almost instantly had all 25,000 URL's crawled (yay!). On June 18th, we made some updates to the title & description tags for the majority of pages on our site and added new content to our home page so we submitted a new sitemap. So far the results have been underwhelming and google has indexed a very low number of the updated pages. As a result, only a handful of the new titles and descriptions are showing up on the SERP pages. Any ideas as to why this might be? What are the tricks to having google re-index all of the URLs in a sitemap?
Technical SEO | | Emily_A0 -
Does adding subcategory pages to an commerce site limit the link juice to the product pages?
I have a client who has an online outdoor gear company. He mostly sells high end outdoor gear (like ski jackets, vests, boots, etc) at a deep discount. His store currently only resides on Ebay. So we're building him an online store from scratch. I'm trying to determine the best site architecture and wonder if we should include subcategory pages. My issue is that I think the subcategory pages might be good from a user experience, but it'll add an additional layer between the homepage and the product pages. The problem is that I think a lot of user's might be searching for the product name to see if they can find a better deal, and my client's site would be perfect for them. So I really want to rank well for the product pages, but I'm nervous that the subcategory pages will limit the link juice of the product pages. Home --> SubCategory --> Product List --> Product Detail Home --> Men's Ski Clothing --> Men's Ski Jack --> North Face Mt Everest Jacket Should I keep the SubCategory page "Men's Ski Clothing" if it helps usability? On a separate note, the SubCategory pages would have some head keyword terms, but I don't think that he could rank well for these terms anytime soon. However, they would be great pages / terms to rank for in the long term. Should this influence the decision?
Technical SEO | | Santaur0 -
Site being indexed by Google before it has launched
We are currently coming towards the end of a site migration, and are at the final stage of testing redirects etc. However, to our horror we've just discovered Google has started indexing the new site. Any ideas on how this could have happened? I have most recently asked for robots.txt to exclude anything with a certain parameter in URL. Is there a chance this, wrongly implemented, could have caused this?
Technical SEO | | Sayers0 -
Mega Menus - Site Links - Bottom of the Page
Here are the questions: If you replace your top menu with a mega menu - like rei.com, target.com etc - that has dramatically more links and lots of non-optimized testimonials and calls for action, and locate the actual code of the mega menu at the bottom of the HTML , How will this affect your sitelinks? Will this now, make your on-page content more visible and indexable? Or does the Google bott dismiss this as just navigation content? In the past, I've have seen this technique work well, but that was before site links were easier to obtain. Looking at sites with virtually no navigation on their home pages and good authority, I've seen site links seemingly gleamed from alt attributes.
Technical SEO | | Runner20090