Site: Query Question
-
Hi All,
Question around the site: query you can execute on Google for example. Now I know it has lots of inaccuracies, but I like to keep a high level sight of it over time.
I was using it to also try and get a high level view of how many product pages were indexed vs. the total number of pages.
What is interesting is when I do a site: query for say www.newark.com I get ~748,000 results returned.
When I do a query for www.newark.com "/dp/" I get ~845,000 results returned.
Either I am doing something stupid or these numbers are completely backwards?
Any thoughts?
Thanks,
Ben
-
Barry Schwartz posted some great information about this in November of 2010, quoting a couple of different Google sources. In short, more specific queries can cause Google to dig deeper and give more accurate estimates.
-
Yup. get rid of parameter laden urls and its easy enough. If they hang around the index for a few months before disappearing thats no big deal, as long as you have done the right thing it will work out fine
Also your not interested in the chaff, just the bits you want to make sure are indexed. So make sure thise are in sensibly titled sitemaps and its fine (used this on sites with 50 million and 100 million product pages. It gets a bit more complex at that number, but the underlying principle is the same)
-
But then on a big site (talking 4m+ products) its usually the case that you have URL's indexed that wouldn't be generated in a sitemap because they include additional parameters.
Ideally of course you rid the index of parameter filled URL's but its pretty tough to do that.
-
Best bet is to make sure all your urls are in your sitemap and then you get an exact count.
Ive found it handy to use multiple sitempas for each subfolder i.e. /news/ or /profiles/ to be able to quickly see exactly what % of urls are indexed from each section of my site. This is super helpful in finding errors in a specific section or when you are working on indexing of a certain type of page
S
-
What I've found the reason for this comes down to how the Google system works. Case in point, a client site I have with 25,000 actual pages. They have mass duplicate content issues. When I do a generic site: with the domain, Google shows 50-60,000 pages. If I do an inurl: with a specific URL param, I either get 500,000 or over a million.
Though that's not your exact situation, it can help explain what's happening.
Essentially, if you do a normal site: Google will try its best to provide the content within the site that it shows the world based on "most relevant" content. When you do a refined check, it's naturally going to look for the content that really is most relevant - closest match to that actual parameter.
So if you're seeing more results with the refined process, it means that on any given day, at any given time, when someone does a general search, the Google system will filter out a lot of content that isn't seen as highly valuable for that particular search. So all those extra pages that come up in your refined check - many of them are most likely then evaluated as less than highly valuable / high quality or relevant to most searches.
Even if many are great pages, their system has multiple algorithms that have to be run to assign value. What you are seeing is those processes struggling to sort it all out.
-
about 839,000 results.
-
Different data center perhaps - what about if you add in the "dp" query to the string?
-
I actually see 'about 897,000 results' for the search 'site:www.newark.com'.
-
Thanks Adrian,
I understand those areas of inaccuracy, but I didn't expect to see a refined search produce more results than the original search. That just seems a little bizarre to me, which is why I was wondering if there was a clear explanation or if I was executing my query incorrectly.
Ben
-
This is an expected 'oddity' of the site: operator. Here is a video of Matt Cutts explaining the imprecise nature of the site: operator.
Browse Questions
Explore more categories
-
Moz Tools
Chat with the community about the Moz tools.
-
SEO Tactics
Discuss the SEO process with fellow marketers
-
Community
Discuss industry events, jobs, and news!
-
Digital Marketing
Chat about tactics outside of SEO
-
Research & Trends
Dive into research and trends in the search industry.
-
Support
Connect on product support and feature requests.
Related Questions
-
Query on google analytic benchmarking report
Hi All, First I select My Industry Vertical - ABCD then I select Region - US ( all region) then size by daily session - 10000-99999 finally reports appears "Blank" but when I decrease daily session from 10000-99999 to 1000-4999 or less then report is perfect. So what does it mean? My Avg daily session is 70k to 80k. So how to analysis benchmarking in this case? Thanks!
Reporting & Analytics | | pragnesh96390 -
Had suspicious spike in Adsense clicks, next day site ranking tanks
Yesterday, one of my sites had extreme Adsense clicks for several hours in the morning, which brought it up to CTRs of around 120%. My normal CTR is about 10-15%. It added several hundred dollars income over and above my normal amount. After that, it went back to normal. I have waited to see if Google would adjust the income down, as someone or some bot seemingly clicked the heck out of the site's ads. Nothing has been adjusted; it's been 24 hours. Question #1: what usually causes this type of insane clicking to occur (i.e. competitors messing me?) Then, today I noticed something else disturbing. I cannot find my site in the top 100 SERPs for the main keyword. I was at #1 for a couple years, then, when I changed themes from Thesis to Genesis (site otherwise exactly the same) a couple months ago, I bounced around various positions on the first page. In the last couple weeks we've been bouncing between the teens and the thirties. Two days ago we were at #15. (the site is still indexed when I use "site:" to check. It seems awfully coincidental that yesterday I had the Adsense click explosion, and today I'm not even in the top 100 for the first time in my pretty stable two-year history, and have no idea how far behind 100 I am. I went to Google Webmaster Tools and see no errors or warnings relating to this. Adsense has not sent me any messages. So... Question #2: does Google search apply some sort of penalty to site that have suspicious Adsense clicking? By the way, I don't have any funny business going on with any bad SEO practices, it's all above board, and I have thousands of real readers each day Liking and commenting on the pages. It's a very real site. Note: I have been checking the ranking each day via a Google Incognito window and searching for the term. Of course I use MOZ but I do the Incognito search for a quick real time check, which I've found to be accurate.
Reporting & Analytics | | bizzer0 -
Is there a problem with using same gmail account for multiple site analytics and GWMT?
Hi, Is there a problem or a general recommendation about using the same gmail account for two different sites (both in Google Analytics and Webmaster tools)? Thanks
Reporting & Analytics | | BeytzNet0 -
Question about cannonical URLs for a site redesign
Hello folks, I've redesigned a site completely and I ended up changing their CMS to wordpress as well. So their URLs which mostly ended in .html and folder organization have been thrown completely out the window with wordpress' '/' format. I'm just wondering what the best way is to approach retaining all the site's previous "link juice". What should I be doing here? How do I make sure their organic rankings don't fall? (They've left their previous SEO firm so they can't help me out on this). Thanks!
Reporting & Analytics | | seonubblet0 -
Google analytics question
Ok so in my traffic sources break down I have 3 sections: direct, organic, and referral. My question is under the referral tab I have recently noticed a new traffic source, my own website.... How is this possible? My top referring site is my own website.... Is this considered direct traffic or how is this being traced?
Reporting & Analytics | | jameswalkerson0 -
WMT and 'Links To Your Site'
Anyone else find that there are, almost continually, links added to the 'Links To Your Site' list from years ago that weren't previously reflected? I'm seeing links that were added to directories in 2008 (by whoever was doing the SEO then) only showing in the last week or so when these links weren't in the list a few months ago. I don't suppose there's much I can do - it's just annoying in that it adds to more people to contact to have nonsense removed.
Reporting & Analytics | | Martin_S0 -
Google Analytics Site Search to new sub-domain
Hi Mozzers, I'm setting up Google's Site Search on a website. However this isn't for search terms, this will be for people filling in a form and using the POST action to land on a results page. This is similar to what is outlined at http://support.google.com/analytics/bin/answer.py?hl=en&answer=1012264 ('<a class="zippy zippy-collapse">Setting Up Site Search for POST-Based Search Engines').</a> However my approach is different as my results appear on a sub-domain of the top level domain. Eg.. user is on www.domain.com/page.php user fills in form submits user gets taken to results.domain.com/results.php The issue is with the suggested code provided by Google as copied below.. Firstly, I don't use query strings on my results page so I would have to create an artificial page which shouldn't be a problem. But what I don't know is how the tracking will work across a sub-domain without the _gaq.push(['_setDomainName', '.domain.com']); code. Can this be added in? Can I also add Custom Variables? Does anyone have experience of using Site Search across a sub-domain perhaps to track quote form values? Many thanks!
Reporting & Analytics | | panini0 -
Site crawler hasn't crawled my site in 6 days!
On 4.23 i requested a site crawl. My site only has about 550 pages. So how can we get faster crawls?
Reporting & Analytics | | joemas990