Tuesday, June 19, 2012

StatCounter and Statistics

"In summary - the Net Applications sample is small (it is based on 40,000 websites compared to our 3,000,000). Weighting the initial unrepresentative data from that small sample will NOT produce meaningful information. Instead, applying a weighting factor will simply inflate inaccuracies due to the small sample size." http://gs.statcounter.com/press/open-letter-ms
Very misleading. I have no idea which of StatCounter and Net Applications is more accurate. But that argument is off.

In statistics, sample size is basically irrelevant past a certain minimal size. That's how a survey of 300 people in the US can predict pretty well for 300 million. The number of people doesn't matter in two ways: First, it could be 1 million or 1 billion, the actual population size is irrelevant, and second, it could be 3,000 or 30,000 and it would not be much better than 300. The only exception to those two facts is if the population size is very small, say 100. Then a sample size of 100 is guaranteed to be representative, obviously. And for very small sample sizes like say 5, you have poor accuracy in most cases. But just 300 people is enough for any large population.

The reason is the basic statistical law that the standard deviation of the sample is the same as of the population, divided by the square root of the sample size. If you are measuring something like % of people using a browser, the first factor doesn't matter much. That leaves the second. Happily for statistics, 1 over square root decreases very fast. You get to accuracy of a few percent with just a few hundred people, no matter what the population size.

So that StatCounter has 3,000,000 websites and Net Applications has 40,000 means practically nothing (note that 40,000 even understates it, since those are websites. The number of people visiting those sites is likely much larger). 40,000 is definitely large enough: In fact, just a few hundred datapoints would be enough! Of course, that is only if the sample is unbiased. That's the crucial factor, not sample size. We don't really know which of StatCounter and Net Applications is less biased. But the difference in sample size between them is basically irrelevant. Past a minimal sample size, more doesn't matter, even if it seems intuitively like it must make you more representative.

8 comments:

  1. Hi Azaki,

    Thanks for your comments.

    We are very sorry to hear that you think our letter is misleading - that was not our intention at all. Instead, we are simply pointing out a number of errors and omissions in the Microsoft analysis.

    "In statistics, sample size is basically irrelevant past a certain minimal size. "
    Sure - but that's ONLY in the case of a contolled survey. Neither StatCounter not Net Apps is conducting such a survey. Instead, we are releasing aggegate statistics as compiled from our network sites.

    The accuracy of the global stats we produce is dependent on getting large numbers of page views from all around the world therefore we genuinely feel that 3,000,000 sites is hugely significant compared to 40,000 sites. We believe that our larger sample pool gives us larger samples of pageviews in every single country worldwide. We suspect that Net Apps does not have an accurate picture of browser usage in many countries due to their small/unrepresentative sample. This therefore impacts their global figures. [Let me emphasize again that we are very sorry that Net Apps has been dragged into this dicussion. We would never have engaged in a public critique of their stats if it were not for the misleading, ill-informed and one-sided article published on the IE blog. Our invitation to add some comments to our letter remains open to them if they wish - or if they prefer not to comment, that's fine too.]

    Furthermore, Net Applications only offer paid web analytics services which may introduce a bias towards profit-making, ecommerce-type sites in their sample pool. Our free service with optional paid upgrades gives us better variety and coverage of all site types.

    In short in order for us (or Net Apps) to correctly analyze global browser usage stats, it's necessary to get good coverage of brower usage in all (or at the very least as many as possible) countries. We feel that with the 40,000 sites in the Net Apps network they are not achieving an accurate picture of browser usage in many individual countries. Combining these unrepresentative stats then creates a problem in their global numbers (which is magnified by the use of weights).

    "We don't really know which of StatCounter and Net Applications is less biased."
    We publish our individual country sizes which allows you to see the depth of coverage we have in each country (http://gs.statcounter.com/faq#sample-size). Net Apps do not publish this information. We encourage Net Apps to release this info so that we can have an accurate comparison of our services.

    If you have any other concerns or comments, we would be very happy to address them. (http://gs.statcounter.com/feedback)

    ReplyDelete
  2. Hi Azaki,

    Thanks for your comments.

    We are very sorry to hear that you think our letter is misleading - that was not our intention at all. Instead, we are simply pointing out a number of errors and omissions in the Microsoft analysis.

    "In statistics, sample size is basically irrelevant past a certain minimal size. "
    Sure - but that's ONLY in the case of a contolled survey. Neither StatCounter not Net Apps is conducting such a survey. Instead, we are releasing aggegate statistics as compiled from our network sites.

    The accuracy of the global stats we produce is dependent on getting large numbers of page views from all around the world therefore we genuinely feel that 3,000,000 sites is hugely significant compared to 40,000 sites. We believe that our larger sample pool gives us larger samples of pageviews in every single country worldwide. We suspect that Net Apps does not have an accurate picture of browser usage in many countries due to their small/unrepresentative sample. This therefore impacts their global figures. [Let me emphasize again that we are very sorry that Net Apps has been dragged into this dicussion. We would never have engaged in a public critique of their stats if it were not for the misleading, ill-informed and one-sided article published on the IE blog. Our invitation to add some comments to our letter remains open to them if they wish - or if they prefer not to comment, that's fine too.]

    Furthermore, Net Applications only offer paid web analytics services which may introduce a bias towards profit-making, ecommerce-type sites in their sample pool. Our free service with optional paid upgrades gives us better variety and coverage of all site types.

    In short in order for us (or Net Apps) to correctly analyze global browser usage stats, it's necessary to get good coverage of brower usage in all (or at the very least as many as possible) countries. We feel that with the 40,000 sites in the Net Apps network they are not achieving an accurate picture of browser usage in many individual countries. Combining these unrepresentative stats then creates a problem in their global numbers (which is magnified by the use of weights).

    "We don't really know which of StatCounter and Net Applications is less biased."
    We publish our individual country sizes which allows you to see the depth of coverage we have in each country (http://gs.statcounter.com/faq#sample-size). Net Apps do not publish this information. We encourage Net Apps to release this info so that we can have an accurate comparison of our services.

    If you have any other concerns or comments, we would be very happy to address them. (http://gs.statcounter.com/feedback)

    (apologies if this is a repost - our first attempt to comment didn't seem to work)

    ReplyDelete
  3. First thing, note that I only commented on the specific issue of sample size. The other issues mentioned in that link are entirely separate, and I don't think I know enough to have an opinion on them.

    But regarding

    > The accuracy of the global stats we produce is dependent on getting large numbers of page views from all around the world therefore we genuinely feel that 3,000,000 sites is hugely significant compared to 40,000 sites.

    There is no statistical basis for that statement. If the actual population size is much larger than both samples (which is true) and if both samples are large enough (larger than a few hundred datapoints, which they are), then from a statistical point of view, all that matters is which sample collection method is less biased. 300 unbiased samples are better than 3,000,000 biased ones. Yes, this seems very counterintuitive - it seems like getting more samples from the population means you are becoming more valid - but the intuition here is false.

    Again, you had some arguments about which is more biased, and I don't have an opinion about those. But the mere sample size is just irrelevant. Sorry to be pedantic about this, but it's a pet peeve of mine.

    ReplyDelete
  4. Azaki, as stated above by the SC rep, this is not a controlled survey: you do not get to choose your subjects(data pool), instead they choose you.

    Is it enough to prove that in this case more data leads to better data? It certainly isn't, but in my opinion it does raise an eyebrow :)

    I wonder how those 40,000 Net Apps datapoints are distributed around the world.

    ReplyDelete
  5. The question is what amount of bias there is in "selecting" the sample from the population. If the "selection" is done by them coming to you or vice versa is a separate issue - although obviously if you pick them, you have more chances to ensure lower bias (but never perfectly).

    In any case, my point in the article is that sample size is entirely irrelevant here. 40,000 sites or 40,000,000 sites are about the same, both are huge sample sizes, and statistically speaking there simply isn't much benefit going from 40,000 to 40,000,000, look at 1/sqrt(n).

    Put another way, both sample sizes are more then enough to predict their "underlying population" very well - the underlying population being the population the samples can be considered to be taken from, randomly. If the sample is truly random and unbiased, the underlying population is the same as the real one we care about. If not, then no matter how well we approximate it, we are approximating with a bias that we cannot fix.

    So predicting the underlying population a tiny bit better - which is exactly what going from 40,000 to 40,000,000 samples gives you - is just not important. What is important is which underlying population is closer to the real one, and that has nothing to do with sample size here.

    ReplyDelete
  6. Simply marvelous Alon. You are indeed correct with regards to your central point around sample size not mattering past a certain point.

    That said, you raised a secondary issue of sample selection bias.

    For arguments sake lets assume there are two billion (2b) browser visitors in total.

    Would it be fair to assume that if your sample size (n) is almost the same size as your population (N) [2b visitor browsers], then the bias issue rapidly becomes irrelevant (because your sample is nearly the total anyway and there are hardly any "unpicked" samples to introduce bias).

    Hence, (and I'm not a statistician), at what point would this become a contributing factor in comparing biases between vastly different sized sample sizes?

    ReplyDelete
  7. Yes, if your sample is very close to the population size, then bias must vanish - you are basically accessing all the population and not a hopefully-representative sample.

    But we don't have anywhere near that amount of sampling capability in any of the public statistics. However, I would imagine Google and Facebook in their private data do track most of the world population (at least outside of a few countries like China), because most people use those services.

    ReplyDelete
  8. Forgot to say, you need to get very close to the full population size. If the population is split 50-50 on something, and you sample half of that huge population - you might still be totally wrong if you happen to sample one of those two halves and none of the other.

    ReplyDelete