If you know any statistics, you know "sampling". It's the idea of measuring some subset of the population. Using the Law of Large Numbers, you are able to learn from the sample and generalize to the population.
Your Stats professor never told you what "unsampling" is. You're not going to find this word in a statistics textbook either.
What does it mean? The "un" implies that you can recover the population from a sample. That makes no sense at all. If you have a sample of 5,000 Americans, how do you "unsample" to the 300 million plus population?
Yet, "unsampling" is a Big Data term. It appears that Google Analytics people invented this word. In their software, they talk about "sampled reports" and "unsampled reports". "Unsampled" means "not sampled," at least that's what the marketing folks want us to believe.
This would imply "unsampled" means the reports use all the data instead of some subset of the data.
Except that's not what the word means either.
I learned this the hard way. If you are a power user of Google Analytics, you might have seen this error message too:
This restriction to standard table reports is extremely severe. Almost all standard reports are univariate cuts of the data which means that if you are analyzing correlations, forget it.
In addition, an "unsampled" report cannot use more than say 100 million rows of data. Even if I'm needing a report of summary statistics that take up two rows of output, if I have more than 100 million rows of input (for the Premium version), they will not run it. I'm not sure how something that has an upper limit in size can be "not sampled".
I wish they would stop using that word. It's inaccurate, and nonsensical.
Comments