Hi Holger,
Trying to minimize technical stuff, inferential statistics is based on the twin notions of a population distributions and sampling distributions. Let's say the population is comprised of all new binoculars made by Leica, merging the two types used in the Allbinos tests. Each member of the population is either waterproof or not, i.e., having binary values (1 or 0), but we don't know in what proportion. In order to find out, we devise a scheme of drawing a random sample of size N from the large population, with the idea of inferring whether or not the proportion of zeros, q, exceeds some criterion e.g., 5%. If q > 5% the alarm goes off. Otherwise the percentage of good products, i.e., p = 1 - q, is considered acceptable and production continues.
Now let's consider how large a sample is needed to get information about the population proportions. In the lower limit, when N = 1, the only thing that can inferred about the population, if a 0 is recorded, is that p < 100%. That's it. With a sample of N = 2, even with two zeros not much more is learned. As N increases, however, the population proportions are increasingly expected be mimicked by the sample proportions. So a very small sample tells us almost nothing and a large sample can tell us everything.
For practical reasons it isn't possible to measure huge samples, so
sampling distributions http://en.wikipedia.org/wiki/Sampling_distribution are used to draw probabilistic inferences. For binary events the sampling distribution of p is called a
binomial distribution, having a mean = Np and variance = Np(1-p).
http://en.wikipedia.org/wiki/Binomial_distribution The statistical question then becomes: for a sample of size N what is the probability that the observed proportion q = 1 - p exceeds our production criterion of q = .05 with a confidence level of, let's say, .95.
This reasoning is the basis for constructing quality control charts
http://en.wikipedia.org/wiki/Control_chart, although it's too complicated to present here. Let me simply say that within this framework a sample of N=2 reveals almost nothing.
Sorry to disagree with you, my friend. I liked your earlier conclusion that two failures should probably catch the manufacturer's attention, but that might just be to devise damage control in the marketing arena.
I hope I can still count on you for help with PDEs when the need arises.
Best regards,
Ed