Just to inform you. I've just got phone call from Leica Sport Optics. The case will be continued.
Hello Arek,
I'm not saying this to be insulting, argumentative, or to restart a technical dispute with Holger, who is certainly free to defend you or express his own views. I would rather take this opportunity to point out that if Allbinos' findings are cast into a statistical framework, which I seriously recommend against, that one should be mindful of
responsible product testing conventions, and how to present impartial conclusions that are helpful to the potential buyer while not being unnecessarily damaging to the manufacturer.
A professional consultant might be in order because this is a specialty area and statistical reasoning is not for the faint of heart. Fortunately, however, I've just discovered there are on-line tools available that might meet your general needs
PRODUCT TESTING STATISTICAL TOOLS. But, even so they should be well understood, since they can be misused.
In this case, you are dealing with single Bernoulli trials from each of two product lines, so they should be analyzed separately. A waterproof probability p = .95 is claimed by the manufacturer, which, if true, would give the buyer a 1 in 20 risk of buying a leaky product. One in twenty really isn't all that high when you think about it, but many people are impressed when it's stated as 5% rejects, even better, .05. Such is advertising.
Plugging numbers into this tool, the standard deviation of a Binomial sampling distribution of independent Bernoulli trials is (p*(1-p))^.5 = .2179. I've selected an
initial value for mu(1) as .90. What we are estimating is the
sample size needed to draw a conclusion that the buyer's acceptable risk should really be downgraded 1 in 10 rather than 1 in 20. The computation tells us that for adequate resolution of the two hypotheses a sample of N = 118 is needed, assuming statistical power of only .80. This means that only 80%, or 4 out of 5, of such conclusions need to be correct. If the buyer's acceptable risk were further downgraded to 3 in 10, then a sample of 5 would do the trick, and if the acceptable risk were decreased to 4 in 10 only a single sample would be needed at the Alpha = .05 level of confidence (assuming all else was perfect). But, since a confidence level of .05 is also questionable for an important decision, setting Alpha = .01 would only increase the necessary sample sizes to 191, 8, and 4, respectively.
Now let's reverse the process and use the calculator to determine statistical power for a fixed sample size of N = 1. (Click the Calculate Power button at top.) For mu(1) = .9, the power is only .04. Hence, the experimenter's reliability in drawing this modest conclusion is only 4%. As you can see, however, inserting N = 118 increases the power to 80% as shown in the earlier calculation.
It should be clear that experimenter reliability (statistical power) vs alternative consumer risk is the nature of statistical product evaluation. A single Bernoulli trial (i.e., N = 1) has almost no power, so your tests should be presented in a different framework.
Hope this helps your endeavor. I've got nothing against what you're doing, but I do think it's more meaningful to help the potential buyer make an informed decision than wag a finger at the manufacturer. That's what impartial evaluation is all about.
Ed