No: If you want to test the assumption "less than 5% of the binoculars are faulty", then even a single draw (yielding a faulty binocular) suffices to reject that assumption on 95% confidence basis (because the probability to draw a faulty binocular would be below 5%).
Cheers,
Holger
The reviews are still not accessible for me.
Me, also.
I am in the habit of frequently clearing my cookies and cache.
I believe that when they made changes to the site, some errors were made.
It happens.
Holger,
If you want to test the assumption that less than 5% of the binoculars are faulty, a single draw is not sufficient for any meaningful statistical conclusion, in spite of your calculation. It's a well known pitfall based on a theoretical anomaly that many instructors point out. Rather than trying to argue with you, however, I've attached a salient example below. Even so, there's no need to accept it. It's a matter of experimental design, and requires judgment.
Putting my design hat on, I would make the following technical criticisms. Your assessment was: (1) done on a post hoc basis, i.e., decided upon after the observations were made, (2) not done with an adequate sample size, as mentioned, and (3) not done using a random draw from the population of interest, i.e., clearly effected by prior stress tests. Impartial evaluations are designed before data collection to avoid selective use of data; adequate sample size is established a priori based on statistical power; and the samples must be drawn randomly from the population of interest without alteration. Applied statistics is a disciplined and rational process where theory does not hold common sense hostage; and magic is only appealed to as a last resort.
I hasten to add, in this case, that the population being sampled from, and generalized to, has not been adequately characterized either, i.e., it's been oversimplified. Leica binoculars may be assumed to emerge from a production line that uses feedback from ongoing statistical sampling. The output, therefore, is a time series of items that are not necessarily identical due to feedback corrections. The flawed samples in the Allbinos test, had they not been corrupted by harsh treatment, may still have been from a localized batch and not representative of total production output. Multiple samples would need to be taken at different time points to form meaningful conclusions about that.
We may disagree, but I trust I've not been too disagreeable. :smoke:
Thanks,
Ed
Hi Ed,
OK, then let's come to the point:
1) We start with a hypothesis: Less than x% of binoculars of a certain brand are leaking (x being a small number well below 50%).
2) We start drawing and find, with a first draw, a binocular that leaks. The same with the second one, with the third one and so on. We continue to find leaking binoculars, no exception.
3) At which point are we allowed to interrupt the test procedure and to reject our hypothesis on 95% confidence basis? How would you calculate the number of times such an experiment has to be repeated before safely rejecting the hypothesis?
P.S.: To keep things reasonably simple, let's assume that the total set of binoculars to draw from is large (i.e. infinite), so we can assume that our drawing would not affect the distribution of the binoculars we are drawing from ...
Cheers,
Holger
Hi Holger,
Yes, there definitely are sequential sampling plans that allow for discontinued testing, but explaining the framework in which it's done isn't all that simple. Remember, please, I'm only the messenger.
I've only got limited time for the next few days, so let me briefly frame the issue and return to it later on. Broadly speaking, there are two risks to be taken into consideration. First, in deciding to reject the null hypothesis, e.g., p < .05, we accept a risk of being wrong of .05, at a 95% confidence level. This is called a Type I error. (In this context it's also called seller's risk.) However, if the hypothesis were not rejected, there is also a risk of accepting that > 95% of the products are good with 95% confidence. This is called a Type II error. (In this context, buyer's risk.) A sufficient sample size N = k, therefore, is chosen to bracket the upper and lower limits for the population parameter, p, to balance these risks. Unfortunately, k is not usually a tiny number allowing for expensive specimens costing > $2000.
In the product testing arena, this risk model is typically incorporated into what's called an Operating Characteristic Curve.
This should keep you busy for a day or two. Hope it helps.
Regards,
Ed
PS. If trials were allowed to run for N = 10 Leicas, and stopped for cost reasons after none leaked, p (the proportion defective) would only be bracketed in the 95% confidence range 0.00 — .267. So the buyer has a high risk even though he might be led intuitively to believe otherwise. If the test were terminated at N =5, after only five non-leaks, the Type II risk would be much greater. This pitfall may soon become a source of confusion when 10 members innocently report the outcomes of voluntarily water-testing their bins (hopefully all of the same type).
Haven't had a problem with the Allbinos site. There is a message about a change to their cookies/permissions so perhaps that is causing a problem for some people?
Interesting. Keep us posted, Oetzi. :t: I followed the link and read the thread but my German is...um..a little rusty I guess. They're only dunking them for 5 minutes? Well, of course their own binoculars are on the line, so...
Mark
... Your "counter-example" given above doesn't apply here: It was about the hypothesis whether or not a pair of dice is fair. The acceptance or rejection of this hypothesis would require a sampling of the entire distribution, not just the remote edge as in our case.
Cheers,
Holger
May now have been fixed.
The reviews appear to all be available again, May 12, 5pm EasternTime.
It's unfortunate both Leicas failed the water test, and also unfortunate that Allbinos didn't send them back to Leica and ask for an explanation before pummeling the brand in public.
My two cents,
Mark
A week or so after returning I was asked by e-mail what kind of water was used in waterproof test.
Well, seeing the Geovid picture, I also wondered about that. Where's the mud coming from?
Thanks, but enough now of these torture picsEspecially for you some more images of Ultravid HD.