• BirdForum is the net's largest birding community dedicated to wild birds and birding, and is absolutely FREE!

    Register for an account to take part in lively discussions in the forum, post your pictures in the gallery and more.

Endurance test of binoculars (1 Viewer)

elkcub

Silicon Valley, California
United States
No: If you want to test the assumption "less than 5% of the binoculars are faulty", then even a single draw (yielding a faulty binocular) suffices to reject that assumption on 95% confidence basis (because the probability to draw a faulty binocular would be below 5%).

Cheers,
Holger

Holger,

If you want to test the assumption that less than 5% of the binoculars are faulty, a single draw is not sufficient for any meaningful statistical conclusion, in spite of your calculation. It's a well known pitfall based on a theoretical anomaly that many instructors point out. Rather than trying to argue with you, however, I've attached a salient example below. Even so, there's no need to accept it. It's a matter of experimental design, and requires judgment.

Putting my design hat on, I would make the following technical criticisms. Your assessment was: (1) done on a post hoc basis, i.e., decided upon after the observations were made, (2) not done with an adequate sample size, as mentioned, and (3) not done using a random draw from the population of interest, i.e., clearly effected by prior stress tests. Impartial evaluations are designed before data collection to avoid selective use of data; adequate sample size is established a priori based on statistical power; and the samples must be drawn randomly from the population of interest without alteration. Applied statistics is a disciplined and rational process where theory does not hold common sense hostage; and magic is only appealed to as a last resort. ;)

I hasten to add, in this case, that the population being sampled from, and generalized to, has not been adequately characterized either, i.e., it's been oversimplified. Leica binoculars may be assumed to emerge from a production line that uses feedback from ongoing statistical sampling. The output, therefore, is a time series of items that are not necessarily identical due to feedback corrections. The flawed samples in the Allbinos test, had they not been corrupted by harsh treatment, may still have been from a localized batch and not representative of total production output. Multiple samples would need to be taken at different time points to form meaningful conclusions about that.

We may disagree, but I trust I've not been too disagreeable. :smoke:

Thanks,
Ed
 

Attachments

  • Testing with p-values.jpg
    Testing with p-values.jpg
    110.6 KB · Views: 186
Last edited:

Binastro

Well-known member
Dadra,
thank you very much for that link as I needed to check some specific points.

I was surprised that even the cheaper Nikon models seem to be up to specification.
Thanks again.
 

etudiant

Registered User
Supporter
Me, also.

I am in the habit of frequently clearing my cookies and cache.

I believe that when they made changes to the site, some errors were made.

It happens.

May now have been fixed.
The reviews appear to all be available again, May 12, 5pm EasternTime.
 

Holger Merlitz

Well-known member
Holger,

If you want to test the assumption that less than 5% of the binoculars are faulty, a single draw is not sufficient for any meaningful statistical conclusion, in spite of your calculation. It's a well known pitfall based on a theoretical anomaly that many instructors point out. Rather than trying to argue with you, however, I've attached a salient example below. Even so, there's no need to accept it. It's a matter of experimental design, and requires judgment.

Putting my design hat on, I would make the following technical criticisms. Your assessment was: (1) done on a post hoc basis, i.e., decided upon after the observations were made, (2) not done with an adequate sample size, as mentioned, and (3) not done using a random draw from the population of interest, i.e., clearly effected by prior stress tests. Impartial evaluations are designed before data collection to avoid selective use of data; adequate sample size is established a priori based on statistical power; and the samples must be drawn randomly from the population of interest without alteration. Applied statistics is a disciplined and rational process where theory does not hold common sense hostage; and magic is only appealed to as a last resort. ;)

I hasten to add, in this case, that the population being sampled from, and generalized to, has not been adequately characterized either, i.e., it's been oversimplified. Leica binoculars may be assumed to emerge from a production line that uses feedback from ongoing statistical sampling. The output, therefore, is a time series of items that are not necessarily identical due to feedback corrections. The flawed samples in the Allbinos test, had they not been corrupted by harsh treatment, may still have been from a localized batch and not representative of total production output. Multiple samples would need to be taken at different time points to form meaningful conclusions about that.

We may disagree, but I trust I've not been too disagreeable. :smoke:

Thanks,
Ed

Hi Ed,

OK, then let's come to the point:

1) We start with a hypothesis: Less than x% of binoculars of a certain brand are leaking (x being a small number well below 50%).

2) We start drawing and find, with a first draw, a binocular that leaks. The same with the second one, with the third one and so on. We continue to find leaking binoculars, no exception.

3) At which point are we allowed to interrupt the test procedure and to reject our hypothesis on 95% confidence basis? How would you calculate the number of times such an experiment has to be repeated before safely rejecting the hypothesis?

P.S.: To keep things reasonably simple, let's assume that the total set of binoculars to draw from is large (i.e. infinite), so we can assume that our drawing would not affect the distribution of the binoculars we are drawing from ...

Cheers,
Holger
 
Last edited:

elkcub

Silicon Valley, California
United States
Hi Ed,

OK, then let's come to the point:

1) We start with a hypothesis: Less than x% of binoculars of a certain brand are leaking (x being a small number well below 50%).

2) We start drawing and find, with a first draw, a binocular that leaks. The same with the second one, with the third one and so on. We continue to find leaking binoculars, no exception.

3) At which point are we allowed to interrupt the test procedure and to reject our hypothesis on 95% confidence basis? How would you calculate the number of times such an experiment has to be repeated before safely rejecting the hypothesis?

P.S.: To keep things reasonably simple, let's assume that the total set of binoculars to draw from is large (i.e. infinite), so we can assume that our drawing would not affect the distribution of the binoculars we are drawing from ...

Cheers,
Holger


Hi Holger,

Yes, there definitely are sequential sampling plans that allow for discontinued testing, but explaining the framework in which it's done isn't all that simple. Remember, please, I'm only the messenger. ;)

I've only got limited time for the next few days, so let me briefly frame the issue and return to it later on. Broadly speaking, there are two risks to be taken into consideration. First, in deciding to reject the null hypothesis, e.g., p < .05, we accept a risk of being wrong of .05, at a 95% confidence level. This is called a Type I error. (In this context it's also called seller's risk.) However, if the hypothesis were not rejected, there is also a risk of accepting that > 95% of the products are good with 95% confidence. This is called a Type II error. (In this context, buyer's risk.) A sufficient sample size N = k, therefore, is chosen to bracket the upper and lower limits for the population parameter, p, to balance these risks. Unfortunately, k is not usually a tiny number allowing for expensive specimens costing > $2000.

In the product testing arena, this risk model is typically incorporated into what's called an Operating Characteristic Curve.

This should keep you busy for a day or two. Hope it helps.

Regards,
Ed

PS. If trials were allowed to run for N = 10 Leicas, and stopped for cost reasons after none leaked, p (the proportion defective) would only be bracketed in the 95% confidence range 0.00 — .267. So the buyer has a high risk even though he might be led intuitively to believe otherwise. If the test were terminated at N =5, after only five non-leaks, the Type II risk would be much greater. This pitfall may soon become a source of confusion when 10 members innocently report the outcomes of voluntarily water-testing their bins (hopefully all of the same type).
 
Last edited:

Holger Merlitz

Well-known member
Hi Holger,

Yes, there definitely are sequential sampling plans that allow for discontinued testing, but explaining the framework in which it's done isn't all that simple. Remember, please, I'm only the messenger. ;)

I've only got limited time for the next few days, so let me briefly frame the issue and return to it later on. Broadly speaking, there are two risks to be taken into consideration. First, in deciding to reject the null hypothesis, e.g., p < .05, we accept a risk of being wrong of .05, at a 95% confidence level. This is called a Type I error. (In this context it's also called seller's risk.) However, if the hypothesis were not rejected, there is also a risk of accepting that > 95% of the products are good with 95% confidence. This is called a Type II error. (In this context, buyer's risk.) A sufficient sample size N = k, therefore, is chosen to bracket the upper and lower limits for the population parameter, p, to balance these risks. Unfortunately, k is not usually a tiny number allowing for expensive specimens costing > $2000.

In the product testing arena, this risk model is typically incorporated into what's called an Operating Characteristic Curve.

This should keep you busy for a day or two. Hope it helps.

Regards,
Ed

PS. If trials were allowed to run for N = 10 Leicas, and stopped for cost reasons after none leaked, p (the proportion defective) would only be bracketed in the 95% confidence range 0.00 — .267. So the buyer has a high risk even though he might be led intuitively to believe otherwise. If the test were terminated at N =5, after only five non-leaks, the Type II risk would be much greater. This pitfall may soon become a source of confusion when 10 members innocently report the outcomes of voluntarily water-testing their bins (hopefully all of the same type).

This appears quite clear to me: Given our situation (our hypothesis assumes a low fraction of bad binoculars) it would require a large sample to accept the hypothesis, since you have to test quite a few good samples before you can safely say "95% are good on 95% confidence level".

Yet, our question was different: Is it possible to REJECT that hypothesis with very few samples, IF these samples turn out to be bad. Since that was the fact: Arek tested 2 Leicas, 2 were 'bad', none was 'good', and I suggest that suffices to reject our hypothesis (on 95% confidence basis).

Is there a significant Type I error? No! Proof: If in fact the fraction of 'bad' items is less than 0.05, then such a result (2 'bad' out of 2 trials) had the probability of at most 0.05*0.05 = 0.25%. So if me made some clones of Arek, all of them repeated that test, then - given the hypothesis were right - at most 0.25% of these clones would have the same test result: Thus, the Type I error which we make, when rejecting the hypothesis after Arek's test, is far below 0.05. In fact, even a single 'bad' draw out of 1 draw had been sufficient here.

In turn, if he had drawn 2 'good' binoculars, it would not have been sufficient to accept the hypothesis since the sample size were far too small. The rejection (upon drawing 'bad' items) or the acceptance (upon drawing 'good' items) behave highly asymmetrically here, in terms of sample sizes, because our hypothesis places the 'bad' items into the remote corner of the distribution - hitting this corner repeatedly should allow us to reject our hypothesis very early. If the theory of statistical testing did not consider these (very logical) facts then I would tend to conclude something may be wrong with statistics ;-)

P.S. Your "counter-example" given above doesn't apply here: It was about the hypothesis whether or not a pair of dice is fair. The acceptance or rejection of this hypothesis would require a sampling of the entire distribution, not just the remote edge as in our case.

Cheers,
Holger
 

NDhunter

Experienced observer
United States
Haven't had a problem with the Allbinos site. There is a message about a change to their cookies/permissions so perhaps that is causing a problem for some people?

That is correct, you have to click on the cookies thing, and then you can read
all of the reviews.

Most sites, just go ahead and place cookies, I like mine fresh from the oven. 8-P

Jerry
 

NDhunter

Experienced observer
United States
Interesting. Keep us posted, Oetzi. :t: I followed the link and read the thread but my German is...um..a little rusty I guess. They're only dunking them for 5 minutes? Well, of course their own binoculars are on the line, so...

Mark

Mark:

I just reread the Endurance Test, and when referring to the Leica, it seems that
when Arek found the Leica's both leaked water, a large disappointment was
found. And for many expecting high quality out of the "alphas", this would
be seen as a large problem.

I do think that lent to a negative tone to the review, but in any case, he only
reported what was found. I do hope some others will test their Leica's, and
will report.

I trust the reviewer in the results of the tests.

Jerry
 

elkcub

Silicon Valley, California
United States
... Your "counter-example" given above doesn't apply here: It was about the hypothesis whether or not a pair of dice is fair. The acceptance or rejection of this hypothesis would require a sampling of the entire distribution, not just the remote edge as in our case.

Cheers,
Holger

No, it was a perfect example of how a correct calculation on a single sample can lead to a ridiculous conclusion. You've been supporting another one ... :flyaway:
 

Arek

Well-known member
It's unfortunate both Leicas failed the water test, and also unfortunate that Allbinos didn't send them back to Leica and ask for an explanation before pummeling the brand in public.
My two cents,
Mark

Hi!

The official distributor of Leica was informed about poor result of both Leicas. They asked for quick return of the binoculars to send them back to the headquarters for inspection. A week or so after returning I was asked by e-mail what kind of water was used in waterproof test. I answered that normal water from the tap. It was last message from Leica I got. I waited a couple of months with the publication of the whole endurance test - it was enough time for explanation and the reaction. I got nothing. In the earlier phone call with the distributor I informed that I would be happy to publish the official statement of Leica, when the full analysis of the damaged Leica binoculars will be finished.

Arek
 
Last edited:

dalat

...
A week or so after returning I was asked by e-mail what kind of water was used in waterproof test.

Well, seeing the Geovid picture, I also wondered about that. Where's the mud coming from?

And btw, the Ultravid picture does not really show clearly that water is inside the binocular, I only can see some drops and fogging that could also be on the outside. I don't want to doubt that water was inside, just saying that the pics don't illustrate it well.
 

Arek

Well-known member
Especially for you some more images of Ultravid HD. Just after taking of from the bath it was a lot of tiny droplets on the internal lenses. After day or two they linked into the bigger ones.
 

Attachments

  • ult1.JPG
    ult1.JPG
    65.7 KB · Views: 201
  • ult2.JPG
    ult2.JPG
    62.9 KB · Views: 188
  • ult3.JPG
    ult3.JPG
    48 KB · Views: 187

Users who are viewing this thread

Top