Curtis Croulet said:
Precisely. I think it's rather unsporting to expect the article to be more, and its not being more is due to neither bias nor incompetence. It is what it is: an evaluation based upon simple "like"/"don't like" opinions, which is how 98% of birders evaluate binoculars. Most people (not the denizens of this forum, but most) will pick up a bin and say, "Wow, that's sharp!" or "Hmm, what's so special about this?" or "This is too heavy!", and -- for them -- one number to express optical performance (for example) is enough. I'm reminded of the test reports of audio and photo products in Consumer Reports where, say, "sound quality" or "optical quality" is boiled down to one colored button. To an audio or photo enthusiast, it's laughable, but for most people it's more than enough.
John/Curtis,
If you like this article, and the methods used in it to produce rankings (i.e., rating statistics) I certainly honor your defense of it. Personally, I would have hoped that a full professor at a major Ornithology laboratory would be more professional in evaluating the tools of the trade for the average birder. Whether or not he realizes it, statistical analysis was indeed done with the data, and I don't think it's "unsporting" to point that out that it had serious failings — just as we have no difficulty discussing the failings of a binocular or telescope.
Oh, I recognize a challenge when I see one guys. Alas, I have no access to a large number of new binoculars provided by manufacturers, or students/staff to make evaluations. However, for the next "review" I'd be more than pleased to provide free consulting services for both the design and analysis. How's that? |=)|
Since some folks apparently don't see the perils of composite ranking scores (i.e., overall "goodness" scores) I'm somewhat reluctant to mention that a few days ago I transcribed the "Top Gun" statistics, decomposed the Quality Index (QI) scores, and then did a multiple regression analysis. No doubt many will roll their eyes at this boring stuff, — but IMO it is relevant.
The QI scores are a linear combination of subjective and objective factors. Dr. Rosenberg constructed the weighting function out of whole cloth (remember he weighted image quality by 2x?). Well, it's not evident at first but he also gave zero weight to several other objective factors important enough to be included in the table and which most people would take into consideration when evaluating binoculars. These are magnification (power), objective size, weight, and price.
The pure subjective index (SI) part of the QI can be obtained by subtracting out the objective ranking scores for FOV and close-focus, leaving SI = 2x(image quality) + overall feel + eyeglass friendliness. The regression analysis addressed the simple question:
"To what extent can SI be predicted from the physical properties of the binoculars, without knowing brand name or model?" The five predictors were FOV, close focus, magnification, objective size, and price.
Okay, subject to sampling error and several "assumptions," the multiple linear regression R= .831 suggests* that one is able to predict R^2 = 69% of the variability in the
subjective scores, which include the all-critical "image quality". This prediction is made without knowing the brand of the binoculars or anything physically about their image quality. The computed weights of the function:
SI = .11(Power) - .41(Objective Size) + .52(FOV) + .36(Weight) – .26(Close Focus) + .49(Price)
suggest that for these (averaged) raters, FOV and Price are positively related to the SI rating, while objective size and close focus are negatively related. Larger objectives and longer near-focus decrease the SI, while power (magnification) has the least predictive influence of all. What's the point? The Quality Index is largely determined by physical design factors, and may have very little residual relationship to "quality" as it might commonly be understood. As I see it, the metric is basically tautological.
It's really not worth the effort to try to unscramble Dr. Rosenber's egg any further, but I will add this comment. If all that was intended was a composite ranking of 20 "Top Gun" binoculars, what in the world would have been the problem with simply asking the 40 evaluators to independently place them in rank order (1-20) and then analyzing the ranks for each binocular? For each binocular one could compute the mean and standard deviation of the ranks (better the median and interquartile range). For example, if everyone rated a particular binocular as #1, its average ranking would be 1 and its standard deviation 0. Several simple non-parametric methods could also be applied to determine binocular clusters, or even how the ranking results differed between experts and novices.
Okay, if you still believe the Cornell method has great meaning/merit you're certainly welcome to that opinion. My purpose has not been to discredit Dr. Rosenberg, which it's probably what it sounds like, so this will be my swan song on the subject (mute swan song
).
Elkcub
* the analysis is based on a very small sample and the data are combined across an unknown and varying number of reviewers, hence, the results are suggestive at best.