I think there might be a future for binoculars with EVFs (Electronic ViewFinders) as opposed to looking straight through glass. Reasons:
1) EVFs are improving quickly. Epson recently announced a 4.41mp EVF (probably as used in Leica SL cameras). As EVF resolution improves, OVFs(Optical ViewFinders) may no longer have superior optics.
http://www.dpreview.com/articles/7963916751/epson-mass-produces-4-41m-dot-lcd
2) EVFs allow information overlays (very useful if image capture is part of the plan).
3) EVFs can offer a 'live view' that can greatly enhance the image. For example, depending on the sensor that is used, the EVF can produce a relatively bright image in relatively dark conditions (much moreso than an OVF could provide).
4) EVFs can be stabilized much easier than OVFs.
5) EVFs can be digitally collimated. This is somewhat of a reach at the moment, but as processor speed improves, images from two different objectives can be combined/aligned even if/when objects are out-of-collimation. This could open the door to some interesting 3D views.
6) There is the possibility of having hybrid binoculars with both an EVF and an OVF, which may reduce cost of having two EVFs, two objective sensors, etc.
7) Image capture technology will become much better. Yes, the binocular cameras that have been available have been very gimmicky and poor image quality. However, if you have a high-performance processor and high-performance sensors, quality image capture becomes more of a reality. As an example, several modern cameras can capture 8mp stills from 4k video (this is MUCH better quality than, say, taking a screenshot of a video on your computer). Panasonic is rapidly working to develop sensors that enable 33mp still images to be extracted from 8k video by 2018.
8) we all know that larger sensors can gather/capture more light - simple physics. Larger sensors also require larger glass and faster processors. On the flipside, smaller sensors can use smaller/lighter glass, but gather less light. However, as processor speeds increase, you can leverage that processing power to extract more light from the smaller sensor image. As an example, suppose you can stack 60 images together over a one-second period, and add all of their light together - you get a very bright image. Of course, no one wants a one-second lag in their viewing, but those speeds will increase along with processor evolution, and you may only need a few frames which might only require hundredths, or thousands, of a second. Twenty years from now, it will be essentially real-time.
All of this technology is being developed in the optics industry, primarily for cameras, including surveillance cameras. If we are near the zenith for optical glass, the next logical step forward is electronically-enhanced glass. It is only a matter of time before the technology makes the leap from cameras to binoculars...