[So binned quad photosites willl be twice as bad as a single bigger site, but only if the noise level per photo-site (in typical electron number) is the same;
BJL,
Not sure that this is right. There's some interesting information on binning at
http://www.roperscientific.com/library_enc_binning.shtmlAs I understand it, there are three broad sources of noise.
1. Photon noise which Samikharusi referred to (due to fluctuations in the photon arrival rate at a given point - if only these particles would behave themselves!!)
2. Dark Current noise, mainly thermally generated stray electrons that can be reduced by cooling.
3. 'Read' noise induced mainly from the on-chip preamplifier as the electron charge is read.
Binnining reduces the Read noise but not the other two types of noise, and it's easy to see why. Binning four photosites means one 'read' as opposed to four, or 1/4 of the read noise. Binning 16 photosites requires only one reading instead of 16.
So the question arises, when is read noise a significant proportion of the total noise? Answer: in low light conditions.
For example, let's say only 9 photons are impinging upon a photosite. Photon noise, according to the Einsteinian square root law, is 3 photons. Dark noise is perhaps 1 electron. Read noise is perhaps 5 electrons. Result: nothing remains of the original signal. It's lost in the noise. Let's try 16x binning. Combined signal is 16x9=144 photons. However, photon noise is not the square root of 144 but 16x the square root of 9 = 48 photons. Dark noise is 16x1 electrons but read noise is still only 5. Total noise is 69 electrons which leaves 75 photons from the original signal. Now that's a worthwhile improvement.
However, it seems to me for binning to be really useful it has to be selective. There's no advantage in reducing the overall resolution of the image in order to acheive more detail in the shadows. We need a chip which is able to bin almost instantaneously and on demand, only those areas of the sensor which correlate with the shadow areas of the image. I feel as though I'm in Star Trek territory here. I've got no idea if this is possible or whether something similar is already being done.
On the issue of S/N and DR of the smaller pixel, I think you're right. A photosite can hold only a certain number of electrons (or charge) depending on it's size. A 10 litre bucket can not hold more than 10 litres, and it may not even be advisable to fill it to the top. But there has to be some way around this. Continuing with the analogy of a water bucket for each photosite, the problem it seems to me is that many of the buckets, for the average image, are going to be less than half full. Some are going to be virtually empty and some are going to be overflowing or close to it. What a waste!
I imagine there's there's an optimum fill level for the bucket which acheives maximum 'linear' signal-to-noise - say 75% full. Now, as Moore's law continues to operate (and remember that a huge advantage of the CMOS chip is that it uses the normal computer fabrication systems which allows for all sorts of add-on processing features to be included on the sensor) I envisage that it might eventually be possible to give each photosite a variable sensitivity that changes automatically (and almost instantaneously) according to the intensity of light that falls upon it so that each bucket is at least reasonably full. The 'real' information about the image would then consist of a fairly narrow range of variability in the fill level of the buckets, plus very specific information regarding the changes in the sensitivity of individual photosites. In the process of decoding this information and downloading the image, the compressed levels would be restored in a very precise manner at the pixel level from the recorded data for each pixel's individual sensitivity.
I imagine for such a system to be practicable, there would have to be a pre-exposure along the lines of the red-eye-reduction pre-flash. The major flaw in such a system, as I see it, is the possibility of movement between the pre-exposure and the actual exposure. If one keeps the pre-exposure very short, say 1/1000th sec, it will only provide details about the highlight situation. To get pre-warning about the shadow situation would require a much longer pre-exposure allowing for the possibility of blurring and smudging as a result of some pixels having an inappropriate sensitivity. On the other hand, we all accept don't we, if you want the best results a tripod is often required.
Now, where do I apply for my Nobel prize? (Only kidding!!)
ps. Not sure why the Fuji concept of the smaller less sensitive pixel attached to the larger pixel doesn't appeal to me. It seems clever but not very elegant. What happens to the overspill from the main pixel? How is it contained? Is the main pixel switched off as it reaches saturation and if not, what about blooming? How is the resolution of the lens compatible with this smaller pixel which, if we're talking about P&S camers is likely to be very, very small?