Ah, ok. Therefore with that assumption 1 ideal Foveon 'pixel' is the same size as 1 Bayer 'sensel' (as opposed to the three I had imagined) in current DSLRs. In that case the result of our simplified thought experiment would indeed be what you suggest, raw values (128,128,128) vs (128,0,0) respectively, and with a Bayer pattern in a uniform patch you could indeed take a guess at the missing information.
In this case too, however, the 2/3s of information thrown away would be apparent. Let's say for instance that the pixel under examination was receiving light from an isolated star that produced a circle of confusion on the sensor with a diameter equal to the pixel pitch: for a given exposure 384 photons recorded by the ideal sensor; 128 by the ideal Bayer, 1/3 the SNR for the Bayer.
Yes, there is a difference between the 384 photons recorded, and the 128 photons recorded for a single isolated spike signal (that's why, in addition to lens blur, an OLPF makes sense). However, the photon shot noise is equal to the square root of the number of photons, so the difference would be 384/sqrt(384) = 19.6, versus 128/sqrt(128) = 11.3 . That doesn't account for the fact that the per channel noise adds in quadrature, and that interpolated channels (from a properly low-pass filtered image source) will probably have a lower than actual spatial noise frequency (interpolation usually gives some loss of modulation due to the weighted averaging), and that should be included in the total equation.
For a better simulation of the S/N ratios due to 1 channel versus 3 channel sampling, one could add Poisson noise to a test image, and measure the differences before and after demosaicing. In fact, here is the result for a specific (VNG) demosaicing algorithm, only Poisson (shot) noise was added (no read noise):
A patch of uniform Gray level 128, Poisson Noise added, [R,G,B] Standard Deviation = [11.230, 11.355, 11.314]
Here is the ideal CFA version of that patch with zero contribution for 2/3rds of each pixel:
And here is the result after VNG demosaicing, [R,G,B] Standard Deviation = [10.301, 9.485, 10.274]
As you can see, the noise was blurred by averaging and by undersampling, and overall noise was not increased but slightly reduced. A simpler demosaicing algorithm, e.g. bilinear, would have blurred even more but would also have lost more real detail had it been present.
Of course if you took pictures of scenes with a lot less detail, say a foggy day, you could fill-in a lot of the missing information by interpolation. But then you would not need a sensor with such a high resolution. So the issue is still there, simply shifting from noise to resolution and back. And people appear to like having clean images and/or more resolution these days of cameras that end in 'e' :-)
Correct, and there are several other issues, some of which have not been mentioned yet. One that may partly be related to the relatively small charge capacity wells of the Foveon design (it needs to store 3 channel charges in the same area that the CFA design can allocate for one channel), is how effective is the color filtering by silicon penetration depth really? When one inspects the Raw data of a Foveon capture, the Raw data looks almost like a monochrome image. There is hardly any difference between the color channels, which means that some serious heavylifting needs to be done on that data to boost saturation, and with that comes color noise amplification. It's one of the reasons that Foveon sensors are relatively poor at higher ISO settings.
While I was thinking about your comment, I decided to compare for fun the Sigma SD15 (pixel pitch about 8um) to the Nikon D3200 (sensel pitch about 4 um): 1 SD15 contains almost exactly 1 D3200 RGBG quadruplet - now we are back to my example in the post above with SNR the issue instead of resolution. But let's take it a step further: assume comparing the SD15 sensor to a Bayer with a 1um pixel pitch: now 16 Bayer quadruplets fit inside 1 foveon pixel. You use the best demosaicing algorithm in town... You see where this is going?
Yes, there is no such thing as Bayer quadruplets, unless one does binning which averages noise.
So at one extreme the missing information results in degraded resolution, at the other noise performance. Some believe that truth lies in the middle: perhaps it's because that way six of one can ignore the half dozen of the other ;-)? Correct me if I am wrong.
The technologies do not scale down with equal ease. Remember what I said about the well depth for 3 channels versus 1 channel on the same area of silicon real estate ... Being able and store 3x as many electrons for a color channel will reduce the noise to 58%.
And then there is the fact that Red, Green, and Blue do not contribute equally to Luminance ...