Actually I believe that we are in full agreement, with the possible exception of this statement which goes to the root of (my) confusion:
No, the demosaiced result would be [128,128,128] for the 4 theoretically perfect output pixels (the same as the Foveon type of sensor) added together to the same surface area.
This makes no sense to me, unless we bring into the discussion the difference between brightness and exposure of my first post, with related consequences on SNR and IQ.
Maybe you missed the part of my quote I've marked in italic bold here. You insisted on comparing a 4x larger photon collection area with single Bayer CFA sensels, so in order to make a valid comparison, one would have to add the four [32,32,32] interpolated Bayer CFA ones together, which gives [128,128,128].
Forget about Foveon for a second and think only about the Bayer in my example. We are looking at a square area A on the ideal Bayer sensor made up of 4 sensels in a 2x2 matrix, each of area A/4: 1 under a red , 1 under a blue and 2 under a green ideal color filter (CFA). If 384 photons reach area A, each A/4 filter area will only see 1/4 of them or 96. And since each filter only lets through 1/3 of them in our idealized example, the sensel underneath each filter will receive 32 photons and that's the value it would record in the raw data for each of the four sensels of our investigation.
Correct, 384/4 sensels = 96, and with 1/3rd bandpass filtering 96/3 = 32.
Simple demosaicing of the raw data (for instance with dcraw -h 'half' switch, which keeps the red and blue values as 'they are' and averages the greens) would produce a single R*G*B* Pixel for the whole of area A of value (32,32,32), with a given SNR, keeping in mind the earlier proviso on the green channel - because demosaicing works off the raw data.
Correct, see also the above explanation, [32,32,32] is the result after demosaicing.
More complicated demosaicing would give the same result, but it would be harder to follow. Of course we could in fact express this as any value we desired through digital post processing operations (let's call them brightness/tonal corrections) in-camera or in-computer, but the underlying information and SNR (IQ) would remain unchanged.
I'm not sure why you are mentioning the S/N ratio here but, as you can see in my demonstration earlier, the noise amplitude at the pixel level will be reduced and replaced by a lower spatial frequency noise pattern.
Now let's shrink the sensels: If area A contained 64 smaller Bayer sensels instead of 4, once downrez'd to a single ideal Pixel for area A, for the given SNR as before such a pixel would have the exact same value (32,32,32).
Not really, apart from the practical implications which do not scale down perfectly with geometry. Dividing an area that receives 384 photons in 64 will leave 6 photons on average, an after a 1/3rd bandpass filter that would become 2 photons each. But in the theoretical example there are still the same number of photons falling on the same total area, and there is still 2/3rds being filtered out. So the remaining 1/3rd times the original 384 photons for the total area still makes 128 (64 sensels times 2 photons). When you divide the same area up in smaller sample areas, then each sample will detect fewer photons but they still add up to the same number for the area, 1/3rd sampled, 2/3rd interpolated. It will have hardly any effect on the S/N ratio
for the total area, none actually in our theoretical example.
Is my confusion with your (128,128,128) statement above clearer now?
I stiil think it's the 4x larger Foveon type of sensor in your original example that's the basis for any possible confusion. One needs to compare equal areas for a meaningful comparison.
I'm not sure whether there is some subconsciously nagging issue (which is understandable) with the fact that while only '1/3rd' of our photons actually are registered, 2/3rd will be supplemented by interpolation to create full RGB output pixels. Those RGB output pixels which have the (approximately) identical brightness as a full RGB sensor would give, as my demosiacing demonstrations earlier in the thread show. The Bayer CFA converted originals look darker, because 2/3rd of their RGB pixel data is zero, but after supplementing the zeros with interpolated/reconstructed data, the original average brightness is restored.
I do
not think that another confusion plays a role here, namely a misconception about demosaicing where some people think that it takes 4 Bayer CFA sensels to make 1 RGB output pixel. That would be a completely wrong representation of how demosaicing works (and it would result in half of the resolution that is actually recorded, which proves that that representation is flawed). But for those who believe that's how demosaicing works, it doesn't.
Cheers,
Bart