if you are using a Bayer-interpolated sensor with an anti-aliasing filter you need to down-sample from 30 or 40 Mpx to get an optimum 2Mpx file.
An anti-aliasing filter spreads the light destined for each pixel over the adjacent 8 pixels, so you could argue that you need a 10 time down-sample to compensate, and Bayer interpolation interpenetrates one pixel from 4 real pixels, so, theroetically, if these two factors effectively multiplied, you would need 40 pixels to get one optimal pixel.
So, according to that theory, a 1MPx crop from a 4 shot MF picture would be as good as a 40 Mpx AA Bayer picture, which is clearly not the case.
What is "an optimum 2Mpx file", and why would I want it? I want "an optimum image", either hanging on my wall or shown on my computer display.
I think that in order to state "theoretically....", you should have a clearly stated, widely accepted theory. I dont think that you have.
An AA-filter acts to smooth/blur the image optically/continously prior to sampling, not totally unlike diffraction blurring. The exact kernel (smoothing function) is somewhat different from that of diffraction, and it is (hopefully) not dependent on camera/lense settings.
If the scene was flat spectrum, and the AA filter convolved with sensel coverage was a "perfect" sin(x)/x function (and the sensel itself was a point-sampler) and the sensor had no CFA, I believe that we could apply Shannon-Nyquist theory rather easily. In that case, an AA-filtered sensor could accurately capture any pattern of light that was bandlimited to N/2 maxima and N/2 minima either vertically or horizontally, if the sensor had N sensels in that dimension. Any light patterns that changed quicker than that (such as stepped edges) would be band limited.
What happens if we, say, change that sin(x)/x function with a rectangular integration corresponding to the sensel spacing (i.e. simulating a AA-filter less idealized sensor)? The Fourier transform of a rectangular function is a sin(x)/x function, so you would get some attenuation of "passband" (desired signal) and bleed-through of aliasing-causing frequencies. This can be easily seen by letting those rectangular integrators slide by an image of hard edges/impulses: the output can have relatively large changes for small changes in sensel/image alignement. For other spots, the expected output image could change exactly zero, even though the camera/scene have changed alignmnt by 1/2 sensel. In other words, it is not possible to recreate accurately the original scene (not even a bandlimited version). That is not to say that an inaccurate representation cannot be visually pleasing (or even more pleasing than the accurate version).
So what happens if we replace the AA filter with a realistic filter like what Canikon use? I dont know. Anyone know their spatial function?
So what happens if we allow the scene to actually have colors, and the sensor to have a CFA and demanding the use of demosaicing? Demosaicing is application specific, and usually proprietary, so I wont comment on that. But scene colors and CFA is interesting. If we assume an improbably narrow spectrally scene that only gets sensed by one of the CFA primaries, I believe that we can use the same analysis as for the color-less case, only that the sensels will be reduced to 1/4 (r,b) or 1/2 (g) while the AA filter stays the same. Clearly, this would make the (up until now) perfekt AA-filter less perfect (too high cutoff frequency), and we would have more spatial aliasing.
So what happens if we allow the scene to realistic spectrally? (most of the information/variation in the luminance)? I believe that this reduce the influence of the CFA spectral selectivity on spatial capture, and that the "monochrome" analysis iturns out to be quite relevant. Quite but not perfect. There will always be corner cases or nitty-grittys where the trade-offs present in Bayer-type sensors are made visible. I think that those trade-offs tend to be good ones for most applications
-h
edit:
most of my post is considering 1-d versions of the problem.