In talking about "per-image" statistics what is the notion of "noise" when the "signal", and how that signal is affected with which methodology is chosen to do resampling, is not even considered in your analysis above? Can this type of noise be treated in isolation to the signal?
I looked at both postings, but not the entire threads. So I am trying to guess what you mean, and where you might be going with this.
Q: Can noise levels be measured using regular photos, e.g. of a cat? It would be hard to distinguish signal from noise, wouldn't it?
A: That's not how the DxOMark measurements were done.
DxOMark's "Protocol" documentation says that they measure noise using RAW images of neutral-density filters (=patches) that are backlit using a large diffuse light source. You can measure noise by looking at spatial (repeatable) variation: fixed-pattern noise. And by looking at temporal fluctuations (temporal noise) when you take lots of identical images of the identical source. As far as I know (I asked them in an E-mail) their published numbers are FPN and temporal noise added up. This means that in theory one image suffices.
Q: But could it be done with a real photo, e.g. of a statue of a cat?
A: I wouldn't. And DxOMark Sensor doesn't.
You would need to take multiple images to be able to distinguish noise (variation) from singal (average). But this would measure less noise that what DxO defines as noise (because you would miss FPN like dark current non-uniformity and photo-response non-uniformity). And a detailed scene would make the setup unnecessarily sensitive to vibrations and drift: you would see fake noise at sharp edges.
Q: When noise measured at one resolution is scaled to a reference resolution, is this sensitive to the scaling algorithm?
A: No. In DxOMark's procedure they measure noise of a 20 MPixel sensor at 20 MPix (MSensel) resolution. Then the resulting signal-to-noise ratio is corrected using a simple theoretical model. So there is no rescaling algorithm involved. In my article, I provide one or two examples of this that I checked by hand.
Q: Would you get the exact same results if you took the test image, rescaled it and then measured noise? In other words is the "simple theoretical model" accurate?
A: The model used for scaling corresponds to what a simple binning algorithm would do (e.g. replacing 2x2 small pixels by one fat one), assumes that you have a competent implementation (e.g. do the measurements and calculations in enough precision), and assumes Poisson noise with no correlation between pixels. It should thus be pretty accurate for the photon shot noise and dark current noise. The scaling may not apply for the FPN, but its scaling cannot be predicted, and it should be smallish. So the model is pretty accurate - and significantly, the model doesn't need to be fully accurate. It is just meant to provide a handicap to compensate for resolution differences: it doesn't attempt to accurately simulate actual devices.
Q: Do the numbers based on images of test patches have relevance to a real scene? Like Schroedinger's cat or the statue of a cat or somebody's cat?
A: Yes. Overal behavior of a sensor is reasonably well understood. Just like you can characterize the noise in an audio amplifier, rather than having to measure noise specifically when playing Beethoven sonates or even Neil Young.
Q: But what if the test patches do not generate homogeneous light patches on the sensor? How to deal with offset (=blackpoint) in the processing?
A: Measurement setups are indeed never perfect. These are serious issues. I mentioned those kinds of problems in an earlier posting.
Engineers and scientists will point out that numerous questions remain about test details for any precision measurement: e.g. light source homogeneity, light source stability, finite test patch size, vignetting, dust on the source, dust on the optics. I can assure you (I worked for years in labs) that precisions measurements are a major headache. Some of these issues are nowadays covered by international standards where the experts jointly develop measurement protocols. DxOMark is active in some of these committees (source: LinkedIn and private communications). And DxO says that outside engineers regularly get to see the setup and discuss the procedures used. This is normal in engineering: if you challenge my measurement results, I either need to exhaustively document measurement details and you review them, and/or you send in experts to see if you can find a flaw in the measurments. You can bet that a major manufacturer will contact DxOMark whenever their products get lower scores than hoped for.
You can do a rough check yourself by examining the slopes of various graphs against theory. But the data seems good enough for comparing sensors. And checking for ever more subtle pitfalls in measurements is best left to the manufacturers who hope to see performance increases (that are increasingly hard to measure) in their latest designs.
By the way,
http://peter.vdhamer.com/2010/12/25/harvest-imaging-ptc-series/ is a posting about how to measure noise in sensors. I just summarized the material. The source is Albert Theuwissen, an expert on sensor design and modeling.