I have to admit that it has been a few years since I had physics, but I do believe that the world of photons is properly described as "binary" in this context.
Human perception is less relevant: if the physical scene is "really" binary in nature, then our perception can work this way or the other but would still be inherently limited by the information present in the scene.
-h
You're technically correct (
https://www.youtube.com/watch?v=hou0lU8WMgo ): photons are absorbed and therefore detected discretely.
However, the flux of photons is huge. Daylight provides something of order of 10^21 photons per square meter per second.
This can be made as near to continuous as makes no odds just by upping the integration time. It is very very likely that you'll hit the limits of your sampling device's abilities before you hit the fundamental limitations for the light being composed of discrete quanta. So at least for daylight landscape, it is pretty much as if you are sampling a continuous signal.
This doesn't apply at night, when the photon counts are much lower, and the discrete nature of the signal becomes much more apparent.
It's in this latter scenario where your "one photon per pixel" camera breaks down- it's likely to be overwhelmed by noise, because each sensel will have separate noise sources which are apt to make it register a photon hit when one has not in fact occurred. Many of these noise sources can be reduced (eg by cooling the sensor to reduce thermal noise sources) but can't be eliminated. The usual way of combating this is to up the integration time, allowing more signal to accumulate before reading out. This helps a lot with noise sources which doesn't scale per unit time (eg readout noise) but also helps with noise sources which do accumulate with time, because you get a higher signal to allow you to differentiate from the noise, and the "shot noise" (the inherent variation from sampling small numbers of photons in a Poisson distribution).
In theory you can find the optimum readout time to maximise the signal-to-noise ratio for a given signal. In order to have the flexibility to do that for pixels in the shadows, you'll need to allow pixels in the highlight to accumulate much more signal and not clip. Or you could optimise for the signal-to-noise ratio on the highlight pixels, but then you'll very likely to obliterating any detail in the shadows by the read noise and thermal noise when you could have done substantially better by integrating for longer.
The need to allow decent signal-to-noise in the shadows whilst preventing clipping in the highlights is exactly why camera sensors have big wells and low readout noise; the optimisation I referred to above has a well-known procedure for normal shooting conditions- expose to the right! That gathers maximum signal in the hottest pixels without clipping, and allows maximum signal to noise in the shadows with the maximum integration time.
I'm far from convinced that your super-segmented one photon per pixel camera can do better. If it is one photon per pixel in the highlights, in the shadows it becomes one photon per hundred thousand pixels and there is NO WAY to spot that the one electron which is signal from all the electrons caused by noise spread over all those channels.
If you can do it, you will definitely need to be sampling the sensor quickly and doing your integration offline by exposure stacking and try to build up a picture of the noise including the temporal behaviour. As astrophotogaphers already do with exposure stacking for faint sources now, and as in the paper you quoted. You'll need to store all the data in time slices- this will make the data rate requirements of 4K video look like a walk in the park if you are aiming at one photon per sensel.
I can see that this could work, but it'll be extremely compute and storage intensive offline and very demanding on sensor readout noise, dark current, thermal noise, etc. on the sensor. What I'm not so convinced is that it will provide decisive advantages for general photographic use compared with just doing the integration physically with the shutter and having deep wells on the chip, as we do now.
It's an interesting ideas, but I don't think we've got the computing power in our cameras yet to read out and store the information fast enough, or the offline computing power to do an offline reconstruction of an HDR image in sensible time. But maybe it will come
Cheers, Hywel