Hi,
Theere are two problems with your way of reasoning:
1) There is no noise floor. Readout noise is so low that shadow noise is dominated by photo flux variations.
That's exactly what I'm saying. You need to expose for long enough that there is detail in the shadows above the random noise generated by photon flux. I never said anything about read noise.
2) The exposure is limited by FWC (Full Well Capacity). If full well capacity is exceeded clipping will result.
Again, that's exactly what I'm saying, and is the main challenge in increasing DR.
You need to ensure that the wells don't reach capacity before sufficient shadow detail is recorded.
Therefore, the problem is in the highlights rather than the shadows. You can take care of shadow detail by exposing for long enough. But the wells need to have enough capacity that the highlights don't blow out in the time it takes to collect the shadow detail. And every 1-stop increase in dynamic range requires a doubling of the well capacity.
Perhaps the solution isn't in an expose-then-readout system as is used now, but in a continuous readout system, whereby the sensor is continuously read while the exposure is happening. Exposure could continue until sufficient detail is recorded in the shadows above the photon noise; the continuous readout would mean that the numbers from the highlights would simply keep adding up, rather than reaching the limit and being capped there. But that would require different sensor and readout architecture.
It is possible to make pixels larger, and that would increase the possible dynamic range of each pixel, but we would have less pixels so the effect on highlights would be nil, but large pixels give marginally lower shadow noise.
Adding ND filters on half of the pixels or using two different exposures on different pixel groups could expand dynamic range greatly. That would be HDR within a single exposure. Fuji has done it in different variations.
I guess that single exposure HDR is not what the market is asking for.
Or multiple exposures (either with continuous shooting or by shooting a video file) with the frames averaged. Photon shot noise would be averaged out, revealing shadow details that were previously hidden in the noise. And the highlights wouldn't be blown out, since each single exposure would be short enough for the wells not to reach capacity.
One real problem with HDR exposures is that they need to be downmapped (tone mapped) to something that can be seen in print or screen, doing that in a harmonious way is no easy feat. Think grungy HDRs. Obviously, it can be done in a nice way, but still it's a complex issue with pitfalls.
Every image needs to be tonemapped, HDR or not. Without anything to tell you what recorded luminance equals what brightness and colour in the output, it's just a meaningless list of numbers in a file. It's just that single-exposure images already have pre-made tone-mapping curves via RAW conversion software, while curves for HDR images need to be made manually - and, in the hands of the inexperienced, can be made very badly.