Regardless of sensor type or manufacture, what about the effect that the signal A/D converters have on the result? We never seem to see this discussed or measured anywhere, but in audio (which is another of my passions) the quality of A/D converters when recording music is quite crucial to the end result.
Besides the analogies, there are significant differences between audio and image signals. One of the differences is Photon shot noise. At low signal levels, the Signal to Noise ratio is dominated by Photon shot noise. In audio it's the amplifier/circuit noise that's the killer.
Another issue is the "well depth" of sensors. With a maximum well depth of 50,000 electrons, and a (mostly) read noise of 10 electrons, we have a dynamic range of 5000:1, or a little more than 12 bits of real DR. A 14-bit DAC would be adequate, and cheaper than a 16-bit DAC. Of course I assume that the DAC and circuitry doesn't introduce huge (>1.5 bits) amounts of noise itself.
As sensors get denser with a smaller sensel pitch, and no real progress in storage capacity (which mostly scales with surface area, not depth, for now), the storage capacity per sensel will rather decrease than pose a challenge for DACs. There will still be benefits from low noise circuitry, but it's not the only source of noise when photons are involved.
Added to that, the optical system can limit the scene DR that is projected on the sensor to, say, 9 stops quite easily. There may also be other factors, e.g. pixel uniformity that are a priority to address. There are lots of improvement possibilities, but there may be diminishing returns involved with some of them as well.