Beyond the dynamic range, there is no tone to dither - it is all black or white.
If 16 bits gives you shadow and highlight detail and colour, then that is a real benefit - even if the MTF res is not so high. To deny this would be like insisting that lens manufacturers specify image circle diameters that give the same res as the center of the lens.
If, in the shadows, 1 pixel in 4 or 10 captures a photon, and the software interpolates that to a soothe shade of some colour, that too is a benefit: a trade-off between res and noise.
I think you are misunderstanding the nature of DR. At base ISO, a typical FF DSLR or MFDB is capturing 40,000 to 80,000 photons (depending on pixel size and efficiency). Now, a 16-bit capture records data to a part in 2^16 = 65536, so let's take a figure in the middle, 60,000 photons, and so one digital level would naively seem like a change in illumination by one photon's worth. But it doesn't work that way; the camera electronics has noise in it, and the voltage fluctuations from the noise are indistinguishable from the voltage change due to an increased or decreased signal. So the noise causes random fluctuations up or down on top of the signal, and therefore throws off the count in the raw data so that it doesn't completely accurately reflect the actual photon count that the camera recorded.
One can translate the camera's electronic noise into an 'equivalent photons' count. For the D3x at base ISO, say, it is a tad over 6 photons' worth of noise, with a saturation capacity of a little under 50,000 photons. For the P65+ it seems (estimating from DxO data) that the saturation capacity is also a tad under 50,000 photons, and the electronic noise at base ISO is about 16 photons' worth.
Now let's ask if all those bits are worthwhile. For the D3x, and 14 bits data recording, the precision of the recording is one part in 2^14; the full scale range of 0-50,000 photons is divided up into 2^14=16,384 steps, and so each step is about 3 photons' worth. In a perfect world, an extra two bits would help and the counts would distinguish individual photons, but since the camera's electronic noise amounts to +/- 6 photons' worth of inaccuracy, 14 is ample (in fact, 13 would do). For the P65+, and 16 photons' worth of inaccuracy, 16/50,000 is more than a a part in 2^12, so 12 bits would have been sufficient.
Bit depth is not the same thing as DR; rather, DR bounds the number of bits needed to accurately specify the count delivered by the camera, given the inaccuracy in the count inherent in the camera electronics.
Finally, the pixel DR is but one of a whole host of measures of data quality, so it's not worth obsessing about.