Regarding the creamyness of images I'm thinking about two effects, which may actually contradict each other.
1) An MF back has a larger sensor area, so it collects more photons. This would reduce shot noise which would dominate in highlights.
2) MF backs seem to be adjusted to underexpose a bit, according to some interpretation of DxO measurements. That would reduce clipping in highlights, but also increase shot noise.
I agree on (1): dynamic range measurements relate only to parts of the scene getting so little light that sensor dark noise roughly matches or exceeds photo shot noise, and does not inform us about SNR in better lit parts of the scene, where photon counts are what matters.
On (2), be careful: What you and DXO and many others refer to as underexposure is in fact simply a valid choice to position metered midtones in the raw files further down from maximum level, while still placing them at higher numerical levels in the 16-bit format than happens with a DSLR with 14-bit ADC output. For example, where a DSLR with 14-bit output might place the midtones at level 500, about three stops below the maximum level of 4095, a DMF back with 16-bit output might place the same level at 1000, about four stops below its maximum level of 16,383.
So long at the default raw-to-JPEG conversion knows about this, it can map these respective ADC levels to an appropriate gamma scaled level in JPEG, say 118, and is no evidence of underexposure: it is just a different decision about how to use ADC levels to encode the information. And with the DR well under 16 stops, a midtone placement at four of even five stops below that maximum of 2^16-1 = 16,383 is still placing the dark noise floor above the quantization noise level, so this choice of quantization is not adding significantly to the noise in the digital signal.
As pointed out in other threads, some "ISO-less sensors" could always use the same analog gain, so that each stop increase in exposure index setting would half exposure level and so half the ADC level of the midtones, and then the "DXO sensitivity" would be the same for all settings! And that would be right in the sense that these would simply be measurements of the base sensitivity of the senosr itself: the minimum safely usable exposure index to avoid blown highlights due to overful photosites. The DXO calibration of the exposure index settings on cameras ("ISO settings") is misusing an ISO12232 definition that is intended to measure that base sensitivity of a sensor ("base ISO speed"), which refers to exposure levels that saturate the photosites.