If sensors were the best at capturing highlights up to the clipping point you'd be able to pull almost infinite high quality details out of the highlights just before they clip. I'm not an engeneer but it is definitely not what I can observe. The bits are there, the details are not.
Almost infinite? I think you mean, 'the maximum detail the system can deliver in the shooting circumstances', don't you?
As I understand, the response of digital imagers is quite different to film in the sense that highlights in film undergo significant compression before total clipping, commonly known as a 'shoulder'.
Digital sensors have a much narrower shoulder. Within half a stop or so, it seems, you can go from a situation of full, uncompressed detail in the highlights to totally blown highlights. There's a much sharper cut-off which presents a major problem for ETTR. It's clearly better to be a 1/2 stop under the correct exposure for ETTR than a 1/2 stop over, if preserving those highlight details is important.
However, what I've just written is an oversimplification (how could it be otherwise. I'm not even sure I know what I'm talking about ). There's another issue relevant here, which is addressed in another current thread, 'expanding dynamic range'. It is unlikely that all 3 channels in a digital sensor are going to 'blow out' at at the same point. The red channel might blow out first, followed by the blue channel, leaving the green channel as pure luminance. It seems there is no way around this, other than to use the right type of filter in front of the lens and do a 'custom WB' before taking the shot.
Now, just how precise do you want to be in your photography?