Indeed and a good point to reinforce. But if we look at the distribution of levels in a linear encoded file, we see half of all that data in the first stop of highlight and on the other end, the smallest number of levels. The question is, how do we describe in a sentence or two the relationship of those fewer levels in the last stop, the noise that results there with a higher or lower S/N ratio? In the classic example shown on this site and originally by Bruce Fraser, the last stop of shadow detail had 16 levels (as opposed to the first at 2048). Its a good point that no matter the exposure, its still 16 levels, how do we define the relationship and effect of these fewer levels, noise and S/N?
Maximizing the number of levels, and increasing S/N, often point in the same direction -- both go up as the exposure increases. But the details differ, and those differences can be important in some situations. Here is the base ISO number of distinguishable tones per channel, based on S/N, for the P65+ (continuing Bill's example), together with the number of raw levels (the calculated values include the effects of read noise as well as photon count fluctuations, aka "photon noise"):
top stop = 166 distinguishable tones, 32768 levels
2nd stop= 117 distinguishable tones, 16384 levels
3rd stop= 82 distinguishable tones, 16536 levels
4th stop= 57 distinguishable tones, 8192 levels
5th stop= 39 distinguishable tones, 4096 levels
6th stop= 26 distinguishable tones, 2048 levels
7th stop= 17 distinguishable tones, 1024 levels
8th stop= 10 distinguishable tones, 512 levels
9th stop= 6 distinguishable tones, 256 levels
10th stop= 3 distinguishable tones, 128 levels
11th stop= 2 distinguishable tones, 64 levels
12th stop= 0.9 distinguishable tones, 32 levels
So yes, they both trend upward as one increases the exposure; both imply ETTR for fixed ISO. But if one really subscribed to the "it's the number of levels that's important" mantra, one would be led to incorrect conclusions. For instance, suppose your base ISO exposure doesn't reach the top stop; does one do better by increasing the exposure a stop (eg by doubling the exposure time), or by doubling the ISO? If you double the ISO, you don't change the S/N because S/N on this camera is entirely determined by exposure. You do however double the number of levels, because the histogram is pushed to the right where the levels are denser. On the other hand, if you double the exposure, you again double the number of levels taken by a given patch of the image; you also increase the number of distinguishable tones by a factor ~1.4 according to the above. So the "#levels mantra" leads you to think you do just as well by raising the ISO at fixed exposure, whereas you actually only do better by increasing the exposure.
As a side note, on a camera such as the P65+, raising the ISO never increases the number of distinguishable levels; as far as quality of the raw data goes, raising the ISO accomplishes nothing but reduces the available highlight headroom (and therefore the room for increasing exposure, should you be able to do so without compromising DoF or motion blur requirements). (Caveat: raw data is no better, but your raw converter may treat it differently, since profiles often read the ISO and change the conversion accordingly, eg by applying more noise reduction at higher ISO.)
It is really quite remarkable how few distinguishable tones there really are in an image file (note that the above is per channel; modulo subtleties having to do with white balance, color filter response and output color gamut, the number of distinguishable colors is roughly the cube of the above numbers). The vast majority of all those wonderful raw values in the upper zones are wasted in quantizing noise, and not adding anything to image quality.
In the original ETTR article here, Ian Lyons is quoted as saying: The ideal exposure ensures that you have maximum number of levels describing your image without loosing important detail in the highlights. The closer you get to this ideal then the more of those levels are being used to describe your shadows. How about working on a sentence that takes all of the above into account and explains the relationship and effect of these fewer levels, noise and S/N.
Replace "levels" by "distinguishable tones" and you have a correct statement. Roughly, two tones S1 and S2 are distinguishable if |S2-S1|>N where the noise is N. If S1 and S2 differ by less than the noise, you don't know whether they are different tones, or the same tone pushed to different values by noise fluctuations.