The inclusion/exclusion of highlights is not a part of ETTR.
Sure it is. As a matter of fact, including highlightsand not excluding them is the essence of ETTR. That's where ETTR got its name from.
ETTR essentially says that it is preferable to maximize the number of photons detected by the sensor.
You really want to recapitulate what ETTR actually is. Hint: Maximizing the number of photons detected by the sensor it is not. After all, a capture over-exposed by several f-stops will always detect far more photons than a properly exposed one.
... and ETTR minimizes noise (it is all it does).
Well, that's my point—it is
not all it does.
Color separation is in my view only a function of the spectral charectiristics of the color grid array and the color transformation matrix, both of which are independent of exposure.
Your view is wrong.
Simply consider the extremes. The blackest black that an RGB image can produce is RGB(0, 0, 0). No colour separation at all; one colour only at this tone level: black. The same at RGB(255, 255, 255)—only one single colour, white, at the maximum tone level; no variations possible (that's for 8 bits per RGB channel—for higher bit depths adjust the numbers accordingly but the principle will remain the same). The most saturated red, for example, would be RGB(255, 0, 0)—that's a tone much brighter than black but also much darker than white. Colours that are as bright as RGB(255, 255, 255) but at the same time as saturated as RGB(255, 0, 0) simply cannot exist in an RGB system where each channel's range of values is finite. So the variation of possible colours is widest at medium tone levels and narrowest at extremely low or extremely high tone levels. To complicate matters even more, RGB(0, 255, 0) is not the same brightness level as RGB(255, 0, 0), and RGB(0, 0, 255) is yet another brightness level.
Of course, things still aren't as simple as that. First, you're right when you're saying that noise will hurt colour separation, so avoiding noise basically is a good thing. That's why I said, put your histogram's peak half-way between center and right when subject contrast permits and colours are important. Second, a camera's RGB channels usually won't clip all at the same brightness level—that further complicates matters and is another reason not to push your histogram to the farthest right when you don't have to.
So—ETTR sure is a useful rule of thumb generally but still needs some consideration. It is not the gold standard for all situations and circumstances. In most cases ETTR is a good idea but sometimes you must expose beyond ETTR, and sometimes it's better to back off from ETTR by one stop or two. It's just the same as everywhere else in real life: Simple rules aren't.