Ok, I understand the technique (behaves like having more transistors in parallel of an input stage of an amp) what I don´t understand is where they assume multiple frames come from.
I see good evidence for temporal averaging in the DXO data (as do the folks as DXO):
1. The total SNR graphs go up to about 50dB (at minimum ISO exposure index, 100% illumination). With just photon shot noise, that requires counting at least 100,000 photons (the photon count needed to achieve a given SNR in dB, SNRdB, is 10^(SNRdB/10).)
That is about 4000 photon counts per square micron of photosite area, more than twice what is typical and almost three times what I see measured for the D800 which has similar photosite size.
That can only be achieved by some combination of (a) deeper wells, to actually count 4000 photo-electrons per square micron, and (b) combining the photo-electron counts from several frames (temporal averaging).
2. The base ISO speed Ssat (what DXO calls "ISO") is 104, comparable to and indeed a bit higher than cameras like the D800 [Ssat=75 for the D800].
3. If there were no temporal averaging, having over double the well depth of the D800 and also this higher Ssat would require almost tripling the quantum efficiency. (Doubling well depth at equal QE would half the base ISO speed Ssat.)
4. Sensors like that of the D800 are close to the maximum possible QE for a Bayer CFA sensor, and triple the QE of the D800 would be beyond 100%: it seems impossible that RED could have increased the QE of its Bayer CFA sensors by nearly enough to explain the DXO measurements.
By process of elimination, temporal averaging seems almost certain.
Arguably, this still gives a legitimate advantage for exposures when light is abundant: effectively, the light used to form each frame is being gathered over a longer time than sensor saturation normally allows, so it is almost equivalent to increasing the well depth and decreasing the base ISO speed Ssat by a factor of more than two -- and yet doing this with only about 1/24s between frames using RED's fast rolling shutter, and so handling subject and camera motion better than frame averaging with a normal still camera, which would need to have more time between the frames.
So a stills photographer might get some advantage when longish total exposure times are acceptable, in exchange for paying about twenty times as much as for a stills cameras with the same sensor size.