How and at what stage are the voltage variants at each pixel site defined by a 3D color gamut model?
You don't need to demosaic to get RGB responses, as a test patch covers a bunch of pixels so you get plenty for each R G and B. When you've measured RGB filters like Trantor you don't need to shoot test targets, you can feed the camera model with "virtual test patches", including single wave lengths so you can trace the spectral locus for example (if I understand correctly Trantor's plots above is a result of spectral locus tracing). Being able to work with virtual test patches is necessary to have a gamut discussion, as a reflective test target never can cover any extreme colors.
Printer+paper gamuts are simple, the reason is that we can print all colors a printer can reproduce onto a number of papers and accurately measure them, and we get an accurate gamut.
The printer is an output device, so the question is "which colors can it reproduce?", with the camera the question is "which colors it can register?".
With the camera a problem is that we can't make a test chart that covers all colors we can see. If we could the camera's gamut could be defined as covered by all test patches that gives us a unique RGB value. In other words all colors the camera can differ.
With the virtual method we can generate any test patch spectrum, however the variations are infinite and I don't know if there is a good method to generate only the ones we need to appropriately cover all colors the eye can see. Maybe there is, Trantor may know. If there is we could do the above, ie feed the measured camera response curves with all human detectable spectrums and see how many of those that yields a different RGB value. Doing it all inside Matlab or other software we can test millions of virtual patches.
The next step is however that RGB values need to be mapped to *correct* XYZ positions. This is the job of the profile. A matrix-only profile will typically only succeed placing low saturation colors reasonably correct and the high saturation colors will be way off, possibly in out-of-human-gamut positions. A LUT profile can make the best of it, but there may be reasons to not optimise solely for XYZ accuracy as many extreme colors are never seen in a real scene.
When you have a profile that translates only subset of of RGB combinations to correct XYZ coordinates and many get into grossly incorrect positions or even out of human gamut, what is then the gamut of the camera+profile combination? There is no clear definition, and I think one should then rephrase the question to something like how large is the gamut where this camera+profile combination can produce colors with a Delta E smaller than X (where X is quite large, say 10)?
Probably it's wise to optimize a profile to make good color match within say Pointer's gamut and relax matching of extreme colors.
It's also worth noting that a camera may (might?) also be able to separate some spectrums that the eye can't. Using the human eye as reference I guess we should consider those colors invalid and a LUT profile could merge them to the same XYZ coordinate, but from an artistic reason we may want them to be separated anyway.
All these issues is the reason some say "camera's have no gamut". With a printer you just need a high quality ICC profile and a profile viewer and you'll see what colors it can correctly reproduce. Looking at a camera ICC or DNG profile you cannot see what colors it can accurately capture.