Digital raw capture has nothing to do with sRGB which is based on a theoretical emissive display (CRT) circa 1994 or so. Digital cameras don't even have a color gamut! Digital cameras have a color mixing function. Basically, a color mixing function is a mathematical representation of a measured color as a function of the three standard monochromatic RGB primaries needed to duplicate a monochromatic observed color at its measured wavelength. Cameras don’t have primaries, they have spectral sensitivities, and the difference is important because a camera can capture all sorts of different primaries. Two different primaries may be captured as the same values by a camera, and the same primary may be captured as two different values by a camera (if the spectral power distributions of the primaries are different). A camera has colors it can capture and encode as unique values compared to others, that are imaginary (not visible) to us. There are colors we can see, but the camera can't capture that are imaginary to it. Most of the colors the camera can "see" we can see as well. Yet some cameras can “see (captures) colors“ outside the spectral locus however every attempt is usually made to filter those out. Most important is the fact that cameras “see colors“ inside the spectral locus differently than humans. I know of no shipping camera that meets the Luther-Ives condition. This means that cameras exhibit significant observer metameric failure compared to humans. The camera color space differs from a more common working color space in that it does not have a unique one to one transform to and from CIE XYZ. This is because the camera has different color filters than the human eye, and thus "sees" colors differently. Any translation from camera color space to CIE XYZ space is therefore an approximation.
The point is that if you think of camera primaries you can come to many incorrect conclusions because cameras capture spectrally. On the other hand, displays create colors using primaries. Primaries are defined colorimetrically so any color space defined using primaries is colorimetric. Native (raw) camera color spaces are almost never colorimetric, and therefore cannot be defined using primaries. Therefore, the measured pixel values don't even produce a gamut until they're mapped into a particular RGB space. Before then, *all* colors are (by definition) possible.
Raw image data is in some native camera color space, but it is not a colorimetric color space, and has no single “correct” relationship to colorimetry. The same thing could be said about a color film negative.
Someone has to make a choice of how to convert values in non-colorimetric color spaces to colorimetric ones. There are better and worse choices, but no single correct conversion (unless the “scene” you are photographing has only three independent colorants, like when we scan film).
Do raw files have a color space? Fundamentally, they do, but we or those handling this data in a converter may not know what that color space is. The image was recorded through a set of camera spectral sensitivities which defines the intrinsic colorimetric characteristics of the image. One simple way to think of this is that the image was recorded through a set of "primaries" and these primaries define the color space of the image.
If we had spectral sensitivities for the camera, that would make the job of mapping to CIE XYZ better and easier, but we'd still have decisions on what to do with the colors the camera encodes, that are imaginary to us.