Shot noise as the result of the Poisson process that leads to the variability in the flow of incoming photons is indeed independent of the rest (which is not to say there aren't other elements in the chain that could also be see as Poisson processes and have their own "shot noise"). If one wants to maximize SNR, one needs more samples. If counting photons is the goal, higher QE (higher chance of detecing the photon), real aperture (getting more photons at the input size), real focal length and sensor area (how the collector spreads the photons on the sensing area) and integration time are the main factors.
Shot noise as defined above is a property of the signal all the rest are the properties of the measuring instrument. But I am sure you know all that...
My comment on the read noise was more on the practical/industrial/business side of things. Given that sensors with a well capacity of 25000e and a read noise of 14e were quite frequent a few years ago, it makes more sense for manufacturers to pursue an attainable reduction of the read noise than to focus on esoteric doping. Likewise, maximizing the effective sensing area by minimizing dead space between actual sensels and trying not to lose photons that have already been captured by optimizing micro-lenses were, and possibly still are, areas where big concrete gains could be made. And that's without even considering the additional issues they have to deal with when using CMOS sensors.
Of course, that doesn't exclude a new wonderful doping method that would increase the depletion area's "storage" ability (I don't know how close we are from the theoretical limit, my guess is "not far" because we already have very high QE, but you never know) and conceptually Marc's question makes a lot of sense.
Note: my comments are based on a fairly decent understanding of how things work in the base theoretical sensor. While I know it is trendy to be "authoritative", I make no such claim. ;-)