CD does 44100 samples a second at 16 bits precision.
Photography does 10s of megapixels of samples a frame at ~14 bit precision.
I think those are the most valid dimensions to compare.
Yes, so we might have a way to go. However photography does have the advantage of massive parallelism, with the column-parallel approach of Sony. Each column of a D800 sensor gives about 5000 photosites to be handled by the ADC at the bottom, so even in some imagined 60fps super resolution video, only about 300,000 samples per second, far lower than off-board ADCs do now. Read-out beyond the ADC might need Gb/s digital signal handling, but that it not so hard these days for digital signal transmission and storage.
The bigger problem for now is per photosite read noise levels: they need to be kept safely below photon shot levels in the photosites over the "photographically interesting" of subject brightness levels, and that gets harder with very many, very small photosites.
Note: with the idea of the outputs from individual cells on a sensor being not "picture elements" in themselves, but "atoms" combined in large numbers into photographically significant output, I prefer to talk the sensor's cells as "photosites" rather than "pixels"; the pixels used to produce the final displayed image will likely be constructed from cell-level signals in subsequent processing. Actually, they always are, with demosaicing, moiré removal and such.