Raw is linear encoded so half of all the data is contained (with proper exposure) in the first stop of highlight data. You can GB on Gray but its hardly best practice and depending on the card, result in color shifts. Gray cards are fine for gray balancing gamma corrected images, again assuming the gray is neutral. White is better for WB Raw data.
That statement about half the data being in the first f/stop goes back to the rationale for ETTR, but does not take noise into account as
Emil Martinec explains on his web site. Shot noise is highest in the brightest f/stop. The actual number of gray levels in the brightest f/stop would be many fewer 8192 for a 14 bit file with 16384 possible levels. Most of those levels consist of noise. For example, the Nikon D3 has a tonal range (Tonal range indicates how many gray levels are distinguishable up to noise in an image) of 8.72 bits according to the DXO measurements. That is 422 levels for the total range, not the 16,384 levels that one would expect from a 14 bit file or even the 4096 levels in an 12 bit file. However, it is sufficient for smooth tonal gradation.
One will get better precision by taking the white balance from a white card rather than a gray one, but how significant is this? One can estimate the difference by considering the shot noise. According to Bill Claff, the full well of the D3 is 65,600 electrons, so the shot noise would be sqrt(65600)=256, giving a c.v. (coefficient of variation) of 0.39% at base ISO. 18% saturation would collect about 8454 electrons, giving a standard deviation of 91.9 for a c.v of 1.09%. One can convert to DNs (data numbers) by dividing by the camera gain, as shown in the table. A c.v of 1% would give a very good white balance figure. At higher ISO, use of a white card would become more critical, as each doubling of ISO would reduce the number of electrons by half.
[attachment=19574:ExcelData.gif]