If I were to do a colour-filter free sensor, I would add 4 stops ND filters on half the photosites:
With such structure design, we would still have to interpolate half the pixel values losing some detail, sharpness and having more aliasing artifacts like in a regular colour sensor, but in change we would get a 12 stops B&W HDR sensor (assuming the base photosites provide 8 stops of DR), and I think dynamic range would be a very valuable point (do not only think about HDR scenes, that HDR sensor would also be very useful to allow exposure mistakes of up to 4 stops, for example in action shooting).
I made a simulation comparing dynamic range captured information on a genuine B&W sensor vs a interpolated HDR B&W sensor:
Test scene:
Shadows (left genuine B&W, right HDR B&W):
Highlights (left genuine B&W, right HDR B&W):
I think the improvement in DR using this method (similar to Fuji Super CCD's) is worth the loose of detail that requiring interpolation could introduce.
Regards