The main benefit of sensor movement/super resolution should be increased spatial bandwidth in the color (difference) channels. Shooting red/blue woven fabric?
There are 2 resolution related benefits to certain multi-shot sensor practices. One is the increase in resolution, due to the denser sampling. Another benefit is the fact that a half sensel-pitch offset, with a relatively large aperture of the sensels, will produce an overlap with the 'initial' sampling positions. That overlap will reduce the modulation of signals near Nyquist, a sort of AA-filter effect.
Therefore the two effects enforce one another, higher sampling density will reduce the chance of creating aliasing (it requires even smaller detail and better lenses/focus to cause aliasing), and overlapping area samples will reduce modulation near Nyquist.
It's a related principle that is also exploited by Epson in their 'staggered' sensor alignment in scanners, which do not use tri-linear (R/G/B) sensors, but bi-linear per color (6 in total for R/G/B) filtering with half a sensel physical offset. According to their patent
it was mainly to increase resolution and
speed. The staggered offset sensels work fine in a scanning device, but in a single shot full frame capturing device one would require a physical displacement of the entire sensor array by a fraction of the sensel pitch, which slows down the operation and raises the bar for consistent lighting and vibration reduction.
The aspect of increased color accuracy with a monochrome sensor and a color filter wheel only addresses the full sensel pitch quality of color and eliminates the need for demosiacing, which will eliminate false color artifacts caused by different sampling densities between Green, and Red/Blue filtered sensels. Resolution will only benefit modestly, but still significant enough, by maybe 10%, slightly more than the loss due to demosaicing because other filter band-pass overlaps can be used.