I need to understand better exaction what pixel-shifting does. As I understand it, what is improved is the color and any improvement in resolution/sharpness is the avoidance of artifacts made by the Bayer Interpolation. Is this correct?
Correct.
(In case of the Olympus implementation you also get oversampling, but the available sensorsizes seem to limit the usefulness of that.)
In the Sony case currently it seems to create 4 RAW conversions which you somehow need to slam together. The net result would be less optimal than actually using the 4 files during demosaicing.
In your case, with manual focusing and stacking umpty files, you may wonder whether there is any gain relative to for example; stacking with automatic alignment in Affinity Photo. Taking say 10 shots with ever so slight movement and stack/align will likely result in a much better file than a stack of 4 pixelshifted files.