I think in audio spectral estimation after dividing a signal into overlapping segment, each segment is windowed also. May be that is also a reason for overlap, to make those data points that are windowed out (or lowered in value) have a contribution to spectral estimation.
In the case of the MDCT, you have 50% overlap but still critical sampling, i.e. the transform produces 512 spectral samples out of 512 temporal samples. The trick is to choose a window with particular properties (timedomain aliasing cancellation) and to depend on the overlapping analysis/synthesis.
You still have to process twice as many samples as your sample-rate though. I would expect that for similar properties in a 2-d image, you would need 50% overlap vertically and 50% overlap horizontally, meaning that 4x as many pixels has to be processed, something that might cost most than it is worth.
For compression purposes, not _adding_ data-points in your transform is probably a good thing. For noise-reduction and analysis, this requirement is probably less relevant.
-h