I do not pretend to Jim's expertise, but one likely long-term limit I see is lenses — we might not be so very far from sensors being able to squeeze every useful bit of information out of the light delivered to the focal plane by the lens. For example, the pixel sizes in some small sensors are down to about twice the wavelength of light, so that fundamental limit on sensor resolution is "in sight", so to speak. But I doubt that lens for formats like 36x24mm will even resolve as finely as that. In fact one limit is aperture ratios and thus diffraction; I believe that there is a fundamental optical limit to about f/0.7 (cue comments about the so-called "Kubrick lenses", really "NASA moonshot lenses": some f/0.7 lenses originally produced by Zeiss for the moon landing program). And more likely, good corner-to-corner image quality at high resolution will always limit aperture ratios to something distinctly higher, maybe f/2, and thus with diffraction spot size about 2 microns across, so I doubt that formats 36x24mm or larger will ever resolve well (with high MTF and so with good local contrast) below about 2 microns. If so, 36x24 is resolution limited to a mere 18000x12000 pixels or about 200MP! (Even if more, smaller pixels are used for "oversampling" to sustain fancy post-processing.)
Coming down to earth, lens limits like aberrations and diffraction will probably set distinctly lower limits on the usable resolution in various formats, and I doubt that 36x24 will ever completely match what 54x40mm can do in some demanding situations.
There is no theoretical limit on the resolution of lenses, except diffraction. Diffraction, however, should not be under-estimated, as we also need some depth of field unless shooting perfectly flat subjects. In normal photographic practice, the compromises between diffraction and depth of field are real, as anybody who tried to use a 8"x10" view camera would find out. This compromise also becomes apparent when stitching is used to increase resolution: only landscapes at far distance are routinely imaged.
There is no theoretical limit, but there are practical ones. Optics have made progresses, but not as fast as electronics. In practice, lenses are compromises on aberrations, price and size/weight. The optical engineer can build very good lenses for high-definition MF sensors, but they will be huge and heavy. When they are not, other tricks are used like software corrections of distortion or chromatic aberrations to relax the constraints a bit.
So what gives? At present, the maximum resolution available in a single lens frame is 200 mpix (by moving the sensor around). Presently available MF lenses cope, when stopped down a bit. An educated guess is that this would also be around the practical limit, give or take some.
Another important fact is that seemingly large increases in number of pixels correspond to relatively moderate increase in practical detail level. A good rule of thumb is that, to get a noticeable effect, one needs to double the resolution. This is very noticeable for 24x36 cameras where the manufacturers present resolution increase between, say, 36 and 42 mpix as significant while they are not unless peeping pixels. Today, the maximum resolution for 24x36 cameras is 50 mpix. An increase to, say, 70 mpix is not likely to break the lenses.