I can see arguments for having a sensor that somewhat outresolves the best lens in your system, at its optimal aperture and in center of field, so that the sensor will significantly out-resolve most of your lenses at most aperture choices, especially towards the edges of the frame.
The main one is economic: it is easier and less expensive to improve the resolution of a sensor in a given format than to improve the resolution of the lens system -- moreso for people who own and use multiple medium format lenses. So economics suggests that for a given expenditure, you can get more resolution improvement from sensor upgrades than from lens upgrades. In general, the cheapest component should somewhat outperform the most expensive component if you want to get the best results within a given budget.
Bear in mind that lens and sensor resolution combine in a "multiplicative" way (MTF values are multiplied), so starting with a lens and sensor of equal resolution, increasing the sensor resolution will improve the results that you get from the same lens, and increasing the lens resolution will improve the results that you get from the same sensor. But the sensor upgrade option is cheaper, especially when it improves the results from multiple lenses.
Another argument is simple physics: it is becoming feasible to produce sensors in formats 35mm and larger with resolution exceeding the fundamental optical limits of most lenses at most apertures of interest (due to diffraction for example) and when a lens is up against those limits, the only way to improve image resolution and detail is to push sensor resolution somewhat beyond the physical limits of the lens, in order to squeeze the most out of the lens. This is already the case when DOF needs require using an aperture ratio high enough that diffraction is the dominant limitation on the resolution given by the lens: about f/4 to f/5.6 I think, and a lot of MF work needs higher aperture ratios than that. Photographers seeking ever high resolution will more and more often be "diffraction limited" rather than "lens aberration limited", and then the last thing you want is to lose some of the lens' resolution by instead being "sensor limited".
EDIT: typos corrected.