The "gigapxl" project and its website
http://www.gigapxl.com are interesting, with lots of carefully described technical details about the challenges of ultra-detailed imaging. I have one significant quibble though, which means that for practical purposes, they really have "only" about 100 million useful pixels worth of detail.
This can easily be seen in some of the crops provided; despite corresponding to about a megapixel of the image or more, they are as soft and lacking in fine detail as web-formatted images of less than 100,000 pixels.
The problem come from the fact that at each of the numerous stages that limit resolution (amospheric turbulelence, lens abberations, diffraction, film resolution, etc.) they require 50% MTF, a fairly standard design target in optics. In plain language, they require no more than a one-stop loss of contrast
at each stage. But these losses combine; the percentages multiply. So with about five such stages, their system loses about five stops of contrast, or has combined MTF of around 5%, and this is generally considered to be so low as to be useless, unless your goal is to photograph ultra-high contrast transmissive test patterns.
One thing that interests me a lot is their study of atmospheric effects; they need to use an ultrawide FOV to get atmospheric limits under control; for a normal angular field of view, 50% MTF is limited to about about 3,000 line pairs per picture height. With a medium that dos not use Bayer inerpooltion (like ascanning back), that gives a useful limit of 6,000 pixels high. With Bayer pattern sensors, maybe up to 10,000 pixels high.
Adding in other unavoidable resolution losses due to lenses, and the current 4,000 pixel high 22MP sensors are about half way to the useful limit as far as "line pairs of resolution", or a quarter of the way if you count pixels.
Scanning backs already go beyond 6,000 pixels high, so they are probably already at the limit for angular FOV out to moderately wide, with only ultra-wide likely to benefit from much more.
But that brings me to didger's suggestion of blending frames.
Once the sensor is scanning rather than single shot, so that you need a completely stationary subject, it is probably easier and better to blend multiple frames. For one thing, bleding is much easier on lens design, since you can use a lens of normal to narrow FOV, for which optical abberations are far easier to control. For another, one can add in dynamic range blending with mutliple frames on each part of the scene.
So my imaginary ultra-high resolution camera uses a high resolution sensor (the smallest viable pixels), an extremely sharp lens (macro?, telephoto?), a very solid tripod, and a robotic camera pointing and control system to automate taking the multiple frames.