Because of this and modern sensor performance I think Fuji have made a smart move sticking with aps-c and building a lens line for it.
Conversely I think we'll also see top end full frame mirrorless bodies getting bigger once their advantages over dslrs really start to be realised, right now they need to stay small to be a compelling alternative.
Current mirrorless cameras can replace SLRs for anything except action photography, and do somewhat better than SLRs in many situations where the subject is nonmoving and the photographer has time to carefully set up and fine-tune the shot. Full-frame is what really makes it attractive - without full-frame, mirrorless bodies wouldn't be competitive with top-end SLRs (or Leicas) at all, and would only be able to compete for the lower-end amateur market.
At the moment, the larger SLR form factor buys you two things that mirrorless does not, which make them capable of action photography:
1: Fast AF for tracking moving targets
2: A lag-free viewfinder for accurate composition
Sure, there are other useful things that come out of the larger body (dual cards, larger battery, etc.) but those of themselves don't make the larger body capable of shooting action, nor the smaller body incapable.
At the same time, EVFs have a number of features which could be immensely useful in action photography, should the two aforementioned issues be overcome:
1: Real-time exposure simulation - no more wondering 'is this too dark' or 'is this too bright' and relying on automatic metering, which can be a crapshoot in some difficult lighting situations
2: For those shooting JPEG (for news, some journalistic use and other situations requiring immediate publication with minimal time for postprocessing) real-time WB simulation
3: Better visibility in very dark situations and/or with slow lenses. The brightness of an OVF is entirely limited by the optics. The brightness of an EVF, on the other hand, improves with every improvement in high-ISO capability. Optics haven't gotten any brighter in decades, and physically can't (at least not without drastically increasing the size of the viewfinder) but ISO capability has been making leaps and bounds.
4: Continuous recording - having a buffer that saves the last 0.5-1 second of footage every time you press the shutter button could save a lot of missed shots in action photography. Not that you can't shoot action without it, but it could do wonders for the keeper rate.
Therefore, having an action-capable mirrorless camera would be a very desirable thing.
To overcome these two issues, a number of things need to happen.
AF speed is the easy part - fundamentally, mirrorless and SLR cameras use the same method (PDAF) to achieve fast, reasonably-accurate focus, except that the on-sensor method eliminates microadjustment-correctable back/front focus, and mirrorless cameras (and Live View) also have the option of supplementary methods of AF (CDAF, AI-driven means such as eye focus) to achieve greater accuracy when time permits. The current slowness of mirrorless AF speed has nothing to do with the method used, but the AF processing and lens drive speed of mirrorless vs SLR cameras - with dedicated AF processors and more powerful batteries, current SLRs can track targets and move glass far faster than current mirrorless offerings (and the 1Dx, at least, can drive lenses even faster, due to its higher-voltage, higher-wattage battery). Boost the battery power of a mirrorless camera and add in a dedicated AF chip, to put the hardware on equal footing with SLRs, and there's no reason mirrorless focus speed and tracking ability should be any less than a top-end SLR. After all, the AF systems use identical technology.
A lag-free viewfinder is trickier, but hardly insurmountable. No EVF, of course, can be completely lag-free. But, by reducing viewfinder lag to 30ms or less, it can be rendered so fast as to be imperceptible to the human eye/brain (it takes longer than that for the brain to even register that the eye has seen something) and insignificant in comparison to human reaction time (which is in the order of hundreds of milliseconds, although anticipation can mitigate this if you're already looking at/tracking a subject) and, in the case of SLRs, mirror lag. Such viewfinders exist - you couldn't perform laparoscopic or robot-assisted surgery without them. In fact, some surgical procedures, chiefly in neurosurgery, which used to be performed under direct visualisation only became laparoscopic after viewfinders/cameras became fast enough to mitigate viewfinder lag - prior to that, they were just too delicate and precise to be performed with laggy cameras. (obviously, an appendicectomy is far more forgiving, hence could be performed laparoscopically much earlier). They're also driven by powerful, power-hungry processors.
Of course, all these features take space, so I wouldn't expect an action-capable mirrorless camera to be any smaller in length and width (although possibly lighter and thinner) than an SLR - at least not until denser power sources or lower-energy chips and EVFs become available. But, due to the features possible with through-the-sensor acquisition and EVFs but not possible with mirrors, it could be far more capable.