Let me point out that sports and wildlife photographers are the least likely users of mirrorless. EVF lag is the main issue, along with really fast really robust AF with supertelephotos. Actual resolution is less important - a great shot at 6 MP is still a great shot for 99% of sports uses. With wildlife there is some cropping, so more resolution is good, up to a point.
6MP isn't even 4k video. With 8k video, you'll be getting at least 33MP, likely more. If a stills action camera can't capture at least as much detail as a frame grab from a video camera, then there's little point to the stills camera. As for wildlife, there's a reason many wildlife shooters use the D810 or 5Ds instead of a dedicated action body, or use their 7D2 backup body as much as their 1Dx.
We still haven't seen what a full-featured Sony AF system can do. The A7rII and A6300 do very well with Sony lenses (seems about the same as the 5D3 with Canon lenses), but these are miniature cameras with limited processing power and limited battery power.
EVF lag is the big issue, although the larger battery and greater processing power of an A9 should allow it to greatly cut down on the lag issues that plague smaller mirrorless cameras. The other option is to treat it like a video camera when shooting sports - pan and focus like when shooting video, only triggering the actual burst at key moments. Even the slowest EVF lag is faster than human reaction time anyway. It would require a different shooting technique from that which action stills photographers are used to, though.
A large percentage of wildlife and birds are active at dawn and dusk, so there is some point at which noise is an issue when shooting with shutter speeds upwards of 1/1000 sec.
Of course, it's the overall noise level that matters, not whether the image is divided in to 20 million or 80 million pixels. The higher-resolution sensor will give you more spatial detail and the same noise level. The tradeoff is in frame rate, not whole-image noise. If the bottleneck is in the card write speed rather than the capture rate, you can even sacrifice resolution for frame rate through pixel binning, without losing effective sensor area.