[...] Web designers (and the stakeholders that employ web designers) wish for the same thing: they want their web sites to look the best they can on the devices that people actually use.
Then perhaps you need to consider something that most effective testing plans do in my experience, and incorporate tests on representatives of the equipment that your viewers actually have. If your only frame of reference is a synthetic standard like sRGB that most uncalibrated displays don't match, and your own high-end calibrated display, this doesn't tell you how your images will look on those viewers' monitors. In software terms, this would be somewhat like a developer telling me "well, it compiles and runs on my dev machine" when I have a flood of user reports describing problems on their own machines.

I respectfully disagree with both you and Royce that monitors don't fall into some kind of Bell curve distribution. Yes, we sometimes see purple faces, but they are outliers and fit in one of the the extremely skinny long tails of the distribution.
You're free to disagree of course. My counter example is the scores of displays I've evaluated or worked with; in Andrew's case it's probably thousands. I know, that sounds like the old internet argument "I'm right because I have X, Y, Z experience."

Still, I wouldn't treat the entire population of uncalibrated displays as a single uniform distribution in terms of their performance because that theory doesn't match the real experience I see all the time. The reason it's not a single curve is that it's in fact multiple curves that cluster around different skew factors: LCD vs. CRT (for those who still have them), age (performance doesn't degrade on a smooth curve over time), wide gamut vs. narrow, IPS panel vs. TFT (combined with viewing angle), CCFL vs. LED backlight, size of panel (contributing to good panel uniformity or not), mobile/laptop vs. desktop, ambient lighting conditions such as dark workroom vs. bright fluorescent lit room vs. bright natural light room, matte vs. glossy, resolution including high DPI scaling vs. not, 6-bit vs. 8-bit vs. still-rare 10-bit video pipeline, Mac (ColorSync) vs. Windows (no general system colour management), user-controlled settings like brightness or even calibration, etc.
You can try to target some kind of average or median in this soup of variables if you like. All it means is that chunks of the population that cluster elsewhere will be further out from your presumed centre of the curve.
But really all of this is sort of angels dancing on the head of a pin. The simple fact is that calibrated or not, different monitors don't match. That's observable fact. How far they mismatch, and why that is, is causing you concern. Okay. If you want to produce a single image that looks good on all displays, the best you can do is to constrain it to a low common denominator. And even then, it's not going to look good on certain clusters of displays, for certain values of "good". Just look at the display performance of almost the entire collection of all smartphones prior to the past 12 - 18 months or so. They were frankly horrid, and certainly as a massive population of image-viewing devices I'd argue they did not share a smooth curve with desktop displays.
After thinking what Royce said about primaries, I wonder if I would get closer to the center of the distribution curve if I had chosen the sRGB PA version instead of the wide gamut PA241W? If I am aiming for sRGB monitors (and minilabs) exclusively.
If you wanted to focus on sRGB-only then you certainly could just use sRGB class displays on your workstation. But I don't think that would be particularly helpful to what you're trying to do. If the wide gamut PA241W is already nailing sRGB essentially 100%, which it is, using a narrow gamut NEC PA monitor won't really change anything for you in a material way. That monitor likely will nail sRGB pretty much 100% as well. Either way you'll still have a calibrated colour-critical display, and your viewers won't. And since wide gamut monitors are increasing in availability and you stated a competing goal of future proofing your images for coming new standards, targeting sRGB only opens you to the reverse issue of how they will look on improperly configured wide gamut displays with viewers who have those.
Back to the example you started with. The profile you showed for the Surface Pro 3 is one of the best I've personally seen for any mobile device, and better than most laptop displays I've seen. But I highly doubt it would match an sRGB NEC display any better than it matches the PA241W. It's not an issue of the PA241W being a wide gamut display. It's the issue that, even when calibrated & profiled, the SP3 is unable to fully cover sRGB and perhaps has other issues as well such as less than perfect neutrality due to the LED backlight, presumed 8-bit video LUT, or both.
But this is all theory. Try some experiments. If it's a priority, find a way to get ahold of some other monitors and set up a small test bench. Run through various scenarios and see what happens. I'm not trying to be argumentative

just providing some expectation management and illustrate why you observed what you did in this case, and how it might generalize out. (Or not.)
Your earlier stated desire for colour management to make different monitors match is not going to happen -- not in the general case, and not even in most specific cases. Plus few of your viewers will be calibrating & profiling their monitors anyway. So if you back away from that, what's left? Probably something like producing the best images you can on a good quality, properly configured display, and then testing how they look on a variety of representative viewer displays if you truly care to make that effort.
I'll be curious to see whether setting i1Profiler to one of the LED modes improves the Surface Pro 3 a bit...