There's not a clear-cut answer to this, because the question of what's "right" or "wrong" largely depends on what type of image processing you want to do.
Yes, it is true that in 8-bit mode and 16-bit mode in Photoshop (and most rendered image processing software) you are typically operating on gamma-encoded images and therefore results will depend on your choice of working space.
I don't know where they got the number of 186 for sRGB corresponding to middle gray. I suppose it depends on what your definition of middle gray is. If we go by the CIE L*a*b* model, middle gray is L* = 50. On an 8-bit scale, this is about 118 for sRGB and Adobe RGB (gamma 2.2 for the latter, almost gamma 2.2 for the former) and 100 for ProPhoto RGB (gamma 1.
.
For certain types of operations, such as white balance, using an encoded space is definitely the wrong thing to do, particularly when the curve is not strictly a power curve. (In the case of white balance, doing it post-raw-conversion in PS is almost always the wrong thing to do, not because of gamma encoding, but because nearly all raw converters apply a highly non-linear tone mapping curve to the raw data.)
If you are trying to any type of physical simulations (such as colorimetric calculations, for the additions of color for HDR processing) then you definitely want linear data, not tone-mapped and not gamma encoded. This is partly why 32-bit mode is different in PS: it is generally assumed that if you have 32-bit-per-channel data then you're really trying to take linear data and do some tone mapping with it (e.g., HDR).
Other types of computations are much less clear cut, such as sharpening. Most sharpening algorithms are based on edge enhancement, and internally, typically blurs of different sizes are used to find the edges (yes, I know it's counterintuitive to think of blurs when discussing sharpening). But whether you perform this blur in linear space or gamma-encoded space is debatable; it has different effects on whether you end up emphasizing lighter halos vs. darker halos.
In general I would say that the reason why the answer isn't clear-cut is because many phenomena happen at the linear level (so you need linear data if you're trying to model that) but humans perceive them non-linearly (such as lightness, where a material with only 18% reflectance appears "middle gray"). So the question is whether you should tackle the problem at the physical level or the perceptual level.