I have also wondered about how the 3:2 ratio became entrenched some decades ago, despite the fact that most or all standard print shapes were less elongated than that (even "snapshot" sized prints used to be shapes like 4"x5", 3.5"x5", 9x12cm before the standard moved to 4"x6"/10x15cm).
Part of the story is the original Leica design, doubling up the 35mm movie frame of 18mmx24mm to get 36mmx24mm. But a few other shapes were experimented with, none catching on (except for a bit of "half-frame" 18x24mm).
My new speculation is that 24x36mm is in fact an excellent format for use within the constraint of using the widely available, very economical, 35mm film and its 24mm maximum width, because
a) the smallness of 35mm film makes it desirable to be able to use that full 24mm width of the film when printing at all the common shapes
the commonly used print shapes all lie in the range from about 1.25:1 to 1.5:1 (3 1/2"x5", 4"x5", 4"x6", 5"x7", 8"x10", 11"x14", 16"x20", etc., and in some European countries 9x12cm, 18x24cm used to be common).
Making the frame shape the widest of these, 3:2 or 24mmx36mm, allows every common print shape to be achieved by cropping only at the sides, still using the maximum possible 24mm height; any narrower frame (like 24x32mm for 4:3 shape) would require cropping to less than 24m height to get 3:2 print shapes (about 22x32mm for 3:2 shape from a 24x32mm negative.)
That is, I suspect that the success of the 24x36mm frame is not that it is the "ideal" shape (no single ideal print shape exists), but that it is minimizes loss of resolution when cropping by being at the wide end of the range of commonly used print shapes.
P. S. I see a good reason why the 4:3 (1.33...:1) shape is used by every current digital camera model with sensor size of up to 4/3" format, and by the new 22MP digital backs, and by most current scanning backs for large format. This shape is close to the middle of the common print shape range and so for photographers who use shapes scattered over this range, this sensor shape minimizes the average "pixel wastage". However, for people who mostly prefer the more panoramic shapes like 3:2 and up (this is meant to be a landscape oriented forum after all), 3:2 sensor shape makes more sense.