I would make it simple: the easier end of the light range to be identified as correctly captured by a digital camera thanks to the clipping information provided are the highlights, correct? so start by the highlights.
1. Expose as much as you can right before starting to clip the highlights of interest and shoot.
2. Now look at the camera display: are there important areas of the scene that you wish to lift in postprocessing to show their textures, that still display underexposed?
YES?: then you need more shots. Increase shutter by +2EV (or even +3EV), less is a waste, and shoot again. Repeat 2
3. Mix the information from the shots in an optimum way
That's the method. Since it's very convenient not to touch the camera between the different shots, the best thing is use AEB, for example the {-2,0,+2} scheme, and adjust it so that the least exposed shot, i.e. the one named '-2', corresponds to the shot calculated in 1.
To do that you just need to increase shutter by +2EV right after calculating the exposure in 1, and then activate the AEB. Always M mode of course.
If 3 shots were not enough (which is very rare in real life situations if the first shot was a proper ETTR), go on shooting with increasing shutter exposure.
Regards.