Hi, Guillermo. I think your analysis is confusing dynamic range with shadow noise and coming to the wrong conclusion. Theoretically your two scenarios are entirely equivalent, both in depth of field and in shadow noise. There is no benefit to using a shorter focal length and cropping.
As you correctly note, both scenarios collect the same number of photons from the subject. The simplest way to see this is to note that the physical lens aperture is the same (6mm in your example). Exposure time is assumed to be the same, constrained by something (perhaps motion blur) other than subject brightness.
Consider a small patch of dark shadow in the original scene. That patch contributes the same number of photons to the image in both scenarios. The photons are spread out over a larger sensor area and more pixels in one scenario, but that does not matter. Since noise comes from photon statistics (neglecting sensor contributions), the total noise of the patch will be the same in both scenarios.
In the long focal length image, the photons from the patch are spread out over 4x as many pixels. Each pixel will therefore have 1/4 signal and 1/2 noise. To make the brightness the same, the signal in each pixel must be amplified 4x (by raising ISO in your example). The end result is that each pixel will have 2x noise. However, to make a print of the same size, those pixels will require 4x less area magnification. In effect, the pixels get averaged 4:1 in the final print. This averaging reduces image noise by 2x, exactly compensating for the extra noise per pixel.
I think the confusion in your analysis comes from equating dynamic range with shadow noise. You note that increasing ISO by a factor of 4 reduces dynamic range by 2 stops. That is true because of highlight clipping, not shadow noise. In your two scenarios, pixels will have the same final brightness (f/stop reduction compensates for ISO multiplication), so highlight clipping will be the same.
If shutter speed was not constrained, then the large focal length scenario would actually win. Just lower ISO to the original level and increase exposure time 4x. Shadow noise will improve because of the increased photon statistics, with no change in image brightness or highlight clipping. This is the fundamental DR advantage of large sensors.