Regardless of the discussion of where the image in an emulsion actually is, I had been living with the following definitions :
Depth of field : the range of distances from the lens where the image elements are perceived as "sharp" or "in focus", while image elements at distances outside this range were considered "unsharp" or "out of focus". How these are determined/calculated requires some variables to be standardized : degree of enlargement, print size, viewing distance etc.
Depth of focus : the analogous range on the other side of the lens where the FOCUSED image is maximally sharp - as long as the film or sensor is within this range, the recording will be as sharp as possible; if the sensor or film is in the "wrong" position, the image intended to be captured (i.e. what was focused on) is not maximally sharp.
I never spent much time thinking about depth of focus because I assumed that for quite some time, lenses have been good enough to ensure it was adequate. Once this is the case, then the two parameters DoF and DoF) must be related in a predictable way - for example, if the sensor is not positioned correctly, all shots will front or back focus with respect to what is in focus on the viewfinder screen - and it needs to be sent in for calibration.
These definitions had a pleasing symmetry to them which made them easier to remember. Anyone else have a similar set of definitions?? Or are they out of date? In the long-forgotten equation containing 1/f , 1/v and 1/u, v and u refer to the distances from the lens to subject and the other to lens to film and DoF's refer to the ranges of u and v "considered sharp".
For digital, it seems like much of this discussion has left out a relevant item : the AA filter. I believe that this is substantially thicker than the sensor element on the chip is deep. All photons will go through this and some will be shifted to land up to a pixel or two away from where they would have been in its absence. Thus , there will be different amounts of the image still in focus at different depths in the AA filter, but the sensor elements will still only record a composite of those - and this controlled blurring is a good thing to prevent Moire effects.
Don, I have not experienced what describe in terms of focusing through a 4x5 onto an enlarger bed (since I only scan them) but even with my 10x loupe, I only see the transition between "all in focus" and "all not in focus" as I adjust the height of the loupe above the light table. For thick B&W emulsions, your "layers" might be detectable, but with layered colored emulsions, I think Jonathan has made a good case why you shouldn't be able to detect it as you describe (at least not without color shifts).
In your "TTY" figure, if you were to draw lines from the X's and O's to all parts of the exit pupil of the lens, you would see that the lines cross the sensor plane where adjacent or nearby sensor elements would catch them, that's why the X behind a given pixel must be captured by a nearby pixel, and points get blurred. Here the relationship of the two DoFs becomes a bit easier to discuss - the smaller the exit pupil (i.e. aperture) the more closely the rays converge on the same focus plane and the out of focus objects will "focus" either in front of or behind the AA filter, and point objects will be recorded by several pixels.
A couple of cents' worth of rambling
Andy