While reverse interpolating the AtoB1 tables avoids the gamut boundary issue, or better,using the raw device Lab data, either one is going to be pretty slow and is outside any standard ICC color management engine I'm aware of.
There's nothing that makes it non ICC, but yes, it's going to be noticeably slower. Fully threaded RGB on a modern machine with the aim of just more accurate clipping isn't going to be that slow though (I'm guessing a fraction of a second, even with an accuracy-at-all-costs type of algorithm that I use in ArgyllCMS).
Generally, the errors that are inside the gamut boundaries by at least one grid separation have quite low error. Perhaps a special CMM where near boundary colors are handled specially by AtoB1 interpolation to locate the best device values? Are you aware of products that do that?
An approach that might be faster would be to have two B2A tables, one that doesn't clip (although you'd have to scale the device values to allow for out of range output on grid values just outside the gamut boundary), and one that always clips, even for grid values just inside the gamut boundary. So you convert via the first, and if you get an out of gamut device value, you look up the second. The transition isn't likely to be seamless though, so there'd still have to be some sort of explicit transition region to smooth things over.
Of course all this is only a concern for colorimetric rendering. If you're doing perceptual, a rounded out compression near the gamut surface is probably what you are after anyway :-)