I've just submitted to SF #757624 the third implementation of the SRE changes.
Unlike the other implementations, this one doesn't recurse at all, and doesn't change the meaning of the opcodes in the engine. Indeed, the original logic is mostly unchanged, and the recursion has been removed by a code reorganization. Most of the magic is done using "jumps", and "context saving" inside SRE_MATCH. It's even possible to reenable the recursive scheme by enabling the USE_RECURSION macro.
Besides the regression tests, I have tested this using Fredrik's suggestion, with a 4MB XML file and xmllib/tokenize modules. There was a very small slow down (i.e. about 37.4s in old implementation vs. 38.0s in new implementation).
Bugs previously mentioned are also fixed in this implementation.
This implementation also includes the protection against zero-width match in greedy expressions ("(?=a)*", "a"). Even if this implementation doesn't get into 2.3, I was thinking about backporting this specific fix to 2.3. Unfortunately, this would need an additional local variable in SRE_MATCH of the current implementation, and I'm afraid to reduce the recursion limit even more by introducing it.
Additionally, after a quick look over perl's regex engine, I also have some ideas for matching optimization in SRE, but I'll wait a little bit before fiddling with it.