Issue filed for the performance issue:

With that change and running on tip of Mako on my laptop now reports 1.25x slower which is much better than it was. This performance issue might also explain why all of the regex compilation benchmarks are worse under Python 3.3 by a decent margin.

lru_cache on re._compile_typed