regex module - proper implementation of alternation?
Since the 'regex' module is a candidate for inclusion into the stdlib, I figured this would be a good place to ask. While discussing something related in pss (https://github.com/eliben/pss), Ben Hoyt brought to my attention that the implementation of alternation (foo|bar) in Python's default regex module (the SRE implementation) is very inefficient. And indeed, looking there it seems that | gets translated to an opcode that simply means going over all the alternatives in a loop trying to match each. This is not how a proper regex engine should implement alternation! A common advice given to Python programmers is to combine their regexes into a single one with |. This makes code faster, but it turns out that it's far from its full potential because the combination doesn't go full way to the DFA as it's supposed to. A question about 'regex' - is it implemented properly there? Eli
participants (1)
-
Eli Bendersky