[Python-Dev] Make re.compile faster

Serhiy Storchaka storchaka at gmail.com
Tue Oct 3 11:28:21 EDT 2017


03.10.17 17:21, Barry Warsaw пише:
> What if the compiler could recognize constant arguments to re.compile() and do the regex compilation at that point?  You’d need a way to represent the precompiled regex in the bytecode, and it would technically be a semantic change since regex problems would be discovered at compilation time instead of runtime - but that might be a good thing.  You could also make that an optimization flag for opt-in, or a flag to allow opt out.

The representation of the compiled regex is an implementation detail. It 
is even not exposed since the regex is compiled. And it is changed 
faster than bytecode and marshal format. It can be changed even in a 
bugfix release.

For implementing this idea we need:

1. Invent a universal portable regex bytecode. It shouldn't contain 
flaws and limitations and should support all features of Unicode regexps 
and possible extensions. It should also predict future Unicode changes 
and be able to code them.

2. Add support of regex objects in marshal format.

3. Implement an advanced AST optimizer.

4. Rewrite the regex compiler in C or make the AST optimizer able to 
execute Python code.

I think we are far away from this. Any of the above problems is much 
larger and can give larger benefit than changing several microseconds at 
startup.

Forget about this. Let's first get rid of GIL!



More information about the Python-Dev mailing list