[Python-Dev] Make re.compile faster

Serhiy Storchaka storchaka at gmail.com
Tue Oct 3 01:35:52 EDT 2017


03.10.17 06:29, INADA Naoki пише:
> Before deferring re.compile, can we make it faster?
> 
> I profiled `import string` and small optimization can make it 2x faster!
> (but it's not backward compatible)

Please open an issue for this.

> I found:
> 
> * RegexFlag.__and__ and __new__ is called very often.
> * _optimize_charset is slow, because re.UNICODE | re.IGNORECASE
> 
> diff --git a/Lib/sre_compile.py b/Lib/sre_compile.py
> index 144620c6d1..7c662247d4 100644
> --- a/Lib/sre_compile.py
> +++ b/Lib/sre_compile.py
> @@ -582,7 +582,7 @@ def isstring(obj):
> 
>   def _code(p, flags):
> 
> -    flags = p.pattern.flags | flags
> +    flags = int(p.pattern.flags) | int(flags)
>       code = []
> 
>       # compile info block

Maybe cast flags to int earlier, in sre_compile.compile()?

> diff --git a/Lib/string.py b/Lib/string.py
> index b46e60c38f..fedd92246d 100644
> --- a/Lib/string.py
> +++ b/Lib/string.py
> @@ -81,7 +81,7 @@ class Template(metaclass=_TemplateMetaclass):
>       delimiter = '$'
>       idpattern = r'[_a-z][_a-z0-9]*'
>       braceidpattern = None
> -    flags = _re.IGNORECASE
> +    flags = _re.IGNORECASE | _re.ASCII
> 
>       def __init__(self, template):
>           self.template = template
> 
> patched:
> import time:      1191 |       8479 | string
> 
> Of course, this patch is not backward compatible. [a-z] doesn't match 
> with 'ı' or 'ſ' anymore.
> But who cares?

This looks like a bug fix. I'm wondering if it is worth to backport it 
to 3.6. But the change itself can break a user code that changes 
idpattern without touching flags. There is other way, but it should be 
discussed on the bug tracker.



More information about the Python-Dev mailing list