[Python-Dev] Make re.compile faster

INADA Naoki songofacandy at gmail.com
Mon Oct 2 23:29:40 EDT 2017


Before deferring re.compile, can we make it faster?

I profiled `import string` and small optimization can make it 2x faster!
(but it's not backward compatible)

Before optimize:

import time: self [us] | cumulative | imported package
import time:      2339 |       9623 | string

string module took about 2.3 ms to import.

I found:

* RegexFlag.__and__ and __new__ is called very often.
* _optimize_charset is slow, because re.UNICODE | re.IGNORECASE

diff --git a/Lib/sre_compile.py b/Lib/sre_compile.py
index 144620c6d1..7c662247d4 100644
--- a/Lib/sre_compile.py
+++ b/Lib/sre_compile.py
@@ -582,7 +582,7 @@ def isstring(obj):

 def _code(p, flags):

-    flags = p.pattern.flags | flags
+    flags = int(p.pattern.flags) | int(flags)
     code = []

     # compile info block
diff --git a/Lib/string.py b/Lib/string.py
index b46e60c38f..fedd92246d 100644
--- a/Lib/string.py
+++ b/Lib/string.py
@@ -81,7 +81,7 @@ class Template(metaclass=_TemplateMetaclass):
     delimiter = '$'
     idpattern = r'[_a-z][_a-z0-9]*'
     braceidpattern = None
-    flags = _re.IGNORECASE
+    flags = _re.IGNORECASE | _re.ASCII

     def __init__(self, template):
         self.template = template

patched:
import time:      1191 |       8479 | string

Of course, this patch is not backward compatible. [a-z] doesn't match with
'ı' or 'ſ' anymore.
But who cares?

(in sre_compile.py)
    # LATIN SMALL LETTER I, LATIN SMALL LETTER DOTLESS I
    (0x69, 0x131), # iı
    # LATIN SMALL LETTER S, LATIN SMALL LETTER LONG S
    (0x73, 0x17f), # sſ

There are some other `re.I(GNORECASE)` options in stdlib. I'll check them.

More optimization can be done with implementing sre_parse and sre_compile
in C.
But I have no time for it in this year.

Regards,
-- 
Inada Naoki <songofacandy at gmail.com>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20171003/c3df69d4/attachment-0001.html>


More information about the Python-Dev mailing list