Speeding up regular expression compilation

In the python-dev archives I find remarks about the old pre module being much faster at compiling regular expressions than the new sre module. My own experiences are that pre is about twenty times as fast. Since my application uses a lot of simple patterns which are matched on short strings (file names actually), the pattern compilation time is taking half the CPU cycles of my program. The faster execution of sre apparently doesn't compensate for the slower compile time. Is the plan to implement the sre module in C getting closer to being done? Is there a trick to make compiling patterns go faster? I'm already falling back to the pre module with Python 2.2 and older. With Python 2.3 this generates a warning message, thus I don't do it there. I considered copying the 2.2 version of pre.py into my application, but this will stop working as soon as the support for pre is dropped (the compiled C code won't be there). Thus it would be only a temporary fix. I don't care about the Unicode support. -- LAUNCELOT: Isn't there a St. Aaaaarrrrrrggghhh's in Cornwall? ARTHUR: No, that's Saint Ives. "Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD /// Bram Moolenaar -- Bram@Moolenaar.net -- http://www.Moolenaar.net \\\ /// Creator of Vim - Vi IMproved -- http://www.Vim.org \\\ \\\ Project leader for A-A-P -- http://www.A-A-P.org /// \\\ Help AIDS victims, buy here: http://ICCF-Holland.org/click1.html ///

(better on python-list@python.org than here, btw) Bram> Is there a trick to make compiling patterns go faster? Not really. Note though that the sre module caches compiled regular expressions. How many it caches depends on the size of sre._MAXCACHE (default is 100). If you have many more regular expressions than that, you'll spend a lot of time compiling them. You might find it helpful to boost that number. If you're adventurous, you might investigate recasting the sre_compile._compile function as C code. If you use an Intel CPU, another alternative might be to use psyco. Skip

Barry Warsaw wrote:
I'm already caching all the compiled patterns. It's the first-time compile that is consuming time, there are a lot of patterns. But half a second to compile them is too much, the whole program may not run longer than a second. BTW. I've changed the code to use pre.py on Python 2.3 (with the warning removed) as a temporary solution. The problem will be back with 2.4... The reason I sent this to the development list is that I thought this could be solved on the library side. Changing the Python code sounds like working around the real problem. -- BRIDGEKEEPER: What is your favorite colour? GAWAIN: Blue ... No yelloooooww! "Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD /// Bram Moolenaar -- Bram@Moolenaar.net -- http://www.Moolenaar.net \\\ /// Creator of Vim - Vi IMproved -- http://www.Vim.org \\\ \\\ Project leader for A-A-P -- http://www.A-A-P.org /// \\\ Help AIDS victims, buy here: http://ICCF-Holland.org/click1.html ///

If you're adventurous, you might investigate recasting the sre_compile._compile function as C code.
Or Pyrex code. :-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Bram Moolenaar <Bram@moolenaar.net> writes:
Is there a trick to make compiling patterns go faster?
If you compile the same regular expression at every program startup, and want to reduce the time for that, you can cPickle the compile expression, and restore it from the string. If that fails (because the format of compiled expressions has failed), you should fall back to compiling expressions, and optionally save the new version. Regards, Martin

(better on python-list@python.org than here, btw) Bram> Is there a trick to make compiling patterns go faster? Not really. Note though that the sre module caches compiled regular expressions. How many it caches depends on the size of sre._MAXCACHE (default is 100). If you have many more regular expressions than that, you'll spend a lot of time compiling them. You might find it helpful to boost that number. If you're adventurous, you might investigate recasting the sre_compile._compile function as C code. If you use an Intel CPU, another alternative might be to use psyco. Skip

Barry Warsaw wrote:
I'm already caching all the compiled patterns. It's the first-time compile that is consuming time, there are a lot of patterns. But half a second to compile them is too much, the whole program may not run longer than a second. BTW. I've changed the code to use pre.py on Python 2.3 (with the warning removed) as a temporary solution. The problem will be back with 2.4... The reason I sent this to the development list is that I thought this could be solved on the library side. Changing the Python code sounds like working around the real problem. -- BRIDGEKEEPER: What is your favorite colour? GAWAIN: Blue ... No yelloooooww! "Monty Python and the Holy Grail" PYTHON (MONTY) PICTURES LTD /// Bram Moolenaar -- Bram@Moolenaar.net -- http://www.Moolenaar.net \\\ /// Creator of Vim - Vi IMproved -- http://www.Vim.org \\\ \\\ Project leader for A-A-P -- http://www.A-A-P.org /// \\\ Help AIDS victims, buy here: http://ICCF-Holland.org/click1.html ///

If you're adventurous, you might investigate recasting the sre_compile._compile function as C code.
Or Pyrex code. :-) Greg Ewing, Computer Science Dept, +--------------------------------------+ University of Canterbury, | A citizen of NewZealandCorp, a | Christchurch, New Zealand | wholly-owned subsidiary of USA Inc. | greg@cosc.canterbury.ac.nz +--------------------------------------+

Bram Moolenaar <Bram@moolenaar.net> writes:
Is there a trick to make compiling patterns go faster?
If you compile the same regular expression at every program startup, and want to reduce the time for that, you can cPickle the compile expression, and restore it from the string. If that fails (because the format of compiled expressions has failed), you should fall back to compiling expressions, and optionally save the new version. Regards, Martin
participants (5)
-
Barry Warsaw
-
Bram Moolenaar
-
Greg Ewing
-
martin@v.loewis.de
-
Skip Montanaro