[Python-ideas] re.compile_lazy - on first use compiled regexes

M.-A. Lemburg mal at egenix.com
Sat Mar 23 20:41:49 CET 2013


On 23.03.2013 14:53, Ezio Melotti wrote:
> On Sat, Mar 23, 2013 at 2:52 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>> On 23.03.2013 13:43, Ronny Pfannschmidt wrote:
>>>
>>>> Wouldn't it make sense to add a way to pickle or marshal compiled REs ?
>>>>
>>>> The precompiled REs could then be loaded directly from the
>>>> pickle, avoiding the compiling overhead on startup.
>>>>
>>>
>>> as far as i can tell that would need regex as part of the syntax to make sense fort use in modules
>>> i dont think such a change would be accepted and i dont even what to deal with the potential
>>> bikeshedding for such an integration
>>
>> I wasn't thinking of making it part of the Python byte-code.
>>
>> It would suffice to add pickle/marshal support for the
>> compiled RE code. This could then be loaded from a string
>> embedded in the module code on startup.
>>
>> E.g.
>> # rx = re.compile('.*')
>> rx = pickle.loads('asdfsadfasdf')
>>
> 
> According to http://bugs.python.org/issue11454#msg170697, this would
> be twice as slow.

RE objects can already be pickled and that's also what was
measured in that message. It doesn't actually pickle
the RE "byte" code, though. Instead it just pickles the
pattern and the flags and does a complete recompile when
unpickling the RE object.

I was talking about actually pickling the RE "byte" code
that the re module generates to avoid the overhead of
having to recompile the pattern.

I'm pretty sure this would be faster :-)

>> It would also be possible to seed the re module cache with
>> such pickle.loads, perhaps compiled at Python build time.
>> This would avoid having to change code in the stdlib to
>> load pickles.

On 23.03.2013 14:36, Oleg Broytman wrote:
> On Sat, Mar 23, 2013 at 01:38:19PM +0100, "M.-A. Lemburg" <mal at egenix.com> wrote:
>> Wouldn't it make sense to add a way to pickle or marshal compiled REs ?
>>
>> The precompiled REs could then be loaded directly from the
>> pickle, avoiding the compiling overhead on startup.
>
>    But with an overhead of opening files and unpickling. My wild guess
> [not backed by numbers] is that it would be as slow.
>
>    I suspect the only way to speedup things would be to precompile
> regexps at compile (generating .pyc) time and saving compiled regexps in
> the byte code file.

The patterns used in the stdlib could be precompiled at
Python build time and then stored away in a separate
module, say _re_cache_preloader.py.

This module would then be used to seed the re module
cache. Since the cache works by looking at the patterns,
no recompile would happen when calling re.compile() on
these patterns.

The approach is similar to the way the sysconfig module
information is cached in a separate module to reduce
startup time (something we also did in eGenix PyRun to
improve startup time - and because we had to do it anyway,
since there are no Makefile available to parse when
running pyrun).

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 23 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-03-13: Released eGenix pyOpenSSL 0.13 ...    http://egenix.com/go39

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/



More information about the Python-ideas mailing list