[Python-ideas] re.compile_lazy - on first use compiled regexes

M.-A. Lemburg mal at egenix.com
Sat Mar 23 22:19:02 CET 2013


On 23.03.2013 20:41, Antoine Pitrou wrote:
> On Sat, 23 Mar 2013 20:41:49 +0100
> "M.-A. Lemburg" <mal at egenix.com> wrote:
>> On 23.03.2013 14:53, Ezio Melotti wrote:
>>> On Sat, Mar 23, 2013 at 2:52 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>>> On 23.03.2013 13:43, Ronny Pfannschmidt wrote:
>>>>>
>>>>>> Wouldn't it make sense to add a way to pickle or marshal compiled REs ?
>>>>>>
>>>>>> The precompiled REs could then be loaded directly from the
>>>>>> pickle, avoiding the compiling overhead on startup.
>>>>>>
>>>>>
>>>>> as far as i can tell that would need regex as part of the syntax to make sense fort use in modules
>>>>> i dont think such a change would be accepted and i dont even what to deal with the potential
>>>>> bikeshedding for such an integration
>>>>
>>>> I wasn't thinking of making it part of the Python byte-code.
>>>>
>>>> It would suffice to add pickle/marshal support for the
>>>> compiled RE code. This could then be loaded from a string
>>>> embedded in the module code on startup.
>>>>
>>>> E.g.
>>>> # rx = re.compile('.*')
>>>> rx = pickle.loads('asdfsadfasdf')
>>>>
>>>
>>> According to http://bugs.python.org/issue11454#msg170697, this would
>>> be twice as slow.
>>
>> RE objects can already be pickled and that's also what was
>> measured in that message. It doesn't actually pickle
>> the RE "byte" code, though. Instead it just pickles the
>> pattern and the flags and does a complete recompile when
>> unpickling the RE object.
>>
>> I was talking about actually pickling the RE "byte" code
>> that the re module generates to avoid the overhead of
>> having to recompile the pattern.
> 
> The problem is that for pickles to be durable, you would then need some
> kind of compatibility guarantee for the re bytecode.
> 
> Otherwise you might add the bytecode version number to the pickle, and
> then ignore the bytecode when loading the pickle and the current
> version number is different; but that would mean people would lose the
> benefit of caching without being warned, which would make performance
> more fickle.

Hmm, I'm not following you. The patterns would get compiled once
at Python build time when installing the stdlib. The bytecode
version wouldn't change for those compiled patterns - unless, of
course, you upgrade to a new Python version, but then you'd
rebuild the bytecode versions of the REs :-)

To make them generally useful, I agree, you would have to add a
RE compiler version to the bytecode pickle, but AFAICS this
should not affect the usefulness for the stdlib RE cache.

The whole idea is really very similar to the Python VM bytecode
caching Python is using to speedup imports of modules.

Perhaps we could have a GSoC student give it a try and see
whether it makes results in noticable startup time speedups ?!

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Mar 23 2013)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...       http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2013-03-13: Released eGenix pyOpenSSL 0.13 ...    http://egenix.com/go39

::::: Try our mxODBC.Connect Python Database Interface for free ! ::::::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/



More information about the Python-ideas mailing list