[Python-ideas] re.compile_lazy - on first use compiled regexes

Antoine Pitrou solipsis at pitrou.net
Sat Mar 23 20:41:59 CET 2013


On Sat, 23 Mar 2013 20:41:49 +0100
"M.-A. Lemburg" <mal at egenix.com> wrote:
> On 23.03.2013 14:53, Ezio Melotti wrote:
> > On Sat, Mar 23, 2013 at 2:52 PM, M.-A. Lemburg <mal at egenix.com> wrote:
> >> On 23.03.2013 13:43, Ronny Pfannschmidt wrote:
> >>>
> >>>> Wouldn't it make sense to add a way to pickle or marshal compiled REs ?
> >>>>
> >>>> The precompiled REs could then be loaded directly from the
> >>>> pickle, avoiding the compiling overhead on startup.
> >>>>
> >>>
> >>> as far as i can tell that would need regex as part of the syntax to make sense fort use in modules
> >>> i dont think such a change would be accepted and i dont even what to deal with the potential
> >>> bikeshedding for such an integration
> >>
> >> I wasn't thinking of making it part of the Python byte-code.
> >>
> >> It would suffice to add pickle/marshal support for the
> >> compiled RE code. This could then be loaded from a string
> >> embedded in the module code on startup.
> >>
> >> E.g.
> >> # rx = re.compile('.*')
> >> rx = pickle.loads('asdfsadfasdf')
> >>
> > 
> > According to http://bugs.python.org/issue11454#msg170697, this would
> > be twice as slow.
> 
> RE objects can already be pickled and that's also what was
> measured in that message. It doesn't actually pickle
> the RE "byte" code, though. Instead it just pickles the
> pattern and the flags and does a complete recompile when
> unpickling the RE object.
> 
> I was talking about actually pickling the RE "byte" code
> that the re module generates to avoid the overhead of
> having to recompile the pattern.

The problem is that for pickles to be durable, you would then need some
kind of compatibility guarantee for the re bytecode.

Otherwise you might add the bytecode version number to the pickle, and
then ignore the bytecode when loading the pickle and the current
version number is different; but that would mean people would lose the
benefit of caching without being warned, which would make performance
more fickle.

Regards

Antoine.





More information about the Python-ideas mailing list