[Python-ideas] re.compile_lazy - on first use compiled regexes
Joao S. O. Bueno
jsbueno at python.org.br
Sat Mar 23 15:09:30 CET 2013
On 23 March 2013 10:15, Chris Angelico <rosuav at gmail.com> wrote:
> On Sat, Mar 23, 2013 at 11:52 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>> It would suffice to add pickle/marshal support for the
>> compiled RE code. This could then be loaded from a string
>> embedded in the module code on startup.
>> # rx = re.compile('.*')
>> rx = pickle.loads('asdfsadfasdf')
> What would that do to versioning? Currently, as I understand it, the
> compiled RE is a complete implementation detail; at any time, the re
> module can change how it stores it. Pickles (again, as I understand it
> - I may be wrong) should be readable on other versions of Python
> (forward-compatibly, at least), on other architectures, etc, etc;
> would this be a problem?
> Alternatively, at the expense of some storage space, there could be
> some kind of fallback. If the tag doesn't perfectly match the creating
> Python's tag, it ignores the dumped version and just compiles it as
Pleas enote that compiled reg-expes can already be pickled
Unfortunatelly, to avoid the version issues you mention, from overlooking
the pickled string, it looks like it just calls "re.compile" with the original
regex on unpickle - so there would be no gain from the implementation as is.
(I should stop being that lazy, and check what does unpickling a
regexp actually does =
Ah --Ezio found it while I was at it)
> Hmm. Here's a mad thought - a bit of latticed casementing, if you
> like. Could the compiled regexes be stored in the .pyc file? That
> already has version tagging done. All it'd take is some sort of
> extension mechanism that says "hey, here's some additional data that
> the pyc might want to make use of". Or would that overly complicate
I can't see how this could be achieved but for adding a special
syntax that would compile reg-exps at parsing time. Then, we might as well use
Perl instead :-)
But maybe some custom serializing could go straight into the
sre_code that would proper serialize its objects as python-bytecode,
and them some helper functions to load them from a custom made pyc file.
These pre-generated pycs would be built at Python build time.
More information about the Python-ideas