[Python-ideas] re.compile_lazy - on first use compiled regexes

Joao S. O. Bueno jsbueno at python.org.br
Sat Mar 23 15:14:54 CET 2013

On 23 March 2013 11:09, Joao S. O. Bueno <jsbueno at python.org.br> wrote:
> On 23 March 2013 10:15, Chris Angelico <rosuav at gmail.com> wrote:
>> On Sat, Mar 23, 2013 at 11:52 PM, M.-A. Lemburg <mal at egenix.com> wrote:
>>> It would suffice to add pickle/marshal support for the
>>> compiled RE code. This could then be loaded from a string
>>> embedded in the module code on startup.
>>> E.g.
>>> # rx = re.compile('.*')
>>> rx = pickle.loads('asdfsadfasdf')
>> What would that do to versioning? Currently, as I understand it, the
>> compiled RE is a complete implementation detail; at any time, the re
>> module can change how it stores it. Pickles (again, as I understand it
>> - I may be wrong) should be readable on other versions of Python
>> (forward-compatibly, at least), on other architectures, etc, etc;
>> would this be a problem?
>> Alternatively, at the expense of some storage space, there could be
>> some kind of fallback. If the tag doesn't perfectly match the creating
>> Python's tag, it ignores the dumped version and just compiles it as
>> normal.
> Pleas enote that compiled reg-expes can already be pickled
> straightforwardly.
> Unfortunatelly, to avoid the version issues you mention, from overlooking
> the pickled string, it looks like it just calls "re.compile" with the original
> regex on unpickle - so there would be no gain from the implementation as is.
> (I should stop being that lazy, and check what does unpickling a
> regexp actually does =
> Ah --Ezio found it while I was at it)

There it is, straight in re.py:

import copyreg

def _pickle(p):
    return _compile, (p.pattern, p.flags)

copyreg.pickle(_pattern_type, _pickle, _compile)

So, pickling regexps as they are now are definitely no speed-up.

More information about the Python-ideas mailing list