On 23.03.2013 20:41, Antoine Pitrou wrote:
On Sat, 23 Mar 2013 20:41:49 +0100 "M.-A. Lemburg" email@example.com wrote:
On 23.03.2013 14:53, Ezio Melotti wrote:
On Sat, Mar 23, 2013 at 2:52 PM, M.-A. Lemburg firstname.lastname@example.org wrote:
On 23.03.2013 13:43, Ronny Pfannschmidt wrote:
Wouldn't it make sense to add a way to pickle or marshal compiled REs ?
The precompiled REs could then be loaded directly from the pickle, avoiding the compiling overhead on startup.
as far as i can tell that would need regex as part of the syntax to make sense fort use in modules i dont think such a change would be accepted and i dont even what to deal with the potential bikeshedding for such an integration
I wasn't thinking of making it part of the Python byte-code.
It would suffice to add pickle/marshal support for the compiled RE code. This could then be loaded from a string embedded in the module code on startup.
E.g. # rx = re.compile('.*') rx = pickle.loads('asdfsadfasdf')
According to http://bugs.python.org/issue11454#msg170697, this would be twice as slow.
RE objects can already be pickled and that's also what was measured in that message. It doesn't actually pickle the RE "byte" code, though. Instead it just pickles the pattern and the flags and does a complete recompile when unpickling the RE object.
I was talking about actually pickling the RE "byte" code that the re module generates to avoid the overhead of having to recompile the pattern.
The problem is that for pickles to be durable, you would then need some kind of compatibility guarantee for the re bytecode.
Otherwise you might add the bytecode version number to the pickle, and then ignore the bytecode when loading the pickle and the current version number is different; but that would mean people would lose the benefit of caching without being warned, which would make performance more fickle.
Hmm, I'm not following you. The patterns would get compiled once at Python build time when installing the stdlib. The bytecode version wouldn't change for those compiled patterns - unless, of course, you upgrade to a new Python version, but then you'd rebuild the bytecode versions of the REs :-)
To make them generally useful, I agree, you would have to add a RE compiler version to the bytecode pickle, but AFAICS this should not affect the usefulness for the stdlib RE cache.
The whole idea is really very similar to the Python VM bytecode caching Python is using to speedup imports of modules.
Perhaps we could have a GSoC student give it a try and see whether it makes results in noticable startup time speedups ?!