On 23.03.2013 22:20, Antoine Pitrou wrote:
On Sat, 23 Mar 2013 22:19:02 +0100 "M.-A. Lemburg" mal@egenix.com wrote:
Hmm, I'm not following you. The patterns would get compiled once at Python build time when installing the stdlib. The bytecode version wouldn't change for those compiled patterns - unless, of course, you upgrade to a new Python version, but then you'd rebuild the bytecode versions of the REs :-)
To make them generally useful, I agree, you would have to add a RE compiler version to the bytecode pickle, but AFAICS this should not affect the usefulness for the stdlib RE cache.
Ah, you're talking only about the stdlib. Well, sure, that would work, but we have to remember to regenerate those pickles by hand each time the re bytecode is updated (which doesn't happen often, admittedly). That's a bit of a maintenance burden.
No, that would happen at build time automatically. setup.py would create the module with the pickled RE bytecodes by scanning the stdlib modules for RE patterns, the re module would use this to seed its cache.
That's the high-level idea. I'm sure there are a few pitfalls along the way :-)
The whole idea is really very similar to the Python VM bytecode caching Python is using to speedup imports of modules.
Except that the VM bytecode caching works automatically and transparently :-)
Should be the same for the REs in the stdlib. The user wouldn't notice (except for the speedup hopefully). Code in the stdlib compiling the REs wouldn't need to be touched either, since the cache in the re module would simply reuse the compiled versions.
Perhaps we could have a GSoC student give it a try and see whether it makes results in noticable startup time speedups ?!
That's a rather smallish topic for a GSoC project, IMHO.
Well, you could extend it by adding some RE optimization tasks on top of it :-)