OK to memoize re objects?
steven at REMOVE.THIS.cybersource.com.au
Tue Sep 22 04:29:49 CEST 2009
On Mon, 21 Sep 2009 13:33:05 +0000, kj wrote:
> I find the docs are pretty confusing on this point. They first make the
> point of noting that pre-compiling regular expressions is more
> efficient, and then *immediately* shoot down this point by saying that
> one need not worry about pre-compiling in most cases. From the docs:
> ...using compile() and saving the resulting regular expression
> object for reuse is more efficient when the expression will be used
> several times in a single program.
> Note: The compiled versions of the most recent patterns passed to
> re.match(), re.search() or re.compile() are cached, so programs that
> use only a few regular expressions at a time needn't worry about
> compiling regular expressions.
> Honestly I don't know what to make of this... I would love to see an
> example in which re.compile was unequivocally preferable, to really
> understand what the docs are saying here...
I find it entirely understandable. If you have only a few regexes, then
there's no need to pre-compile them yourself, because the re module
caches them. Otherwise, don't rely on the cache -- it may help, or it may
not, no promises are made.
The nature of the cache isn't explained because it is an implementation
detail. As it turns out, the current implementation is a single cache in
the re module, so every module "import re" shares the one cache. The
cache is also completely emptied if it exceeds a certain number of
objects, so the cache may be flushed at arbitrary times out of your
control. Or it might not.
More information about the Python-list