OK to memoize re objects?

Ethan Furman ethan at stoneleaf.us
Mon Sep 21 16:11:36 CEST 2009


kj wrote:
> In <mailman.120.1253406305.2807.python-list at python.org> Robert Kern <robert.kern at gmail.com> writes:
> 
> 
>>kj wrote:
>>
>>>My Python code is filled with assignments of regexp objects to
>>>globals variables at the top level; e.g.:
>>>
>>>_spam_re = re.compile('^(?:ham|eggs)$', re.I)
>>>
>>>Don't like it.  My Perl-pickled brain wishes that re.compile was
>>>a memoizing method, so that I could use it anywhere, even inside
>>>tight loops, without ever having to worry about the overhead of
>>>regexp compilation.
> 
> 
>>Just use re.search(), etc. They already memoize the compiled regex objects.
> 
> 
> Thanks.
> 
> I find the docs are pretty confusing on this point.  They first
> make the point of noting that pre-compiling regular expressions is
> more efficient, and then *immediately* shoot down this point by
> saying that one need not worry about pre-compiling in most cases.
>>From the docs:
> 
>     ...using compile() and saving the resulting regular expression
>     object for reuse is more efficient when the expression will be
>     used several times in a single program.
> 
>     Note: The compiled versions of the most recent patterns passed
>     to re.match(), re.search() or re.compile() are cached, so
>     programs that use only a few regular expressions at a time
>     needn't worry about compiling regular expressions.
> 
> Honestly I don't know what to make of this...  I would love to see
> an example in which re.compile was unequivocally preferable, to
> really understand what the docs are saying here...
> 
> kynn

Looking in the code for re in 2.5:
.
.
.
_MAXCACHE = 100
.
.
.
     if len(_cache) >= _MAXCACHE:
         _cache.clear()
.
.
.

so when you fill up, you lose the entire cache.  On the other hand, I (a 
re novice, to be sure) have only used between two to five in any one 
program... it'll be a while before I hit _MAXCACHE!

~Ethan~




More information about the Python-list mailing list