[Python-ideas] re.compile_lazy - on first use compiled regexes

Sun Mar 24 04:39:28 CET 2013

On Sat, Mar 23, 2013 at 4:03 PM, Bruce Leban <bruce at leapyear.org> wrote:

> To summarize:
>
> - compiling regexes is slow so applications frequently compute it once and
> save it
> - compiling all the regexes at startup slows down startup for regexes that
> may never be used
> - a common pattern is to compute once at time of use and it would be nice
> to optimize this pattern
> - the regex library has a cache feature which means that frequently it
> will be optimized automatically
> - however, there's no guarantee that the regex you care about won't fall
> out of the cache.
>
> I think this addresses all the issues better than compute_lazy:
>
> re.compile(r'...', keep=True)
>
> When keep=True is specified, the regex library keeps the cached value for
> the lifetime of the process. The regex is computed only once on first use
> and you don't need to create a place to store it. Furthermore, if you use
> the same regex in more than one place, once with keep=True, the other uses
> will automatically be optimized.
>

Nice summary. The real problem is, I think, that many developers are not
aware of the default caching done by the re module. I have a hunch that if
this was better known, fewer manual optimization attempts would spring up.

How about examining what the size of that re cache is, and how much memory
it typically occupies. Perhaps this cache can be changed to fit more
regexes?

Eli
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20130323/5a1ca215/attachment.html>