[Python-ideas] re.compile_lazy - on first use compiled regexes

Sun Mar 24 08:38:32 CET 2013

Gregory P. Smith, 24.03.2013 00:48:
> On Sat, Mar 23, 2013 at 4:34 PM, Bruce Leban wrote:
>> On Sat, Mar 23, 2013 at 4:14 PM, Gregory P. Smith wrote:
>>> keep=True defeats the purpose of a caching strategy.  An re.compile call
>>> within some code somewhere is typically not in a position to know if it is
>>> going to be called a lot.
>>>
>>> I think the code, as things are now, with dynamic construction at runtime
>>> based on a simple test is the best of both worlds to avoid the more
>>> complicated cost of calling re.compile and going through its cache logic.
>>>  If the caching is ever is improved in the future to be faster, the code
>>> can arguably be simplified to use re.search or re.match directly and rely
>>> solely on the caching.
>>>
>>> ie: don't change anything.
>>
>> Truth is people are currently doing caching themselves, by compiling and
>> then keeping the compiled regex. Saying they're not in a position to know
>> whether or not to do that isn't going to change that. Is it worthwhile
>> having the regex library facilitate this manual caching?
> 
> In the absense of profiling numbers showing otherwise, i'd rather see all
> forms of manual caching like the conditional checks or a keep=True go away
> as it's dirty and encourages premature "optimization".

+1

If I had been "more aware" of the re internal cache during the last years,
I would have avoided at least a couple of re.compile() calls in my code, I
guess.

Maybe this is something that the documentation of re.compile() can help
with, by telling people explicitly that this apparently cool feature of
pre-compiling actually has a drawback in it (startup time + a bit of memory
usage) and that they won't notice a runtime difference in most cases anyway.

Stefan