[Python-ideas] re.compile_lazy - on first use compiled regexes

Thu Mar 28 02:25:17 CET 2013

On 28/03/13 10:06, Terry Reedy wrote:
> On 3/24/2013 3:38 AM, Stefan Behnel wrote:
>> Gregory P. Smith, 24.03.2013 00:48:
>>> In the absense of profiling numbers showing otherwise, i'd rather see all
>>> forms of manual caching like the conditional checks or a keep=True go away
>>> as it's dirty and encourages premature "optimization".
>>
>> +1
>>
>> If I had been "more aware" of the re internal cache during the last years,
>> I would have avoided at least a couple of re.compile() calls in my code, I
>> guess.
>>
>> Maybe this is something that the documentation of re.compile() can help
>> with, by telling people explicitly that this apparently cool feature of
>> pre-compiling actually has a drawback in it (startup time + a bit of memory
>> usage) and that they won't notice a runtime difference in most cases anyway.
>
> With a decent re cache size, .compile seems more like an attractive nuisance that something useful.

On the contrary, I think that it is the cache which is an (unattractive) nuisance.

Like any cache, performance is only indirectly under your control. You cannot know for sure whether re.match(some_pattern, text) will be a cheap cache hit or an expensive re-compilation. All you can do is keep increasing the size of the cache until the chance of a cache miss is "low enough", whatever that means for you, and hope.

I cannot think of any object in the Python standard library where the recommended API is to repeatedly convert from strings each time you need the object. We do this:

x = Decimal(some_string)
y = x**3
z = x.exp()

not this:

y = Decimal(some_string)**3
z = Decimal(some_string).exp()

hoping that the string will be in a cache and the conversion will be fast. So why do we do this?

result = re.match(some_string, text)
other_result = re.match(some_string, other_text)

-- 
Steven