Re: [Python-ideas] re.compile_lazy - on first use compiled regexes

23 Mar 2013

      On Sat, 23 Mar 2013 15:35:18 +0100
Masklinn  wrote:
...
On 2013-03-23, at 14:34 , Antoine Pitrou wrote:
...
On Sat, 23 Mar 2013 14:26:30 +0100
Masklinn  wrote:
...
Wouldn't it be better if there are *few* different regexes? Since the
module itself caches 512 expressions (100 in Python 2) and does not use
an LRU or other "smart" cache (it just clears the whole cache dict once
the limit is breached as far as I can see), *and* any explicit call to
re.compile will *still* use the internal cache (meaning even going
through re.compile will count against the _MAXCACHE limit), all regex
uses throughout the application (including standard library &al) will
count against the built-in cache and increase the chance of the regex
we want cached to be thrown out no?
Well, it mostly sounds like the re cache should be made a bit smarter.
It should, but even with that I think it makes sense to explicitly cache
regexps in the application, the re cache feels like an optimization more
than semantics.
Well, of course it is. A cache *is* an optimization.
...
Either that, or the re module should provide an instantiable cache object
for lazy compilation and caching of regexps e.g.
re.local_cache(maxsize=None) which would return an lru-caching proxy to
re. Thus the caching of a module's regexps would be under the control of
the module using them if desired (and important)
IMO that's the wrong way to think about it. The whole point of a cache
is that the higher levels don't have to think about it. Your CPU has
L1, L2 and sometimes L3 caches so that you don't have to allocate your
critical data structures in separate "faster" memory areas.

That said, if you really want to manage your own cache, it should
already be easy to do so using functools.lru_cache() (or any
implementation of your choice). The re module doesn't have to provide a
dedicated caching primitive.

But, really, the point of a cache is to optimize performance *without*
you tinkering with it.

Regards

Antoine.