[Python-Dev] caching in the stdlib? (was: New regex module for 3.2?)
stefan_ml at behnel.de
Wed Jul 28 07:29:00 CEST 2010
R. David Murray, 28.07.2010 03:43:
> On Tue, 27 Jul 2010 08:27:35 +0200, Stefan Behnel wrote:
>> Gregory P. Smith, 27.07.2010 07:40:
>>> A max cache size of 100 was too small. I just increased it to 500 in the
>>> py3k branch along with implementing a random replacement cache overflow
>>> policy. It now randomly drops 20% of the compiled regular expression cache
>>> instead of simply dropping the entire cache on overflow.
>>> With the regex_v8 benchmark, the better cache replacement policy sped it up
>>> ~7% while raising the cache size on top of that (likely meaning the cache
>>> was never overflowing) sped it up ~25%.
>>> Random replacement without dropping everything at least means apps thrashing
>>> the cache degrade much more gracefully.
>> The same algorithm should be helpful in ElementTree's ElementPath module.
> We recently added the old re cache-clearing strategy to
> fnmatch, because previously its cache would grow indefinitely.
> It sounds like this should be applied there as well.
> That's three...time to figure out how to share the code?
What about actually putting it visibly into the stdlib? Except for files, I
didn't see much about caching there, which seems like a missing battery to
me. Why not do it as with the collections module and add stuff as it comes in?
More information about the Python-Dev