[Python-Dev] caching in the stdlib? (was: New regex module for 3.2?)

Stefan Behnel stefan_ml at behnel.de
Wed Jul 28 07:29:00 CEST 2010


R. David Murray, 28.07.2010 03:43:
> On Tue, 27 Jul 2010 08:27:35 +0200, Stefan Behnel wrote:
>> Gregory P. Smith, 27.07.2010 07:40:
>>> A max cache size of 100 was too small.  I just increased it to 500 in the
>>> py3k branch along with implementing a random replacement cache overflow
>>> policy.  It now randomly drops 20% of the compiled regular expression cache
>>> instead of simply dropping the entire cache on overflow.
>>>
>>> With the regex_v8 benchmark, the better cache replacement policy sped it up
>>> ~7% while raising the cache size on top of that (likely meaning the cache
>>> was never overflowing) sped it up ~25%.
>>>
>>> Random replacement without dropping everything at least means apps thrashing
>>> the cache degrade much more gracefully.
>>
>> The same algorithm should be helpful in ElementTree's ElementPath module.
>
> We recently added the old re cache-clearing strategy to
> fnmatch, because previously its cache would grow indefinitely.
> It sounds like this should be applied there as well.
>
> That's three...time to figure out how to share the code?

What about actually putting it visibly into the stdlib? Except for files, I 
didn't see much about caching there, which seems like a missing battery to 
me. Why not do it as with the collections module and add stuff as it comes in?

Stefan



More information about the Python-Dev mailing list