On Thu, Jul 22, 2010 at 3:26 PM, Nick Coghlan
<ncoghlan@gmail.com> wrote:
On Fri, Jul 23, 2010 at 12:42 AM, Georg Brandl <
g.brandl@gmx.net> wrote:
> Sure -- I don't think this is a showstopper for regex. However if we don't
> include regex in a future version, we might think about increasing MAXCACHE
> a bit, and maybe not clear the cache when it reaches its max length, but
> rather remove another element.
Yikes, I didn't know it did that. That certainly sounds like it should
be an RFE in its own right - some basic form of Least Recently Used
accounting should be beneficial (although the extra bookkeeping might
hurt scripts that aren't hitting the cache limit).
A max cache size of 100 was too small. I just increased it to 500 in the py3k branch along with implementing a random replacement cache overflow policy. It now randomly drops 20% of the compiled regular expression cache instead of simply dropping the entire cache on overflow.
With the regex_v8 benchmark, the better cache replacement policy sped it up ~7% while raising the cache size on top of that (likely meaning the cache was never overflowing) sped it up ~25%.
Random replacement without dropping everything at least means apps thrashing the cache degrade much more gracefully.
This change should be incorporated into MRAB's regex module in order to keep comparisons fair.
-gps