On Thu, Jul 22, 2010 at 7:42 AM, Georg Brandl <g.brandl@gmx.net> wrote:
Am 22.07.2010 14:12, schrieb Nick Coghlan:
On Thu, Jul 22, 2010 at 9:34 PM, Georg Brandl <g.brandl@gmx.net> wrote:
So, I thought there wasn't a difference in performance for this use case (which is compiling a lot of regexes and matching most of them only a few times in comparison). However, I found that looking at the regex caching is very important in this case: re._MAXCACHE is by default set to 100, and regex._MAXCACHE to 1024. When I set re._MAXCACHE to 1024 before running the test suite, I get times around 18 (!) seconds for re.
It might be fun to do a pygments based macro benchmark. Just have it syntax highlight itself and time it.
Sure -- I don't think this is a showstopper for regex. However if we don't include regex in a future version, we might think about increasing MAXCACHE a bit, and maybe not clear the cache when it reaches its max length, but rather remove another element.
+50 for the last idea. Collin encountered a problem two summers ago in Mondrian where we were relying on the regex cache and were surprised to find that it cleared itself after filling up, rather than using LRU or random eviction. Reid