Am 22.07.2010 14:12, schrieb Nick Coghlan:
On Thu, Jul 22, 2010 at 9:34 PM, Georg Brandl <g.brandl@gmx.net> wrote:
So, I thought there wasn't a difference in performance for this use case (which is compiling a lot of regexes and matching most of them only a few times in comparison). However, I found that looking at the regex caching is very important in this case: re._MAXCACHE is by default set to 100, and regex._MAXCACHE to 1024. When I set re._MAXCACHE to 1024 before running the test suite, I get times around 18 (!) seconds for re.
That still fits with the compile/match performance trade-off changes between re and regex though. It does make it clear this isn't going to be a win across the board though - things like test suites are going to have more one-off regex operations than a long-running web server or a filesystem or database scanning operation.
Sure -- I don't think this is a showstopper for regex. However if we don't include regex in a future version, we might think about increasing MAXCACHE a bit, and maybe not clear the cache when it reaches its max length, but rather remove another element. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.