Am 23.07.2010 11:16, schrieb Hrvoje Niksic:
On 07/22/2010 01:34 PM, Georg Brandl wrote:
Timings (seconds to run the test suite):
re 26.689 26.015 26.008 regex 26.066 25.797 25.865
So, I thought there wasn't a difference in performance for this use case (which is compiling a lot of regexes and matching most of them only a few times in comparison). However, I found that looking at the regex caching is very important in this case: re._MAXCACHE is by default set to 100, and regex._MAXCACHE to 1024. When I set re._MAXCACHE to 1024 before running the test suite, I get times around 18 (!) seconds for re.
This seems to point to re being significantly *faster* than regexp, even in matching, and as such may be something the author would want to look into.
Nick writes:
That still fits with the compile/match performance trade-off changes between re and regex though.
The performance trade-off should make regex slower with sufficiently small compiled regex cache, when a lot of time is wasted on compilation. But as the cache gets larger (and, for fairness, of the same size in both implementations), regex should outperform re. Georg, would you care to measure if there is a difference in performance with an even larger cache?
I did measure that, and there are no significant differences in timing. I also did the check the other way around, and restricting regex._MAXCACHE to 100 I got from 26 seconds to 42 seconds. (Nick, is that enough data to calculate A and B now? ;) Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.