On Fri, Jul 23, 2010 at 8:16 PM, Hrvoje Niksic <hrvoje.niksic@avl.com> wrote:
The performance trade-off should make regex slower with sufficiently small compiled regex cache, when a lot of time is wasted on compilation. But as the cache gets larger (and, for fairness, of the same size in both implementations), regex should outperform re. Georg, would you care to measure if there is a difference in performance with an even larger cache?
That's not quite accurate. The figure that matters is "matches per compilation", and that is ultimately governed by the nature of the application once the cache is sufficiently large (e.g. an application that compiles 10 different regular expressions, but also only matches each one against a single string is going to be slowed down significantly by a switch to regex no matter how big the compilation cache may be). The total runtime for a particular re matching exercise is "(average compilation time)*(number of times compiled) + (average match time)*(number of matches performed)" We know that the new regex module speeds up matching at the cost of slowing down compilation. Whether that is a net increase for the application as a whole depends on 4 things: X: the average speed change in compilation for the application Y: the average speed change in matching for the application A: the proportion of time the application currently spends compiling regular expressions B: the proportion of time the application currently spends matching regular expressions The overall proportional speed change for re operations in a given application is then just X*A + Y*B. For regex to be an improvement for that application, the resulting figure needs to be less than 1. For example, suppose X=2, Y=0.5 (i.e. matching is twice as fast on average, but compilation also takes twice as long), then the overall speed change would be 2*A + 0.5*B. For that to be a wash, the original application needs to spend 1/3 of its expression matching time compiling the expressions and 2/3 actually matching them (the application with the new engine would then spend 2/3 of its time compiling and only 1/3 matching). That's going to be a tough assessment to make in practice, but I think Georg's pygments test suite example shows that there is real world code out there that already spends a lot of time on compilation. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia