[Python-Dev] Revising RE docs

Guido van Rossum guido at python.org
Sun Sep 4 05:10:31 CEST 2005


On 9/2/05, Gareth McCaughan <gmccaughan at synaptics-uk.com> wrote:
> On Thursday 2005-09-01 18:09, Guido van Rossum wrote:
> 
> > They *are* cached and there is no cost to using the functions instead
> > of the methods unless you have so many regexps in your program that
> > the cache is cleared (the limit is 100).
> 
> Sure there is; the cost of looking them up in the cache.
> 
>     >>> import re,timeit
> 
>     >>> timeit.re=re
>     >>> timeit.Timer("""re.search(r"(\d*).*(\d*)", "abc123def456")""").timeit(1000000)
>     7.6042091846466064
> 
>     >>> timeit.r = re.compile(r"(\d*).*(\d*)")
>     >>> timeit.Timer("""r.search("abc123def456")""").timeit(1000000)
>     2.6358869075775146
> 
>     >>> timeit.Timer().timeit(1000000)
>     0.091850996017456055
> 
> So in this (highly artificial toy) application it's about 7.5/2.5 = 3 times
> faster to use the methods instead of the functions.

Yeah, but the cost is a constant -- it is not related to the cost of
compiling the re. (You should've shown how much it cost if you
included the compilation in each search.)

I haven't looked into this, but I bet the overhead you're measuring is
actually the extra Python function call, not the cache lookup itself.
I also notice that _compile() is needlessly written as a varargs
function -- all its uses pass it exactly two arguments.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)


More information about the Python-Dev mailing list