[Python-Dev] Revising RE docs
Guido van Rossum
guido at python.org
Sun Sep 4 05:10:31 CEST 2005
On 9/2/05, Gareth McCaughan <gmccaughan at synaptics-uk.com> wrote:
> On Thursday 2005-09-01 18:09, Guido van Rossum wrote:
>
> > They *are* cached and there is no cost to using the functions instead
> > of the methods unless you have so many regexps in your program that
> > the cache is cleared (the limit is 100).
>
> Sure there is; the cost of looking them up in the cache.
>
> >>> import re,timeit
>
> >>> timeit.re=re
> >>> timeit.Timer("""re.search(r"(\d*).*(\d*)", "abc123def456")""").timeit(1000000)
> 7.6042091846466064
>
> >>> timeit.r = re.compile(r"(\d*).*(\d*)")
> >>> timeit.Timer("""r.search("abc123def456")""").timeit(1000000)
> 2.6358869075775146
>
> >>> timeit.Timer().timeit(1000000)
> 0.091850996017456055
>
> So in this (highly artificial toy) application it's about 7.5/2.5 = 3 times
> faster to use the methods instead of the functions.
Yeah, but the cost is a constant -- it is not related to the cost of
compiling the re. (You should've shown how much it cost if you
included the compilation in each search.)
I haven't looked into this, but I bet the overhead you're measuring is
actually the extra Python function call, not the cache lookup itself.
I also notice that _compile() is needlessly written as a varargs
function -- all its uses pass it exactly two arguments.
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-Dev
mailing list