[Python-Dev] [Python-checkins] cpython: Issue #13165: stringbench is now available in the Tools/stringbench folder.

Terry Reedy tjreedy at udel.edu
Mon Apr 9 20:54:03 CEST 2012


Some comments...

On 4/9/2012 11:09 AM, antoine.pitrou wrote:
> http://hg.python.org/cpython/rev/704630a9c5d5
> changeset:   76179:704630a9c5d5
> user:        Antoine Pitrou<solipsis at pitrou.net>
> date:        Mon Apr 09 17:03:32 2012 +0200
> summary:
>    Issue #13165: stringbench is now available in the Tools/stringbench folder.
...

> diff --git a/Tools/stringbench/stringbench.py b/Tools/stringbench/stringbench.py
> new file mode 100755
> --- /dev/null
> +++ b/Tools/stringbench/stringbench.py
> @@ -0,0 +1,1483 @@
> +

Did you mean to start with a blank line?

> +# Various microbenchmarks comparing unicode and byte string performance
> +# Please keep this file both 2.x and 3.x compatible!

Which versions of 2.x? In particular

> +dups = {}

> +        dups[f.__name__] = 1

Is the use of a dict for a set a holdover that could be updated, or 
intentional for back compatibility with 2.whatever and before?

> +# Try with regex
> + at uses_re
> + at bench('s="ABC"*33; re.compile(s+"D").search((s+"D")*300+s+"E")',
> +       "late match, 100 characters", 100)
> +def re_test_slow_match_100_characters(STR):
> +    m = STR("ABC"*33)
> +    d = STR("D")
> +    e = STR("E")
> +    s1 = (m+d)*300 + m+e
> +    s2 = m+e
> +    pat = re.compile(s2)
> +    search = pat.search
> +    for x in _RANGE_100:
> +        search(s1)

If regex is added to stdlib as other than re replacement, we might want 
option to use that instead or in addition to the current re.

> +#### Benchmark join
> +
> +def get_bytes_yielding_seq(STR, arg):
> +    if STR is BYTES and sys.version_info>= (3,):
> +        raise UnsupportedType
> +    return STR(arg)

> + at bench('"A".join("")',
> +       "join empty string, with 1 character sep", 100)

I am puzzled by this. Does str.join(iterable) internally branch on 
whether the iterable is a str or not, so that that these timings might 
be different from equivalent timings with list of strings?

What might be interesting, especially for 3.3, is timing with non-ascii 
BMP and non-BMP chars both as joiner and joined.


tjr


More information about the Python-Dev mailing list