[Python-Dev] [Python-checkins] cpython: Issue #13165: stringbench is now available in the Tools/stringbench folder.
Terry Reedy
tjreedy at udel.edu
Mon Apr 9 20:54:03 CEST 2012
Some comments...
On 4/9/2012 11:09 AM, antoine.pitrou wrote:
> http://hg.python.org/cpython/rev/704630a9c5d5
> changeset: 76179:704630a9c5d5
> user: Antoine Pitrou<solipsis at pitrou.net>
> date: Mon Apr 09 17:03:32 2012 +0200
> summary:
> Issue #13165: stringbench is now available in the Tools/stringbench folder.
...
> diff --git a/Tools/stringbench/stringbench.py b/Tools/stringbench/stringbench.py
> new file mode 100755
> --- /dev/null
> +++ b/Tools/stringbench/stringbench.py
> @@ -0,0 +1,1483 @@
> +
Did you mean to start with a blank line?
> +# Various microbenchmarks comparing unicode and byte string performance
> +# Please keep this file both 2.x and 3.x compatible!
Which versions of 2.x? In particular
> +dups = {}
> + dups[f.__name__] = 1
Is the use of a dict for a set a holdover that could be updated, or
intentional for back compatibility with 2.whatever and before?
> +# Try with regex
> + at uses_re
> + at bench('s="ABC"*33; re.compile(s+"D").search((s+"D")*300+s+"E")',
> + "late match, 100 characters", 100)
> +def re_test_slow_match_100_characters(STR):
> + m = STR("ABC"*33)
> + d = STR("D")
> + e = STR("E")
> + s1 = (m+d)*300 + m+e
> + s2 = m+e
> + pat = re.compile(s2)
> + search = pat.search
> + for x in _RANGE_100:
> + search(s1)
If regex is added to stdlib as other than re replacement, we might want
option to use that instead or in addition to the current re.
> +#### Benchmark join
> +
> +def get_bytes_yielding_seq(STR, arg):
> + if STR is BYTES and sys.version_info>= (3,):
> + raise UnsupportedType
> + return STR(arg)
> + at bench('"A".join("")',
> + "join empty string, with 1 character sep", 100)
I am puzzled by this. Does str.join(iterable) internally branch on
whether the iterable is a str or not, so that that these timings might
be different from equivalent timings with list of strings?
What might be interesting, especially for 3.3, is timing with non-ascii
BMP and non-BMP chars both as joiner and joined.
tjr
More information about the Python-Dev
mailing list