String concatenation benchmarking weirdness

Terry Reedy tjreedy at udel.edu
Sat Jan 12 12:31:09 CET 2013


On 1/12/2013 3:38 AM, wxjmfauth at gmail.com wrote:
> from timeit import timeit, repeat
>
> size = 1000
>
> r = repeat("y = x + 'a'", setup = "x = 'a' * %i" % size)
> print('1:', r)
> r = repeat("y = x + 'é'", setup = "x = 'a' * %i" % size)
> print('2:', r)
> r = repeat("y = x + 'œ'", setup = "x = 'a' * %i" % size)
> print('3:', r)
> r = repeat("y = x + '€'", setup = "x = 'a' * %i" % size)
> print('4:', r)
> r = repeat("y = x + '€'", setup = "x = '€' * %i" % size)
> print('5:', r)
> r = repeat("y = x + 'œ'", setup = "x = 'œ' * %i" % size)
> print('6:', r)
> r = repeat("y = é + 'œ'", setup = "é = 'œ' * %i" % size)
> print('7:', r)
> r = repeat("y = é + 'œ'", setup = "é = '€' * %i" % size)
> print('8:', r)
>
>
>
>> c:\python32\pythonw -u "vitesse3.py"
> 1: [0.3603178435286996, 0.42901157137281515, 0.35459694357592086]
> 2: [0.3576409223543202, 0.4272010951864649, 0.3590055732104662]
> 3: [0.3552022735516487, 0.4256544908828328, 0.35824546465278573]
> 4: [0.35488168890607774, 0.4271707696118834, 0.36109528098614074]
> 5: [0.3560675370237849, 0.4261538782668417, 0.36138160167082134]
> 6: [0.3570182634788317, 0.4270155971913008, 0.35770629956705324]
> 7: [0.3556977225493485, 0.4264969117143753, 0.3645634239700426]
> 8: [0.35511247834379844, 0.4259628665308437, 0.3580737510097034]
>> Exit code: 0
>> c:\Python33\pythonw -u "vitesse3.py"
> 1: [0.3053600256152646, 0.3306491917840535, 0.3044963374976518]
> 2: [0.36252767208680514, 0.36937298133086727, 0.3685573415262271]
> 3: [0.7666293438924097, 0.7653473991487574, 0.7630926729867262]
> 4: [0.7636680712265038, 0.7647586103955284, 0.7631395397838059]
> 5: [0.44721085450773934, 0.3863234021671369, 0.45664368355696094]
> 6: [0.44699700013114807, 0.3873974001136613, 0.45167383387335036]
> 7: [0.4465200615491014, 0.387050034441188, 0.45459690419205856]
> 8: [0.44760587465455437, 0.3875261853459726, 0.45421212384964704]
>> Exit code: 0
>
>
> The difference between a correct (coherent) unicode handling and ...

By 'correct' Jim means 'speedy', for a subset of string operations*. 
rather than 'accurate'. In 3.2 and before, CPython does not handle 
extended plane characters correctly on Windows and other narrow builds. 
This is, by the way, true of many other languages. For instance, Tcl 8.5 
and before (not sure about the new 8.6) does not handle them at all. The 
same is true of Microsoft command windows.

* lets try another comparison:

from timeit import timeit
print(timeit("a.encode()", "a = 'a'*10000"))

3.2: 12.1 seconds
3.3    .7 seconds

3.3 is 15 times faster!!! (The factor increases with the length of a.)

A fairer comparison is the approximately 120 micro benchmarks in 
Tools/stringbench.py. Here they are, uncensored, for 3.3.0 and 3.2.3. It 
is in the Tools directory of some distributions but not all (including 
not Windows). It can be downloaded from
http://hg.python.org/cpython/file/6fe28afa6611/Tools/stringbench

In FireFox, Right-click on the stringbench.py link and 'Save link as...'
to somewhere you can run it from.

 >>>
stringbench v2.0
3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:57:17) [MSC v.1600 64 bit 
(AMD64)]
2013-01-12 06:17:51.685781
bytes	unicode
(in ms)	(in ms)	%	comment
========== case conversion -- dense
0.41	0.43	95.2	("WHERE IN THE WORLD IS CARMEN SAN DEIGO?"*10).lower() 
(*1000)
0.42	0.43	95.8	("where in the world is carmen san deigo?"*10).upper() 
(*1000)
========== case conversion -- rare
0.41	0.43	95.8	("Where in the world is Carmen San Deigo?"*10).lower() 
(*1000)
0.42	0.43	96.3	("wHERE IN THE WORLD IS cARMEN sAN dEIGO?"*10).upper() 
(*1000)
========== concat 20 strings of words length 4 to 15
1.83	1.95	94.1	s1+s2+s3+s4+...+s20 (*1000)
========== concat two strings
0.10	0.10	98.7	"Andrew"+"Dalke" (*1000)
========== count AACT substrings in DNA example
2.46	2.44	100.9	dna.count("AACT") (*10)
========== count newlines
0.77	0.75	103.6	...text.with.2000.newlines.count("\n") (*10)
========== early match, single character
0.30	0.27	110.5	("A"*1000).find("A") (*1000)
0.45	0.06	750.5	"A" in "A"*1000 (*1000)
0.30	0.27	110.4	("A"*1000).index("A") (*1000)
0.24	0.22	107.2	("A"*1000).partition("A") (*1000)
0.33	0.29	116.6	("A"*1000).rfind("A") (*1000)
0.32	0.29	107.9	("A"*1000).rindex("A") (*1000)
0.20	0.21	94.1	("A"*1000).rpartition("A") (*1000)
0.42	0.45	93.4	("A"*1000).rsplit("A", 1) (*1000)
0.39	0.41	95.9	("A"*1000).split("A", 1) (*1000)
========== early match, two characters
0.32	0.27	121.1	("AB"*1000).find("AB") (*1000)
0.45	0.06	729.5	"AB" in "AB"*1000 (*1000)
0.30	0.27	111.2	("AB"*1000).index("AB") (*1000)
0.23	0.28	85.0	("AB"*1000).partition("AB") (*1000)
0.33	0.30	110.6	("AB"*1000).rfind("AB") (*1000)
0.33	0.30	110.5	("AB"*1000).rindex("AB") (*1000)
0.22	0.27	83.1	("AB"*1000).rpartition("AB") (*1000)
0.46	0.47	96.7	("AB"*1000).rsplit("AB", 1) (*1000)
0.44	0.48	90.9	("AB"*1000).split("AB", 1) (*1000)
========== endswith multiple characters
0.24	0.29	84.0	"Andrew".endswith("Andrew") (*1000)
========== endswith multiple characters - not!
0.26	0.28	92.9	"Andrew".endswith("Anders") (*1000)
========== endswith single character
0.25	0.28	90.0	"Andrew".endswith("w") (*1000)
========== formatting a string type with a dict
N/A	0.67	0.0	"The %(k1)s is %(k2)s the 
%(k3)s."%{"k1":"x","k2":"y","k3":"z",} (*1000)
========== join empty string, with 1 character sep
N/A	0.06	0.0	"A".join("") (*100)
========== join empty string, with 5 character sep
N/A	0.06	0.0	"ABCDE".join("") (*100)
========== join list of 100 words, with 1 character sep
0.87	1.27	68.8	"A".join(["Bob"]*100)) (*1000)
========== join list of 100 words, with 5 character sep
1.14	1.54	74.0	"ABCDE".join(["Bob"]*100)) (*1000)
========== join list of 26 characters, with 1 character sep
0.27	0.37	72.0	"A".join(list("ABC..Z")) (*1000)
========== join list of 26 characters, with 5 character sep
0.32	0.43	75.7	"ABCDE".join(list("ABC..Z")) (*1000)
========== join string with 26 characters, with 1 character sep
N/A	1.30	0.0	"A".join("ABC..Z") (*1000)
========== join string with 26 characters, with 5 character sep
N/A	1.37	0.0	"ABCDE".join("ABC..Z") (*1000)
========== late match, 100 characters
3.25	3.23	100.5	s="ABC"*33; ((s+"D")*500+s+"E").find(s+"E") (*100)
2.79	2.78	100.4	s="ABC"*33; ((s+"D")*500+"E"+s).find("E"+s) (*100)
1.98	1.94	102.3	s="ABC"*33; (s+"E") in ((s+"D")*300+s+"E") (*100)
3.24	3.23	100.3	s="ABC"*33; ((s+"D")*500+s+"E").index(s+"E") (*100)
4.26	3.62	117.7	s="ABC"*33; ((s+"D")*500+s+"E").partition(s+"E") (*100)
3.23	3.23	100.1	s="ABC"*33; ("E"+s+("D"+s)*500).rfind("E"+s) (*100)
2.32	2.32	100.1	s="ABC"*33; (s+"E"+("D"+s)*500).rfind(s+"E") (*100)
3.23	3.21	100.8	s="ABC"*33; ("E"+s+("D"+s)*500).rindex("E"+s) (*100)
3.58	3.57	100.4	s="ABC"*33; ("E"+s+("D"+s)*500).rpartition("E"+s) (*100)
3.60	3.60	100.0	s="ABC"*33; ("E"+s+("D"+s)*500).rsplit("E"+s, 1) (*100)
3.60	3.56	101.2	s="ABC"*33; ((s+"D")*500+s+"E").split(s+"E", 1) (*100)
========== late match, two characters
0.62	0.58	106.3	("AB"*300+"C").find("BC") (*1000)
0.92	0.82	111.8	("AB"*300+"CA").find("CA") (*1000)
0.73	0.33	218.8	"BC" in ("AB"*300+"C") (*1000)
0.61	0.60	101.0	("AB"*300+"C").index("BC") (*1000)
0.54	0.82	66.4	("AB"*300+"C").partition("BC") (*1000)
0.66	0.63	104.6	("C"+"AB"*300).rfind("CA") (*1000)
0.91	0.88	102.3	("BC"+"AB"*300).rfind("BC") (*1000)
0.65	0.62	105.1	("C"+"AB"*300).rindex("CA") (*1000)
0.53	0.56	94.5	("C"+"AB"*300).rpartition("CA") (*1000)
0.75	0.77	96.6	("C"+"AB"*300).rsplit("CA", 1) (*1000)
0.65	0.67	97.0	("AB"*300+"C").split("BC", 1) (*1000)
========== no match, single character
0.89	0.87	102.3	("A"*1000).find("B") (*1000)
1.03	0.64	159.1	"B" in "A"*1000 (*1000)
0.67	0.68	98.7	("A"*1000).partition("B") (*1000)
0.87	0.85	102.8	("A"*1000).rfind("B") (*1000)
0.67	0.68	98.5	("A"*1000).rpartition("B") (*1000)
0.87	0.87	99.2	("A"*1000).rsplit("B", 1) (*1000)
0.86	0.85	101.5	("A"*1000).split("B", 1) (*1000)
========== no match, two characters
1.22	1.16	104.9	("AB"*1000).find("BC") (*1000)
1.93	2.02	95.2	("AB"*1000).find("CA") (*1000)
1.37	0.94	145.3	"BC" in "AB"*1000 (*1000)
1.39	2.14	65.1	("AB"*1000).partition("BC") (*1000)
2.32	2.31	100.7	("AB"*1000).rfind("BC") (*1000)
1.47	1.44	102.1	("AB"*1000).rfind("CA") (*1000)
2.26	2.27	99.7	("AB"*1000).rpartition("BC") (*1000)
2.46	2.45	100.2	("AB"*1000).rsplit("BC", 1) (*1000)
1.15	1.16	99.1	("AB"*1000).split("BC", 1) (*1000)
========== quick replace multiple character match
0.13	0.12	105.0	("A" + ("Z"*128*1024)).replace("AZZ", "BBZZ", 1) (*10)
========== quick replace single character match
0.12	0.12	105.2	("A" + ("Z"*128*1024)).replace("A", "BB", 1) (*10)
========== repeat 1 character 10 times
0.08	0.10	80.6	"A"*10 (*1000)
========== repeat 1 character 1000 times
0.16	0.18	93.1	"A"*1000 (*1000)
========== repeat 5 characters 10 times
0.11	0.13	84.4	"ABCDE"*10 (*1000)
========== repeat 5 characters 1000 times
0.39	0.41	94.8	"ABCDE"*1000 (*1000)
========== replace and expand multiple characters, big string
2.02	2.36	85.6	"...text.with.2000.newlines...replace("\n", "\r\n") (*10)
========== replace multiple characters, dna
3.12	3.23	96.6	dna.replace("ATC", "ATT") (*10)
========== replace single character
0.33	0.40	82.4	"This is a test".replace(" ", "\t") (*1000)
========== replace single character, big string
0.75	0.86	87.4	"...text.with.2000.lines...replace("\n", " ") (*10)
========== replace/remove multiple characters
0.41	0.48	86.1	"When shall we three meet again?".replace("ee", "") (*1000)
========== split 1 whitespace
0.14	0.18	79.3	("Here are some words. "*2).partition(" ") (*1000)
0.11	0.14	75.1	("Here are some words. "*2).rpartition(" ") (*1000)
0.35	0.39	90.3	("Here are some words. "*2).rsplit(None, 1) (*1000)
0.32	0.38	83.9	("Here are some words. "*2).split(None, 1) (*1000)
========== split 2000 newlines
1.74	2.02	86.3	"...text...".rsplit("\n") (*10)
1.69	1.97	85.5	"...text...".split("\n") (*10)
1.89	2.55	74.0	"...text...".splitlines() (*10)
========== split newlines
0.35	0.39	88.9	"this\nis\na\ntest\n".rsplit("\n") (*1000)
0.34	0.40	86.4	"this\nis\na\ntest\n".split("\n") (*1000)
0.32	0.40	80.7	"this\nis\na\ntest\n".splitlines() (*1000)
========== split on multicharacter separator (dna)
2.28	2.30	99.1	dna.rsplit("ACTAT") (*10)
2.63	2.66	98.9	dna.split("ACTAT") (*10)
========== split on multicharacter separator (small)
0.55	0.69	79.0 
"this--is--a--test--of--the--emergency--broadcast--system".rsplit("--") 
(*1000)
0.58	0.70	82.9 
"this--is--a--test--of--the--emergency--broadcast--system".split("--") 
(*1000)
========== split whitespace (huge)
1.51	2.12	71.4	human_text.rsplit() (*10)
1.51	2.05	73.6	human_text.split() (*10)
========== split whitespace (small)
0.48	0.68	70.1	("Here are some words. "*2).rsplit() (*1000)
0.48	0.64	74.9	("Here are some words. "*2).split() (*1000)
========== startswith multiple characters
0.24	0.25	95.9	"Andrew".startswith("Andrew") (*1000)
========== startswith multiple characters - not!
0.24	0.25	95.7	"Andrew".startswith("Anders") (*1000)
========== startswith single character
0.23	0.25	95.4	"Andrew".startswith("A") (*1000)
========== strip terminal newline
0.09	0.21	44.1	s="Hello!\n"; s[:-1] if s[-1]=="\n" else s (*1000)
0.09	0.12	74.0	"\nHello!".rstrip() (*1000)
0.09	0.12	74.0	"Hello!\n".rstrip() (*1000)
0.09	0.12	71.6	"\nHello!\n".strip() (*1000)
0.09	0.12	73.2	"\nHello!".strip() (*1000)
0.09	0.12	72.9	"Hello!\n".strip() (*1000)
========== strip terminal spaces and tabs
0.09	0.13	69.6	"\t   \tHello".rstrip() (*1000)
0.09	0.13	72.3	"Hello\t   \t".rstrip() (*1000)
0.07	0.08	86.8	"Hello\t   \t".strip() (*1000)
========== tab split
0.59	0.65	90.9	GFF3_example.rsplit("\t", 8) (*1000)
0.55	0.59	94.2	GFF3_example.rsplit("\t") (*1000)
0.52	0.57	90.7	GFF3_example.split("\t", 8) (*1000)
0.52	0.57	90.1	GFF3_example.split("\t") (*1000)
108.87	116.31	93.6	TOTAL
 >>>
stringbench v2.0
3.2.3 (default, Apr 11 2012, 07:12:16) [MSC v.1500 64 bit (AMD64)]
2013-01-12 06:23:05.994000
bytes	unicode
(in ms)	(in ms)	%	comment
========== case conversion -- dense
0.63	3.01	21.0	("WHERE IN THE WORLD IS CARMEN SAN DEIGO?"*10).lower() 
(*1000)
0.63	2.90	21.5	("where in the world is carmen san deigo?"*10).upper() 
(*1000)
========== case conversion -- rare
0.84	2.83	29.8	("Where in the world is Carmen San Deigo?"*10).lower() 
(*1000)
0.50	3.47	14.3	("wHERE IN THE WORLD IS cARMEN sAN dEIGO?"*10).upper() 
(*1000)
========== concat 20 strings of words length 4 to 15
1.82	1.75	103.9	s1+s2+s3+s4+...+s20 (*1000)
========== concat two strings
0.09	0.08	115.5	"Andrew"+"Dalke" (*1000)
========== count AACT substrings in DNA example
2.40	2.64	91.1	dna.count("AACT") (*10)
========== count newlines
0.77	0.75	101.6	...text.with.2000.newlines.count("\n") (*10)
========== early match, single character
0.19	0.18	101.9	("A"*1000).find("A") (*1000)
0.39	0.05	824.7	"A" in "A"*1000 (*1000)
0.19	0.19	96.3	("A"*1000).index("A") (*1000)
0.20	0.22	87.5	("A"*1000).partition("A") (*1000)
0.20	0.20	101.8	("A"*1000).rfind("A") (*1000)
0.20	0.20	101.2	("A"*1000).rindex("A") (*1000)
0.18	0.22	82.5	("A"*1000).rpartition("A") (*1000)
0.41	0.45	91.7	("A"*1000).rsplit("A", 1) (*1000)
0.42	0.43	99.0	("A"*1000).split("A", 1) (*1000)
========== early match, two characters
0.19	0.19	102.3	("AB"*1000).find("AB") (*1000)
0.39	0.05	781.6	"AB" in "AB"*1000 (*1000)
0.19	0.20	97.9	("AB"*1000).index("AB") (*1000)
0.23	0.33	71.1	("AB"*1000).partition("AB") (*1000)
0.20	0.20	101.6	("AB"*1000).rfind("AB") (*1000)
0.20	0.20	100.1	("AB"*1000).rindex("AB") (*1000)
0.22	0.31	70.4	("AB"*1000).rpartition("AB") (*1000)
0.47	0.53	90.0	("AB"*1000).rsplit("AB", 1) (*1000)
0.45	0.52	85.0	("AB"*1000).split("AB", 1) (*1000)
========== endswith multiple characters
0.18	0.18	97.6	"Andrew".endswith("Andrew") (*1000)
========== endswith multiple characters - not!
0.18	0.18	100.4	"Andrew".endswith("Anders") (*1000)
========== endswith single character
0.18	0.18	97.1	"Andrew".endswith("w") (*1000)
========== formatting a string type with a dict
N/A	0.53	0.0	"The %(k1)s is %(k2)s the 
%(k3)s."%{"k1":"x","k2":"y","k3":"z",} (*1000)
========== join empty string, with 1 character sep
N/A	0.05	0.0	"A".join("") (*100)
========== join empty string, with 5 character sep
N/A	0.05	0.0	"ABCDE".join("") (*100)
========== join list of 100 words, with 1 character sep
1.02	1.02	99.6	"A".join(["Bob"]*100)) (*1000)
========== join list of 100 words, with 5 character sep
1.25	1.48	84.4	"ABCDE".join(["Bob"]*100)) (*1000)
========== join list of 26 characters, with 1 character sep
0.31	0.25	122.9	"A".join(list("ABC..Z")) (*1000)
========== join list of 26 characters, with 5 character sep
0.36	0.41	88.4	"ABCDE".join(list("ABC..Z")) (*1000)
========== join string with 26 characters, with 1 character sep
N/A	1.06	0.0	"A".join("ABC..Z") (*1000)
========== join string with 26 characters, with 5 character sep
N/A	1.22	0.0	"ABCDE".join("ABC..Z") (*1000)
========== late match, 100 characters
2.52	2.68	94.0	s="ABC"*33; ((s+"D")*500+s+"E").find(s+"E") (*100)
2.35	3.06	76.9	s="ABC"*33; ((s+"D")*500+"E"+s).find("E"+s) (*100)
1.55	1.61	96.2	s="ABC"*33; (s+"E") in ((s+"D")*300+s+"E") (*100)
2.51	2.68	94.0	s="ABC"*33; ((s+"D")*500+s+"E").index(s+"E") (*100)
3.57	4.66	76.7	s="ABC"*33; ((s+"D")*500+s+"E").partition(s+"E") (*100)
3.23	3.24	99.8	s="ABC"*33; ("E"+s+("D"+s)*500).rfind("E"+s) (*100)
2.35	2.56	91.7	s="ABC"*33; (s+"E"+("D"+s)*500).rfind(s+"E") (*100)
3.23	3.24	99.8	s="ABC"*33; ("E"+s+("D"+s)*500).rindex("E"+s) (*100)
3.58	3.92	91.4	s="ABC"*33; ("E"+s+("D"+s)*500).rpartition("E"+s) (*100)
3.62	3.96	91.4	s="ABC"*33; ("E"+s+("D"+s)*500).rsplit("E"+s, 1) (*100)
2.89	3.38	85.4	s="ABC"*33; ((s+"D")*500+s+"E").split(s+"E", 1) (*100)
========== late match, two characters
0.52	0.52	99.5	("AB"*300+"C").find("BC") (*1000)
0.69	0.90	76.5	("AB"*300+"CA").find("CA") (*1000)
0.67	0.37	179.2	"BC" in ("AB"*300+"C") (*1000)
0.51	0.53	96.8	("AB"*300+"C").index("BC") (*1000)
0.48	0.81	59.3	("AB"*300+"C").partition("BC") (*1000)
0.55	0.55	101.5	("C"+"AB"*300).rfind("CA") (*1000)
0.85	0.85	100.0	("BC"+"AB"*300).rfind("BC") (*1000)
0.55	0.55	100.3	("C"+"AB"*300).rindex("CA") (*1000)
0.52	0.60	87.1	("C"+"AB"*300).rpartition("CA") (*1000)
0.78	0.82	95.4	("C"+"AB"*300).rsplit("CA", 1) (*1000)
0.65	0.72	91.2	("AB"*300+"C").split("BC", 1) (*1000)
========== no match, single character
0.77	0.77	100.6	("A"*1000).find("B") (*1000)
0.98	0.63	155.1	"B" in "A"*1000 (*1000)
0.66	0.66	99.7	("A"*1000).partition("B") (*1000)
0.77	0.77	100.4	("A"*1000).rfind("B") (*1000)
0.66	0.66	99.7	("A"*1000).rpartition("B") (*1000)
0.88	0.88	100.4	("A"*1000).rsplit("B", 1) (*1000)
0.88	0.87	101.2	("A"*1000).split("B", 1) (*1000)
========== no match, two characters
1.19	1.21	98.1	("AB"*1000).find("BC") (*1000)
1.79	2.51	71.2	("AB"*1000).find("CA") (*1000)
1.28	1.08	119.1	"BC" in "AB"*1000 (*1000)
1.10	2.11	52.1	("AB"*1000).partition("BC") (*1000)
2.37	2.37	100.0	("AB"*1000).rfind("BC") (*1000)
1.36	1.36	100.5	("AB"*1000).rfind("CA") (*1000)
2.25	2.26	99.9	("AB"*1000).rpartition("BC") (*1000)
2.38	2.62	90.7	("AB"*1000).rsplit("BC", 1) (*1000)
1.18	1.30	90.1	("AB"*1000).split("BC", 1) (*1000)
========== quick replace multiple character match
0.12	0.32	37.1	("A" + ("Z"*128*1024)).replace("AZZ", "BBZZ", 1) (*10)
========== quick replace single character match
0.12	0.30	37.9	("A" + ("Z"*128*1024)).replace("A", "BB", 1) (*10)
========== repeat 1 character 10 times
0.08	0.09	90.3	"A"*10 (*1000)
========== repeat 1 character 1000 times
0.16	0.19	82.2	"A"*1000 (*1000)
========== repeat 5 characters 10 times
0.11	0.12	98.3	"ABCDE"*10 (*1000)
========== repeat 5 characters 1000 times
0.40	0.58	67.9	"ABCDE"*1000 (*1000)
========== replace and expand multiple characters, big string
1.95	2.13	91.7	"...text.with.2000.newlines...replace("\n", "\r\n") (*10)
========== replace multiple characters, dna
2.93	3.25	90.3	dna.replace("ATC", "ATT") (*10)
========== replace single character
0.25	0.26	96.6	"This is a test".replace(" ", "\t") (*1000)
========== replace single character, big string
0.73	1.01	72.0	"...text.with.2000.lines...replace("\n", " ") (*10)
========== replace/remove multiple characters
0.30	0.34	89.0	"When shall we three meet again?".replace("ee", "") (*1000)
========== split 1 whitespace
0.12	0.13	93.3	("Here are some words. "*2).partition(" ") (*1000)
0.11	0.11	98.8	("Here are some words. "*2).rpartition(" ") (*1000)
0.32	0.37	86.5	("Here are some words. "*2).rsplit(None, 1) (*1000)
0.32	0.33	96.9	("Here are some words. "*2).split(None, 1) (*1000)
========== split 2000 newlines
1.76	2.19	80.5	"...text...".rsplit("\n") (*10)
1.72	2.10	81.9	"...text...".split("\n") (*10)
1.87	2.58	72.4	"...text...".splitlines() (*10)
========== split newlines
0.36	0.34	103.9	"this\nis\na\ntest\n".rsplit("\n") (*1000)
0.35	0.33	105.9	"this\nis\na\ntest\n".split("\n") (*1000)
0.31	0.34	89.7	"this\nis\na\ntest\n".splitlines() (*1000)
========== split on multicharacter separator (dna)
2.18	2.34	93.4	dna.rsplit("ACTAT") (*10)
2.50	2.64	94.5	dna.split("ACTAT") (*10)
========== split on multicharacter separator (small)
0.59	0.62	95.3 
"this--is--a--test--of--the--emergency--broadcast--system".rsplit("--") 
(*1000)
0.55	0.59	93.1 
"this--is--a--test--of--the--emergency--broadcast--system".split("--") 
(*1000)
========== split whitespace (huge)
1.54	2.34	65.5	human_text.rsplit() (*10)
1.51	2.22	68.3	human_text.split() (*10)
========== split whitespace (small)
0.46	0.60	76.5	("Here are some words. "*2).rsplit() (*1000)
0.45	0.51	87.6	("Here are some words. "*2).split() (*1000)
========== startswith multiple characters
0.18	0.18	97.3	"Andrew".startswith("Andrew") (*1000)
========== startswith multiple characters - not!
0.18	0.18	100.1	"Andrew".startswith("Anders") (*1000)
========== startswith single character
0.17	0.18	96.8	"Andrew".startswith("A") (*1000)
========== strip terminal newline
0.11	0.21	52.0	s="Hello!\n"; s[:-1] if s[-1]=="\n" else s (*1000)
0.06	0.07	92.1	"\nHello!".rstrip() (*1000)
0.06	0.07	92.2	"Hello!\n".rstrip() (*1000)
0.06	0.07	91.2	"\nHello!\n".strip() (*1000)
0.06	0.07	91.1	"\nHello!".strip() (*1000)
0.06	0.07	91.1	"Hello!\n".strip() (*1000)
========== strip terminal spaces and tabs
0.07	0.07	89.4	"\t   \tHello".rstrip() (*1000)
0.07	0.07	91.4	"Hello\t   \t".rstrip() (*1000)
0.04	0.05	88.7	"Hello\t   \t".strip() (*1000)
========== tab split
0.57	0.56	100.8	GFF3_example.rsplit("\t", 8) (*1000)
0.53	0.53	100.7	GFF3_example.rsplit("\t") (*1000)
0.49	0.49	101.2	GFF3_example.split("\t", 8) (*1000)
0.51	0.49	103.5	GFF3_example.split("\t") (*1000)
102.13	125.57	81.3	TOTAL

-- 
Terry Jan Reedy





More information about the Python-list mailing list