String performance regression from python 3.2 to 3.3
Chris Angelico
rosuav at gmail.com
Thu Mar 14 02:47:12 EDT 2013
On Thu, Mar 14, 2013 at 3:05 PM, Steven D'Aprano
<steve+comp.lang.python at pearwood.info> wrote:
> That depends on how you use the strings. Because strings are immutable,
> there isn't really anything like "switching between widths" -- the width
> is set when the string is created, and then remains fixed.
The nearest thing to "switching" is where you repeatedly replace() or
append/slice to add/remove the one non-ASCII character that your
contrived test is using. Let's see...
Python 3.3.0 (v3.3.0:bd8afb90ebf2, Sep 29 2012, 10:55:48) [MSC v.1600
32 bit (Intel)] on win32
ASCII -> ASCII:
>>> timeit.timeit("s=s[:-1]+'\u0034'","s='asdf'*10000",number=10000)
0.14999895238081962
ASCII -> BMP:
>>> timeit.timeit("s=s[:-1]+'\u1234'","s='asdf'*10000",number=10000)
1.7513426985832012
BMP -> BMP:
>>> timeit.timeit("s=s[:-1]+'\u1234'","s='\u1234sdf'*10000",number=10000)
0.22562895563542895
ASCII -> SMP:
>>> timeit.timeit("s=s[:-1]+'\U00012345'","s='asdf'*10000",number=10000)
1.9037101084076369
BMP -> SMP:
>>> timeit.timeit("s=s[:-1]+'\U00012345'","s='\u1234sdf'*10000",number=10000)
1.9659967956821163
SMP -> SMP:
>>> timeit.timeit("s=s[:-1]+'\U00012345'","s='\U00012345sdf'*10000",number=10000)
0.7214749360603037
So there *is* cost to "changing size". Trying them again in Python 2.6 Narrow:
Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit
(Intel)] on win32
ASCII -> ASCII:
>>> timeit.timeit("s=s[:-1]+u'\u0034'","s=u'asdf'*10000",number=10000)
0.53506213778566547
ASCII -> BMP:
>>> timeit.timeit("s=s[:-1]+u'\u1234'","s=u'asdf'*10000",number=10000)
0.57752172412974268
BMP -> BMP:
>>> timeit.timeit("s=s[:-1]+u'\u1234'","s=u'\u1234sdf'*10000",number=10000)
0.53309121690045913
ASCII -> SMP:
>>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'asdf'*10000",number=10000)
0.55128347317885584
BMP -> SMP:
>>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'\u1234sdf'*10000",number=10000)
0.55610140394938412
SMP -> SMP:
>>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'\U00012345sdf'*10000",number=10000)
0.6599570615818493
Much more consistent. (Note that the SMP timings are quite probably a
bit off as the string will continue to grow - I'm taking off one
16-bit character and putting on two.)
I don't have a 2.6 wide build on the same hardware, so these times
don't truly compare to the above ones. This is slower hardware than
the above tests.
Python 2.6.6 (r266:84292, Sep 15 2010, 15:52:39)
[GCC 4.4.5] on linux2
>>> timeit.timeit("s=s[:-1]+u'\u0034'","s=u'asdf'*10000",number=10000)
1.5774970054626465
>>> timeit.timeit("s=s[:-1]+u'\u1234'","s=u'asdf'*10000",number=10000)
1.5743560791015625
>>> timeit.timeit("s=s[:-1]+u'\u1234'","s=u'\u1234sdf'*10000",number=10000)
1.6072981357574463
>>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'asdf'*10000",number=10000)
1.6745591163635254
>>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'\u1234sdf'*10000",number=10000)
1.6705770492553711
>>> timeit.timeit("s=s[:-1]+u'\U00012345'","s=u'\U00012345sdf'*10000",number=10000)
1.7078530788421631
Here's my reading of all these stats. Python 3.3's str is faster than
2.6's unicode when the copy can be done directly (ie when the size
isn't changing), but converting sizes costs a lot (suggestion: memcpy
is blazingly fast, no surprise there). Since MOST string operations
won't change the size, this is a benefit to most programs.
I expect that Python 3.2 will behave comparably to the 2.6 stats, but
I don't have 3.2s handy - can someone confirm please?
ChrisA
More information about the Python-list
mailing list