[Python-ideas] Create a StringBuilder class and use it everywhere
M.-A. Lemburg
mal at egenix.com
Mon Aug 29 11:27:23 CEST 2011
Dirkjan Ochtman wrote:
> On Thu, Aug 25, 2011 at 11:45, M.-A. Lemburg <mal at egenix.com> wrote:
>> I think you should use cStringIO in your class implementation.
>> The list + join idiom is nice, but it has the disadvantage of
>> creating and keeping alive many small string objects (with all
>> the memory overhead and fragmentation that goes along with it).
>
> AFAIK using cStringIO just for string building is much slower than
> using list.append() + join(). IIRC we tested some micro-benchmarks on
> this for Mercurial output (where it was a significant part of the
> profile for some commands). That was on Python 2, of course, it may be
> better in io.StringIO and/or Python 3.
Turns our you're right (list.append must have gotten a lot faster
since I last tested this years ago, or I simply misremembered
the results).
> python2.6 teststringbuilding.py array cstringio listappend
Running test array ...
669.68 ms
Running test cstringio ...
563.95 ms
Running test listappend ...
389.22 ms
> python2.7 teststringbuilding.py array cstringio listappend
Running test array ...
775.32 ms
Running test cstringio ...
679.88 ms
Running test listappend ...
375.19 ms
Here's the Python2 code:
"""
TIMEIT_N = 10
N = 1000000
SIZES = (2, 10, 23, 30, 33, 22, 15, 16, 27)
N_STRINGS = len(SIZES)
STRINGS = ['x' * SIZES[i] for i in range(N_STRINGS)]
REFERENCE = ''.join(STRINGS[i % N_STRINGS] for i in xrange(N))
def cstringio():
import cStringIO
s = cStringIO.StringIO()
write = s.write
for i in xrange(N):
write(STRINGS[i % N_STRINGS])
result = s.getvalue()
assert result == REFERENCE
def array():
import array
s = array.array('c')
write = s.fromstring
for i in xrange(N):
write(STRINGS[i % N_STRINGS])
result = s.tostring()
assert result == REFERENCE
def listappend():
l = []
append = l.append
for i in xrange(N):
append(STRINGS[i % N_STRINGS])
result = ''.join(l)
assert result == REFERENCE
if __name__ == '__main__':
import sys, timeit
for test in sys.argv[1:]:
print 'Running test %s ...' % test
t = timeit.timeit('%s()' % test,
'from __main__ import %s' % test,
number=TIMEIT_N)
print ' %.2f ms' % (t / TIMEIT_N * 1e3)
"""
Aside: For some reason cStringIO and array got slower in Python 2.7.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Source (#1, Aug 29 2011)
>>> Python/Zope Consulting and Support ... http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ... http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ... http://python.egenix.com/
________________________________________________________________________
2011-10-04: PyCon DE 2011, Leipzig, Germany 36 days to go
::: Try our new mxODBC.Connect Python Database Interface for free ! ::::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
More information about the Python-ideas
mailing list