[Python-ideas] Create a StringBuilder class and use it everywhere

M.-A. Lemburg mal at egenix.com
Mon Aug 29 11:27:23 CEST 2011


Dirkjan Ochtman wrote:
> On Thu, Aug 25, 2011 at 11:45, M.-A. Lemburg <mal at egenix.com> wrote:
>> I think you should use cStringIO in your class implementation.
>> The list + join idiom is nice, but it has the disadvantage of
>> creating and keeping alive many small string objects (with all
>> the memory overhead and fragmentation that goes along with it).
> 
> AFAIK using cStringIO just for string building is much slower than
> using list.append() + join(). IIRC we tested some micro-benchmarks on
> this for Mercurial output (where it was a significant part of the
> profile for some commands). That was on Python 2, of course, it may be
> better in io.StringIO and/or Python 3.

Turns our you're right (list.append must have gotten a lot faster
since I last tested this years ago, or I simply misremembered
the results).

> python2.6 teststringbuilding.py array cstringio listappend
Running test array ...
   669.68 ms
Running test cstringio ...
   563.95 ms
Running test listappend ...
   389.22 ms

> python2.7 teststringbuilding.py array cstringio listappend
Running test array ...
   775.32 ms
Running test cstringio ...
   679.88 ms
Running test listappend ...
   375.19 ms

Here's the Python2 code:

"""
TIMEIT_N = 10
N = 1000000
SIZES = (2, 10, 23, 30, 33, 22, 15, 16, 27)
N_STRINGS = len(SIZES)
STRINGS = ['x' * SIZES[i] for i in range(N_STRINGS)]
REFERENCE = ''.join(STRINGS[i % N_STRINGS] for i in xrange(N))

def cstringio():
    import cStringIO
    s = cStringIO.StringIO()
    write = s.write
    for i in xrange(N):
        write(STRINGS[i % N_STRINGS])
    result = s.getvalue()
    assert result == REFERENCE

def array():
    import array
    s = array.array('c')
    write = s.fromstring
    for i in xrange(N):
        write(STRINGS[i % N_STRINGS])
    result = s.tostring()
    assert result == REFERENCE

def listappend():
    l = []
    append = l.append
    for i in xrange(N):
        append(STRINGS[i % N_STRINGS])
    result = ''.join(l)
    assert result == REFERENCE

if __name__ == '__main__':
    import sys, timeit
    for test in sys.argv[1:]:
        print 'Running test %s ...' % test
        t = timeit.timeit('%s()' % test,
                          'from __main__ import %s' % test,
                          number=TIMEIT_N)
        print '   %.2f ms' % (t / TIMEIT_N * 1e3)
"""

Aside: For some reason cStringIO and array got slower in Python 2.7.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Aug 29 2011)
>>> Python/Zope Consulting and Support ...        http://www.egenix.com/
>>> mxODBC.Zope.Database.Adapter ...             http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...        http://python.egenix.com/
________________________________________________________________________
2011-10-04: PyCon DE 2011, Leipzig, Germany                36 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! ::::


   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/



More information about the Python-ideas mailing list