Use of StringIO vs cStringIO in standard modules

Hrvoje Niksic hniksic at
Thu Jun 3 12:09:47 EDT 1999

Guido van Rossum <guido at CNRI.Reston.VA.US> writes:

> Hrvoje Niksic <hniksic at>:
> > I noticed that many standard modules use StringIO and not
> > cStringIO, although they don't need subclassing.  Is this
> > intentional?
> > 
> > For example, uses StringIO to implement encodestring()
> > and decodestring().  Since both functions write to output line by
> > line, I imagine the performance hit of StringIO vs cStringIO might
> > be non-negligible.
> Have you noticed any speed difference?

Yes, quite a bit.  Trivially replacing StringIO with cStringIO in makes encoding 2.3 times and decoding 3.6 times faster.
That's on my system (Ultra 2 under Solaris 2.6), measured repeatedly
with time.clock() and an approx. 1M sample string.  I can post the
script if there is interest.

Maybe the correct solution for would be to use something
like this at top-level:

    from cStringIO import StringIO
    from StringIO import StringIO

> cPickle, because calling it from C is much faster than calling
> StringIO from C; however I believe that for calls from Python,
> StringIO isn't that much slower.

I've looked at the code, and to me it seems that the slowness comes
from creating new strings on each write, where cStringIO just resizes
its internal buffer and creates the string only at the end.

> > Furthermore, is there a particular reason for maintaining two
> > parallel StringIO implementations?  If subclassing is the reason,
> > I assume it would be trivial to rewrite StringIO to encapsulate
> > cStringIO the same way that UserDict encapsulates dictionary
> > objects.
> That's not the reason; it's got more to do with not requiring a C
> extension where plain Python code will do.  Also to have a reference
> implementation.

But then you have to maintain both, *and* you get much slower code.
Is it worth it?

More information about the Python-list mailing list