Use of StringIO vs cStringIO in standard modules
hniksic at srce.hr
Thu Jun 3 12:09:47 EDT 1999
Guido van Rossum <guido at CNRI.Reston.VA.US> writes:
> Hrvoje Niksic <hniksic at srce.hr>:
> > I noticed that many standard modules use StringIO and not
> > cStringIO, although they don't need subclassing. Is this
> > intentional?
> > For example, base64.py uses StringIO to implement encodestring()
> > and decodestring(). Since both functions write to output line by
> > line, I imagine the performance hit of StringIO vs cStringIO might
> > be non-negligible.
> Have you noticed any speed difference?
Yes, quite a bit. Trivially replacing StringIO with cStringIO in
base64.py makes encoding 2.3 times and decoding 3.6 times faster.
That's on my system (Ultra 2 under Solaris 2.6), measured repeatedly
with time.clock() and an approx. 1M sample string. I can post the
script if there is interest.
Maybe the correct solution for base64.py would be to use something
like this at top-level:
from cStringIO import StringIO
from StringIO import StringIO
> cPickle, because calling it from C is much faster than calling
> StringIO from C; however I believe that for calls from Python,
> StringIO isn't that much slower.
I've looked at the code, and to me it seems that the slowness comes
from creating new strings on each write, where cStringIO just resizes
its internal buffer and creates the string only at the end.
> > Furthermore, is there a particular reason for maintaining two
> > parallel StringIO implementations? If subclassing is the reason,
> > I assume it would be trivial to rewrite StringIO to encapsulate
> > cStringIO the same way that UserDict encapsulates dictionary
> > objects.
> That's not the reason; it's got more to do with not requiring a C
> extension where plain Python code will do. Also to have a reference
But then you have to maintain both, *and* you get much slower code.
Is it worth it?
More information about the Python-list