Use of StringIO vs cStringIO in standard modules
Michael P. Reilly
arcege at shore.net
Thu Jun 3 14:13:50 EDT 1999
Hrvoje Niksic <hniksic at srce.hr> wrote:
: Guido van Rossum <guido at CNRI.Reston.VA.US> writes:
:> Hrvoje Niksic <hniksic at srce.hr>:
:>
:> > I noticed that many standard modules use StringIO and not
:> > cStringIO, although they don't need subclassing. Is this
:> > intentional?
:> >
:> > For example, base64.py uses StringIO to implement encodestring()
:> > and decodestring(). Since both functions write to output line by
:> > line, I imagine the performance hit of StringIO vs cStringIO might
:> > be non-negligible.
:>
:> Have you noticed any speed difference?
: Yes, quite a bit. Trivially replacing StringIO with cStringIO in
: base64.py makes encoding 2.3 times and decoding 3.6 times faster.
: That's on my system (Ultra 2 under Solaris 2.6), measured repeatedly
: with time.clock() and an approx. 1M sample string. I can post the
: script if there is interest.
: Maybe the correct solution for base64.py would be to use something
: like this at top-level:
: try:
: from cStringIO import StringIO
: except:
: from StringIO import StringIO
:> cPickle, because calling it from C is much faster than calling
:> StringIO from C; however I believe that for calls from Python,
:> StringIO isn't that much slower.
: I've looked at the code, and to me it seems that the slowness comes
: from creating new strings on each write, where cStringIO just resizes
: its internal buffer and creates the string only at the end.
:> > Furthermore, is there a particular reason for maintaining two
:> > parallel StringIO implementations? If subclassing is the reason,
:> > I assume it would be trivial to rewrite StringIO to encapsulate
:> > cStringIO the same way that UserDict encapsulates dictionary
:> > objects.
:>
:> That's not the reason; it's got more to do with not requiring a C
:> extension where plain Python code will do. Also to have a reference
:> implementation.
: But then you have to maintain both, *and* you get much slower code.
: Is it worth it?
There are some interface differences between calls to cStringIO.StringIO
and StringIO.StringIO which I think makes StringIO more "usable" in some
cases. Specifically, if you create an instances of cStringIO.StringIO
with initializeing data, then the instance becomes unwritable:
>>> import cStringIO, StringIO
>>> f = StringIO.StringIO("Hi there")
>>> f.seek(0, 2)
>>> f.write("\n")
>>> f.getvalue()
'Hi there\012'
>>> g = cStringIO.StringIO("Hi there")
>>> g.seek(0, 2)
>>> g.write("\n")
Traceback (innermost last):
File "<stdin>", line 1, in ?
AttributeError: write
>>>
This could surely break some of the existing code. But I will mention
that:
f = StringIO.StringIO(str)
is equivalent to:
f = cStringIO.StringIO()
f.write(str)
f.seek(0)
I would want to see cStringIO mimic StringIO before the two are merged.
And yes, Guido, I have noticed some differences at times.
-Arcege
More information about the Python-list
mailing list