StringIO proposal: add __iadd__
aleax at mail.comcast.net
Sun Jan 29 23:59:12 CET 2006
Paul Rubin <http://phr.cx@NOSPAM.invalid> wrote:
> aleax at mail.comcast.net (Alex Martelli) writes:
> > But why can't I have perfectly polymorphic "append a bunch of strings
> > together", just like I can now (with ''.join of a list of strings, or
> > StringIO), without caring whether the strings are Unicode or
> > bytestrings?
> I see that 'a' + u'b' = u'ab', which makes sense. I don't use Unicode
> much so haven't paid much attention to such things. Is there some
> sound reason cStringIO acts differently from StringIO? I'd expect
> them to both do the same thing.
I believe that cStringIO tries to optimize, while StringIO doesn't and
is thereby more general.
> > As for extending cStringIO.write I guess that's
> > possible, but not without breaking compatibility ... you'd
> > need instead to add another couple of methods, or wait for Py3k.
> We're already discussing adding another method, namely __iadd__.
> Maybe that's the place to put it.
Still need another method to 'getvalue' which can return a Unicode
string (currently, cStringIO.getvalue returns plain strings only, and it
might break something if that guarantee was removed).
That being said, if the only way to use a StringIO was to call += or
__iadd__ on it, I would switch my recommendation away from it and
towards "just join the sequence of strings". Taking your example:
temp_buf = StringIO()
for x in various_pieces_of_output():
v = go_figure_out_some_string()
temp_buf += v
final_string = temp_buf.getvalue()
it's just more readable to me to express it
final_string = ''.join(go_figure_out_some_string()
for x in various_pieces_of_output())
Being able to use temp_buf.write(v) [like today, but with StringIO, not
cStringIO] would still have me recommending it to newbies, but having to
explain that extra += just tips the didactical balance. It's already
hard enough to jump ahead to a standard library module in the middle of
an explanation of strings, just to explain how to concatenate a bunch...
Yes, I do understand your performance issues:
Nimue:~/pynut alex$ python2.4 -mtimeit -s'from StringIO import StringIO'
's=StringIO(); s.writelines(str(i) for i in range(33)); x=s.getvalue()'
1000 loops, best of 3: 337 usec per loop
Nimue:~/pynut alex$ python2.4 -mtimeit -s'from cStringIO import
StringIO' 's=StringIO(); s.writelines(str(i) for i in range(33));
10000 loops, best of 3: 98.1 usec per loop
Nimue:~/pynut alex$ python2.4 -mtimeit 's=list(); s.extend(str(i) for i
in range(33)); x="".join(s)'
10000 loops, best of 3: 99 usec per loop
but using += instead of writelines [[actually, how WOULD you express the
writelines equivalent???]] or abrogating plain-Python StringIO would not
speed up the cStringIO use (which is already just as fast as the ''.join
More information about the Python-list