StringIO proposal: add __iadd__

Paul Rubin http
Sun Jan 29 18:57:57 EST 2006


aleax at mail.comcast.net (Alex Martelli) writes:
> > Is there some sound reason cStringIO acts differently from
> > StringIO?  I'd expect them to both do the same thing.
> 
> I believe that cStringIO tries to optimize, while StringIO doesn't and
> is thereby more general.

I'm not sure what optimizations make sense.  I'd thought the most
important difference was the ability to subclass StringIO, before
new-style classes arrived.  It's really ugly that .getvalue does
different things for StringIO and cStringIO, something that I didn't
realize and which amazes me.  I'd go as far as to say maybe .getvalue
should be deprecated in both modules, and replaced by .getstring
(returns regular or unicode string depending on contents) and
.getbytes (always returns a byte string).

> > We're already discussing adding another method, namely __iadd__.
> > Maybe that's the place to put it.
> 
> Still need another method to 'getvalue' which can return a Unicode
> string (currently, cStringIO.getvalue returns plain strings only,
> and it might break something if that guarantee was removed).

Yeah, replacing getvalue with explicit methods is preferable.  "Explicit
is better than implicit."

> That being said, if the only way to use a StringIO was to call += or
> __iadd__ on it, I would switch my recommendation away from it and
> towards "just join the sequence of strings".

Fixing getvalue takes care of it.  The ''join idiom is IMO a total
monstrosity and should die, die, die, die, die.

> it's just more readable to me to express it
>    final_string = ''.join(go_figure_out_some_string()
>                                   for x in various_pieces_of_output())

OK for that example, maybe not for a more complex one.  Anyway I like
sum(...) even better (where sum promises to be O(n) in the number of
bytes), but clpy had THAT discussion a few days ago.

> Being able to use temp_buf.write(v) [like today, but with StringIO, not
> cStringIO] would still have me recommending it to newbies, but having to
> explain that extra += just tips the didactical balance.

I just can't for the life of me see += as harder to explain than the
''.join horror.  But yeah, the real problem is the incompatible
definitions of .getvalue between the two classes, so that should be
fixed, and .write would do the right thing.

> but using += instead of writelines [[actually, how WOULD you express the
> writelines equivalent???]] or abrogating plain-Python StringIO would not
> speed up the cStringIO use (which is already just as fast as the ''.join
> use).

''.join with a list (rather than a generator) arg may be plain worse
than python StringIO.  Imagine building up a megabyte string one
character at a time, which means making a million-element list and a
million temporary one-character strings before joining them.



More information about the Python-list mailing list