StringIO proposal: add __iadd__

Alex Martelli aleax at mail.comcast.net
Sun Jan 29 19:56:22 EST 2006


Paul Rubin <http://phr.cx@NOSPAM.invalid> wrote:
   ...
> > Absolutely wrong: ''.join takes less for a million items than StringIO
> > takes for 100,000.  
> 
> That depends on how much ram you have.  You could try a billion items.

Let's see you try it -- I have better things to do than to trash around
checking assertions which I believe are false and that you're too lazy
to check yourself.

> > After all, how do you think StringIO is implemented internally?  A list
> > of strings and a ''.join at the end are the best way that comes to mind,
> 
> I'd have used the array module.

...and would that support plain byte strings and Unicode smoothly and
polymorphically?  You may recall a few posts ago expressing wonder at
what optimizations cStringIO might have that stop it from doing just
this...

> > As for sum, you'll recall I was its original proponent, and my first
> > implementation did specialcase strings (delegating right to ''.join).
> 
> You could imagine a realy dumb implementation of ''.join that used
> a quadratic algorithm, and in fact
> 
>   http://docs.python.org/lib/string-methods.html
> 
> doesn't guarantee that join is linear.  Therefore, the whole ''.join
> idiom revolves around the progrmamer knowing some undocumented
> behavior of the implementation (i.e. that ''.join is optimized).  This

No more than StringIO.write "revolves around" the programmer knowing
exactly the same thing about the optimizations in StringIO: semantics
are guaranteed, performance characteristics are not.

> reliance on undocumented behavior seems totally bogus to me, but if

So I assume you won't be using StringIO.write any more, nor ANY other
way to join sequences of strings?  Because the performance of ALL of
them depend on such "undocumented behavior".

Personally, I don't consider depending on "undocumented behavior" *for
speed* to be bogus at all, particularly when there are no approaches
whose performance characteristics ARE documented and guaranteed.
Besides C++'s standard library, very few languages like to pin
themselves down by ensuring any performance guarantee;-).


> How making [].join(bunch_of_lists) analogous to ''.join, with a
> documented guarantee that both are linear?

I personally have no objection to adding a join method to lists or other
sequences, but of course the semantics should be similar to:

  def join(self, *others):
    result = list()
    for other in others[:-1]:
        result.extend(other)
        result.extend(self)
    result.extend(others[-1])
    return self.__class__(result)

As for performance guarantees, I don't think we have them now even for
list.append, list.extend, dict.__getitem__, and other similarly
fundamental methods.  I assume any such guarantees would have to
weaselword regarding costs of memory allocation (including, possibly,
garbage collection), since such allocation may of course be needed and
its performance can easily be out of Python's control; and similarly,
costs of iterating on the items of 'others', cost of indexing it, and so
on (e.g.: for list.sort, cost of comparisons; for dict.__getitem__, cost
of hash on the key; and so on, and so forth).

I don't think it's worth my time doing weaselwording for this purpose,
but if any sealawyers otherwise idle want to volunteer (starting with
the existing methods of existing built-in types, I assume, rather than
by adding others), the offer might be welcome on python-dev (I assume
that large effort will have to be devoted to examining the actual
performance characteristics of at least the reference implementation, in
order to prove that the purported guarantees are indeed met).


Alex



More information about the Python-list mailing list