[pypy-dev] [Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

Steven D'Aprano steve at pearwood.info
Mon Mar 30 23:10:32 EDT 2020

On Mon, Mar 30, 2020 at 12:37:48PM -0700, Andrew Barnert via Python-ideas wrote:
> On Mar 30, 2020, at 12:00, Paul Sokolovsky <pmiscml at gmail.com> wrote:
> > Roughly speaking, to support efficient appending, one need to
> > be ready to over-allocate string storage, and maintain bookkeeping for
> > this. Another known optimization CPython does is for stuff like "s =
> > s[off:]", which requires maintaining another "offset" pointer. Even
> > with this simplistic consideration, internal structure of "str" would
> > be about the same as "io.StringIO" (which also needs to over-allocate
> > and maintain "current offset" pointer). But why, if there's io.StringIO
> > in the first place?
> Because io.StringIO does _not_ need to do that.

The same comment can be made that str does not need to implement the 
in-place concat optimization either. And yet it does, in CPython if not 
any other interpreter.

It seems to me that Paul makes a good case that, unlike the string 
concat optimization, just about every interpreter could add this to 
StringIO without difficulty or great cost. Perhaps they could even get 
together and agree to all do so.

But unless CPython does so too, it won't do them much good, because 
hardly anyone will take advantage of it. When one platform dominates 90% 
of the ecosystem, one can sensibly write code that depends on that 
platform's specific optimizations, but going the other way, not so much.

The question that comes to my mind is not whether StringIO *needs* to do 
this, but whether there is any significant cost to doing this?

Of course there is *some* cost: somebody has to do the work, and it 
won't be me. But once done, is there any significant maintenance cost 
beyond what there would be without it? Is there any downside?

> And it doesn’t allow you to do random-access seeks to arbitrary 
> character positions.

Sorry, I don't see why random access to arbitrary positions is relevant 
to a discussion about concatenation. What am I missing?


More information about the pypy-dev mailing list