[pypy-dev] [Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)
steve at pearwood.info
Mon Mar 30 23:10:32 EDT 2020
On Mon, Mar 30, 2020 at 12:37:48PM -0700, Andrew Barnert via Python-ideas wrote:
> On Mar 30, 2020, at 12:00, Paul Sokolovsky <pmiscml at gmail.com> wrote:
> > Roughly speaking, to support efficient appending, one need to
> > be ready to over-allocate string storage, and maintain bookkeeping for
> > this. Another known optimization CPython does is for stuff like "s =
> > s[off:]", which requires maintaining another "offset" pointer. Even
> > with this simplistic consideration, internal structure of "str" would
> > be about the same as "io.StringIO" (which also needs to over-allocate
> > and maintain "current offset" pointer). But why, if there's io.StringIO
> > in the first place?
> Because io.StringIO does _not_ need to do that.
The same comment can be made that str does not need to implement the
in-place concat optimization either. And yet it does, in CPython if not
any other interpreter.
It seems to me that Paul makes a good case that, unlike the string
concat optimization, just about every interpreter could add this to
StringIO without difficulty or great cost. Perhaps they could even get
together and agree to all do so.
But unless CPython does so too, it won't do them much good, because
hardly anyone will take advantage of it. When one platform dominates 90%
of the ecosystem, one can sensibly write code that depends on that
platform's specific optimizations, but going the other way, not so much.
The question that comes to my mind is not whether StringIO *needs* to do
this, but whether there is any significant cost to doing this?
Of course there is *some* cost: somebody has to do the work, and it
won't be me. But once done, is there any significant maintenance cost
beyond what there would be without it? Is there any downside?
> And it doesn’t allow you to do random-access seeks to arbitrary
> character positions.
Sorry, I don't see why random access to arbitrary positions is relevant
to a discussion about concatenation. What am I missing?
More information about the pypy-dev