On Mar 30, 2020, at 12:00, Paul Sokolovsky <pmiscml@gmail.com> wrote:

Roughly speaking, to support efficient appending, one need to
be ready to over-allocate string storage, and maintain bookkeeping for
this. Another known optimization CPython does is for stuff like "s =
s[off:]", which requires maintaining another "offset" pointer. Even
with this simplistic consideration, internal structure of "str" would
be about the same as "io.StringIO" (which also needs to over-allocate
and maintain "current offset" pointer). But why, if there's io.StringIO
in the first place?

Because io.StringIO does _not_ need to do that. It’s documented to act like a TextIOWrapper around a BytesIO. And the pure-Python implementation (as used by some non-CPython implementations of Python) is actually implemented that way: https://github.com/python/cpython/blob/3.8/Lib/_pyio.py#L2637. Every read and write to a StringIO passes through the incremental newline processor and the incremental UTF-8 coded to get passed on to a BytesIO. That’s not remotely optimal. And it doesn’t allow you to do random-access seeks to arbitrary character positions.

It’s true that the C accelerator for io.StringIO used by CPython uses a dynamic overallocated array of UCS4 instead, but you can’t rely on that portably any more than you can rely on CPython’s str.__iadd__
optimization portably. Plus, it’s optimized for typical file-like usage, not for typical string-like usage, so the resize rules aren’t the same; there’s no attempt to optimize storage for all-Latin or all-BMP text; and so on. Plus, it still has to deal with file-ish things like universal newline support which you not only don’t need, but explicitly want to not be there.

(*) Instead, there're various of practical hacks to implement it, as
both 2006's and this thread shows.

No, there is one idiomatic way to do it: create a list of strings and join them. That’s not a “hack” any more than using a string builder class or a string stream/file class is a “hack”. The fact that the standard Python idiom, the standard Java idiom, and the standard C++ idiom for building strings are all different is not a defect in any of those three languages; they’re all perfectly reasonable. And changing Python to have two standard idioms instead of one (with the new one less efficient and more complicated) would not be an improvement.