New subject: Explicitly defining a string buffer object (aka StringIO += operator)

April 1, 2020

      On Tue, Mar 31, 2020 at 12:39:56AM -0700, Christopher Barker wrote:
...
But while the OP specifically suggested adding += to stringIO, you might
note that he did not suggest it as an extension to the file protocol, which
StringIO mimics. Rather, he suggested that there should be something that
can be used as an efficient buffer to build up strings, that had the same
API as str does. He figures that it would be pretty easy to get that by
adding += to StringIO.
But wouldn’t a mutable string type satisfy that as well?
Indeed it would, and if I ask for a nutcracker to crack open peanuts a 
nuclear powered 200 tonne bulldozer would satisfy the requirements too.

What Paul asked for:

https://media.qcsupply.com/media/catalog/product/cache/122b61bfb663175d7f1bb...

What we're talking about instead:

http://media.firebox.com/pic/p1861_column_grid_12.jpg

If you want a mutable string type, you have to support a rather 
extensive string API that includes at least ten operators:

    + * % == != in < <= >= >

plus slicing and about 45 methods. (There may be some things I have 
missed.) Paul asked for *one* operator, `+=`.
...
And why WOULD folks that want a buffer to build strings in use string
concatenation? Maybe because in the end they want a string? But even if
StringIO grew +=, then they get a StringIO object at the end, and would
need to call .getvalue() to get a str.
Yes, rather like the way you have to call `''.join(buf)` if you use a 
list.

If your point is a criticism of the StringIO proposal, it is equally a 
criticism of the recommended list+join idiom, but we still tell people 
to use list+join, so it can't be a very important issue.

And if it's not a valid criticism of the list+join idiom, or at least a 
very minor, unimportant one, then precisely the same applies to 
StringIO.
...
So how exactly does this meet the use case of being able to drop it into
code that’s written to use strings?
This has been covered at least twice, once by Paul and once by me. When 
you are refactoring the "string concatenation" idiom to list append, 
you have to change three things:

1. the buffer initialisation: `buf = ''` --> `buf = []`;
2. add a conversion to the end: `result = ''.join(buf)`;
3. and potentially dozens of instances of `buf += s` to `buf.append(s)`.

The third may be scattered around multiple functions. It is not always 
an easy refactoring. We should not assume that every append to the 
buffer will be in a single place or even a single function.

Under Paul's suggestion, you change the buffer initialisation, add the 
conversion, and nothing else needs to be touched.

You also get a performance boost (by my testing, using StringIO is about 
halfway in speed between string concat and list+join), and potentially a 
memory saving relative to the list version.

(Although Paul's calculations on that have been disputed, and I don't 
think we have a definite answer on that one just yet.)
...
BTW, what about folks that concatenaste strings with plain + ? You couldn’t
drop StringIO in there reasonably either.
Do you mean people who write `buf = buf + substring`?

Maybe that's an argument to support `+` as well. Perhaps it would be 
weird and surprising to support augmented assignment for an operator 
without supporting the operator? But I don't have a strong feeling 
either way.

Or maybe that's just an argument that no solution is going to solve 
*every* problem. What do we do about people who write this:

    buf = f'{buf}{substring}'

inside a loop? We can't fix everyone's code with one change.
...
If we want to provide a way for folks to not do naive string concatenation—
adding += to StringIO is not any better than telling them to use a list and
.join() — it’s still a whole other way of doing it.
It's not really. The point is to minimise how many things need to 
change. Instead of changing every `buf += s` into `buf.append(s)`, you 
just leave them alone.

In that regard, it's *the same way* of doing it, the only difference is 
that the buffer changes from an immutable string to a builder.

We should be careful to avoid exaggerating tiny differences, especially 
when that involves a double-standard.

Is it a problem for the recommended list+join idiom that it is "a whole 
other way of doing it"? If no, then its not a problem for StringIO 
either. If the answer is yes, then StringIO is no worse than what people 
are already told to do, so it's not really much of a problem.
...
And the list and join() method uses two of the most common builtins— there
is a real advantage to that.
Again, we must beware of double-standards. This list often says:

"Not everything needs to be a builtin, just put it in a module."

Also this list:

"Being a builtin is a real advantage, we can't use StringIO because it 
comes from a module that needs importing."

We can't have it both ways.

In this case, using StringIO does involve one extra import. Okay, that's 
fine, the standard library is full of things that require one extra 
import but nevertheless are a common, or even sometimes recommended, way 
to do it. itertools comes to mind especially, using iterators in Python 
is ubiquitous and yet we still require people to import a module to do 
something as fundamental as slice an iterator.

But maybe we can consider an alternative that doesn't require that 
import.
...
Anyway, I’m not proposing a mutable string type. It was really just an
example of: if this was really needed, it would have been done.
Ah, so by that argument, *every* new proposal can be immediately 
rejected. If this were really needed, it would have already been done. 
Python 3.8 is the optimal language with literally every single desirable 
feature, nothing to be added, and no unneeded cruft to be removed.

I trust that's not what you intended to say :-)

-- 
Steven

Re: Explicitly defining a string buffer object (aka StringIO += operator)

Steven D'Aprano

Chris Angelico

Steven D'Aprano

Chris Angelico

Chris Angelico

Steven D'Aprano

Chris Angelico

tags

participants (2)