[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

April 2, 2020

      On Wed, Apr 01, 2020 at 09:25:46PM -0400, Kyle Stanley wrote:
...
While I agree that it's sometimes okay to go outside the strict bounds of
"only one way to do it"
The Zen of Python was invented as a joke, not holy writ, and as a series 
of koans intended to guide thought, not shut it down. Unfortunately, and 
with the greatest respect to Tim Peters, in practice that's not how it 
is used, *particularly* the "One Way" kaon, which is almost invariably 
used as a thought-terminating cliche.

1. The Zen doesn't mandate *only one way*, that is a total cannard about 
Python invented by the Perl community as a criticism.

2. Even if it did say "only one way", even a moment's glance at the 
language would show that it is not true. 

And moreover it *cannot* be true in any programming language. Given any 
task but the must basic, there will always be multiple possible 
implementations or algorithms, usually an *infinite* number of ways to 
do most things. (Not all of which will be efficient or sensible.)

3. Of all the koans in the Zen, the "One Way" koan is probably intended 
the most to be an ironic joke, not taken too seriously. Instead the 
Python community treats it as the most serious of all.

In Tim Peter's own words:

    In writing a line about "only one way to do it", I used a device (em 
    dash) for which at least two ways to do it (with spaces, without 
    spaces) are commonly used, neither of which is obvious -- and 
    deliberately picked a third way just to rub it in.

https://bugs.python.org/issue3364

Let's look at what the koan actually says:

There should be one-- and preferably only one --obvious way to do it.

Adding emphasis:

"There SHOULD BE ONE OBVIOUS WAY to do it."

with only a *preference* for one way, not a hard rule. And given that 
Tim wrote it as a joke, having the koan intentionally go against its own 
advice, I think we should treat that preference as pretty low.

So... what is "it", and what counts as "obvious"? This is where the koan 
is supposed to open our minds to new ideas, not shoot them down.

In this case, "it" can be:

1. I want to build a string as efficiently as possible.

2. I want to build a string in as easy and obvious a way as possible.

(There may be other "its", but they are the two that stand out.)

For option 1, there is one recommended way (which may or may not be the 
most efficient way -- that's a quality of implementation detail): use 
list plus join. But it's not "obvious" until you have been immersed in 
Python culture for a long time.

For option 1, Paul's proposal changes nothing. If list+join is the 
fastest and most efficient method (I shall grant this for the sake of 
the argument) then nothing need change. Keep doing what you are doing.

The koan isn't satisfied in this case, there is One Way but it isn't 
Obvious. But Paul's proposal is not about fixing that.

-----

For option 2, "it" cares more about readable, self-documenting code 
which is clear and ovious to more than just Pythonistas who have been 
immersed in the language for years. The beauty of Python is that it 
ought to be readable by everyone, including scientists and hobbists who 
use the language from time to time, students, sys admins, and coders 
from other languages.

Ask a beginner, or someone who has immigrated from another language, 
what the obvious way to build a string is, and very few of them will say 
"build a list, then call a string method to join the list".

Some of them might guess that they need to build a list, then call a 
*list* method to build a string: `list.join('')`. Why Python doesn't do 
that is even a FAQ.

Beginners will probably say "add the strings together". People coming 
from other OOP languages will probably say "Use a String Builder", and 
possibly even stumble across StringIO as the closest thing to a builder. 
It's a bit odd that you have to call "write", but it builds a string out 
of substrings.

(Later, in another post, I will give evidence that StringIO is already 
used as a string builder, and has been for a long time.)

A significant sector of the community know the list+join idiom, but 
dislike it so strongly that they are willing to give up some efficiency 
to avoid it.

Whatever the cause, there is a significant segment of the Python 
community who either don't know, don't care about, or actively dislike, 
the list+join idiom. For them, it is not Obvious and never will be, the 
Obvious Way is to concatenate strings into a String Builder or a bare 
string.

This segment, the people who use string concatenation and either don't 
know better, don't care to change, or actively refuse to change, is the 
focus of this proposal. For this segment, the One Obvious Way is to 
concatenate strings using `+=`, and they aren't going to change for the 
sake of other interpreters.

And that's a problem for other interpreters. Hence Paul's RFC.

[...]
...
Regarding the proposal in general though, I actually like the main idea of
having "StringBuffer/StringBuilder"-like behavior, *assuming* it provides
substantial benefits to alternative Python implementations compared to
``""join()``. As someone who regularly uses other languages with something
similar, I find the syntax to be appealing, but not strong enough on its
own to justify a stdlib version (mainly since a wrapper would be very
trivial to implement).
Surely the fact that the wrapper is "trivial" should count as a point in 
its favour, not against it?

The greater the burden of an enhancement request, the greater the 
benefit it must give to justify it. If your enhancement requires a 
complete overhall of the entire language and interpreter and will 
obsolete vast swathes of documentation, the benefit has to be very 
compelling to justify it.

But if your enhancement requires an extra dozen lines of C code, one or 
two tests, and an extra couple of lines of documentation, the benefit 
can be correspondingly smaller in order for the cost:benefit ratio to 
come up in its favour.

The cost here is tiny. This thread alone has probably exceeded by a 
factor of 100 the cost of implementing the change. The benefit to 
CPython is probably small, but to the broader Python ecosystem (Paul 
mentioned nine other interpreters, I can think of at least two actively 
maintained ones that he missed) it is rather larger.
...
But, I'm against the idea of adding this to the existing StringIO class,
largely for the reasons cited above, of it being outside of the scope of
its intended use case. There's also a significant discoverability factor to
consider. Based on the name and its use case in existing versions of
Python, I don't think a substantial number of users will even consider
using it for the purpose of building strings. As it stands, the only people
who could end up benefiting from it would be the alternative
implementations and their users, assuming they spend time *actively
searching* for a way to build strings with reduced memory usage. So I would
greatly prefer to see it as a separate class with a more informative name,
even if it ends up being effectively implemented as a subset of StringIO
with much of the same logic.
As I mentioned above, in another post to follow I will demonstrate that 
people already do know and use StringIO for concatenation.

Nevertheless, you do make a good point. It may be that StringIO is not 
the right place for this. That can be debated without dismissing the 
entire idea.

-- 
Steven

[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

Steven D'Aprano