[Python-ideas] Re: Explicitly defining a string buffer object (aka StringIO += operator)

April 2, 2020

      Hello,

On Wed, 1 Apr 2020 21:25:46 -0400
Kyle Stanley <aeros167@gmail.com> wrote:
...
Paul Sokolovsky wrote:
...
Roughly speaking, the answer would be about the same in idea as
answers to the following questions:
[snip]
I would say the difference between this proposal so far and the ones
listed are that they emphasized concrete, real-world examples from
Well, but those are "done" changes which were backed by official PEPs
(except for unary+ which hopefully was there forever). While I kinda
tried to flex my arms in what it would make to write a PEP-like text,
it certainly nowhere there after me spending a couple of hours on it,
and collecting more evidence would take more time.
...
existing code either in the stdlib or "out in the wild", showing
I hardly would target CPython stdlib at this stage, given the feedback
that "".join() is the fastest, and CPython implementation clearly
optimizes for "speed where it can be gotten with whatever we have on
our hands (which isn't much due to lack of JIT), even if those are
tricks".

I might be able to show "out in the wild" code (which happens to be
stdlib for another Python implementation), it just needs to be properly
refactored from .write() approach I partially succumbed to earlier. 

[]
...
...
Please let people learn
computer science inside Python, not learn bag of tricks to then
escape in awe and make up haikus along the lines of:
A language, originally for kids,
Now for grown-up noobs.
Considering the current widespread usage of Python in the software
development industry and others, characterizing it as a language for
"grown-up noobs" seems rather disingenuous (even if partially in
I do hope that you and other readers do trust me that I picked up that
"haiku" somewhere and not made it up here on the spot. I otherwise do
spend a lot of time studying criticism of Python, and keep an eye on
other languages too. Because I do see a clear pattern of people
abandoning advanced Python projects (compilers, JITs, etc.), and moving
to other languages. And I always have that back feeling that maybe I'm
wasting my time either and should just jump into those goes, julias,
rusts, haskells, etc. But so far I keep seeing Python as the best - not
the best language, but the best-compromise language.

[]
...
Also, while I can see that blindly relying on "str += part" can be
sidestepping the underlying computer science to some degree, I find
that appending the parts to a list and joining the elements is very
conceptually similar to using a string buffer/builder; even if the
syntax differs significantly from how other languages do it.
Don't get me wrong - I love the l.append/"".join(l) pattern. To me, it
looks like a twisted mirror of LISP's CONS function. But that was a
language where CONS was the only way to be a container! And Python even
lacks linked list/cons in the first place. Bottom line: I see myself
using l.append/"".join(l) about as frequent as I use cons (which is
rare). 

[]
...
But, I'm against the idea of adding this to the existing StringIO
class,
That's quite expectable feedback, I foresaw it and mentioned in
"Further Ideas (aka Scope Creep)" section of the original RFC. For a
compiler language, that would be a natural choice (you don't use it -
you don't get it in your binary), but interpreted language have that
surprising for some implication that adding more stuff burdens
everyone. To where I come from (implementing a language - small subset
of Python), adding more and more stuff is definitely an anti-pattern.

So, my interest lies in finding ways in extending already available
functionality in *natural way* (subject to debate) to cover more
interesting usecase.

To not raise any worry, let me give an example of what I consider
"natural" and "unnatural" way. So, in a language which already has
OrderedDict type, I would never-ever "extended" a dict type,
corresponding to a Computer Science type of an unordered hashtable, to
be ordered either (as already handled by OrderedDict).

[]
...
Also, on the point of memory usage: I'd very much like to see some
real side-by-side comparisons of the ``''.join(parts)`` memory usage
across Python implementations compared to ``StringIO.write()``. I
some earlier in the thread, but the results were inaccurate since
they relied entirely on ``sys.getsizeof()``, as mentioned earlier.
IMO, having accurate memory benchmarks is critical to this proposal.
As Chris Angelico mentioned, this can be observed through monitoring
the before and after RSS (or equivalent on platforms without it). On
I would still find that too crude an approach. If it would come to
that, I would prefer to actually study internal implementation(s) in
detail, and patch up sys.getsizeof() to provide actual information. As
you may imagine, that's time consuming, and would be "too early" (if it
all), given that the discussion oscillates between vertexes of a
triangle of:

1. "Not needed" ("".join() to rule them all).
2. "+= isn't suitable for StringIO".
3. "We can do much more" (mutable string/+= for all streams/separate
class).

[]

-- 
Best regards,
 Paul                          mailto:pmiscml@gmail.com