
Sorry, this has been sitting in my drafts for a while, and maybe this conversation is over. But since I wrote it, I figured I might as well post it. On Fri, Apr 3, 2020 at 4:24 AM Paul Sokolovsky <pmiscml@gmail.com> wrote:
the idea of adding += to the File protocol -- for all file-like objects. I like the compactness of:
with open(filename, 'w') as outfile: a_loop_of_somesort: outfile += something
That's again similar to the feedback received in the 2006 thread: https://mail.python.org/pipermail/python-list/2006-January/396021.html , and just as the original author there, my response is "that's not what I'm talking about".
I know. this is python-ideas -- we're not ONLY talking about your proposal :-)
So, why do I think it's good idea to add "+=" to BytesIO/StringIO, but not for other streams (like file objects), are BytesIO/StringIO are somehow special?
My answer is yes, they are special.
<snip>
So, stream and buffer protocols are very important, powerful, and deep notion in Python.
streams, yes (though even though the docs and CS-educated folks use the word Stream, in the broader community, we usually think more in terms of "File Like Object" -- at least those of us that have been around for a long time. As for "buffer", if you search the docs, you see that the word is used to describe the Buffer Protocol, which is a whole different concept. It also shows up in various other places, describing internal behavior (like readline, or pickle, or ...). In the context of streams, it's used to describe the difference between BinaryIO and RawIO, but again, mostly as a implementation detail for streams. All that is a long way of saying that most folks at not thinking in terms of buffers, which is why most folks aren't going to think: "I need to build up a big string from little parts -- I wonder if the io module has what I'm looking for?" -- nor search for "stream" or "buffer" to find what they want. It's my idea that BytesIO/StringIO is the closest what Python has to
this buffer/stream "cross-object", and actually, it is already does enough to be *the* cross-object.
sure -- I'll agree with that. So, just think about it - BytesIO allows to construct data using stream
API, and then get that get data as a buffer (using an extension method outside the stream API). Sure, BytesIO can also do other things - you don't have to use .getvalue().
In fact, the entire reason it exists is to be a file-like object (i.e. the stream API). But the fact that BytesIO can do different things is exactly the
motivation for proposing to add another operator, +=, that's not going to change the status quo of "does different things" that much. And the operator += being added isn't random - it's buffer's append method,
well, no. It's Sequence's extend() method
added to BytesIO to make it more of a "cross" between buffer and stream.
The thing is: Python is all about "duck typing" or "protocols" or "ABCs", whatever you want to call them. And there is not, in fact a standard "buffer" (as you are using the term here) protocol to follow: there are Sequences, and strings, and there are streams. And StringIO is already an implementation of the stream protocol (that's its whole point). So IIUC your idea here, you think it would be good to have an efficient way to building strings that follows the string protocol: actually, + and += in this context is really the sequence protocol: In [11]: lst = [1,2,3] In [12]: lst += [4,5,6] In [13]: lst Out[13]: [1, 2, 3, 4, 5, 6] In [14]: tup = (1,2,3) In [15]: tup += (4,5,6) In [16]: tup Out[16]: (1, 2, 3, 4, 5, 6) In [17]: strin = "123" In [18]: strin += "456" In [19]: strin Out[19]: '123456' Which is why I suggested that the way to get what you want would be a mutable string, rather than a single, out of place, addition to StringIO. And a StringBuilder class would be another way. Either way, I think you'd want it to behave as much like a string as it could, rather than like a stream, with one added feature. However: as it happens strings are unique in Python: I *think* they are the only built in "ABC" with only one implementation (that is, not only type with no ABC :-) ): they are not duck-typed at all (disregarding the days of Python2 with str and unicode). And as far as I know, they are not in any commonly used third party library either. This is not the case even with numbers: we have integers and floats, and other numbers in third partly libs, such as numpy (and had the __index__ dunder added to better support those). So there is a lot of code, at least in cPython, that is expecting a str, and exactly a str in various places, right down to the binary representation. And cPython implementation aside, thanks to strings being a sequence of themselves, They are often type checked in user code as well (to distinguish between a string and, e.g. a list of strings). I know in my code, checking for strings is the ONLY type checking I ever do. So that means a StringBuilder may be the way to go: focused use case and no other type could really be used in place of strings very much anyway. This all made me think *why* do I type check strings, and virtually nothing else? And it's because in most otehr places, if I want a given type, I can just try to make it from teh input: x = float(x) i = int(i) arr = np.asarray(np) but: st = str(st) doesn't work because ANY object can be made into a string. Makes me wonder if we could have an "asstr()", that acted like numpy's asarray: if it's an array already, it just passes it through. if it's not, then it tries to build an array out of it. Of course, there are currently essentially no objects that ducktype strings now, so, well, no use case. SideNote: I jsut noticed that PYthon2 actually HAD a MutableString: https://docs.python.org/2.0/lib/module-UserString.html Which was clearly (from the docs) not actually designed to be used :-) -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython