On Mon, 28 Dec 2020 14:27:00 +0100
Ronald Oussoren via Python-Dev
On 28 Dec 2020, at 14:00, Inada Naoki
wrote: On Mon, Dec 28, 2020 at 8:52 PM Phil Thompson
wrote: I would have thought that an object was defined by its behaviour rather than by any particular implementation detail.
As my understanding, the policy "an object was defined by its behavior..." doesn't mean "put unlimited amount of implementation behind one concrete type." The policy means APIs shouldn't limit input to one concrete type without a reason. In other words, duck typing and structural subtyping are good.
For example, we can try making io.TextIOWrapper accepts not only Unicode objects (including subclass) but any objects implementing some protocol. We already have __index__ for integers and buffer protocol for byts-like objects. That is examples of the policy.
I agree that that would be the cleanest approach, although I worry about how long it will take until 3th-party code is converted to the new protocol. That’s why I wrote earlier that adding this feature to PyUnicode_Type is the most pragmantic solution ;-)
But the "pragmatic" solution will make a performance-critical type (PyUnicode) more complicated and therefore potentially larger/slower. I think Inada's concerns are valid here.
There are two clear options for a new protocol:
1. Add something similar to __index__ of __fspath__, but for “string-like” objects
2. Add an extension to the buffer protocol
The third option is to add a distinct "string view" protocol. There are peculiarities (such as the fact that different objects may have different internal representations - some utf8, some utf16...) that make the buffer protocol suboptimal for this. Also, we probably don't want unicode-like objects to start being usable in contexts where a buffer-like object is required (such as writing to a binary file, or zlib-compressing a bunch of bytes). Regards Antoine.