On Sun, Dec 27, 2020 at 3:19 AM Ronald Oussoren
On 26 Dec 2020, at 18:43, Guido van Rossum
wrote: On Sat, Dec 26, 2020 at 3:54 AM Phil Thompson via Python-Dev < python-dev@python.org> wrote:
It's worth comparing the situation with byte arrays. There is no problem of translating different representations of an element, but there is still the issue of who owns the memory. The Python buffer protocol usually solves this problem, so something similar for unicode "arrays" might suffice.
Exactly my thought on the matter. I have no doubt that between all of us we could design a decent protocol.
The practical problem would be to convince enough people that this is worth doing to actually get the code changed (str being one of the most popular data types traveling across C API boundaries), in the CPython core (which surely has a lot of places to modify) as well as in the vast collection of affected 3rd party modules. Like many migrations it's an endless slog for the developers involved, and in open source it's hard to assign resources for such a project.
That’s a problem indeed. An 80% solution could be reached by teaching PyArg_Parse* about the new protocol, it already uses the buffer protocol for bytes-like objects and could be thought about a variant of the protocol for strings. That would require that the implementation of that new variant returns a pointer in the Py_view that can used after the view is released, but that’s already a restriction for the use of new style buffers in the PyArg_Parse* APIs.
That wouldn’t be a solution for code using the PyUnicode_* APIs of course, nor Python code explicitly checking for the str type.
In the end a new string “kind” (next to the 1, 2 and 4 byte variants) where callbacks are used to provide data might be the most pragmatic. That will still break code peaking directly in the the PyUnicodeObject struct, but anyone doing that should know that that is not a stable API.
That's an attractive idea. I've personally never had to peek inside the implementation, and I suspect there's not that much code that does so (even in the CPython code base itself, outside the PyUnicode implementation of course). -- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...