[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

7 Feb 2022

      On 2/5/2022 4:09 PM, Guido van Rossum wrote:
...
On Sat, Feb 5, 2022 at 5:18 AM Steve Dower <steve.dower@python.org 
<mailto:steve.dower@python.org>> wrote:
The gap this has with what I'd like to see is that it will only work
    for
    compile-time strings. If instead it could work for an arbitrary uint8_t
    pointer, rather than an embedded array, that allows even runtime
    strings
    to be very cheaply passed into the interpreter (and yes, the caller has
    to manage the lifetime, but that's pretty easy).
What's the use case for that, that isn't covered by PyUnicode_FromString()?
No dynamic memory allocation (and hence, no deallocation) for any case 
where the original string is already managed, which is really quite 
common. Access to the memory manager is what requires thread 
affinity/the GIL for primitive object construction, and string copies - 
especially with transcoding - are the expensive part.

For cases where strings are just passed around but never manipulated 
(e.g. a lot of filesystem operations, or runtime/interpreter 
configuration), the string may never have to be decoded at all. It's 
almost as good as a tagged pointer, but without breaking our existing 
object model (assuming all the PyUnicode_* functions learn how to handle 
them, which is necessary).

But it's purely a transparent optimisation for most users, with an added 
opportunity for those using native APIs and probably decent complexity 
for us as maintainers. There are a lot of edge cases to handle that I'm 
sure people will use to shoot down the idea, which is why rather than 
debate details here I'd rather build it and define its boundaries of 
usefulness, though for now there's plenty of stuff I'd rather do than 
both of these, so it remains an idea for now :)

Cheers,
Steve