[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

5 Feb 2022

      On 04Feb2022 2303, Guido van Rossum wrote:
...
You *can* allocate unicode objects statically. We do it in deepfreeze, 
and Eric's PR under discussion here 
(https://github.com/python/cpython/pull/30928 
<https://github.com/python/cpython/pull/30928>) does it. I wonder if a 
better solution than that PR wouldn't be to somehow change the 
implementation of _Py_IDENTIFIER() to do that, and make the special 'Id' 
APIs just aliases for the corresponding unicode-object-based APIs? It 
wouldn't be ABI compatible, but none of those APIs are in the stable ABI.
Indeed. I'm sure you can appreciate how I skimmed over that bit :) (plus 
I've mostly kept up with this by chatting with Eric, rather than 
reviewing all the code)

The guts of the code in question for those who don't want to find it:
#define STRUCT_FOR_ASCII_STR(LITERAL) \
     struct { \
         PyASCIIObject _ascii; \
         uint8_t _data[sizeof(LITERAL)]; \
     }

Which is then used inside another struct to statically allocate all the 
objects.

The gap this has with what I'd like to see is that it will only work for 
compile-time strings. If instead it could work for an arbitrary uint8_t 
pointer, rather than an embedded array, that allows even runtime strings 
to be very cheaply passed into the interpreter (and yes, the caller has 
to manage the lifetime, but that's pretty easy).

So this is perfect for the needs we have for our internal API, but I 
wouldn't want it to become _the_ public API. (I also wouldn't want to 
have two public APIs that are subtly different, because in a few years 
time everyone will forget why and be very confused about it all.) Not 
sure I'm motivated enough to build it myself, but I'm getting there ;) 
so perhaps I'll put something together that does what I'd like and give 
us concrete things to discuss.
...
(Is there a requirement that an Id only contain ASCII characters (i.e., 
7-bit)?)
I don't think it's unreasonable to require that for these internal 
identifier strings, but it would be a shame to do so for arbitrary strings.

Cheers,
Steve

[Python-Dev] Re: Moving away from _Py_IDENTIFIER().

Steve Dower