
On 04Feb2022 2303, Guido van Rossum wrote:
You *can* allocate unicode objects statically. We do it in deepfreeze, and Eric's PR under discussion here (https://github.com/python/cpython/pull/30928 <https://github.com/python/cpython/pull/30928>) does it. I wonder if a better solution than that PR wouldn't be to somehow change the implementation of _Py_IDENTIFIER() to do that, and make the special 'Id' APIs just aliases for the corresponding unicode-object-based APIs? It wouldn't be ABI compatible, but none of those APIs are in the stable ABI.
Indeed. I'm sure you can appreciate how I skimmed over that bit :) (plus I've mostly kept up with this by chatting with Eric, rather than reviewing all the code) The guts of the code in question for those who don't want to find it: #define STRUCT_FOR_ASCII_STR(LITERAL) \ struct { \ PyASCIIObject _ascii; \ uint8_t _data[sizeof(LITERAL)]; \ } Which is then used inside another struct to statically allocate all the objects. The gap this has with what I'd like to see is that it will only work for compile-time strings. If instead it could work for an arbitrary uint8_t pointer, rather than an embedded array, that allows even runtime strings to be very cheaply passed into the interpreter (and yes, the caller has to manage the lifetime, but that's pretty easy). So this is perfect for the needs we have for our internal API, but I wouldn't want it to become _the_ public API. (I also wouldn't want to have two public APIs that are subtly different, because in a few years time everyone will forget why and be very confused about it all.) Not sure I'm motivated enough to build it myself, but I'm getting there ;) so perhaps I'll put something together that does what I'd like and give us concrete things to discuss.
(Is there a requirement that an Id only contain ASCII characters (i.e., 7-bit)?)
I don't think it's unreasonable to require that for these internal identifier strings, but it would be a shame to do so for arbitrary strings. Cheers, Steve