
On 2/5/2022 4:09 PM, Guido van Rossum wrote:
On Sat, Feb 5, 2022 at 5:18 AM Steve Dower <steve.dower@python.org <mailto:steve.dower@python.org>> wrote:
The gap this has with what I'd like to see is that it will only work for compile-time strings. If instead it could work for an arbitrary uint8_t pointer, rather than an embedded array, that allows even runtime strings to be very cheaply passed into the interpreter (and yes, the caller has to manage the lifetime, but that's pretty easy).
What's the use case for that, that isn't covered by PyUnicode_FromString()?
No dynamic memory allocation (and hence, no deallocation) for any case where the original string is already managed, which is really quite common. Access to the memory manager is what requires thread affinity/the GIL for primitive object construction, and string copies - especially with transcoding - are the expensive part. For cases where strings are just passed around but never manipulated (e.g. a lot of filesystem operations, or runtime/interpreter configuration), the string may never have to be decoded at all. It's almost as good as a tagged pointer, but without breaking our existing object model (assuming all the PyUnicode_* functions learn how to handle them, which is necessary). But it's purely a transparent optimisation for most users, with an added opportunity for those using native APIs and probably decent complexity for us as maintainers. There are a lot of edge cases to handle that I'm sure people will use to shoot down the idea, which is why rather than debate details here I'd rather build it and define its boundaries of usefulness, though for now there's plenty of stuff I'd rather do than both of these, so it remains an idea for now :) Cheers, Steve