[Python-Dev] The future of the wchar_t cache
Steve Dower
steve.dower at python.org
Mon Oct 22 09:48:06 EDT 2018
On 22Oct2018 0928, Victor Stinner wrote:
>> Also, I'm
>> proposing keeping the 'kind' as UCS-2 when the string is created from
>> UCS-2 data that is likely to be used as UCS-2.
>
> Oh. That's a major change in the PEP 393 design. You would have to
> modify many functions in CPython. Currently, the PEP 393 requires that
> a string always use the most efficient storage, and many optimizations
> and code paths rely on that assumptions.
I don't know that it requires that many modifications - those functions
already have to handle UCS-2 content anyway (e.g. if I get a path from
scandir() that includes a non-ASCII character), and they're only using
the assumption of most efficient storage to determine the resulting
storage size of a string operation (which I'm proposing should also be
UCS-2 when the source strings are UCS-2, since that's the best indicator
we have that it'll be used as UCS-2 later, as well as being the current
implementation :) ).
> I'm against this change.
>
> Moreover, it's hard to guess how a string will be used later...
Agreed. There are some heuristics we can use, but it's definitely only a
guess. That's the nature of this problem - guessing that it *won't* be
used as UCS-2 later on is also a guess.
Cheers,
Steve
More information about the Python-Dev
mailing list