[Python-Dev] The future of the wchar_t cache
Steve Dower
steve.dower at python.org
Sat Oct 20 11:58:58 EDT 2018
On 20Oct2018 0901, Stefan Behnel wrote:
> I'd be happy to get rid of it. But regarding the use under Windows, I
> wonder if there's interest in keeping it as a special Windows-only feature,
> e.g. to speed up the data exchange with the Win32 APIs. I guess it would
> have to provide a visible (performance?) advantage to justify such special
> casing over the code removal.
I think these cases would be just as well served by maintaining the
original UCS-2 representation even if the maximum character fits into
UCS-1, and only do the conversion when Python copies the string into a
new location.
I don't have numbers, but my instinct says the most impacted operations
would be retrieving collections of strings from the OS (avoiding a
scan/conversion for each one), comparisons against these collections
(in-memory handling for hash/comparison of mismatched KIND), and passing
some of these strings back to the OS (conversion back into UCS-2). This
is basically a glob/fnmatch/stat sequence, and is the main real scenario
I can think of where Python's overhead might become noticeable.
Another option that might be useful is some way to allow the UCS-1/4 <->
UCS-2 conversion to occur outside the GIL. Most of the time when we need
to convert we're about to release the GIL (or have just recovered it),
so even without the cache we could probably recover some of the
performance impact in parallelism. (That said, these are often tied up
in conditions and generated code, so it may not be as easy to do this as
retaining the original format.)
Some sort of tracing to see how often the cache is reused after being
generated would be interesting, as well as how often the cache is being
generated for a string that was originally in UCS-2 but we changed it to
UCS-1.
Cheers,
Steve
More information about the Python-Dev
mailing list