[Python-Dev] The future of the wchar_t cache

Mon Oct 22 09:24:22 EDT 2018

On 22Oct2018 0913, Victor Stinner wrote:
> Le lun. 22 oct. 2018 à 15:08, Steve Dower <steve.dower at python.org> a écrit :
>> Agreed the cache is useless here, but since the listdir() result came in
>> as wchar_t we could keep it that way (assuming we'd only be changing it
>> to char), and then there wouldn't have to be a conversion when we
>> immediately pass it back to open().
> 
> Serhiy wants to remove the cache which should *reduce* Python memory
> footprint on Windows.
> 
> You are proposing to fill the cache eagierly, that would increase the
> Python memory footprint :-/ Your proposed change is an optimisation, a
> benchmark is needed to see the benefit. I expect no significant
> difference on benchmarks of https://pyperformance.readthedocs.io/ ...

Yes, that's true. But "should reduce ... footprint" is also an 
optimisation that deserves a benchmark by that standard. Also, I'm 
proposing keeping the 'kind' as UCS-2 when the string is created from 
UCS-2 data that is likely to be used as UCS-2. We would not create the 
UCS-1 version in this case, so it's not the same as prefilling the 
cache, but it would cost a bit of memory in exchange for CPU. If slicing 
and concatentation between matching kinds also preserved the kind, a lot 
of path handling code could avoid back-and-forth conversions.

The import benchmarks ought to be improved on Windows by this new 
optimisation, as this is a prime case where we regularly convert strings 
from what the OS gave us into UCS-1 and back into what the OS expects. 
But if you don't run the benchmarks on all OS's, then sure, you won't 
see any difference :)

Cheers,
Steve