[Python-Dev] The future of the wchar_t cache

Serhiy Storchaka storchaka at gmail.com
Mon Oct 22 10:07:10 EDT 2018


22.10.18 16:24, Steve Dower пише:
> Yes, that's true. But "should reduce ... footprint" is also an 
> optimisation that deserves a benchmark by that standard. Also, I'm 
> proposing keeping the 'kind' as UCS-2 when the string is created from 
> UCS-2 data that is likely to be used as UCS-2. We would not create the 
> UCS-1 version in this case, so it's not the same as prefilling the 
> cache, but it would cost a bit of memory in exchange for CPU. If slicing 
> and concatentation between matching kinds also preserved the kind, a lot 
> of path handling code could avoid back-and-forth conversions.

Oh, I afraid this will complicate the whole code of unicodeobject.c (and 
several other files) a much and can introduce a lot of subtle bugs.

For example, when you search a UCS2 string in a UCS1 string, the current 
code returns the result fast, because a UCS1 string can't contain codes 
 > 0xff, and a UCS2 string should contain codes > 0xff. And there are 
many such assumptions.



More information about the Python-Dev mailing list