[Python-Dev] The future of the wchar_t cache

Mon Oct 22 09:08:33 EDT 2018

On 22Oct2018 0413, Victor Stinner wrote:
> For code like "for name in os.listdir(): open(name): ...." (replace
> listdir with scandir if you want to get file metadata), the cache is
> useless, since the fresh string has to be converted to wchar_t*
> anyway, and the cache is destroyed at the end of the loop iteration,
> whereas the cache has never been used...

Agreed the cache is useless here, but since the listdir() result came in 
as wchar_t we could keep it that way (assuming we'd only be changing it 
to char), and then there wouldn't have to be a conversion when we 
immediately pass it back to open().

That said, I spent some time yesterday converting the importlib cache to 
use scandir and separate caches for dir/file (to avoid the stat calls) 
and it made very little overall difference. I have to assume the string 
manipulation dominates. (Making DirEntry lazily calculate its .path had 
a bigger impact. Also, I didn't try to make Windows flush its own stat 
cache, and accessing warm files is much faster than cold ones.)

> I'm not saying that the cache is useless. I just doubt that it's so
> common that it really provide any performance benefit.

I think that it is mostly useless, but if we can transparently keep many 
strings "native" size, that will handle many of the useful cases such as 
the single-use pass-through scenario like above.

Cheers,
Steve