[Python-Dev] casefolding in pathlib (PEP 428)

Ronald Oussoren ronaldoussoren at mac.com
Fri Apr 12 15:10:42 CEST 2013


On 12 Apr, 2013, at 15:00, Christian Heimes <christian at python.org> wrote:

> Am 12.04.2013 14:43, schrieb Ronald Oussoren:
>> At least for OSX the kernel will normalize names for you, at least for HFS+,
>> and therefore two names that don't compare equal with '==' can refer to the
>> same file (for example the NFKD and NFKC forms of Löwe). 
>> 
>> Isn't unicode fun :-)
> 
> Seriously, the OSX kernel normalizes unicode forms? It's a cool feature
> and makes sense for the user's POV but ... WTF?

IIRC only for HFS+ filesystems, it is possible to access files on an NFS share
where the filename encoding isn't UTF-8.

> 
> Perhaps we should use the platform's API for the job. Does OSX offer an
> API function to create a case folded and canonical form of a path?
> Windows has PathCchCanonicalizeEx().

This would have to be done on a per path element case, because every directory
in a file's path could be on a separate filesystem with different conventions
(HFS+, HFS+ case sensitive, NFS mounted unix filesystem).

I have found sample code that can determine if a directory is on a case sensitive
filesystem (attached to <http://lists.apple.com/archives/darwin-dev/2007/Apr/msg00036.html>,
doesn't work in a 64-binary but I haven't check yet why is doesn't work there). 

I don'tknow if there is a function to determine the filesystem encoding, I guess 
assuming that the special casing is only needed for HFS+ variants could work but 
I'd have test that.

Ronald



More information about the Python-Dev mailing list