[Python-Dev] PEP 277 (unicode filenames): please review
Tue, 13 Aug 2002 18:54:49 +0200
Guido van Rossum wrote:
>>Guido van Rossum wrote:
>>>But if you pass the normalized string (or the Latin-1 string) to
>>>open(), will it find the file?
>>I tried opening a file using both "o\xcc\x88" and "\xc3\xb6". Both
>>result in the same file being opened.
>>>I.e. if the filesystem has the
>>>unnormalized name stored in its directory, will filesystem requests
>>>normalize filenames before comparing them?
>>It could be that Apple is decomposing the filenames before comparing
>>them. Either way works.
The recommended way of doing normalization is to go by
Normalization Form C: Canonical Decomposition,
followed by Canonical Composition.
Note that for proper collation suppotr, Unicode strings mus first be
normalized. See http://www.unicode.org/unicode/reports/tr10/#Main_Algorithm
> Hm, that sucks (either way) -- because you get unnormalized Unicode
> out of directory listings, which is harder to turn into local
You can easily normalize it again (provided you have a normalization
lib at hand).
CEO eGenix.com Software GmbH
eGenix.com -- Makers of the Python mx Extensions: mxDateTime,mxODBC,...
Python Consulting: http://www.egenix.com/
Python Software: http://www.egenix.com/files/python/