[Python-Dev] PEP 277 (unicode filenames): please review

Brian Quinlan brian@sweetapp.com
Tue, 13 Aug 2002 09:50:14 -0700


Guido van Rossum wrote:
> > It could be that Apple is decomposing the filenames before comparing
> > them. Either way works.
>=20
> Hm, that sucks (either way) -- because you get unnormalized Unicode
> out of directory listings, which is harder to turn into local
> encodings.

Here is a relevant URI:
http://developer.apple.com/techpubs/macosx/Essentials/SystemOverview/Fil
eSystem/File_Encodings_and_Fonts.html

"""In addition, all code that calls BSD system routines should ensure
that the const *char parameters of these routines are in UTF-8 encoding.
All BSD system functions expect their string parameters to be in UTF-8
encoding and nothing else. An additional caveat is that string
parameters for files, paths, and other file-system entities must be in
canonical UTF-8. In a canonical UTF-8 Unicode string, all decomposable
characters are decomposed; for example, ? (0x00E9) is represented as e
(0x0065) + =B4(0x0301). To put things in canonical UTF-8 encoding, use =
the
"file-system representation" APIs defined in Cocoa and Carbon (including
Core Foundation). For example, to get a canonical UTF-8 character string
in Cocoa, use NSString's fileSystemRepresentation method; for
noncanonical UTF-8 strings, use NSString's UTF8String method"""

Cheers,
Brian