[Pythonmac-SIG] Unicode Filenames on the Mac
Bob Ippolito
bob at redivi.com
Thu Jul 14 06:24:03 CEST 2005
On Jul 13, 2005, at 6:05 PM, Nick Matsakis wrote:
>
> What is the best way to deal with non-ASCII paths when working with
> the
> python standard library? Specifically, when using functions like
> open()
> and the os and glob modules, what should be passed in? What should I
> expect out?
If you pass unicode in, you get unicode out:
>>> import os
>>> set(map(type, os.listdir('.')))
set([<type 'str'>])
>>> set(map(type, os.listdir(u'.')))
set([<type 'unicode'>])
Otherwise you pass and receive byte strings. The encoding of those
byte strings is fixed:
>>> import sys
>>> sys.getfilesystemencoding()
'utf-8'
> In experimenting with it, it appears that these libraries accept str
> objects containing UTF-8 encoded bytes and similarly that is what they
> return. It would seem better to me if they could be made to accept
> and
> return unicode objects, but I could see that that might cause
> backwards
> compatibility problems. Still, is UTF-8 encoded strs really a safe
> bet?
> Are there circumstances, including non HFS filesystems, where it
> will bite
> me if I make this assumption?
HFS actually uses UTF-16 internally, but the POSIX layer is UTF-8.
It will bite you if you expect the code to work on other platforms.
Not all platforms use UTF-8 for their filesystem encoding.
-bob
More information about the Pythonmac-SIG
mailing list