[Pythonmac-SIG] Unicode Filenames on the Mac

Bob Ippolito bob at redivi.com
Thu Jul 14 06:24:03 CEST 2005


On Jul 13, 2005, at 6:05 PM, Nick Matsakis wrote:

>
> What is the best way to deal with non-ASCII paths when working with  
> the
> python standard library? Specifically, when using functions like  
> open()
> and the os and glob modules, what should be passed in?  What should I
> expect out?

If you pass unicode in, you get unicode out:

 >>> import os
 >>> set(map(type, os.listdir('.')))
set([<type 'str'>])
 >>> set(map(type, os.listdir(u'.')))
set([<type 'unicode'>])

Otherwise you pass and receive byte strings.  The encoding of those  
byte strings is fixed:

 >>> import sys
 >>> sys.getfilesystemencoding()
'utf-8'

> In experimenting with it, it appears that these libraries accept str
> objects containing UTF-8 encoded bytes and similarly that is what they
> return.  It would seem better to me if they could be made to accept  
> and
> return unicode objects, but I could see that that might cause  
> backwards
> compatibility problems.  Still, is UTF-8 encoded strs really a safe  
> bet?
> Are there circumstances, including non HFS filesystems, where it  
> will bite
> me if I make this assumption?

HFS actually uses UTF-16 internally, but the POSIX layer is UTF-8.   
It will bite you if you expect the code to work on other platforms.   
Not all platforms use UTF-8 for their filesystem encoding.

-bob



More information about the Pythonmac-SIG mailing list