[Python-3000] Unicode strings, identifiers, and import

James Y Knight foom at fuhm.net
Fri May 18 01:24:21 CEST 2007


On May 17, 2007, at 7:04 PM, Giovanni Bajo wrote:

> On 13/05/2007 21.31, Guido van Rossum wrote:
>
>> The answer to all of this is the filesystem encoding, which is  
>> already
>> supported. Doesn't appear particularly difficult to me.
>
> sys.getfilesystemencoding() is None on most Linux computers I have  
> access to.
> How is the problem solved there?
>
> In fact, I have a question about this. Can anybody show me a valid
> multi-platform Python code snippet that, given a filename as  
> *unicode* string,
> create a file with that name, possibly adjusting the name so to  
> ignore an
> encoding problem (so that the function *always* succeed)?
>
> def dump_to_file(unicode_filename):
>      ...

unicode_filename.encode(sys.getfilesystemencoding() or 'ascii',  
'xmlcharrefreplace') would work.

Although I don't think I've seen a platform where  
sys.getfilesystemencoding() is None.

If I unset LANG/LANGUAGE/LC_*, python reports 'ANSI_X3.4-1968'. But  
normally on my system it reports 'UTF-8', since I have LANG=en_US.UTF-8.

The *really* tricky thing is that on unix systems, if you want to be  
able to access all the files on the disk, you have to use the byte- 
string API, as not all filenames are convertible to unicode. But on  
windows, if you want to be able to access all the files on the disk,  
you *CANNOT* use the byte-string api, because not all filenames  
(which are unicode on disk) are convertible to bytestrings via the  
"mbcs" encoding (which is what getfilesystemencoding() reports). It's  
quite a pain in the ass really.

James


More information about the Python-3000 mailing list