[Python-3000] Unicode strings, identifiers, and import
James Y Knight
foom at fuhm.net
Fri May 18 01:24:21 CEST 2007
On May 17, 2007, at 7:04 PM, Giovanni Bajo wrote:
> On 13/05/2007 21.31, Guido van Rossum wrote:
>
>> The answer to all of this is the filesystem encoding, which is
>> already
>> supported. Doesn't appear particularly difficult to me.
>
> sys.getfilesystemencoding() is None on most Linux computers I have
> access to.
> How is the problem solved there?
>
> In fact, I have a question about this. Can anybody show me a valid
> multi-platform Python code snippet that, given a filename as
> *unicode* string,
> create a file with that name, possibly adjusting the name so to
> ignore an
> encoding problem (so that the function *always* succeed)?
>
> def dump_to_file(unicode_filename):
> ...
unicode_filename.encode(sys.getfilesystemencoding() or 'ascii',
'xmlcharrefreplace') would work.
Although I don't think I've seen a platform where
sys.getfilesystemencoding() is None.
If I unset LANG/LANGUAGE/LC_*, python reports 'ANSI_X3.4-1968'. But
normally on my system it reports 'UTF-8', since I have LANG=en_US.UTF-8.
The *really* tricky thing is that on unix systems, if you want to be
able to access all the files on the disk, you have to use the byte-
string API, as not all filenames are convertible to unicode. But on
windows, if you want to be able to access all the files on the disk,
you *CANNOT* use the byte-string api, because not all filenames
(which are unicode on disk) are convertible to bytestrings via the
"mbcs" encoding (which is what getfilesystemencoding() reports). It's
quite a pain in the ass really.
James
More information about the Python-3000
mailing list