[Python-3000] Unicode strings, identifiers, and import

Fri May 18 07:26:09 CEST 2007

>> The answer to all of this is the filesystem encoding, which is already
>> supported. Doesn't appear particularly difficult to me.
> 
> sys.getfilesystemencoding() is None on most Linux computers I have access to. 

That's strange. Is LANG not set?

> How is the problem solved there?

A default needs to be applied. In 2.x, the default is the system
encoding. Not sure whether the notion of a Python system encoding
will be preserved for 3.x, but it should be safe, on Unix, to default
to UTF-8 for the file system encoding unless LANG specifies something
different.

> In fact, I have a question about this. Can anybody show me a valid 
> multi-platform Python code snippet that, given a filename as *unicode* string, 
> create a file with that name, possibly adjusting the name so to ignore an 
> encoding problem (so that the function *always* succeed)?

That's not really a python-dev or py3k question. If you want to support
*arbitrary* Unicode strings, you clearly cannot map them to file names
directly: what if the Unicode string contains the directory separator,
or other characters not allowed in file names (such as : or * on
Windows).

If you need to guarantee that any Unicode string can map to a file
name, I suggest

f = open(filename.encode("utf-8").encode("hex"), "w")

> I attempted this a couple of times without being satisfied at all by the 
> solutions.

That's probably because you failed to specify all requirements that you
need for satisfaction. If you would explicitly specify them, you would
likely find that they conflict, and that no solution can possibly
exist satisfying all your requirements, and that this has nothing to
do with Unicode.

Notice that my above solution meets the *specified* needs: it supports
all unicode strings, succeeds always, and possibly adjusts the file
name to ignore an encoding problem. Of course, interpreting the file
name in a file explorer is somewhat tedious...

Regards,
Martin