[Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?

James Y Knight foom at fuhm.net
Tue Sep 30 18:20:00 CEST 2008


On Sep 29, 2008, at 11:11 PM, Stephen J. Turnbull wrote:

>> Except...that one over there. That's the whole point of UTF-8b:
>> correctly encoded names get decoded correctly and readably, and the
>> other cases get decoded into something unique that cannot possibly
>> conflict.
>
> Sure.  But there are lots of other operations besides encoding and
> decoding that we do with filenames.  How do you display a filename?
> How about concatenating them to make paths?  What do you do when you
> want to mix a filename with other, well-formed strings?  If you keep
> the filenames internally in UTF-8b, you're going to need what amounts
> to a whole string API for dealing with them, aren't you?  If you're
> not doing that, how is UTF-8b represented?

No, you keep the filenames internally in a PyUnicode object. All that  
stuff *works* in Python today, with a UTF-8b decoded string.

Displaying a filename is encoding it into some other encoding. Like  
this:
 >>> '\x90\x90'.decode('utf-8b')
u'\udc90\udc90'
 >>> u'\udc90\udc90'.encode('utf-8')
'\xed\xb2\x90\xed\xb2\x90'

So, that seems to work okay. Maybe I should try to display that in a  
web browser. Shows up as 2 "unknown character" glyphs. Perfect.

If you want to mix a filename with other strings, you append them  
together, or use os.path, same as always. You don't need any new  
string API.

Since from what I've tried, things seem to work, I'd really like to  
know what precisely does fail from the opponents of utf-8b.

And again: if utf-8b isn't acceptable, because it does break things in  
some unknown-to-me way, I really can't imagine anything working but  
just going back to byte-string access as the only API. It's really not  
okay for the "obvious" APIs to be totally broken by unexpected input.  
Think os.getcwd(),  sys.argv, os.environ. You can't just ignore bad  
files and call it done.

James


More information about the Python-3000 mailing list