[Python-3000] [Python-Dev] Filename as byte string in python 2.6 or 3.0?
James Y Knight
foom at fuhm.net
Tue Sep 30 18:20:00 CEST 2008
On Sep 29, 2008, at 11:11 PM, Stephen J. Turnbull wrote:
>> Except...that one over there. That's the whole point of UTF-8b:
>> correctly encoded names get decoded correctly and readably, and the
>> other cases get decoded into something unique that cannot possibly
> Sure. But there are lots of other operations besides encoding and
> decoding that we do with filenames. How do you display a filename?
> How about concatenating them to make paths? What do you do when you
> want to mix a filename with other, well-formed strings? If you keep
> the filenames internally in UTF-8b, you're going to need what amounts
> to a whole string API for dealing with them, aren't you? If you're
> not doing that, how is UTF-8b represented?
No, you keep the filenames internally in a PyUnicode object. All that
stuff *works* in Python today, with a UTF-8b decoded string.
Displaying a filename is encoding it into some other encoding. Like
So, that seems to work okay. Maybe I should try to display that in a
web browser. Shows up as 2 "unknown character" glyphs. Perfect.
If you want to mix a filename with other strings, you append them
together, or use os.path, same as always. You don't need any new
Since from what I've tried, things seem to work, I'd really like to
know what precisely does fail from the opponents of utf-8b.
And again: if utf-8b isn't acceptable, because it does break things in
some unknown-to-me way, I really can't imagine anything working but
just going back to byte-string access as the only API. It's really not
okay for the "obvious" APIs to be totally broken by unexpected input.
Think os.getcwd(), sys.argv, os.environ. You can't just ignore bad
files and call it done.
More information about the Python-3000