[Python-3000] Unicode and OS strings
"Martin v. Löwis"
martin at v.loewis.de
Fri Sep 14 14:32:59 CEST 2007
> Are you sure that "strings in an unknown encoding" are conceptually
> strings and not rather bytes?
For file names, most definitely. For command line arguments, I am
fairly sure: the argc/argv calling convention does not allow for
arbitrary bytes.
> And what if we skillfully conserve unknown bytes in a private use or
> surrogate area and the application author actually knows the encoding
> and wants correctly decoded strings?
They can easily roundtrip that then to the encoding that it should have:
good_string = sys.argv[bad_string_index].\
encode(sys.argv_encoding, "pua-replace").decode(real_encoding)
However, we are talking about borderline cases here - in most cases,
Python will just do the right thing. Special cases aren't special enough
to break the rules.
Regards,
Martin
More information about the Python-3000
mailing list