[Python-3000] Unicode and OS strings

"Martin v. Löwis" martin at v.loewis.de
Fri Sep 14 14:32:59 CEST 2007


> Are you sure that "strings in an unknown encoding" are conceptually
> strings and not rather bytes?

For file names, most definitely. For command line arguments, I am
fairly sure: the argc/argv calling convention does not allow for
arbitrary bytes.

> And what if we skillfully conserve unknown bytes in a private use or
> surrogate area and the application author actually knows the encoding
> and wants correctly decoded strings?

They can easily roundtrip that then to the encoding that it should have:

good_string = sys.argv[bad_string_index].\
   encode(sys.argv_encoding, "pua-replace").decode(real_encoding)

However, we are talking about borderline cases here - in most cases,
Python will just do the right thing. Special cases aren't special enough
to break the rules.

Regards,
Martin


More information about the Python-3000 mailing list