[Python-3000] C API cleanup str
"Martin v. Löwis"
martin at v.loewis.de
Sun Aug 5 17:48:06 CEST 2007
> IMO at the C level all conversions between bytes and Unicode that
> don't specify a conversion should use UTF-8. That's what most of the
> changes made so far do.
I agree. We should specify that somewhere, so we have a recorded
guideline to use in case of doubt.
One function that misbehaves under this spec is
PyUnicode_FromString[AndSize], which assumes the input is Latin-1
(i.e. it performs a codepoint-per-codepoint conversion).
As a consequence, this now can fail because of encoding errors
(which it previously couldn't).
> An exception should be made for stuff that explicitly handles
> filenames; there the filesystem encoding should obviously used.
In most cases, this still follows the rule, as the filename encoding
is specified explicitly. I agree this should also be specified, in
particular when the import code gets fixed (where strings typically
denote file names).
Regards,
Martin
More information about the Python-3000
mailing list