[Python-3000] C API cleanup str

Sun Aug 5 17:48:06 CEST 2007

> IMO at the C level all conversions between bytes and Unicode that
> don't specify a conversion should use UTF-8. That's what most of the
> changes made so far do.

I agree. We should specify that somewhere, so we have a recorded
guideline to use in case of doubt.

One function that misbehaves under this spec is
PyUnicode_FromString[AndSize], which assumes the input is Latin-1
(i.e. it performs a codepoint-per-codepoint conversion).

As a consequence, this now can fail because of encoding errors
(which it previously couldn't).

> An exception should be made for stuff that explicitly handles
> filenames; there the filesystem encoding should obviously used.

In most cases, this still follows the rule, as the filename encoding
is specified explicitly. I agree this should also be specified, in
particular when the import code gets fixed (where strings typically
denote file names).

Regards,
Martin