[Python-Dev] Multilingual programming article on the Red Hat Developer blog

Chris Angelico rosuav at gmail.com
Tue Sep 16 17:27:44 CEST 2014


On Wed, Sep 17, 2014 at 1:00 AM, R. David Murray <rdmurray at bitdance.com> wrote:
> That isn't the case in the email package.  The smuggled bytes are not
> errors[*], they are literally smuggled bytes.

But they're not characters, which is what Stephen and I were saying -
and contrary to what Jim said about treating them as characters. At
best, they represent characters but in some encoding other than the
one you're using, and you have no idea how many bytes form a character
or anything. So you can't, for instance, word-wrap the text, because
you can't know how wide these unknown bytes are, whether they
represent spaces (wrap points), or newlines, or anything like that.
You can't treat them as characters, so while you have them in your
string, you can't treat it as a pure Unicode string - it''s a Unicode
string with smuggled bytes.

ChrisA


More information about the Python-Dev mailing list