[Python-Dev] Bytes path support

Chris Angelico rosuav at gmail.com
Fri Aug 22 23:04:20 CEST 2014


On Sat, Aug 23, 2014 at 6:17 AM, Glenn Linderman <v+python at g.nevcal.com> wrote:
> "cp1251 of utf-8 encoding" is non-sensical. Either it is cp1251 or it is
> utf-8, but it is not both. Maybe you meant "or" instead of "of".

I'd assume "or" meant there, rather than "of", it's a common typo.

Not sure why 1251, specifically, but it's not uncommon for boundary
code to attempt a decode that consists of something like "attempt
UTF-8 decode, and if that fails, attempt an eight-bit decode". For my
MUD clients, that's pretty much required; one of the servers I
frequent is completely bytes-oriented, so whatever encoding one client
uses will be dutifully echoed to every other client. There are some
that correctly use UTF-8, but others use whatever they feel like; and
since those naughty clients are mainly on Windows, I can reasonably
guess that they'll be using CP-1252. So that's what I do: UTF-8,
fall-back on 1252. (It's also possible some clients will be using
Latin-1, but 1252 is a superset of that.)

But it's important to note that this is a method of handling junk.
It's not a design intention; this is for a situation where I really
want to cope with any byte stream and attempt to display it as text.
And if I get something that's neither UTF-8 nor CP-1252, I will
display it wrongly, and there's nothing can be done about that.

ChrisA


More information about the Python-Dev mailing list