[Python-ideas] Fall back to encoding unicode strings in utf-8 if latin-1 fails in http.client

Thu Jan 7 08:25:33 EST 2016

On Thu, Jan 7, 2016 at 11:59 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> On Thu, Jan 07, 2016 at 08:49:55PM +1100, Chris Angelico wrote:
>
>> It makes sense, but I disagree with the suggestion. Having "Latin-1 or
>> UTF-8" as the effective default encoding is not a good idea, IMO;
>
> I'm curious what your reasoning is. That seems to be fairly common
> behavious with some email clients, for example I seem to recall that
> Thunderbird will try encoding emails as US-ASCII, if that fails,
> Latin-1, and only send UTF-8 if the other two don't work.
>
> I'm not defending this tactic, but wondering what you have against it.

An application is free to do that if it likes, although personally I
wouldn't bother. For a library, I'd much rather the rules be as simple
as possible. Maybe "ASCII or UTF-8" (since one is a strict subset of
the other), but not "ASCII or Latin-1 or UTF-7". I'd prefer something
extremely simple: if you don't specify an encoding, it has one
default. That corresponds to a function signature that says
encoding="UTF-8", and you can be 100% confident that omitting the
encoding parameter will do the same thing as passing "UTF-8".

ChrisA