[Python-ideas] Fall back to encoding unicode strings in utf-8 if latin-1 fails in http.client

Emil Stenström em at kth.se
Thu Jan 7 08:11:01 EST 2016


On 2016-01-07 13:59, Steven D'Aprano wrote:
> On Thu, Jan 07, 2016 at 08:49:55PM +1100, Chris Angelico wrote:
>
>> It makes sense, but I disagree with the suggestion. Having "Latin-1 or
>> UTF-8" as the effective default encoding is not a good idea, IMO;
>
> I'm curious what your reasoning is. That seems to be fairly common
> behavious with some email clients, for example I seem to recall that
> Thunderbird will try encoding emails as US-ASCII, if that fails,
> Latin-1, and only send UTF-8 if the other two don't work.
>
> I'm not defending this tactic, but wondering what you have against it.

I'm fine with either tactic, either defaulting to utf-8 or trying them 
one after the other. The important thing for me is that the API works as 
expected by many.

My main reason for not changing the default was that it would break 
backwards compatibility, but only for the case that people sent latin-1 
strings as if they where unicode strings.

If the reading of the spec that led to using latin-1 is incorrect that 
really makes we question if having latin-1 there is a good idea from the 
start.

So I'm definitely pro switching to utf-8 as default as it would make the 
API work like many (including me) would expect.

/Emil


More information about the Python-ideas mailing list