[Python-ideas] Python 3000 TIOBE -3%

Wed Feb 15 00:35:11 CET 2012

MRAB wrote:
> On 14/02/2012 21:43, Jim Jewett wrote:
>> On Tue, Feb 14, 2012 at 6:39 AM, Carl M. Johnson
>> <cmjohnson.mailinglist at gmail.com>  wrote:
>>
>>>  OK, so concrete proposals: update the docs and maybe make a
>>>  synonym for Latin-1 that makes it more semantically obvious that
>>>  you're not really using it as Latin-1, just as a easy to pass through
>>>  encoding. Anything else? Any bike shedding on the synonym?
>>
>> encoding="ascii-ish"  # gets the sloppyness right
>> encoding="passthrough"  # I would like "ignore", if it wouldn't cause
>> confusion with the errorhandler

"Ignore" won't do. Ignore what? Everything? Don't actually run an encoder? 
That doesn't even make sense!

"Passthrough" is bad too, because it perpetrates the idea that ASCII 
characters are "plain text" which are bytes. Unicode strings, even those that 
are purely ASCII, are not strings of bytes (except in the sense that every 
data structure is a string of bytes). You can't just "pass bytes through" to 
turn them into Unicode.

>> encoding="binpass"
>> encoding="rawbytes"
>>
> encoding="mojibake" # :-)

You have a smiley, but I think that's the best name I've seen yet. It's 
explicit in what you get -- mojibake.

The only downside is that it's a little obscure. Not everyone knows what 
mojibake is called, or calls it mojibake, although I suppose we could add 
aliases to other terms such as Buchstabensalat and Krähenfüße if German users 
complain <wink>

But remind me again, why are we doing this? If you have to teach people the 
recipe

     open(filename, encoding='mojibake')

why not just teach them the very slightly more complex recipe

     open(filename, encoding='ascii', errors='surrogateescape')

which captures the user's intent ("I want ASCII, with some way of escaping 
errors so I don't have to deal with them") much more accurately. Sometimes 
brevity is *not* a virtue.

-- 
Steven