MRAB wrote:
On 14/02/2012 21:43, Jim Jewett wrote:
On Tue, Feb 14, 2012 at 6:39 AM, Carl M. Johnson <cmjohnson.mailinglist@gmail.com> wrote:
OK, so concrete proposals: update the docs and maybe make a synonym for Latin-1 that makes it more semantically obvious that you're not really using it as Latin-1, just as a easy to pass through encoding. Anything else? Any bike shedding on the synonym?
encoding="ascii-ish" # gets the sloppyness right encoding="passthrough" # I would like "ignore", if it wouldn't cause confusion with the errorhandler
"Ignore" won't do. Ignore what? Everything? Don't actually run an encoder? That doesn't even make sense! "Passthrough" is bad too, because it perpetrates the idea that ASCII characters are "plain text" which are bytes. Unicode strings, even those that are purely ASCII, are not strings of bytes (except in the sense that every data structure is a string of bytes). You can't just "pass bytes through" to turn them into Unicode.
encoding="binpass" encoding="rawbytes"
encoding="mojibake" # :-)
You have a smiley, but I think that's the best name I've seen yet. It's explicit in what you get -- mojibake. The only downside is that it's a little obscure. Not everyone knows what mojibake is called, or calls it mojibake, although I suppose we could add aliases to other terms such as Buchstabensalat and Krähenfüße if German users complain <wink> But remind me again, why are we doing this? If you have to teach people the recipe open(filename, encoding='mojibake') why not just teach them the very slightly more complex recipe open(filename, encoding='ascii', errors='surrogateescape') which captures the user's intent ("I want ASCII, with some way of escaping errors so I don't have to deal with them") much more accurately. Sometimes brevity is *not* a virtue. -- Steven