[Python-ideas] Python 3000 TIOBE -3%

Stephen J. Turnbull stephen at xemacs.org
Wed Feb 15 08:46:18 CET 2012


Carl M. Johnson writes:

 > If I can I would like to offer one argument for surrogateescape
 > over latin-1 as the newbie approach.

This isn't the newbie approach.  What should be recommended to newbies
is to use the default (which is locale-dependent, and therefore
"usually" "good enough"), and live with the risk of occasional
exceptions.  If they get exceptions, or must avoid exceptions, learn
about encodings or consult with someone who already knows.[1]
*Neither* of the approaches discussed here is reliable for tasks like
automatically processing email or uploaded files on the web, and
neither should be recommended to people who aren't already used to
encoding-agnostic processing in the Python 2 "str" style.

So, now that you mention "newbies", I don't know what other people are
discussing, but what I've been discussing here is an approach for
people who are comfortable working around (or never experience!) the
defects of Python 2's ASCII-compatible approach to handling varied
encodings in a single program, and want a workalike for Python 3.

The choice between the two is task-dependent.  The encoding='latin1'
method is for tasks where a little mojibake can be tolerated, but an
exception would stop the show.  The errors='surrogateencoding' method
is for tasks where any mojibake at all is a disaster, but occasional
exceptions can be handled as they arise.


Footnotes: 
[1]  When this damned term is over in a few weeks, I'll take a look at
the tutorial-level docs and see if I can come up with a gentle
approach for those who are finding out for the first time that the
locale-dependent default isn't good enough for them.




More information about the Python-ideas mailing list