Carl M. Johnson writes:
If I can I would like to offer one argument for surrogateescape over latin-1 as the newbie approach.
This isn't the newbie approach. What should be recommended to newbies is to use the default (which is locale-dependent, and therefore "usually" "good enough"), and live with the risk of occasional exceptions. If they get exceptions, or must avoid exceptions, learn about encodings or consult with someone who already knows.[1] *Neither* of the approaches discussed here is reliable for tasks like automatically processing email or uploaded files on the web, and neither should be recommended to people who aren't already used to encoding-agnostic processing in the Python 2 "str" style. So, now that you mention "newbies", I don't know what other people are discussing, but what I've been discussing here is an approach for people who are comfortable working around (or never experience!) the defects of Python 2's ASCII-compatible approach to handling varied encodings in a single program, and want a workalike for Python 3. The choice between the two is task-dependent. The encoding='latin1' method is for tasks where a little mojibake can be tolerated, but an exception would stop the show. The errors='surrogateencoding' method is for tasks where any mojibake at all is a disaster, but occasional exceptions can be handled as they arise. Footnotes: [1] When this damned term is over in a few weeks, I'll take a look at the tutorial-level docs and see if I can come up with a gentle approach for those who are finding out for the first time that the locale-dependent default isn't good enough for them.