[Python-ideas] Py3 unicode impositions

Stephen J. Turnbull stephen at xemacs.org
Mon Feb 13 06:24:54 CET 2012


Nick Coghlan writes:

 > Yeah, it didn't take long for me to come back around to that point of
 > view, so I morphed http://bugs.python.org/issue13997 into a docs bug
 > about clearly articulating the absolute bare minimum knowledge of
 > Unicode needed to process text in a robust cross-platform manner in
 > Python 3 instead.

+1

I think (as I've said more verbosely elsewhere) that there are two
common use cases, corresponding to two different definitions of
"robust text processing".

(1) Use cases where you would rather risk occasionally corrupting
    non-ASCII text than risk *any* UnicodeErrors at all *anywhere*.

    They use encoding='latin-1'.

(2) Use cases where you do not want to deal with encodings just to
    "pass through" non-ASCII text, but do want that text preserved
    enough to be willing to risk (rare) UnicodeErrors or validation
    errors from pedantic Unicode-oriented modules.

    They use encoding='ascii', errors='surrogateescape'.




More information about the Python-ideas mailing list