[Python-ideas] Py3 unicode impositions
Stephen J. Turnbull
stephen at xemacs.org
Mon Feb 13 06:24:54 CET 2012
Nick Coghlan writes:
> Yeah, it didn't take long for me to come back around to that point of
> view, so I morphed http://bugs.python.org/issue13997 into a docs bug
> about clearly articulating the absolute bare minimum knowledge of
> Unicode needed to process text in a robust cross-platform manner in
> Python 3 instead.
+1
I think (as I've said more verbosely elsewhere) that there are two
common use cases, corresponding to two different definitions of
"robust text processing".
(1) Use cases where you would rather risk occasionally corrupting
non-ASCII text than risk *any* UnicodeErrors at all *anywhere*.
They use encoding='latin-1'.
(2) Use cases where you do not want to deal with encodings just to
"pass through" non-ASCII text, but do want that text preserved
enough to be willing to risk (rare) UnicodeErrors or validation
errors from pedantic Unicode-oriented modules.
They use encoding='ascii', errors='surrogateescape'.
More information about the Python-ideas
mailing list