[Python-ideas] Python 3000 TIOBE -3%

Wed Feb 15 05:12:58 CET 2012

Nick Coghlan writes:

 > using "ascii+surrogateescape" for your own I/O and setting
 > "backslashreplace" on sys.stdout should cover you (and any
 > exceptions you get will be warning you about cases where your
 > original assumptions about not caring about Unicode validity have
 > been proven wrong).

Are you saying you know more than the user about her application?

 > If the logging module doesn't do it already, it should probably be
 > defaulting to backslashreplace when encoding messages, too

See, *you* don't know whether it will raise, either, and that about an
important stdlib module.  Why should somebody who is not already a
Unicode geek and is just using a module they've downloaded off of PyPI
be required to audit its IO foibles?

Really, I think use of 'latin1' in this context is covered by
"consenting adults."  We *should* provide an alias that says "all we
know about this string is that the ASCII codes represent ASCII
characters," and document that even if your own code is ASCII
compatible (ie, treats runs of non-ASCII as opaque, atomic blobs),
third party modules may corrupt the text.  And use the word "corrupt";
all UnicodelyRightThinking folks will run away screaming.

That statement about corrupting text is true in Python 2, and
pre-PEP-393 Python 3, anyway (on Windows and UCS-2 builds elsewhere),
you know, since they can silently slice a surrogate pair in half.