[Python-ideas] Unicode surrogateescape [was: Re: Python 3000 TIOBE -3%]
cs at zip.com.au
Thu Feb 16 00:07:49 CET 2012
On 14Feb2012 10:17, Carl M. Johnson <cmjohnson.mailinglist at gmail.com> wrote:
| On Feb 14, 2012, at 10:04 AM, Jim Jewett wrote:
| > But is there a good reason not to change the default errorhandler to
| > errors="surrogateescape"?
| It's a conflict in the Zen:
| > Errors should never pass silently.
| > Unless explicitly silenced.
| OK, so default to strict. But:
| > Although practicality beats purity.
| Hmm, so maybe do use surrogates. Then again:
No. Adding errors="surrogateescape" when needed is easy enough not to be
(Also, it clearly flags in the code that we won't always get what we
| > In the face of ambiguity, refuse the temptation to guess.
| Grr, I'm not nearly Dutch enough to make sense of this logical conflict!
I'm not Dutch either (I can never remember which way P and V go in
semaphore operations, for example). However, the logic I would use is
I should know the encoding of these bytes.
If I don't, and I merely have to suck them in and spit them back out again
as bytes undamaged (such as when reading filesystem filenames, which can
often be treated as opaque tokens), use errors="surrogateescape".
Otherwise, arrange to know the encoding (or have enough fiat to declare
one, preferably utf-8).
errors="surrogateescape" is for lossless but usually "blind"
decode/encode. The rest of the time it would be better to know what
Cameron Simpson <cs at zip.com.au> DoD#743
We don't just *borrow* words; on occasion, English has pursued other
languages down alleyways to beat them unconscious and rifle their pockets for
new vocabulary. - James D. Nicoli
More information about the Python-ideas