[Python-Dev] PEP 383 update: utf8b is now the error handler
MRAB
google at mrabarnett.plus.com
Tue May 5 19:45:45 CEST 2009
Stephen J. Turnbull wrote:
> MRAB writes:
>
> > > I don't think "people shouldn't be using non-ASCII-compatible
> > > encodings for locale encodings" is a sufficient rationale for a hard
> > > error here. I mean, of course they *should* be using UTF-8. Maybe
> > > Python 3.1 should just go ahead and error on any other encoding on
> > > POSIX platforms? <wink>
> > >
> > I don't see why the error handler couldn't in principle be used with
> > encodings other than UTF-8, although in that case all of the low
> > surrogates should be open to use.
>
> I should have been more clear here, I guess. The error handler *can*,
> and in the PEP *will be* by default, used with all "sane" locale
> encodings on POSIX.
>
> It occurs to me that the PEP maybe should say that it is an error
> to have your POSIX locale set to UTF-16 or something like that.
>
> What "sane" means in this context is
>
> 1. ASCII NUL is the bytearray terminator, and can't be used as a byte
> in a file name. This rules out UTF-16, UTF-32, and widechar EUC
> encodings, as well as some very rare ones.
>
[snip]
It might be slightly OT, but sometimes strict UTF-8 encoding is violated
by encoding U+0000 using 2 bytes (0xC0 0x80) so that 0x00 can be used as
a terminator. I think I read that Microsoft sometimes does this.
More information about the Python-Dev
mailing list