[Python-3000] PEP 3131 accepted
Stephen J. Turnbull
stephen at xemacs.org
Fri May 25 12:45:39 CEST 2007
"Martin v. Löwis" writes:
> > If people can agree on a method for specifying, 'ascii only', 'ascii +
> > character sets X, Y, Z', and it actually becomes an accepted part of the
> > proposal, gets implemented, etc., I will grumble to myself at home, but
> > I will stop trying to raise a stink here.
> I think you can stop now - this is supported as a side effect of
> PEP 263, and implemented for years.
That seems not to be the case. PEP 263 allows you to specify a coding
system, not a character set. Whether that will restrict the character
set depends on how the coding system is implemented. For example,
ISO-2022-JP is implicitly a (near) UCS since it does not forbid
designations, so you don't know (XEmacs implements it as a UCS, I'm
not sure what GNU does), while ISO-2022-JP-2 is explicitly a UCS
because it explicitly permits designations. And how about C1 code
points in ISO 2022-conformant 8-bit coding systems (including all ISO
8859 systems)? Do they pass, or not? Any restriction is simply a
side effect of the codec throwing an exception because it doesn't
recognize the input. So this requires that users know how the
relevant codec is implemented.
Second, this also removes your ability to use literal strings and
comments outside that coding system. (Of course Unicode escapes will
still be available, but hardly acceptable for string literals, and
completely out of the question for comments.)
Third, it also has the defect of requiring you to use a legacy coding
system, does it not? Ie, if I want to restrict to ASCII + Cyrillic, I
can use ISO-8859-5 or KOI8-R but *not* UTF-8.
Finally it does not make it easy to create unions or subsets. One has
to write a codec to do that.
More information about the Python-3000