[Python-3000] Support for PEP 3131

Wed Jun 13 00:09:17 CEST 2007

On 6/11/07, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Jim Jewett writes:

>  > Of course, I wouldn't type them if I knew they were wrong.  With an
>  > ASCII-only install, I would get that error-check because the
>  > (remaining original uses) were in Cyrillic.  With an "any unicode
>  > character" install, ... well, I might figure out my problem the next
>  > morning.

> But this is something that only a small subset of developers-of-Python
> seem to be concerned about.

(Almost) no one ever cares about typos (or fire escapes, for that
matter) in advance; if it non-ASCII characters were common enough (in
your local environment) that people expected and recognized them, then
they wouldn't be a problem.

That is why I have no objection to using Japanese on systems configured for it.

That is also why I want even systems configured for Japanese to be
able to still get warnings about Latin-1 (beyond ASCII).

I figure if the difference between ì and i may be as subtle to them as
the difference between (two of their letters that happen to be similar
to me), and they might appreciate the heads-up to look carefully.

> But I see no reason why that auditor program can't be run as a PEP 263
> codec.  AFAICS, the following objections could be raised, and answered:

This can of course be turned around.

The "codec does a bit more than you expect" option has been available
since 2.3 for people who want an expanded ID charset.  (Just
transliterate the extra characters into the moral equivalent of an
escape.)  It doesn't seem to have been used.

I'll freely agree that it hasn't been used in part because the
expanded charset is aimed largely at people not ready to use the
"write or at least install a codec that cheats" level of magic.  It is
also partly because the use of non-ASCII IDS is expected to stay small
in widely distributed code.

But the same facts argue against silently allowing unrecognized
characters; the use will be rare enough that people won't be expecting
it, and the level of magic required to write (or even know to install)
such a codec ... tends to come after someone has already found a
workaround for "strange characters".

> That doesn't mollify those who think I should not be allowed to use
> non-ASCII identifiers at all.

There is a subtle distinction there.  I am among those who think you
should not use non-ASCII identifiers *without an explicit
declaration.*

Putting that declaration at the top of the file itself would be fine.
(modulo possible security issues, such as the "coding" with a cyrillic
c.)

-jJ