[Python-3000] Unicode IDs -- why NFC? Why allow ligatures?

Tue Jun 5 13:06:53 CEST 2007

On 6/5/07, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> I'd love to get rid of full-width ASCII and halfwidth kana (via
> compatibility decomposition).

If you do forbid compatibility characters in identifiers, then they
should be flagged as an error, not converted silently. NFC, on the
other hand, should be applied silently. The reason is that character
equivalence is the same thing as binary equivalence of the NFC form in
Unicode, and adding extra equivalences (whether it's "FoO" == "foo",
"ｶｷ" == "カキ" or "Ａ１２３" == "A123") is surprising.

In short, I would like this function to return 'OK' or be a
syntax error, but it should not fail or return something else:

def test():
    if 'A' == 'Ａ': return 'OK'
    A = 'O'
    Ａ = 'K' # as tested above, 'A' and 'Ａ' are not the same thing
    return locals()['A']+locals()['Ａ']

Note that 'A' == 'Ａ' should be false (no automatic NFKC for strings,
please).