[Python-3000] Unicode IDs -- why NFC? Why allow ligatures?
Jim Jewett
jimjjewett at gmail.com
Tue Jun 5 19:14:59 CEST 2007
On 6/5/07, "Martin v. Löwis" <martin at v.loewis.de> wrote:
> > I'd love to get rid of full-width ASCII and halfwidth kana (via
> > compatibility decomposition). Native Japanese speakers often use them
> > interchangably with the "proper" versions when correcting typos and
> > updating numbers in a series. Ugly, to say the least. I don't think
> > that native Japanese would care, as long as the decomposition is done
> > internally to Python.
> Not sure what the proposal is here. If people say "we want the PEP do
> NFKC", I understand that as "instead of saying NFC, it should say
> NFKC", which in turn means "all identifiers are converted into the
> normal form NFKC while parsing".
I would prefer that.
> With that change, the full-width ASCII characters would still be
> allowed in source - they just wouldn't be different from the regular
> ones anymore when comparing identifiers.
I *think* that would be OK; so long as they mean the same thing, it is
just a quirk like using a different font. I am slightly concerned
that it might mean "string as string" and "string as identifier" have
different tests for equality.
> Another option would be to require that the source is in NFKC already,
> where I then ask again what precisely that means in presence of
> non-UTF source encodings.
My own opinion is that it would be reasonable to put those in NFKC
form as part of the parser's internal translation to unicode. (But I
agree that it makes sense to do that for all encodings, if it is done
for any.)
-jJ
More information about the Python-3000
mailing list