[Python-3000] Unicode IDs -- why NFC? Why allow ligatures?

"Martin v. Löwis" martin at v.loewis.de
Sat Jun 9 09:55:42 CEST 2007


> On another note, I have no idea how Martin's name (in the Cc line) ended
> up as:
> 
> """
> L$(D+S(Bwis"
> """
> 
> If I knew, it *might* have a bearing on what sorts of
> canonicalizations should be performed, and what sorts of warnings the
> parser ought to emit for likely corrupted text.

That results from a faulty iso-2022-jp-1 conversion. ESC $ ( D switches
to JIS X 0212-1990 (which apparently includes ö at code position 0x25B3);
ESC ( B switches back to ASCII.

I don't think this has anything to do with normalization.

Regards,
Martin


More information about the Python-3000 mailing list