[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

Fri Jun 20 06:08:37 CEST 2014

Ezio Melotti added the comment:

> I'm not sure what the "Other_ID_Start property" mentioned in [1] and
> [2] means, though. Can we get someone with more in-depth knowledge of
> unicode to help with this? 

See http://www.unicode.org/reports/tr31/#Backward_Compatibility.
Basically they were considered valid ID_Start characters in previous versions of Unicode, but they are no longer valid.  I think it's safe to leave them out (perhaps they could/should be removed from the Python parser too), but if you want to add them the list includes only 4 characters (there are 12 more for Other_ID_Continue).

> The real question is how to do this *fast*, since HyperParser does a
> *lot* of these checks. Do you think caching would be a good approach?

I think it would be enough to check explicitly for ASCII chars, since most of them will be ASCII anyway.  If they are not ASCII you can use unicodedata.category (or .isidentifier() if it does the right thing).

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue21765>
_______________________________________