[issue21765] Idle: make 3.x HyperParser work with non-ascii identifiers.

Ezio Melotti report at bugs.python.org
Fri Jun 20 06:08:37 CEST 2014


Ezio Melotti added the comment:

> I'm not sure what the "Other_ID_Start property" mentioned in [1] and
> [2] means, though. Can we get someone with more in-depth knowledge of
> unicode to help with this? 

See http://www.unicode.org/reports/tr31/#Backward_Compatibility.
Basically they were considered valid ID_Start characters in previous versions of Unicode, but they are no longer valid.  I think it's safe to leave them out (perhaps they could/should be removed from the Python parser too), but if you want to add them the list includes only 4 characters (there are 12 more for Other_ID_Continue).

> The real question is how to do this *fast*, since HyperParser does a
> *lot* of these checks. Do you think caching would be a good approach?

I think it would be enough to check explicitly for ASCII chars, since most of them will be ASCII anyway.  If they are not ASCII you can use unicodedata.category (or .isidentifier() if it does the right thing).

----------

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue21765>
_______________________________________


More information about the Python-bugs-list mailing list