Re: [Python-ideas] Support Unicode code point notation

29 Jul 2013


      On 28/07/2013 18:29, Steven D'Aprano wrote:
...
On 28/07/13 23:06, Nick Coghlan wrote:
...
It would also be more consistent if unicodedata.lookup() was updated
to handle numeric code point names. Something like:
...
...
...
import unicodedata
def enhanced_lookup(name):
...     if name.startswith("U+"):
...         return chr(int(name[2:], 16))
...     return unicodedata.lookup(name)
...
enhanced_lookup("GREEK SMALL LETTER ALPHA")
'α'
enhanced_lookup("U+03B1")
'α'
Earlier, MRAB suggested that unicodedata.name() could return the U+ code point in the case of unnamed characters.
What I said was:

"""I think the point of "\N{U+03C0}" is that it lets you name all of the
codepoints, even those that are as yet unnamed."""

Whether unicodedata.name() could have a fallback is something I've
never considered. Until now... :-)
...
I think it would be better to have a separate unicodedata function to 
return the code point, and leave the current behaviour of name() alone.
def codepoint(c):
      return 'U+{:04X}'.format(ord(c))
This should always succeed for any character.