[Python-ideas] Support Unicode code point notation
Steven D'Aprano
steve at pearwood.info
Sun Jul 28 19:29:45 CEST 2013
On 28/07/13 23:06, Nick Coghlan wrote:
> It would also be more consistent if unicodedata.lookup() was updated
> to handle numeric code point names. Something like:
>
>>>> import unicodedata
>>>> def enhanced_lookup(name):
> ... if name.startswith("U+"):
> ... return chr(int(name[2:], 16))
> ... return unicodedata.lookup(name)
> ...
>>>> enhanced_lookup("GREEK SMALL LETTER ALPHA")
> 'α'
>>>> enhanced_lookup("U+03B1")
> 'α'
Earlier, MRAB suggested that unicodedata.name() could return the U+ code point in the case of unnamed characters. I think it would be better to have a separate unicodedata function to return the code point, and leave the current behaviour of name() alone.
def codepoint(c):
return 'U+{:04X}'.format(ord(c))
This should always succeed for any character.
--
Steven
More information about the Python-ideas
mailing list