[Python-ideas] Support Unicode code point notation

Steven D'Aprano steve at pearwood.info
Sun Jul 28 19:29:45 CEST 2013


On 28/07/13 23:06, Nick Coghlan wrote:

> It would also be more consistent if unicodedata.lookup() was updated
> to handle numeric code point names. Something like:
>
>>>> import unicodedata
>>>> def enhanced_lookup(name):
> ...     if name.startswith("U+"):
> ...         return chr(int(name[2:], 16))
> ...     return unicodedata.lookup(name)
> ...
>>>> enhanced_lookup("GREEK SMALL LETTER ALPHA")
> 'α'
>>>> enhanced_lookup("U+03B1")
> 'α'


Earlier, MRAB suggested that unicodedata.name() could return the U+ code point in the case of unnamed characters. I think it would be better to have a separate unicodedata function to return the code point, and leave the current behaviour of name() alone.

def codepoint(c):
     return 'U+{:04X}'.format(ord(c))

This should always succeed for any character.



-- 
Steven


More information about the Python-ideas mailing list