[Python-ideas] Support Unicode code point notation

MRAB python at mrabarnett.plus.com
Sun Jul 28 20:07:21 CEST 2013


On 28/07/2013 18:29, Steven D'Aprano wrote:
> On 28/07/13 23:06, Nick Coghlan wrote:
>
>> It would also be more consistent if unicodedata.lookup() was updated
>> to handle numeric code point names. Something like:
>>
>>>>> import unicodedata
>>>>> def enhanced_lookup(name):
>> ...     if name.startswith("U+"):
>> ...         return chr(int(name[2:], 16))
>> ...     return unicodedata.lookup(name)
>> ...
>>>>> enhanced_lookup("GREEK SMALL LETTER ALPHA")
>> 'α'
>>>>> enhanced_lookup("U+03B1")
>> 'α'
>
>
> Earlier, MRAB suggested that unicodedata.name() could return the U+ code point in the case of unnamed characters.

What I said was:

"""I think the point of "\N{U+03C0}" is that it lets you name all of the
codepoints, even those that are as yet unnamed."""

Whether unicodedata.name() could have a fallback is something I've
never considered. Until now... :-)

 > I think it would be better to have a separate unicodedata function to 
return the code point, and leave the current behaviour of name() alone.
>
> def codepoint(c):
>       return 'U+{:04X}'.format(ord(c))
>
> This should always succeed for any character.
>



More information about the Python-ideas mailing list