On 28/07/2013 18:29, Steven D'Aprano wrote:
On 28/07/13 23:06, Nick Coghlan wrote:
It would also be more consistent if unicodedata.lookup() was updated to handle numeric code point names. Something like:
import unicodedata def enhanced_lookup(name): ... if name.startswith("U+"): ... return chr(int(name[2:], 16)) ... return unicodedata.lookup(name) ... enhanced_lookup("GREEK SMALL LETTER ALPHA") 'α' enhanced_lookup("U+03B1") 'α'
Earlier, MRAB suggested that unicodedata.name() could return the U+ code point in the case of unnamed characters.
What I said was: """I think the point of "\N{U+03C0}" is that it lets you name all of the codepoints, even those that are as yet unnamed.""" Whether unicodedata.name() could have a fallback is something I've never considered. Until now... :-)
I think it would be better to have a separate unicodedata function to return the code point, and leave the current behaviour of name() alone.
def codepoint(c): return 'U+{:04X}'.format(ord(c))
This should always succeed for any character.