[Python-ideas] Unicode Name Aliases keyword argument abbreviation in unicodedata.name for missing names

Wed Jul 11 22:03:07 EDT 2018

unicodedata.name<http://unicodedata.name> raises KeyError for a few unicode characters like '\0' or '\n', altough the documentation is very clear on the implementation, this is often not what people want, ie. a string describing the character.

In Python 3.3, the name aliases became accepted in unicodedata.lookup('NULL') and '\N{NULL}' == '\N{NUL}'.

One could expect that lookup(name(x)) == x for all unicode character but this property doesn't hold because of the few characters that do not have a name (mainly control characters).

The use case where the KeyError is raised when a codepoint for a unused character or newest version of unicode is however still useful.

In the NameAliases https://www.unicode.org/Public/6.3.0/ucd/NameAliases.txt one can see that some characters have multiple aliases, so there are multiple ways to map a character to a name.

I propose adding a keyword argument, to unicodedata.name<http://unicodedata.name> that would implement one of some useful behavior when the value does not exist. In that case.

One simple behavior would be to chose the name in the "abbreviation" list. Currently all characters except three only have one and only one abbreviation so that would be a good pick, so I'd imagine name('\x00', abbreviation=True) == 'NUL'

The three characters in NameAlias.txt that have more than one abbreviation are :

'\n' with  ['LF', 'NL', 'EOL']
'\t' with ['HT', 'TAB']
'\ufeff' with ['BOM', 'ZWNBSP']

In case multiple abbreviations exist, one could take the first introduced to unicode (for backward compability with python versions). If this is a tie, one could take the first in the list. If it has no name and no abbreviation, unicodata.name<http://unicodata.name> raises an error or returns default as usual.

lookup(name(x)) == x for all x is natural isn't it ?

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20180712/170dbaa2/attachment.html>