unicode name for \u000a

Ken Beesley ken.beesley at xrce.xerox.com
Sun Aug 22 17:37:33 CEST 2004

>"Martin v. Löwis" <martin at v.loewis.de> writes:
>>No. <control> is not a character name. The unicodedata.name function
>>returns the official character name, so it MUST NOT return an alias
>>(which rules out your second alternative).
> <>Tor Iver Wilhemsen responds:

> <>Then why not return None or the empty string instead of raising an
> exception?

Now that we understand that a number of Unicode characters
do not have official names, the intended solution would seem
to be the use of the optional second argument to
unicodedata.name(unichr [, default]) 

"If no name is defined, default is returned."

As in the following script, which reads a UTF-8 file and prints
out the code point value and the name (if any) or "No Name".

import sys, codecs, unicodedata

fp = codecs.open(sys.argv[1], "r", "utf-8")

ustr = fp.read()
pos = 0

for char in ustr:
    print "%d %04x %s" % (pos, ord(char), unicodedata.name(char, "No Name"))
    pos += 1

More information about the Python-list mailing list