unicode name for \u000a

Ken Beesley ken.beesley at xrce.xerox.com
Sun Aug 22 17:37:33 CEST 2004


>
>
>"Martin v. Löwis" <martin at v.loewis.de> writes:
>
>  
>
>>No. <control> is not a character name. The unicodedata.name function
>>returns the official character name, so it MUST NOT return an alias
>>(which rules out your second alternative).
>>    
>>
> <>Tor Iver Wilhemsen responds:


> <>Then why not return None or the empty string instead of raising an
> exception?

Now that we understand that a number of Unicode characters
do not have official names, the intended solution would seem
to be the use of the optional second argument to
unicodedata.name(unichr [, default]) 

"If no name is defined, default is returned."

As in the following script, which reads a UTF-8 file and prints
out the code point value and the name (if any) or "No Name".


import sys, codecs, unicodedata

fp = codecs.open(sys.argv[1], "r", "utf-8")

ustr = fp.read()
pos = 0

for char in ustr:
    print "%d %04x %s" % (pos, ord(char), unicodedata.name(char, "No Name"))
    pos += 1



More information about the Python-list mailing list