unicode name for \u000a
Ken Beesley
ken.beesley at xrce.xerox.com
Sun Aug 22 11:37:33 EDT 2004
>
>
>"Martin v. Löwis" <martin at v.loewis.de> writes:
>
>
>
>>No. <control> is not a character name. The unicodedata.name function
>>returns the official character name, so it MUST NOT return an alias
>>(which rules out your second alternative).
>>
>>
> <>Tor Iver Wilhemsen responds:
> <>Then why not return None or the empty string instead of raising an
> exception?
Now that we understand that a number of Unicode characters
do not have official names, the intended solution would seem
to be the use of the optional second argument to
unicodedata.name(unichr [, default])
"If no name is defined, default is returned."
As in the following script, which reads a UTF-8 file and prints
out the code point value and the name (if any) or "No Name".
import sys, codecs, unicodedata
fp = codecs.open(sys.argv[1], "r", "utf-8")
ustr = fp.read()
pos = 0
for char in ustr:
print "%d %04x %s" % (pos, ord(char), unicodedata.name(char, "No Name"))
pos += 1
More information about the Python-list
mailing list