[Python-bugs-list] [Bug #117524] The unicodedata db does not know about the special ranges.

Thu, 2 Nov 2000 14:57:32 -0800

Bug #117524, was updated on 2000-Oct-23 13:06
Here is a current snapshot of the bug.

Project: Python
Category: Core
Status: Open
Resolution: None
Bug Group: None
Priority: 6
Summary: The unicodedata db does not know about the special ranges.

Details: There seems to a bug in the unicodedata. The special ranges described under "Field Formats" are not handled correctly.

   ftp://ftp.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.html

The two characters should have identical properties:

Python 2.0 (#8, Oct 16 2000, 17:27:58) [MSC 32 bit (Intel)] on win32
Type "copyright", "credits" or "license" for more information.
>>> import unicodedata
>>> unicodedata.category(u"\u3400")
'Lo'
>>> unicodedata.category(u"\u3401")
'Cn'
>>> u"\u3400".isalpha()
1
>>> u"\u3401".isalpha()
0
>>>

Follow-Ups:

Date: 2000-Oct-26 09:02
By: lemburg

Comment:
Looks like we'll have to add a few special if()s to the various APIs.

I hadn't realized that the Unicode database defines properties for
char points which do not appear in the listed UnicodeData.txt file...
one little sentence in the webpage you mention makes the difference - sigh.

-------------------------------------------------------

Date: 2000-Nov-02 14:57
By: none

Comment:
Forget about the if statements -- the compression
method used for the unicode tables doesn't really
care if you have lots of undefined character entries,
or lots of character entries all having the same data.

Or to put it another way, I just added 38642 new chars
to the database, and it's still exactly the same size
as in 2.0.

Patch coming later.

</F>
-------------------------------------------------------

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=117524&group_id=5470