[Patches] [ python-Patches-626485 ] Support Unicode normalization

Sat, 23 Nov 2002 14:08:53 -0800

Patches item #626485, was opened at 2002-10-21 21:02
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=626485&group_id=5470

Category: Core (C code)
Group: None
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Martin v. Löwis (loewis)
Assigned to: Martin v. Löwis (loewis)
Summary: Support Unicode normalization

Initial Comment:
This patch adds support for the normalization forms
NFC, NFKC, NFD, NFKD. It passes the
NormalizationTest-3.2.0.txt tests.

----------------------------------------------------------------------

>Comment By: Martin v. Löwis (loewis)
Date: 2002-11-23 23:08

Message:
Logged In: YES 
user_id=21627

Thanks! Committed as

libunicodedata.tex 1.4
test_normalization.py 1.1
NEWS 1.541
unicodedata.c 2.24
unicodedata_db.h 1.7
makeunicodedata.py 1.15

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-11-23 22:50

Message:
Logged In: YES 
user_id=38388

Looks good (I don't have time to review the patch
in detail, though). Please check it in.

Thanks.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-11-23 16:19

Message:
Logged In: YES 
user_id=21627

This version changes the indentation to 4 spaces. Are any
further changes needed?

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2002-10-25 17:03

Message:
Logged In: YES 
user_id=21627

This patches addresses your issues in the following way:

- single API: done.

- add _getrecord_ex: done. Rename to getunicoderecord:
  since this is a static function in unicodedata.c, this
renaming 
  would not add that much information, so not done.

- #ifdef Py_UNICODE_WIDE. I could not spot any place where
this is necessary.

- Drop -Latest: done.

- adjust skip message: done.

- reformat to 4 spaces: not done, I think PEP 7 should be
followed.

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-10-23 12:36

Message:
Logged In: YES 
user_id=38388

One more minor nit: the indentation in the C file is 4
chars, please reindent your code accordingly

----------------------------------------------------------------------

Comment By: M.-A. Lemburg (lemburg)
Date: 2002-10-23 12:27

Message:
Logged In: YES 
user_id=38388

The patch looks Ok except for a few nits:

* I'd rather like a single API normalize(form) which takes
  the form as string argument instead of NFKD, etc.

* __getrecord should be renamed to _getrecord_ex;
  perhaps both should use a different name altogether,
  e.g. getunicoderecord 

* I think you have to add some #ifdef Py_UNICODE_WIDE
  in the code to avoid compiler warnings for narrow builds
  about non-const if expressions being always true due to 
  size limits.

* The filenames you are using should not include the '-Latest'
  suffix. If you download the files from unicode.org via FTP
  they don't have this extension.

* The skip test message should include a reference of where to
  get the test file from, ie.
ftp://ftp.unicode.org/Public/UNIDATA/NormalizationTest.txt

Thanks for working on this !

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=626485&group_id=5470