[Python-bugs-list] [ python-Bugs-834676 ] segfault in unicodedata module (hangul syllables)

SourceForge.net noreply at sourceforge.net
Mon Nov 3 05:09:30 EST 2003


Bugs item #834676, was opened at 2003-11-02 20:28
Message generated for change (Comment added) made by lemburg
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=834676&group_id=5470

Category: Unicode
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Matthias Klose (doko)
>Assigned to: Martin v. Löwis (loewis)
Summary: segfault in unicodedata module (hangul syllables)

Initial Comment:
[forwarded from http://bugs.debian.org/218697]

start an interactive python2.3 interpreter. run the
following command, twice if necessary: 
 
__import__('unicodedata').normalize('NFC',u'\ud55c\uae00') 
 
this reliably segfaults python2.3 on both i686 and
powerpc. 
 
although my testing has not been very extensive
(unicode is somewhat large and slightly complex,) so
far i have seen the crash only when processing
pre-composed hangul syllables. decomposing them into
combining jamos before calling unicodedata.normalize
seems to avoid the crash, and i've included a wrapper
that does just that in this report. 
 
unfortunately, this method is used internally by
encodings.idna, so this means processing some
internationalized korean domain names can likely crash
any python program with support for internationalized
domain names. 
 
please do let me know if you would like more details,
or if there's anything further i can do to help! 

workaround: 
 
as a workaround in my own python programs, i wrapped
unicodedata.normalize like this (see attachment).


----------------------------------------------------------------------

>Comment By: M.-A. Lemburg (lemburg)
Date: 2003-11-03 11:09

Message:
Logged In: YES 
user_id=38388

Martin wrote this part. Assigning to him.

----------------------------------------------------------------------

Comment By: Benjamin C. W. Sittler (bsittler)
Date: 2003-11-02 21:21

Message:
Logged In: YES 
user_id=645359

fyi, i ran into this while adding an encoding similar to
IDNA [i believe it's an IDNA superset], but capable of
handling free text -- see
http://xent.com/~bsittler/icb_ace.py -- and my very first
test data was the word 한글, written in precomposed form.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=834676&group_id=5470



More information about the Python-bugs-list mailing list