[Python-bugs-list] [ python-Bugs-834676 ] segfault in unicodedata module (hangul syllables)

SourceForge.net noreply at sourceforge.net
Sun Nov 2 14:28:43 EST 2003


Bugs item #834676, was opened at 2003-11-02 19:28
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=834676&group_id=5470

Category: Unicode
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Matthias Klose (doko)
Assigned to: M.-A. Lemburg (lemburg)
Summary: segfault in unicodedata module (hangul syllables)

Initial Comment:
[forwarded from http://bugs.debian.org/218697]

start an interactive python2.3 interpreter. run the
following command, twice if necessary: 
 
__import__('unicodedata').normalize('NFC',u'\ud55c\uae00') 
 
this reliably segfaults python2.3 on both i686 and
powerpc. 
 
although my testing has not been very extensive
(unicode is somewhat large and slightly complex,) so
far i have seen the crash only when processing
pre-composed hangul syllables. decomposing them into
combining jamos before calling unicodedata.normalize
seems to avoid the crash, and i've included a wrapper
that does just that in this report. 
 
unfortunately, this method is used internally by
encodings.idna, so this means processing some
internationalized korean domain names can likely crash
any python program with support for internationalized
domain names. 
 
please do let me know if you would like more details,
or if there's anything further i can do to help! 

workaround: 
 
as a workaround in my own python programs, i wrapped
unicodedata.normalize like this (see attachment).


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=834676&group_id=5470



More information about the Python-bugs-list mailing list