[Patches] [ python-Patches-1454485 ] patch for SIGSEGV in arraymodule.c

Wed Mar 29 09:52:47 CEST 2006

Patches item #1454485, was opened at 2006-03-20 04:44
Message generated for change (Comment added) made by nnorwitz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1454485&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Core (C code)
Group: Python 2.4
Status: Open
Resolution: None
Priority: 7
Submitted By: Baris Metin (tbmetin)
Assigned to: Neal Norwitz (nnorwitz)
Summary: patch for SIGSEGV in arraymodule.c

Initial Comment:
Array module fails handling utf-8 strings giving a  
SIGSEGV. Attached patch seems to do the trick... 

gdb> run   
(no debugging symbols found)   
(no debugging symbols found)   
[Thread debugging using libthread_db enabled]   
[New Thread -1480337216 (LWP 31303)]   
Python 2.4.2 (#1, Mar 20 2006, 12:08:06)   
[GCC 3.4.5] on linux2   
Type "help", "copyright", "credits" or "license" for   
more information.   
>>> import array   
>>> x = array.array("u")   
>>> x.append(u"barÄ±ÅŸ")   
Traceback (most recent call last):   
  File "<stdin>", line 1, in ?   
TypeError: array item must be unicode character   
>>> x.append("barÄ±ÅŸ")   
>>> x   

Program received signal SIGSEGV, Segmentation fault.   
[Switching to Thread -1480337216 (LWP 31303)]   
Error while running hook_stop:   
Invalid type combination in ordering comparison.   
0xa7ee0799 in PyUnicodeUCS4_FromUnicode ()   
from /usr/lib/libpython2.4.so.1.0   

----------------------------------------------------------------------

>Comment By: Neal Norwitz (nnorwitz)
Date: 2006-03-28 23:52

Message:
Logged In: YES 
user_id=33168

Attached is an updated patch to only do the (unsigned) cast
in unicodeobject.c.  The test included in the patch still
crashes  the interpreter, this time in unicodectype.c.

----------------------------------------------------------------------

Comment By: Martin v. LÃ¶wis (loewis)
Date: 2006-03-26 13:36

Message:
Logged In: YES 
user_id=21627

The second part of the patch (checking that *u is not
negative) is definitely right.

The first part (requiring an even number of bytes for a u#
argument) probably requires discussion on python-dev (or
this patch should be assigned to MAL): I don't think it
should be allowed to pass a non-Unicode object to u# in the
first place.

In particular, if you pass a byte string, there would be an
implicit assumption that the byte encoding is the same
internal representation as a Py_UNICODE. This is bad -
Python normally assumes the encoding of a string is the
system encoding, which normally is ASCII.

Of course, changing the call to a type error for 2.4.3
probably won't work, either, because it might break existing
code.

Anyway, I believe the latter fix alone should fix the crash:
the current getargs implementation will round down to the
next multiple of sizeof(Py_UNICODE), thanks the integer
division. u_setitem will then refuse the call if the length
is not 1. IOW, it is possible to append between 4 and 7
bytes to a Unicode array.

I wonder why the patch fixes the problem: *u should be an
unsigned, and comparing an unsigned with a signed should
convert the signed to unsigned, no?

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-03-24 23:29

Message:
Logged In: YES 
user_id=33168

Verified for 2.4 and head.  The probably could exist w/ucs2
also if you use 'bar' (I think).

I agree with Nick, this patch doesn't really solve the
problem.  The attached patch fixes the crash more generally,
but I'm think there is a better solution.  I hope Martin has
time to review this and suggest a better fix.

Martin, the change in getargs ensures that we don't try to
convert an 8-bit string of length 5 to unicode.  The change
in unicodeobject ensures that we don't reference the array
with a negative offset as happens if the buffer conversion
succeeds with an invalid unicode character.

----------------------------------------------------------------------

Comment By: Nick Coghlan (ncoghlan)
Date: 2006-03-24 06:43

Message:
Logged In: YES 
user_id=1038590

To get the effect of the patch, it should be sufficient to
just change the format character to an uppercase U.

That doesn't seem like the right fix though - the actual
explosion isn't happening until later when the array
elements are being converted to Unicode for output.

----------------------------------------------------------------------

Comment By: Baris Metin (tbmetin)
Date: 2006-03-24 01:19

Message:
Logged In: YES 
user_id=1045504

I'm able to reproduce the bug with 2.5a0 SVN (r43289).   

Please try with --enable-unicode=ucs4 

The patch is against svn too.. 

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2006-03-23 22:11

Message:
Logged In: YES 
user_id=33168

With the stock 2.4.2 on my linux box I was able to reproduce
this.  I couldn't reproduce with 2.4.3c1.  Can you verify
this is fixed in 2.4.3?

Sagol.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=1454485&group_id=5470