[Python-bugs-list] [ python-Bugs-556025 ] list(xrange(1e9)) --> seg fault

noreply@sourceforge.net noreply@sourceforge.net
Tue, 13 Aug 2002 04:43:58 -0700


Bugs item #556025, was opened at 2002-05-14 08:41
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=556025&group_id=5470

Category: Python Interpreter Core
Group: None
>Status: Closed
Resolution: Accepted
Priority: 5
Submitted By: Raymond Hettinger (rhettinger)
Assigned to: Jason Tishler (jlt63)
Summary: list(xrange(1e9))  -->  seg fault

Initial Comment:
>From c.lang.py:

'''
 Python 2.2.1 (#2, Apr 21 2002, 22:22:55) 
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.62mdk)] 
on linux2
Type "help", "copyright", "credits" or "license" for 
more information.
>>> list(xrange(1e9))
Segmentation fault
'''

I've reproduced the same fault on Windows ME using 
Py2.2.0 and Py2.3a.

----------------------------------------------------------------------

>Comment By: Jason Tishler (jlt63)
Date: 2002-08-13 03:43

Message:
Logged In: YES 
user_id=86216

Committed as Lib/test/test_b1.py 1.51.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-08-12 05:38

Message:
Logged In: YES 
user_id=33168

You should probably add that it fails due to a bug in newlib
and not python.  Go ahead and check it in and close the bug
report.  Thanks.

----------------------------------------------------------------------

Comment By: Jason Tishler (jlt63)
Date: 2002-08-12 03:26

Message:
Logged In: YES 
user_id=86216

rhettinger wrote:
> My thought is to close the bug, but add a
> unittest that says in effect:  if os is cygwin
> and version(cygwin) >= 1.3.13 and the bug still
> exists, then fail with a message saying that SF
> 556025 was not successfully resolved.

Do we really want to add cruft that is not only
Cygwin specific but Cygwin version specific?

nnorwitz wrote:
> I'm not sure if the test should fail, be
> skipped, or not run for cygwin < 1.3.13.

Agreed.

> But let's at least put a comment in the test and
> close this bug report.  Jason, can you do that?

Yes, but I only have pre-approved commit access to
the CVS repository. Can you approve the attached
patch?

> Good persistence Jason!

Thanks.


----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-08-10 06:27

Message:
Logged In: YES 
user_id=33168

I'm not sure if the test should fail, be skipped, or not run
for cygwin < 1.3.13.  But let's at least put a comment in
the test and close this bug report.  Jason, can you do that?

Good persistence Jason!

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2002-08-09 19:08

Message:
Logged In: YES 
user_id=80475

My thought is to close the bug, but add a unittest that 
says in effect:  if os is cygwin and version(cygwin) >= 
1.3.13 and the bug still exists, then fail with a message 
saying that SF 556025 was not successfully resolved.

This way, we can close the bug (since it is not a python 
bug) and still get a regression test to raise the concern if 
the expected solution either doesn't materialize or 
sometime later dematerializes.


----------------------------------------------------------------------

Comment By: Jason Tishler (jlt63)
Date: 2002-08-09 17:02

Message:
Logged In: YES 
user_id=86216

I guess that my perseverance paid off:

http://sources.redhat.com/ml/newlib/2002/msg00391.html

Hence, this bug will be fixed in Cygwin 1.3.13.

Can we close this bug now? Or, should we wait until
Cygwin 1.3.13 is released?

----------------------------------------------------------------------

Comment By: Jason Tishler (jlt63)
Date: 2002-08-09 10:09

Message:
Logged In: YES 
user_id=86216

Thanks for the sympathy.

I've tried, tried again:

http://sources.redhat.com/ml/newlib/2002/msg00390.html

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-08-08 16:56

Message:
Logged In: YES 
user_id=33168

I looked at the links.  I don't know what I can do to help.
 It seems like you proposed a reasonable solution and even
if it wasn't perfect, you still demonstrated a problem.  I
suppose I can only commiserate with you.

----------------------------------------------------------------------

Comment By: Jason Tishler (jlt63)
Date: 2002-08-08 04:11

Message:
Logged In: YES 
user_id=86216

> Jason, can you test/replicate this?

Yes, I've already been working on this one. See
the following mailing list threads for the details:

http://cygwin.com/ml/cygwin-developers/2002-07/msg00124.html
http://sources.redhat.com/ml/newlib/2002/msg00369.html

To summarize the above, the problem is actually
in newlib which provides Cygwin's libc (and
libm).  Unfortunately, Chris Falyor (the Cygwin
lead developer) has not been able to convince
the newlib maintainer to fix this problem.
Additionally, my first patch has been rejected.

I will continue my efforts to get this problem
resolved.  Any assistance will be greatly
appreciated.  I never expected to become an
expert in Doug Lea's malloc routines.  Sigh...

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-08-07 12:11

Message:
Logged In: YES 
user_id=33168

Actually, I think Jason may be more appropriate, since this
is a cygwin problem.  Jason, can you test/replicate this?

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-08-07 10:20

Message:
Logged In: YES 
user_id=33168

Hmmm, Tim can you reproduce this?  I luckily don't have a
windows box. :-)

----------------------------------------------------------------------

Comment By: Steve Holden (holdenweb)
Date: 2002-08-07 09:28

Message:
Logged In: YES 
user_id=88157

I hope re-opening this is the right thing to do (I'm new here).

Current CVS fails under Win2000+Cygwin with a 
segmentation fault in the updated test_b1.py. Easily 
reproduced:

$ ./python.exe
Python 2.3a0 (#1, Aug  7 2002, 12:18:38)
[GCC 2.95.3-5 (cygwin special)] on cygwin
Type "help", "copyright", "credits" or "license" for more 
information.
>>> import sys
>>> list(xrange(sys.maxint/4))
Segmentation fault (core dumped)

This does seem to be size-related, as:

>>> s = sys.maxint/8
>>> list(xrange(s))
Traceback (most recent call last):
  File "<stdin>", line 1, in ?
MemoryError

which is as expected in test_b1.py


----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-05-22 15:20

Message:
Logged In: YES 
user_id=33168

Checked in as:
  listobject.c 2.106
  test_b1.py 1.46

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2002-05-21 04:59

Message:
Logged In: YES 
user_id=80475

Good plan!  Thx for squashing this bug. 

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-05-21 04:52

Message:
Logged In: YES 
user_id=33168

How about using sys.maxint / 4?  Does that make more sense
than 1e9?  This assumption is a little better, that the data
and address sizes are the same.  I can add a comment to this
effect.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2002-05-20 19:37

Message:
Logged In: YES 
user_id=80475

When you load the fix, please commit the regression test 
patch also.  Thx,  Raymond

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-05-20 18:51

Message:
Logged In: YES 
user_id=31435

Yup, lookin' better.  Python does assume 8-bit bytes in 
several places, and also 2's-complement integers.  Since 
size_t is guaranteed (by C) to be an unsigned type, the 
largest value of type size_t is more easily expressed as

(~(size_t)0)

The C part of the patch looks fine then.  The test is a 
little dubious:  who says the machine can't create a 
billion-integer list?  The idea that 1e9 necessarily 
overflows in this context is a 32-bit address-space 
assumption.  But I'm willing to delay fixing that until a 
machine with a usable larger address space appears <wink>.

So marked Accepted and assigned to you for checkin.  Thanks!

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-05-20 07:43

Message:
Logged In: YES 
user_id=33168

Ok, there were other problems, too:
  * Need to divide by the size of the type, 
    not >> 4 which was completely broken.
  * There was a missing PyErr_NoMemory().

I uploaded a new patch.

I'm not sure the size_t fix is correct.
I hope we can at least assume 8-bit machines: :-)

if (_new_size <= ((1 << (sizeof(size_t)*8 - 1)) / sizeof(type)))



----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-05-19 20:48

Message:
Logged In: YES 
user_id=31435

The code patch has some problems:  you can't assume any 
relation between a size_t and an unsigned long; C simply 
doesn't define how big size_t is, and relative sizes do 
vary on 64-bit platforms.  However that gets fixed, if you 
decide it's "too big", var should be set to NULL (not 0 -- 
this is a "Guido thing" <wink>), and no exception should be 
set.  It's the caller's responsibility to check var for 
NULL after the macro is invoked, and set an appropriate 
exception.  listobject.c sometimes doesn't check the result 
for NULL, but that should only be when it knows it's 
*shrinking* a memory area, so that realloc can't fail.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2002-05-17 16:12

Message:
Logged In: YES 
user_id=80475

Added a patch to add this bug to the testbank.

Shallow code review:  Patch compiles okay (applied to 
Py2.3a0). Fixes the original problem.  Passes the smell 
test.  Macro style good (only the "var" operand is used 
more than once; no side-effects except setting "var" to 
zero upon a resize error).  Passes the standard regression 
tests.  Passes regression testing versus my personal (real 
code) testbank.

Will give it a deeper look this week-end.

One other thought: should the ValueError be replaced with a 
MemoryError to make the message consistent with PyList_New?



----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-05-17 13:03

Message:
Logged In: YES 
user_id=33168

Ok, this time I have a patch.  The patch only fixes listobject.

I looked over the other uses of PyMem_R{ESIZE,ALLOC}() and
they don't appear to be nearly as problematic as list.  For
example, if the grammar has 1e9 nodes, there are going to be
other problems well before then (ie, memory will probably be
gone).

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-05-17 12:25

Message:
Logged In: YES 
user_id=33168

Oops, sorry about that last comment.  That was something I
was playing with.  The CVS version is fine for [x]range(float).

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2002-05-17 12:22

Message:
Logged In: YES 
user_id=33168

Note, this problem is even more generic in CVS version:

>>> range(1.1)
Segmentation fault (core dumped)
>>> xrange(1.1)
Segmentation fault (core dumped)

[x]xrange(float) work fine in 2.2.1.

----------------------------------------------------------------------

Comment By: Raymond Hettinger (rhettinger)
Date: 2002-05-14 16:44

Message:
Logged In: YES 
user_id=80475

Would there be some merit to converting PyMem_RESIZE to a 
function with an overflow check?  

I would guess that the time spent on a realloc dwarfs the 
overhead of a short wrapper function.

OTOH, I'll bet this core dump only occurs in toy examples 
and never in real code.



----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-05-14 14:15

Message:
Logged In: YES 
user_id=31435

Heh.  This is an instance of a general flaw:  the 
PyMem_RESIZE macro doesn't check for int overflow in its

    (n) * sizeof(type)

subexpression. The basic deal is that 1000000000 fits in an 
int, but 4 times that (silently) overflows.  In more 
detail, for this specific test case, listobject.c'.s 
roundupsize rounds 1e9 up to 0x40000000, which silently 
underflows to 0!() when PyMem_RESIZE multiplies it by 4.

Hard to know how to fix this in general; PyMem_RESIZE 
callers don't generally worry about overflow now.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=556025&group_id=5470