[Python-bugs-list] [ python-Bugs-705231 ] Assertion failed, python aborts

Sat, 24 May 2003 16:04:58 -0700

Bugs item #705231, was opened at 2003-03-17 21:49
Message generated for change (Comment added) made by anze
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=705231&group_id=5470

Category: Python Interpreter Core
Group: Python 2.2.2
>Status: Closed
Resolution: None
Priority: 5
Submitted By: Anze Slosar (anze)
Assigned to: Nobody/Anonymous (nobody)
Summary: Assertion  failed, python aborts

Initial Comment:
This bug is reproducible with python 2.2.1 althogh it
fails only occasionally as the flow depends on random
numbers. 
It aborts by saying:

python: Objects/floatobject.c:582: float_pow: Assertion
`(*__errno_location ()) == 34' failed.
Aborted

I tried python 2.2.2 but as I try to install rpms I run
into every growing list of dependencies. I couldn't
reproduce the exact cause of the bug, but it is caused
by the following simple code (trying to invent
expressions for numbers using genetic algorithm (the
code itself is buggy in the Kill method, but I have
trouble debugging it because python crashes).

makeeq.py:

#!/usr/bin/env python
# Make equations using rpn and genetic algorithms

from random import *
from math import *
import rpn

def RanPosP(list):
    return int(uniform(0,len(list))+1)

def RanPos(list):
    return int(uniform(0,len(list)))

def AddUnary(list):
    a1=RanPosP(list)
    a2=RanPos(Unary)
    list=list[:a1]+[Unary[a2]]+list[a1:]
    return list

def AddBinary(list):
    a1=RanPosP(list)
    a2=RanPos(Binary)
    num=int(uniform(0,10))
    #print "Add binary:",list,num,rpn.Binary()[a2]
    list=list[:a1]+[num]+[Binary[a2]]+list[a1:]
    #print 'Add binary:',list
    return list

class RPNGen:
    def __init__(self,target):
        self.pool=[[1]]
        self.rpn=[1.0]
        self.target=target

    def GetRPN(self):
        self.rpn=map(rpn.SolveRPN,self.pool)

    def Grow(self,N):
        for x in range(N):
            ihave=[]
            while rpn.SolveRPN(ihave)==None:
                ml=len(self.pool)
                #print self.pool
                ii=int(uniform(0,ml))
                action=int(uniform(0,4))
                #print action
                if action==0:
                   ihave=(AddUnary(self.pool[ii]))

                elif action==1:
                    ihave=(AddBinary(self.pool[ii]))

                elif action==2:
                    jj=int(uniform(0,len(self.pool)))
                    bit=self.pool[jj]
                    a1=int(uniform(0,len(bit)))
                    a2=int(uniform(0,len(bit)))
                    if a2>a1:
                        bit=bit[a1:a2]
                    else:
                        bit=bit[a2:a1]
                    a3=int(uniform(0,len(self.pool[ii])))

ihave=(self.pool[ii][:a3]+bit+self.pool[ii][a3:])

                elif action==3:
                    bit=self.pool[ii]
                    a1=int(uniform(0,len(bit)))
                    a2=int(uniform(0,len(bit)))

ihave=(self.pool[ii][:a1]+self.pool[ii][a2:])
            self.pool.append(ihave)
            self.rpn.append(rpn.SolveRPN(ihave))

        #print self.pool,self.rpn
        deletelist=[]
        for cc in range(len(self.pool)):
            if self.rpn[cc]==None:
                deletelist.append(cc)

        while len(deletelist)>0:
            cc=deletelist.pop()
            self.rpn.pop(cc)
            self.pool.pop(cc)

    def Kill(self,N):
        TODO=N
        print "TODO:",TODO
        difs=map(lambda
x,y:abs(x-self.target)-len(self.pool)/10.0,self.rpn,self.pool)
        dict={}
        for x in range(N):
            dict[difs[x]]=x
            mn=min(dict.keys())
        for x in range(N+1,len(difs)):
            print 'dict:',dict
            if difs[x]>mn:
                del dict[mn]
                dict[difs[x]]=x
                mn=min(dict.keys())
        list=dict.values()
        list.sort()
        TODO-=len(list)

        for cc in range(len(list)):
            dd=list.pop()
            #print "asd", dd,
            self.rpn.pop(dd)
            self.pool.pop(dd)

Test=RPNGen(137.03599976)    
Binary=rpn.Binary()
Unary=rpn.Unary()

for i in range(100):
    Test.Grow(100)
    #print len(Test.pool)

for i in range(100):
    Test.Grow(100)
    Test.Kill(100)
    print len(Test.pool)    

for i in range(99):
    Test.Kill(200)
    Test.Grow(100)
    print len(Test.pool)    

for i in range(99):
    Test.Kill(1)
    print len(Test.pool),Test.rpn

    #print len(Test.pool),Test.pool, Test.rpn

print Test.pool
print Test.rpn

-----------------------------------------------
rpn.py:

#module for rpn
from math import *

def Unary():
    return ['sin','cos','tan','asin','acos','atan','neg']

def Binary():
    return ['+','-','*','/','^']

def SolveRPN(rpnl):
    stack=[]
    for each in rpnl:
        try:
            num=float(each)
            stack.append(num)
        except:
            try:
                #must be an operator then.
                if each=='+':
                    stack.append(stack.pop()+stack.pop())
                elif each=='-':
                    a1=stack.pop()
                    a2=stack.pop()
                    stack.append(a2-a1)
                elif each=='*':
                    stack.append(stack.pop()*stack.pop())
                elif each=='/':
                    a1=stack.pop()
                    a2=stack.pop()
                    stack.append(a2/a1)
                elif each=='^':
                    a1=stack.pop()
                    a2=stack.pop()
                    stack.append(a2**a1)
                elif each=='cos':
                    stack[-1]=cos(stack[-1])
                elif each=='sin':
                    stack[-1]=sin(stack[-1])
                elif each=='tan':
                    stack[-1]=tan(stack[-1])
                elif each=='acos':
                    stack[-1]=acos(stack[-1])
                elif each=='asin':
                    stack[-1]=asin(stack[-1])
                elif each=='atan':
                    stack[-1]=atan(stack[-1])
                elif each=='neg':
                    stack[-1]=-1.0*stack[-1]
                else:
                    print "Unknown operation",each
            except:
                return None
    if len(stack)<>1:
        #print "Stack ended non-empty:",stack
        return None
    return stack[0]

----------------------------------------------------------------------

>Comment By: Anze Slosar (anze)
Date: 2003-05-24 23:04

Message:
Logged In: YES 
user_id=447507

>to produce the platform result.  If someone can test this on 
>a failing box (test_pow.py should cover it now), please 
>close this bug.

I pulled the cvs version of python and it seems to work now
(see below). So, I will close this report. However, I think
that the exponentiation by repeated multiplication sounds
like quite a screw -up so maybe someone should report this
to glibc people...

Python 2.3b1+ (#1, May 24 2003, 23:56:20) 
[GCC 3.2 20020903 (Red Hat Linux 8.0 3.2-7)] on linux2
Type "help", "copyright", "credits" or "license" for more
information.
>>> a=1.3213112244281147e+252
>>> b=-1.0
>>> b**a
1.0
>>> import math
>>> b**a
1.0
>>> 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-05-24 20:48

Message:
Logged In: YES 
user_id=31435

I'm also unclear on what the assembler is doing.  I'll raise 
another caution about it anyway:  the part after

/* First see whether `y' is a natural number.  In this case we
   can use a more precise algorithm.  */

in the loop between 6 and "jnz 6b" appears to be doing 
exponentiation by repeated multiplication.  Unless those 
repeated multiplies are being done in an extended 
precision (which they well may be -- but it depends on how 
the Pentium's precision-control flags are set), that's 
acutally less precise than a careful log+exp based 
approach.  The latter can guarantee strictly less than 1 ulp 
error in the final result; multiplication using the result's 
precision can introduce a new 0.5 ulp with each multiply.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-05-24 20:26

Message:
Logged In: YES 
user_id=31435

Wormed around in:

Lib/test/test_pow.py 1.19
Misc/NEWS 1.772
Objects/floatobject.c 2.123

__,builtin__.pow() shuuld produce the correct result 
instead of the platform result now.  math.pow() will continue 
to produce the platform result.  If someone can test this on 
a failing box (test_pow.py should cover it now), please 
close this bug.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-05-24 18:12

Message:
Logged In: YES 
user_id=31435

Hmm -- no attachment got attached.

For contrast, check out KC Ng's implementation from fdlibm:

http://www.netlib.org/fdlibm/e_pow.c

In English, the line

   if(iy>=0x43400000) yisint = 2; /* even integer y */

says "if the true exponent is >= 53, it's an even integer",  
Because the implied leading 1 bit in a 754 double is to the 
left of the radix point, and 52 bits follow it, that's exactly 
right:  any (finite) 754 double with a true exponent >= 53 is 
integral and even.  If the true exponet is in [0, 52], it may 
or may not be integral, and if integral may be even or odd.

This is still subtle, and I wish standards would spell it out 
more clearly.  I was working on KSR's libm at the time KC 
Ng was working on fdlibm & corresponded with him about 
this stuff, and was also active in NCEG (the Numerical C 
Extension Group) at the time.  The members of the 754 
committee seemed to believe that these kinds of things 
were obvious to the most casual observer, and continued 
754's tradition of (IMO) writing requirements in language 
very easy for non-754 weenies to misinterpret.

I expect the glibc authors read "integer value" here as if it 
had something to do with C's concrete integral types.  That 
wasn't the intent.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-05-24 17:32

Message:
Logged In: YES 
user_id=21627

I missed the point that the exponent is indeed integral.

Looking at the implementation, I see that it invokes
__ieee754_pow, and then computes errno. If __ieee754_pow
returns NaN, it sets EDOM.

If anybody is interested, I attach the implementation of
__ieee754_pow. I have difficulties following the code, but
it appears that the detection "exponent is a natural number"
uses long-long conversion. If that fails, the exponent is
believed to be non-integral.

I have submitted a glibc bug report.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-05-24 16:08

Message:
Logged In: YES 
user_id=31435

That's the relevant bit, yes.  The subtlety is that a *is* an 
integer:  any double with a very large exponent is an exact 
integer.  Python checks for this by seeing whether a == floor
(a); a semantically equivalent check would be to call modf
(a) and see whether the fractional part returned is exactly 
0.0, same-as whether the integer part returned is exactly a 
(which it is, for any fp # w/ a sufficiently large exponent -- 
unless this math library's implementation of modf is buggy 
too).

Consider the alternative (which is what *this* platform pow 
appears to do):  raising -1 to 1 works fine, to 1e1 works 
fine, to 1e2 works fine, to 1e3 works fine, to 1e4 works 
fine, to 1e5 works fine, ..., but at some senseless point it 
blows up with a domain error.  As n increases, when does 
10.0**n stop being "an integer"?  Of course it doesn't, at 
least not before n is so large that 10.0**n overflows by 
itself. Note IEEE 754 does not view doubles as "fuzzy 
approximations" -- it always takes them exactly at face 
value, and computes the best possible result based on 
that.

Python does this check explicitly to worm around other 
bugs in other libms, in order to raise ValueError when x < 0 
and y is not an integer, just as the standard says.  The only 
exception that should be possible in the cases Python 
passes on to the libm pow is overflow, and this platform pow
() is wrong in this case.  It's too hard to check for overflow 
a priori in a platform-independent way, and that's why we 
leave that part up to the platform pow().

Note that the OP is complaining about an assertion error, 
which means he's running a debug-build Python.  I think it's 
thoroughly appropriate to whine about platform bugs too in 
a debug build.  In a release build, the assert doesn't exist, 
and Python would raise a Python exception instead.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-05-24 11:55

Message:
Logged In: YES 
user_id=21627

That error value (33) is EDOM. It seems to me that this is
the correct result, according to C99 7.12.7.4p2:

 The  pow functions compute x raised to the power y.  A
       domain error occurs if x is finite and  negative  and
 y  is
       finite  and  not an integer value.

----------------------------------------------------------------------

Comment By: Anze Slosar (anze)
Date: 2003-05-23 22:48

Message:
Logged In: YES 
user_id=447507

Of course, it doesn't work, it says:

errno after: 33
result: nan

But my platform really isn't that special: it's the Redhat
8.0 which is a very common system! At least this should be
reported to glibc (?? not sure) people. Moreover, on redhat
7.3 the python 1.5 doesn't failt this test while the python
2.0 does.
It's a very standard setup (a.out is the c code you suggested)

[anze@as280 anze]$ ldd a.out 
        libm.so.6 => /lib/i686/libm.so.6 (0x4002d000)
        libc.so.6 => /lib/i686/libc.so.6 (0x42000000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
[anze@as280 anze]$ rpm -qf /lib/i686/libm.so.6
glibc-2.2.93-5
[anze@as280 anze]$ 

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-05-23 16:30

Message:
Logged In: YES 
user_id=31435

Please try this C program on your box:

"""
#include <math.h>
#include <stdio.h>
#include <errno.h>

int main() {
    double b = -1.0, a = 1.32e252;
    double c;

    errno = 0;
    c = pow(b, a);
    printf("errno after: %d\n", errno);
    printf("result: %g\n", c);

    return 0;
}
 """

It should display this:

"""
errno after: 0
result: 1
"""

If it doesn't display that, it's a bug in your platform math 
library, and should be reported to them.

If we don't get reports of many platforms with this libm bug, 
I'm not inclined to complicate Python to work around a library 
bug on just one platform.

----------------------------------------------------------------------

Comment By: Anze Slosar (anze)
Date: 2003-05-23 10:08

Message:
Logged In: YES 
user_id=447507

Here we go:

[anze@as280 numbers]$ python
Python 2.2.1 (#1, Aug 30 2002, 12:15:30) 
[GCC 3.2 20020822 (Red Hat Linux Rawhide 3.2-4)] on linux2
Type "help", "copyright", "credits" or "license" for more
information.
>>> a=1.3213112244281147e+252
>>> b=-1.0
>>> b**a
python: Objects/floatobject.c:582: float_pow: Assertion
`(*__errno_location ()) == 34' failed.
Aborted
[anze@as280 numbers]$ 

Hope this helps!

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2003-05-22 21:23

Message:
Logged In: YES 
user_id=33168

Anze, any update on this?  2.2.3 is almost ready to go out.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-03-21 17:20

Message:
Logged In: YES 
user_id=31435

OK, that's some progress.  We don't really have any use for a 
traceback -- it's clear where the code is dying.  The platform 
pow() is setting an unexpected errno value on a call to pow().  
What we need to know:

1. What were the inputs to pow()?
2. What is errno's value?  We know it's not 0 and we know it's 
not ERANGE.  I can't think of any other value that makes 
sense (so I'm asserting too <wink>).

Note that this must be triggered by your code line:

    stack.append(a2**a1) 

so you could just

    print repr(a2), repr(a1)

before that line, and the last output before the program dies 
must show the inputs the platform pow() is choking on.

----------------------------------------------------------------------

Comment By: Anze Slosar (anze)
Date: 2003-03-20 12:43

Message:
Logged In: YES 
user_id=447507

Crashes with python 2.2.2 as well, but seems to work under
Solaris.

Here's what gdb says:

(gdb) [anze@APPCH numbers]> gdb `which python2` core.1406
GNU gdb Red Hat Linux (5.2-2)
Copyright 2002 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public
License, and you are
welcome to change it and/or distribute copies of it under
certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show
warranty" for details.
This GDB was configured as "i386-redhat-linux"...
(no debugging symbols found)...
Core was generated by `python2 ./makeeq.py'.
Program terminated with signal 6, Aborted.
Reading symbols from /lib/libdl.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib/libdl.so.2
Reading symbols from /lib/i686/libpthread.so.0...
(no debugging symbols found)...done.
Loaded symbols for /lib/i686/libpthread.so.0
Reading symbols from /lib/libutil.so.1...(no debugging
symbols found)...done.
Loaded symbols for /lib/libutil.so.1
Reading symbols from /lib/i686/libm.so.6...(no debugging
symbols found)...done.
Loaded symbols for /lib/i686/libm.so.6
Reading symbols from /lib/i686/libc.so.6...(no debugging
symbols found)...done.
Loaded symbols for /lib/i686/libc.so.6
Reading symbols from /lib/ld-linux.so.2...(no debugging
symbols found)...done.
Loaded symbols for /lib/ld-linux.so.2
Reading symbols from
/usr/lib/python2.2/lib-dynload/structmodule.so...
(no debugging symbols found)...done.
Loaded symbols for
/usr/lib/python2.2/lib-dynload/structmodule.so
Reading symbols from
/usr/lib/python2.2/lib-dynload/_codecsmodule.so...
(no debugging symbols found)...done.
Loaded symbols for
/usr/lib/python2.2/lib-dynload/_codecsmodule.so
Reading symbols from
/usr/lib/python2.2/lib-dynload/mathmodule.so...
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/python2.2/lib-dynload/mathmodule.so
Reading symbols from
/usr/lib/python2.2/lib-dynload/timemodule.so...
(no debugging symbols found)...done.
Loaded symbols for /usr/lib/python2.2/lib-dynload/timemodule.so
#0  0x42029331 in kill () from /lib/i686/libc.so.6
(gdb) w
Ambiguous command "w": watch, whatis, where, while,
while-stepping, ws.
(gdb) whe
#0  0x42029331 in kill () from /lib/i686/libc.so.6
#1  0x40030bdb in raise () from /lib/i686/libpthread.so.0
#2  0x4202a8c2 in abort () from /lib/i686/libc.so.6
#3  0x42022ecb in __assert_fail () from /lib/i686/libc.so.6
#4  0x080befeb in float_pow ()
#5  0x080af00f in ternary_op ()
#6  0x080af6fc in PyNumber_Power ()
#7  0x08077dda in eval_frame ()
#8  0x0807b49c in PyEval_EvalCodeEx ()
#9  0x0807c4fe in fast_function ()
#10 0x0807a367 in eval_frame ()
#11 0x0807b49c in PyEval_EvalCodeEx ()
#12 0x0807c4fe in fast_function ()
#13 0x0807a367 in eval_frame ()
#14 0x0807b49c in PyEval_EvalCodeEx ()
#15 0x08077491 in PyEval_EvalCode ()
#16 0x080970a1 in run_node ()
#17 0x08096176 in PyRun_SimpleFileExFlags ()
#18 0x08095b9f in PyRun_AnyFileExFlags ()
#19 0x08053c42 in Py_Main ()
#20 0x08053393 in main ()
#21 0x42017589 in __libc_start_main () from /lib/i686/libc.so.6
(gdb) 

----------------------------------------------------------------------

Comment By: Anze Slosar (anze)
Date: 2003-03-20 11:14

Message:
Logged In: YES 
user_id=447507

Operating system is RedHat 8.0 with custom 2.4.20 kernel. I
did the following:

 [anze@as280 anze]$ ldd `which python`
        libdl.so.2 => /lib/libdl.so.2 (0x4002d000)
        libpthread.so.0 => /lib/i686/libpthread.so.0
(0x40031000)
        libutil.so.1 => /lib/libutil.so.1 (0x40061000)
        libm.so.6 => /lib/i686/libm.so.6 (0x40064000)
        libc.so.6 => /lib/i686/libc.so.6 (0x42000000)
        /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
[anze@as280 anze]$ rpm -qf /lib/i686/libm.so.6
glibc-2.2.93-5
[anze@as280 anze]$ 

So it seems to me that libm is from glibc-2.2.93-5. Compiler
is stock redhat gcc-3.2, but I haven't compiled anything
myself...

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-03-17 22:06

Message:
Logged In: YES 
user_id=31435

Which operating system and C compiler?  Since the assert() 
is checking the errno result from your platform libm's pow() 
function, the resolution of this is going to depend on which C 
library you're using.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=705231&group_id=5470