[Python-bugs-list] [ python-Bugs-625698 ] Errors with recursive objects

SourceForge.net noreply@sourceforge.net
Tue, 21 Jan 2003 19:09:52 -0800


Bugs item #625698, was opened at 2002-10-19 15:27
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=625698&group_id=5470

Category: Python Interpreter Core
Group: None
Status: Open
Resolution: Accepted
Priority: 5
Submitted By: Erik Andersén (erik_andersen)
Assigned to: Jeremy Hylton (jhylton)
Summary: Errors with recursive objects

Initial Comment:
List and dictionaries put Python into an infinite loop if they contain 
more than one reference to itself

>>> d = {}
>>> d[1] = 
d
>>> d == d  # OK Python can handle one 
recursion
1
>>> d[2] = d 
>>> d == d  # Crash with 
two

Lists are similar

>>> l=[]
>>> 
l.append(l)
>>> l==l  # OK
1
>>> l.append(l)
>>> 
l==l # Crash

>>> l>l  # Also crash

Tested with 
ActivePython 2.2.1 under Windows 98
and (the first part) 
Python 2.2.2 under Linux


----------------------------------------------------------------------

>Comment By: Tim Peters (tim_one)
Date: 2003-01-21 22:09

Message:
Logged In: YES 
user_id=31435

llit, you should study the current code before suggesting an 
algorithm.  The current code would be blazing fast if it were 
looking for recursion from the start too.  But the very 
process of building that dict with the ids of the objects as 
keys (which the current algorithm also does) costs more 
than most comparisons cost in real life.  That's why the 
current code doesn't even try to catch recursion until the 
comparison level exceeds 20, and that's where the sloth 
comes from.  Stick a gimmick like that in your algorithm, 
and you'll have similar speed problems.

Also note that id(a)  == id(b) doesn't imply a == b (see 
the other note about NaN for an example).

----------------------------------------------------------------------

Comment By: Till Plewe (llit)
Date: 2003-01-21 01:01

Message:
Logged In: YES 
user_id=540660

How about checking for equality by trying to build a
bisimulation
which "proves" equality. 

Time complexity is O(nodes(a) x nodes(b))  

SAMPLE PYTHON CODE
===============================
#! /usr/bin/env python

from types import *
SeqType=[ListType,DictType,TupleType]

def isequal(a,b):
    if id(a)==id(b): return True
    compare={(id(a),id(b)):(a,b)}
    bisimilar={}
    
    while compare:
        (ida,idb),(oba,obb)=compare.popitem()
        if ida==idb:
            continue
        elif type(oba)==type(obb):
            if type(oba) not in SeqType:
                if oba!=obb:
                    return False
                else:
                    continue
            elif len(oba)!=len(obb):
                return False
            elif (ida,idb) not in bisimilar:
                bisimilar[(ida,idb)]=1
                if type(oba)==DictType:
                    for x in oba: #oba and obb have the same
number of keys
                        if x not in obb:
                            return False
                        else:
                           
compare[(id(oba[x]),id(obb[x]))]=(oba[x],obb[x])
                else:
                    for x,y in zip(oba,obb):
                        compare[(id(x),id(y))]=(x,y)
        else:
            return False
    return bisimilar


----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-01-20 15:44

Message:
Logged In: YES 
user_id=31435

Cool.  He's right that they can take a horrendously long time.  
The 2-element list example takes c*2**20 units of time, for 
some suitable constant c, and if it were extended to 3 
elements, it would take c*3**20 units of time, thousands of 
times longer.

As Erik's latest example shows, the outcome isn't always 
particularly well defined either.  An alternative to speeding this 
silliness is to raise an exception instead when recursive 
objects are detected.  There was some hack value in doing 
the graph isomorphism bit, but no real practical value I can 
see.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2003-01-20 15:20

Message:
Logged In: YES 
user_id=33168

Sorry Tim, I was going to reply to one of your first msgs
and forgot.  Erik sent me mail which said there wasn't a
crash, just that "the statements take horrendously long time
to execute."

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-01-20 15:09

Message:
Logged In: YES 
user_id=31435

BTW, Erik, what did you mean by "crash"?  Nobody else has 
seen a crash here.  You started the report by saying "infinite 
loop", which isn't the same thing as a crash.  There isn't an 
infinite loop here either, although the algorithm can take 
astronomical amounts of time to finish (so may appear to be 
in an infinite loop).

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-01-20 14:50

Message:
Logged In: YES 
user_id=31435

Equailty for self-consistent recursive objects does a graph 
isomorphism computation.

In your self-inconsistent example, it so happens that False is 
returned, although True would be returned if the compile-time 
constant (in object.c) NESTING_LIMIT were odd instead of 
even.

But if you worry about this, you need to get out more <wink>.

----------------------------------------------------------------------

Comment By: Erik Andersén (erik_andersen)
Date: 2003-01-20 14:24

Message:
Logged In: YES 
user_id=364358

Excuse me for interupting the efficiency discussion, but what is the 
definition of equality for recursive containers.  I am particularly worrided 
about cases like 

>>> class A:
... 	def __eq__(self,other): 
return not self.u==other.u
... 	
>>> a=A()
>>> l=[a]
>>> 
a.u=l
>>> l==l # ????

If l==l then a!=a and l[0]!=l[0], so 
l!=l.
If l!=l then a==a and l[0]==l[0], so l==l !!!!!!!

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-01-20 11:12

Message:
Logged In: YES 
user_id=31435

I deleted my patch -- it didn't make sense.

Martin, AFAIK the docs say nothing about the relationship (if 
any) between "is" and "==".  It was deliberate intent that "is" 
not imply "==" for rich comparisons, though, in part so that 
IEEE-754 rules for NaN could be implemented by users.

>>> class NaN:
...     def __eq__(self, other): return False
...
>>> n = NaN()
>>> n is n
True
>>> n == n
False
>>>

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-01-20 03:20

Message:
Logged In: YES 
user_id=21627

Where does the documentation say that "is" may not imply ==?
The check for recursion assumes this implication, anyway: if
we meet the very same object, we infer that they are equal.

What is the purpose of clear_inprogress_dict in your patch?

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2003-01-19 23:28

Message:
Logged In: YES 
user_id=31435

The patch assumes "is" implies "==", which isn't necessarily 
so for rich comparisons.

The attached patch cuts the time about in half, via:

1. Getting rid of the per-thread inprogress dicts, in favor of 
a shared inprogress dict.

2. Restoring the intent of the code that tuples be exempted 
(tuples can't be recursive).  Rev 2.65 of tupleobject.c gave 
tuples a tp_as_mapping slot, breaking the intent of the 
code.

3. Changing the inprogress dict to map a tuple to the 
smallest value of compare_nesting at which the tuple was 
seen.  delete_token() was changed to leave the token alone 
if the current compare_nesting value is greater than that.  
This allows tuples to stay in the dict longer.

4. Cutting NESTING_LIMIT from 20 to 19.  This has a huge 
effect, because the algorithm is still basically exponential-
time.

The problem remaining is that the inprogress dict is still 
useless (and empty) so long as compare_nesting is less 
than NESTING_LIMIT.  This can leave an exponential 
amount of work to be done to fight back up to 
NESTING_LIMIT.

Leaving stuff in inprogress longer, and consulting inprogress 
regardless of the value of compare_nesting (provided that's 
gone over NESTING_LIMIT at least once) slashes the time 
to triviality.  However, it isn't safe:  we can't know that the 
addresses in the dict keys refer to the same objects then.  
We could if started storing into inprogress from the very 
start, but that would be a big speed hit for usual-case 
comparisons.  I'd be happy to knock NESTING_LIMIT down 
more, though.

Assigned to Jeremy, as IIRC this was his code originally.

----------------------------------------------------------------------

Comment By: Martin v. Löwis (loewis)
Date: 2003-01-19 12:30

Message:
Logged In: YES 
user_id=21627

The patch is fine, except that a comment needs to be added
explaining what this does and why it does that.

It should be understood that this can't fix the very similar
case

a = []
a.append(a)
a.append(a)

b = []
b.append(b)
b.append(b)

a==b

This would still take a long time, right?

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2003-01-19 11:05

Message:
Logged In: YES 
user_id=33168

Attached is a patch which fixes the problem for me.

----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2003-01-07 19:59

Message:
Logged In: YES 
user_id=33168

Erik, do you still have this problem with 2.2.2?  I can't
reproduce your problem under Linux.  mly can't reproduce the
problems with 2.2.1 on Windows (2000).  Do you only have the
problem on Windows?  

----------------------------------------------------------------------

Comment By: Magnus Lyckå (mly)
Date: 2002-10-26 19:12

Message:
Logged In: YES 
user_id=95217

I tried the above under 

Python 2.2.1 (#34, Sep 27 2002, 18:37:42) [MSC 32 bit
(Intel)] on win32
Type "help", "copyright", "credits" or "license" for more
information.
(ActiveState on Win 2000)

and 

Python 2.1.1 (#1, Aug 20 2001, 20:23:29)
[GCC 2.96 20000731 (Mandrake Linux 8.1 2.96-0.60mdk)] on
linux-i386

In both cases it works correctly. No crash.
It takes a lot of time though, Several seconds
on Duron 700 w/ Win2k, and tens of seconds 
on K6-233 w/ Linux.



----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=625698&group_id=5470