
While I'm at it, maybe the same recursion control logic could be used to remedy (most probably in PyObject_Compare) PR#7: "comparisons of recursive objects" reported by David Asher? -- Vladimir MARANGOZOV | Vladimir.Marangozov@inrialpes.fr http://sirac.inrialpes.fr/~marangoz | tel:(+33-4)76615277 fax:76615252

Vladimir Marangozov wrote:
While I'm at it, maybe the same recursion control logic could be used to remedy (most probably in PyObject_Compare) PR#7: "comparisons of recursive objects" reported by David Asher?
Hey, what a good idea. You know what's happening? We are moving towards tail recursion. If we do this everywhere, Python converges towards Stackless Python. and-most-probably-a-better-one-than-mince - ly y'rs - chris -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com

"CT" == Christian Tismer <tismer@tismer.com> writes:
CT> Vladimir Marangozov wrote:
While I'm at it, maybe the same recursion control logic could be used to remedy (most probably in PyObject_Compare) PR#7: "comparisons of recursive objects" reported by David Asher?
CT> Hey, what a good idea. CT> You know what's happening? We are moving towards tail recursion. CT> If we do this everywhere, Python converges towards Stackless CT> Python. It doesn't seem like tail-recursion is the issue, rather we need to define some rules about when to end the recursion. If I understand what is being suggest, it is to create a worklist of subobjects to compare instead of making recursive calls to compare. This change would turn the core dump into an infinite loop; I guess that's an improvement, but not much of one. I have tried to come up with a solution in the same style as the repr solution. repr maintains a list of objects currently being repred. If it encounters a recursive request to repr the same object, it just prints "...". (There are better solutions, but this one is fairly simple.) I always get hung up on a cmp that works this way because at some point you discover a recursive cmp of two objects and you need to decide what to do. You can't just print "..." :-). So the real problem is defining some reasonable semantics for comparison of recursive objects. I checked what Scheme and Common Lisp, thinking that these languages must have dealt with the issue before. The answer, at least in Scheme, is infinite loop. R5RS notes: "'Equal?' may fail to terminate if its arguments are circular data structures. " http://www-swiss.ai.mit.edu/~jaffer/r5rs_8.html#SEC49 For eq? and eqv?, the answer is #f. The issue was also discussed in some detail by the ANSI commitee X3J13. A summary of the discussion is at here: http://www.xanalys.com/software_tools/reference/HyperSpec/Issues/iss143-writ... The result was to "Clarify that EQUAL and EQUALP do not descend any structures or data types other than the ones explicitly specified here:" [both descend for cons, bit-vectors, and strings; equalp has some special rules for hashtables and arrays] I believe this means that Common Lisp behaves the same way that Scheme does: comparison of circular data structures does not terminate. I don't think an infinite loop is any better than a core dump. At least with the core dump, you can inspect the core file and figure out what went wrong. In the infinite loop case, you'd wonder for a while why your program doesn't terminate, then kill it and inspect the core file anway :-). I think the comparison ought to return false or raise a ValueError. I'm not sure which is right. It seems odd to me that comparing two builtin lists could ever raise an exception, but it may be more Pythonic to raise an exception in the face of ambiguity. As the X3J13 committee noted: Object equality is not a concept for which there is a uniquely determined correct algorithm. The appropriateness of an equality predicate can be judged only in the context of the needs of some particular program. So, in the end, I propose ValueError. Jeremy

Jeremy Hylton wrote:
"CT" == Christian Tismer <tismer@tismer.com> writes:
CT> Vladimir Marangozov wrote:
While I'm at it, maybe the same recursion control logic could be used to remedy (most probably in PyObject_Compare) PR#7: "comparisons of recursive objects" reported by David Asher?
CT> Hey, what a good idea.
CT> You know what's happening? We are moving towards tail recursion. CT> If we do this everywhere, Python converges towards Stackless CT> Python.
It doesn't seem like tail-recursion is the issue, rather we need to define some rules about when to end the recursion. If I understand what is being suggest, it is to create a worklist of subobjects to compare instead of making recursive calls to compare. This change would turn the core dump into an infinite loop; I guess that's an improvement, but not much of one.
Well, I actually didn't read PR#7 before replying. Thought it was about comparing deeply nested structures. What about this? For one, we do an improved comparison, which is of course towards tail recursion, since we push part of the work after the "return". Second, we can guess the number of actually existing objects, and limit the number of comparisons by this. If we need more comparisons than we have objects, then we raise an exception. Might still take some time, but a bit less than infinite. ciao - chris (sub-cantor-set-minded) -- Christian Tismer :^) <mailto:tismer@appliedbiometrics.com> Applied Biometrics GmbH : Have a break! Take a ride on Python's Kaunstr. 26 : *Starship* http://starship.python.net 14163 Berlin : PGP key -> http://wwwkeys.pgp.net PGP Fingerprint E182 71C7 1A9D 66E9 9D15 D3CC D4D7 93E2 1FAE F6DF where do you want to jump today? http://www.stackless.com

Just after I sent the previous message, I realized that the "trashcan" approach is needed in addition to some application-specific logic for what to do when recursive traversals of objects occur. This is true for repr and for a compare that fixes PR#7. Current recipe for repr coredump: original = l = [] for i in range(1000000): new = [] l.append(new) l = new l.append(original) repr(l) Jeremy

[Jeremy Hylton]>
It doesn't seem like tail-recursion is the issue, rather we need to define some rules about when to end the recursion. If I understand what is being suggest, it is to create a worklist of subobjects to compare instead of making recursive calls to compare. This change would turn the core dump into an infinite loop; I guess that's an improvement, but not much of one.
...
So the real problem is defining some reasonable semantics for comparison of recursive objects.
I think this is exactly a graph isomorphism problem, since Python always compares "by value" (so isomorphism is the natural generalization). This isn't hard (!= tedious, alas) to define or to implement naively, but a straightforward implementation would be very expensive at runtime compared to the status quo. That's why "real languages" <wink> would rather suffer an infinite loop. It's expensive because there's no cheap way to know whether you have a loop in an object. An anal compromise would be to run comparisons full speed without trying to detect loops, but if the recursion got "too deep" break out and start over with an expensive alternative that does check for loops. The later requires machinery similar to copy.deepcopy's.
... I think the comparison ought to return false or raise a ValueError.
After a = [] b = [] a.append(a) b.append(b) it certainly "ought to be" the case that a == b in Python. "false" makes no sense. ValueError makes no sense either unless we incur the expense of proving first that at least one object does contain a loop (as opposed to that it's just possibly nested thousands of levels deep) -- but then we may as well implement an isomorphism discriminator.
I'm not sure which is right. It seems odd to me that comparing two builtin lists could ever raise an exception, but it may be more Pythonic to raise an exception in the face of ambiguity. As the X3J13 committee noted:
Lisps have more notions of "equality" than Python 1.6 has flavors of strings <wink>. Python has only one notion of equality (conveniently ignoring that it actually has two <wink>). The thing the Lisp people argue about is which of the three or four notions of equality to apply at varying levels when trying to compute one of their *other* notions of equality -- there *can't* be a universally "good" answer to that mess. Python's life is easier here. in-concept-if-not-in-implementation-ly y'rs - tim

"TP" == Tim Peters <tim_one@email.msn.com> writes:
TP> [Jeremy Hylton]>
So the real problem is defining some reasonable semantics for comparison of recursive objects.
TP> I think this is exactly a graph isomorphism problem, since TP> Python always compares "by value" (so isomorphism is the natural TP> generalization). I'm not familiar with any algorithms for the graph isomorphism problem, but I took a stab at a simple comparison algorithm. The idea is to detect comparisons that would cross back-edges in the object graphs. Instead of starting a new comparison, assume they are the same. If, in fact, the objects are not the same, they must differ in some other way; some other part of the comparison will fail. TP> This isn't hard (!= tedious, alas) to define or to implement TP> naively, but a straightforward implementation would be very TP> expensive at runtime compared to the status quo. That's why TP> "real languages" <wink> would rather suffer an infinite loop. TP> It's expensive because there's no cheap way to know whether you TP> have a loop in an object. My first attempt at implementing this is expensive. I maintain a dictionary that contains all the object pairs that are currently being compared. Specifically, the dictionary is used to implement a set of object id pairs. Every call to PyObject_Compare will add a new pair to the dictionary when it is called and remove it when it returns (except for a few trivial cases). A naive patch is included below. It does seem to involve a big performance hit -- more than 10% slower on pystone. It also uses a lot of extra space. Note that the patch has all its initialization code inline in PyObject_Compare; moving that elsewhere will help a little. It also use a bunch of function calls where macros would be more efficient. TP> An anal compromise would be to run comparisons full speed TP> without trying to detect loops, but if the recursion got "too TP> deep" break out and start over with an expensive alternative TP> that does check for loops. The later requires machinery similar TP> to copy.deepcopy's. It looks like the anal compromise might be necessary. I'll re-implement the patch more carefully and see what the real effect on performance is. Jeremy Index: object.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/object.c,v retrieving revision 2.67 diff -r2.67 object.c 239c239 < "__repr__ returned non-string (type %.200s)", ---
"__repr__ returned non-string (type %s)",
276c276 < "__str__ returned non-string (type %.200s)", ---
"__str__ returned non-string (type %s)",
300a301,328
static PyObject *cmp_state_key = NULL;
static PyObject* cmp_state_make_pair(v, w) PyObject *v, *w; { PyObject *pair = PyTuple_New(2); if (pair == NULL) return NULL; if ((long)v <= (long)w) { PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)v)); PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)w)); } else { PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)w)); PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)v)); } return pair; }
void cmp_state_clear_pair(dict, key) PyObject *dict, *key; { PyDict_DelItem(dict, key); Py_DECREF(key); }
305a334,336
PyObject *tstate_dict, *cmp_dict, *pair; int result;
311a343,376
tstate_dict = PyThreadState_GetDict(); if (tstate_dict == NULL) { PyErr_BadInternalCall(); return -1; } /* fprintf(stderr, "PyObject_Compare(%X: %s, %X: %s)\n", (long)v, v->ob_type->tp_name, (long)w, w->ob_type->tp_name); */ /* XXX should initialize elsewhere */ if (cmp_state_key == NULL) { cmp_state_key = PyString_InternFromString("compare_state"); cmp_dict = PyDict_New(); if (cmp_dict == NULL) return -1; PyDict_SetItem(tstate_dict, cmp_state_key, cmp_dict); } else { cmp_dict = PyDict_GetItem(tstate_dict, cmp_state_key); if (cmp_dict == NULL) return NULL; PyDict_SetItem(tstate_dict, cmp_state_key, cmp_dict); }
pair = cmp_state_make_pair(v, w); if (pair == NULL) { PyErr_BadInternalCall(); return -1; } if (PyDict_GetItem(cmp_dict, pair)) { /* already comparing these objects. assume they're equal until shown otherwise */ Py_DECREF(pair); return 0; }
316a382,384
if (PyDict_SetItem(cmp_dict, pair, pair) == -1) { return -1; }
317a386
cmp_state_clear_pair(cmp_dict, pair);
329a399,401
if (PyDict_SetItem(cmp_dict, pair, pair) == -1) { return -1; }
344a417
cmp_state_clear_pair(cmp_dict, pair);
350,364c423,425 < else if (PyUnicode_Check(v) || PyUnicode_Check(w)) { < int result = PyUnicode_Compare(v, w); < if (result == -1 && PyErr_Occurred() && < PyErr_ExceptionMatches(PyExc_TypeError)) < /* TypeErrors are ignored: if Unicode coercion < fails due to one of the arguments not < having the right type, we continue as < defined by the coercion protocol (see < above). Luckily, decoding errors are < reported as ValueErrors and are not masked < by this technique. */ < PyErr_Clear(); < else < return result; < } ---
cmp_state_clear_pair(cmp_dict, pair); if (PyUnicode_Check(v) || PyUnicode_Check(w)) return PyUnicode_Compare(v, w);
372c433,434 < if (vtp->tp_compare == NULL) ---
if (vtp->tp_compare == NULL) { cmp_state_clear_pair(cmp_dict, pair);
374c436,439 < return (*vtp->tp_compare)(v, w); ---
} result = (*vtp->tp_compare)(v, w); cmp_state_clear_pair(cmp_dict, pair); return result;

On Thu, 13 Apr 2000, Jeremy Hylton wrote:
"TP" == Tim Peters <tim_one@email.msn.com> writes:
TP> [Jeremy Hylton]>
So the real problem is defining some reasonable semantics for comparison of recursive objects.
There is a "right" way to do this, i believe, and my friend Mark Miller implemented it in E. He tells me his algorithm is inspired by the method for unification of cyclic structures in Prolog III. It's available in the E source code (in the file elib/tables/Equalizer.java). See interesting stuff on equality and cyclic data structures at http://www.erights.org/javadoc/org/erights/e/elib/tables/Equalizer.html http://www.erights.org/elang/same-ref.html http://www.erights.org/elang/blocks/defVar.html http://www.eros-os.org/~majordomo/e-lang/0698.html There is also a thread about equality issues in general at: http://www.eros-os.org/~majordomo/e-lang/0000.html It's long, but worth perusing. Here is my rough Python translation of the code in the E Equalizer. Python 1.4 (Mar 25 2000) [C] Copyright 1991-1997 Stichting Mathematisch Centrum, Amsterdam Python Console v1.4 by Ka-Ping Yee <ping@lfw.org> >>> def same(left, right, sofar={}): ... hypothesis = (id(left), id(right)) ... if left is right or sofar.has_key(hypothesis): return 1 ... if type(left) is not type(right): return 0 ... if type(left) is type({}): ... left, right = left.items(), right.items() ... if type(left) is type([]): ... sofar[hypothesis] = 1 ... try: ... for i in range(len(left)): ... if not same(left[i], right[i], sofar): return 0 ... return 1 ... finally: ... del sofar[hypothesis] ... return left == right ... ... >>> same([3],[4]) 0 >>> same([3],[3]) 1 >>> a = [1,2,3] >>> b = [1,2,3] >>> c = [1,2,3] >>> same(a,b) 1 >>> a[1] = a >>> same(a,a) 1 >>> same(a,b) 0 >>> b[1] = b >>> same(a,b) 1 >>> b[1] = c >>> b [1, [1, 2, 3], 3] >>> same(a,b) 0 >>> c[1] = b >>> same(a,b) 1 >>> same(b,c) 1 >>> I would like to see Python's comparisons work this way (i.e. "correct" as opposed to "we give up"). -- ?!ng

As a reference, here is the corresponding cyclic-structure-comparison example from a message about E: ? define tight := [1, tight, "x"] # value: [1, ***CYCLE***, x] ? define loose := [1, [1, loose, "x"], "x"] # value: [1, ***CYCLE***, x] ? tight == loose # value: true ? def map := [tight => "foo"] # value: [[1, ***CYCLE***, x] => foo] ? map[loose] # value: foo Internally, tight and loose have very different representations. However, when both cycles are unwound, they represent the same infinite tree. One could say that tight's representation of this tree is more tightly wound than loose's representation. However, this difference is only in the implementation, not in the semantics. The value of tight and loose is only the infinite tree they represent. If these trees are the same, then tight and loose are ==. Notice that loose prints out according to the tightest winding of the tree it represents, not according to the cycle by which it represents this tree. Only the tightest winding is finite and canonical. -- ?!ng

Looks like the proposed changed to PyObject_Compare matches E for your example. The printed representation doesn't match, but I'm not sure that is as important.
tight = [1, None, "x"] tight[1] = tight tight [1, [...], 'x'] loose = [1, [1, None, "x"], "x"] loose[1][1] = loose loose [1, [1, [...], 'x'], 'x'] tight [1, [...], 'x'] tight == loose 1
Jeremy

On Thu, 13 Apr 2000, Jeremy Hylton wrote:
Looks like the proposed changed to PyObject_Compare matches E for your example. The printed representation doesn't match, but I'm not sure that is as important.
Very, very cool. Well done. Say, when did printing get fixed?
tight = [1, None, "x"] tight[1] = tight tight [1, [...], 'x']
-- ?!ng

"KPY" == Ka-Ping Yee <ping@lfw.org> writes:
KPY> On Thu, 13 Apr 2000, Jeremy Hylton wrote:
Looks like the proposed changed to PyObject_Compare matches E for your example. The printed representation doesn't match, but I'm not sure that is as important.
KPY> Very, very cool. Well done. Say, when did printing get fixed? Looks like the repr checkin was pre-1.5.1. I glanced at the sameness code in E, and it looks like it is doing exactly the same thing. It keeps a mapping of comparisons seen sofar and returns true for them. It seems that E's types don't define their own methods for sameness, though. The same methods seem to understand the internals of the various E types. Or is it just a few special ones. Jeremy

On Thu, 13 Apr 2000, Jeremy Hylton wrote:
Looks like the proposed changed to PyObject_Compare matches E for your example. The printed representation doesn't match, but I'm not sure that is as important.
tight = [1, None, "x"] tight[1] = tight tight [1, [...], 'x'] loose = [1, [1, None, "x"], "x"] loose[1][1] = loose loose [1, [1, [...], 'x'], 'x'] tight [1, [...], 'x'] tight == loose 1
Actually, i thought about this a little more and realized that the above *is* exactly the correct behaviour. In E, [] makes an immutable list. To make it mutable you then have to "flex" it. A mutable empty list is written "[] flex" (juxtaposition means a method call). In the above, the identities of the inner and outer lists of "loose" are different, and so should be printed separately. They are equal but not identical: >>> loose == loose[1] 1 >>> loose is loose[1] 0 >>> loose is loose[1][1] 1 >>> loose.append(4) >>> loose [1, [1, [...], 'x'], 'x', 4] -- ?!ng "In the sciences, we are now uniquely privileged to sit side by side with the giants on whose shoulders we stand." -- Gerald Holton

[Jeremy Hylton]
I'm not familiar with any algorithms for the graph isomorphism problem,
Well, while an instance of graph isomorphism, this one is a relatively simple special case (because "the graphs" here are rooted, directed, and have ordered children).
but I took a stab at a simple comparison algorithm. The idea is to detect comparisons that would cross back-edges in the object graphs. Instead of starting a new comparison, assume they are the same. If, in fact, the objects are not the same, they must differ in some other way; some other part of the comparison will fail.
Bingo! That's the key trick.

On Thu, 13 Apr 2000, Tim Peters wrote:
Well, while an instance of graph isomorphism, this one is a relatively simple special case (because "the graphs" here are rooted, directed, and have ordered children).
Ordered? What about dictionaries? -- Moshe Zadka <mzadka@geocities.com>. http://www.oreilly.com/news/prescod_0300.html http://www.linux.org.il -- we put the penguin in .com

[Tim]
Well, while an instance of graph isomorphism, this one is a relatively simple special case (because "the graphs" here are rooted, directed, and have ordered children).
[Moshe Zadka]
Ordered? What about dictionaries?
An ordering of a dict's kids is forced in the context of comparison (see dict_compare in dictobject.c).
participants (6)
-
Christian Tismer
-
Jeremy Hylton
-
Ka-Ping Yee
-
Moshe Zadka
-
Tim Peters
-
Vladimir.Marangozov@inrialpes.fr