"TP" == Tim Peters <tim_one@email.msn.com> writes:
TP> [Jeremy Hylton]>
So the real problem is defining some reasonable semantics for comparison of recursive objects.
TP> I think this is exactly a graph isomorphism problem, since TP> Python always compares "by value" (so isomorphism is the natural TP> generalization). I'm not familiar with any algorithms for the graph isomorphism problem, but I took a stab at a simple comparison algorithm. The idea is to detect comparisons that would cross back-edges in the object graphs. Instead of starting a new comparison, assume they are the same. If, in fact, the objects are not the same, they must differ in some other way; some other part of the comparison will fail. TP> This isn't hard (!= tedious, alas) to define or to implement TP> naively, but a straightforward implementation would be very TP> expensive at runtime compared to the status quo. That's why TP> "real languages" <wink> would rather suffer an infinite loop. TP> It's expensive because there's no cheap way to know whether you TP> have a loop in an object. My first attempt at implementing this is expensive. I maintain a dictionary that contains all the object pairs that are currently being compared. Specifically, the dictionary is used to implement a set of object id pairs. Every call to PyObject_Compare will add a new pair to the dictionary when it is called and remove it when it returns (except for a few trivial cases). A naive patch is included below. It does seem to involve a big performance hit -- more than 10% slower on pystone. It also uses a lot of extra space. Note that the patch has all its initialization code inline in PyObject_Compare; moving that elsewhere will help a little. It also use a bunch of function calls where macros would be more efficient. TP> An anal compromise would be to run comparisons full speed TP> without trying to detect loops, but if the recursion got "too TP> deep" break out and start over with an expensive alternative TP> that does check for loops. The later requires machinery similar TP> to copy.deepcopy's. It looks like the anal compromise might be necessary. I'll re-implement the patch more carefully and see what the real effect on performance is. Jeremy Index: object.c =================================================================== RCS file: /projects/cvsroot/python/dist/src/Objects/object.c,v retrieving revision 2.67 diff -r2.67 object.c 239c239 < "__repr__ returned non-string (type %.200s)", ---
"__repr__ returned non-string (type %s)",
276c276 < "__str__ returned non-string (type %.200s)", ---
"__str__ returned non-string (type %s)",
300a301,328
static PyObject *cmp_state_key = NULL;
static PyObject* cmp_state_make_pair(v, w) PyObject *v, *w; { PyObject *pair = PyTuple_New(2); if (pair == NULL) return NULL; if ((long)v <= (long)w) { PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)v)); PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)w)); } else { PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)w)); PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)v)); } return pair; }
void cmp_state_clear_pair(dict, key) PyObject *dict, *key; { PyDict_DelItem(dict, key); Py_DECREF(key); }
305a334,336
PyObject *tstate_dict, *cmp_dict, *pair; int result;
311a343,376
tstate_dict = PyThreadState_GetDict(); if (tstate_dict == NULL) { PyErr_BadInternalCall(); return -1; } /* fprintf(stderr, "PyObject_Compare(%X: %s, %X: %s)\n", (long)v, v->ob_type->tp_name, (long)w, w->ob_type->tp_name); */ /* XXX should initialize elsewhere */ if (cmp_state_key == NULL) { cmp_state_key = PyString_InternFromString("compare_state"); cmp_dict = PyDict_New(); if (cmp_dict == NULL) return -1; PyDict_SetItem(tstate_dict, cmp_state_key, cmp_dict); } else { cmp_dict = PyDict_GetItem(tstate_dict, cmp_state_key); if (cmp_dict == NULL) return NULL; PyDict_SetItem(tstate_dict, cmp_state_key, cmp_dict); }
pair = cmp_state_make_pair(v, w); if (pair == NULL) { PyErr_BadInternalCall(); return -1; } if (PyDict_GetItem(cmp_dict, pair)) { /* already comparing these objects. assume they're equal until shown otherwise */ Py_DECREF(pair); return 0; }
316a382,384
if (PyDict_SetItem(cmp_dict, pair, pair) == -1) { return -1; }
317a386
cmp_state_clear_pair(cmp_dict, pair);
329a399,401
if (PyDict_SetItem(cmp_dict, pair, pair) == -1) { return -1; }
344a417
cmp_state_clear_pair(cmp_dict, pair);
350,364c423,425 < else if (PyUnicode_Check(v) || PyUnicode_Check(w)) { < int result = PyUnicode_Compare(v, w); < if (result == -1 && PyErr_Occurred() && < PyErr_ExceptionMatches(PyExc_TypeError)) < /* TypeErrors are ignored: if Unicode coercion < fails due to one of the arguments not < having the right type, we continue as < defined by the coercion protocol (see < above). Luckily, decoding errors are < reported as ValueErrors and are not masked < by this technique. */ < PyErr_Clear(); < else < return result; < } ---
cmp_state_clear_pair(cmp_dict, pair); if (PyUnicode_Check(v) || PyUnicode_Check(w)) return PyUnicode_Compare(v, w);
372c433,434 < if (vtp->tp_compare == NULL) ---
if (vtp->tp_compare == NULL) { cmp_state_clear_pair(cmp_dict, pair);
374c436,439 < return (*vtp->tp_compare)(v, w); ---
} result = (*vtp->tp_compare)(v, w); cmp_state_clear_pair(cmp_dict, pair); return result;