Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7)
Jeremy Hylton
jeremy@cnri.reston.va.us
Thu, 13 Apr 2000 15:08:12 -0400 (EDT)
>>>>> "TP" == Tim Peters <tim_one@email.msn.com> writes:
TP> [Jeremy Hylton]>
>> So the real problem is defining some reasonable semantics for
>> comparison of recursive objects.
TP> I think this is exactly a graph isomorphism problem, since
TP> Python always compares "by value" (so isomorphism is the natural
TP> generalization).
I'm not familiar with any algorithms for the graph isomorphism
problem, but I took a stab at a simple comparison algorithm. The idea
is to detect comparisons that would cross back-edges in the object
graphs. Instead of starting a new comparison, assume they are the
same. If, in fact, the objects are not the same, they must differ in
some other way; some other part of the comparison will fail.
TP> This isn't hard (!= tedious, alas) to define or to implement
TP> naively, but a straightforward implementation would be very
TP> expensive at runtime compared to the status quo. That's why
TP> "real languages" <wink> would rather suffer an infinite loop.
TP> It's expensive because there's no cheap way to know whether you
TP> have a loop in an object.
My first attempt at implementing this is expensive. I maintain a
dictionary that contains all the object pairs that are currently being
compared. Specifically, the dictionary is used to implement a set of
object id pairs. Every call to PyObject_Compare will add a new
pair to the dictionary when it is called and remove it when it returns
(except for a few trivial cases).
A naive patch is included below. It does seem to involve a big
performance hit -- more than 10% slower on pystone. It also uses a
lot of extra space.
Note that the patch has all its initialization code inline in
PyObject_Compare; moving that elsewhere will help a little. It also
use a bunch of function calls where macros would be more efficient.
TP> An anal compromise would be to run comparisons full speed
TP> without trying to detect loops, but if the recursion got "too
TP> deep" break out and start over with an expensive alternative
TP> that does check for loops. The later requires machinery similar
TP> to copy.deepcopy's.
It looks like the anal compromise might be necessary. I'll
re-implement the patch more carefully and see what the real
effect on performance is.
Jeremy
Index: object.c
===================================================================
RCS file: /projects/cvsroot/python/dist/src/Objects/object.c,v
retrieving revision 2.67
diff -r2.67 object.c
239c239
< "__repr__ returned non-string (type %.200s)",
---
> "__repr__ returned non-string (type %s)",
276c276
< "__str__ returned non-string (type %.200s)",
---
> "__str__ returned non-string (type %s)",
300a301,328
> static PyObject *cmp_state_key = NULL;
>
> static PyObject*
> cmp_state_make_pair(v, w)
> PyObject *v, *w;
> {
> PyObject *pair = PyTuple_New(2);
> if (pair == NULL)
> return NULL;
> if ((long)v <= (long)w) {
> PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)v));
> PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)w));
> } else {
> PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)w));
> PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)v));
> }
> return pair;
> }
>
> void
> cmp_state_clear_pair(dict, key)
> PyObject *dict, *key;
> {
> PyDict_DelItem(dict, key);
> Py_DECREF(key);
> }
>
>
305a334,336
> PyObject *tstate_dict, *cmp_dict, *pair;
> int result;
>
311a343,376
> tstate_dict = PyThreadState_GetDict();
> if (tstate_dict == NULL) {
> PyErr_BadInternalCall();
> return -1;
> }
> /* fprintf(stderr, "PyObject_Compare(%X: %s, %X: %s)\n", (long)v,
> v->ob_type->tp_name, (long)w, w->ob_type->tp_name);
> */
> /* XXX should initialize elsewhere */
> if (cmp_state_key == NULL) {
> cmp_state_key = PyString_InternFromString("compare_state");
> cmp_dict = PyDict_New();
> if (cmp_dict == NULL)
> return -1;
> PyDict_SetItem(tstate_dict, cmp_state_key, cmp_dict);
> } else {
> cmp_dict = PyDict_GetItem(tstate_dict, cmp_state_key);
> if (cmp_dict == NULL)
> return NULL;
> PyDict_SetItem(tstate_dict, cmp_state_key, cmp_dict);
> }
>
> pair = cmp_state_make_pair(v, w);
> if (pair == NULL) {
> PyErr_BadInternalCall();
> return -1;
> }
> if (PyDict_GetItem(cmp_dict, pair)) {
> /* already comparing these objects. assume they're
> equal until shown otherwise
> */
> Py_DECREF(pair);
> return 0;
> }
316a382,384
> if (PyDict_SetItem(cmp_dict, pair, pair) == -1) {
> return -1;
> }
317a386
> cmp_state_clear_pair(cmp_dict, pair);
329a399,401
> if (PyDict_SetItem(cmp_dict, pair, pair) == -1) {
> return -1;
> }
344a417
> cmp_state_clear_pair(cmp_dict, pair);
350,364c423,425
< else if (PyUnicode_Check(v) || PyUnicode_Check(w)) {
< int result = PyUnicode_Compare(v, w);
< if (result == -1 && PyErr_Occurred() &&
< PyErr_ExceptionMatches(PyExc_TypeError))
< /* TypeErrors are ignored: if Unicode coercion
< fails due to one of the arguments not
< having the right type, we continue as
< defined by the coercion protocol (see
< above). Luckily, decoding errors are
< reported as ValueErrors and are not masked
< by this technique. */
< PyErr_Clear();
< else
< return result;
< }
---
> cmp_state_clear_pair(cmp_dict, pair);
> if (PyUnicode_Check(v) || PyUnicode_Check(w))
> return PyUnicode_Compare(v, w);
372c433,434
< if (vtp->tp_compare == NULL)
---
> if (vtp->tp_compare == NULL) {
> cmp_state_clear_pair(cmp_dict, pair);
374c436,439
< return (*vtp->tp_compare)(v, w);
---
> }
> result = (*vtp->tp_compare)(v, w);
> cmp_state_clear_pair(cmp_dict, pair);
> return result;