Re: Comparison of cyclic objects (was RE: [Python-Dev] trashcan and PR#7)

13 Apr 2000

      ...
...
...
...
...
"TP" == Tim Peters <tim_one@email.msn.com> writes:
TP> [Jeremy Hylton]>
...
...
So the real problem is defining some reasonable semantics for
comparison of recursive objects.
TP> I think this is exactly a graph isomorphism problem, since
  TP> Python always compares "by value" (so isomorphism is the natural
  TP> generalization).

I'm not familiar with any algorithms for the graph isomorphism
problem, but I took a stab at a simple comparison algorithm.  The idea
is to detect comparisons that would cross back-edges in the object
graphs.  Instead of starting a new comparison, assume they are the
same.  If, in fact, the objects are not the same, they must differ in
some other way; some other part of the comparison will fail.

  TP> This isn't hard (!= tedious, alas) to define or to implement
  TP> naively, but a straightforward implementation would be very
  TP> expensive at runtime compared to the status quo.  That's why
  TP> "real languages" <wink> would rather suffer an infinite loop.
  TP> It's expensive because there's no cheap way to know whether you
  TP> have a loop in an object.

My first attempt at implementing this is expensive.  I maintain a
dictionary that contains all the object pairs that are currently being 
compared.  Specifically, the dictionary is used to implement a set of
object id pairs.  Every call to PyObject_Compare will add a new
pair to the dictionary when it is called and remove it when it returns 
(except for a few trivial cases).

A naive patch is included below.  It does seem to involve a big
performance hit -- more than 10% slower on pystone.  It also uses a
lot of extra space.

Note that the patch has all its initialization code inline in
PyObject_Compare; moving that elsewhere will help a little.  It also
use a bunch of function calls where macros would be more efficient.

  TP> An anal compromise would be to run comparisons full speed
  TP> without trying to detect loops, but if the recursion got "too
  TP> deep" break out and start over with an expensive alternative
  TP> that does check for loops.  The later requires machinery similar
  TP> to copy.deepcopy's.

It looks like the anal compromise might be necessary.  I'll
re-implement the patch more carefully and see what the real
effect on performance is.

Jeremy

Index: object.c
===================================================================
RCS file: /projects/cvsroot/python/dist/src/Objects/object.c,v
retrieving revision 2.67
diff -r2.67 object.c
239c239
<                                    "__repr__ returned non-string (type %.200s)",
---
...
"__repr__ returned non-string (type %s)",
276c276
<                            "__str__ returned non-string (type %.200s)",
---
...
"__str__ returned non-string (type %s)",
300a301,328
...
static PyObject *cmp_state_key = NULL;
static PyObject*
cmp_state_make_pair(v, w)
      PyObject *v, *w;
{
      PyObject *pair = PyTuple_New(2);
      if (pair == NULL)
              return NULL;
      if ((long)v <= (long)w) {
              PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)v));
              PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)w));
      } else {
              PyTuple_SET_ITEM(pair, 0, PyInt_FromLong((long)w));
              PyTuple_SET_ITEM(pair, 1, PyInt_FromLong((long)v));
      }
      return pair;
}
void
cmp_state_clear_pair(dict, key)
      PyObject *dict, *key;
{
      PyDict_DelItem(dict, key);
      Py_DECREF(key);
}
305a334,336
...
PyObject *tstate_dict, *cmp_dict, *pair;
      int result;
311a343,376
...
tstate_dict = PyThreadState_GetDict();
      if (tstate_dict == NULL) {
              PyErr_BadInternalCall();
              return -1;
      } 
/*    fprintf(stderr, "PyObject_Compare(%X: %s, %X: %s)\n", (long)v, 
              v->ob_type->tp_name, (long)w, w->ob_type->tp_name);
*/
      /* XXX should initialize elsewhere */
      if (cmp_state_key == NULL) {
              cmp_state_key = PyString_InternFromString("compare_state"); 
              cmp_dict = PyDict_New();
              if (cmp_dict == NULL)
                      return -1;
              PyDict_SetItem(tstate_dict, cmp_state_key, cmp_dict);
      } else {
              cmp_dict = PyDict_GetItem(tstate_dict, cmp_state_key);
              if (cmp_dict == NULL)
                      return NULL;
              PyDict_SetItem(tstate_dict, cmp_state_key, cmp_dict);
      }
pair = cmp_state_make_pair(v, w);
      if (pair == NULL) {
              PyErr_BadInternalCall();
              return -1;
      }
      if (PyDict_GetItem(cmp_dict, pair)) {
              /* already comparing these objects.  assume they're
                 equal until shown otherwise 
              */
              Py_DECREF(pair);
              return 0;
      }
316a382,384
...
if (PyDict_SetItem(cmp_dict, pair, pair) == -1) {
                      return -1;
              }
317a386
...
cmp_state_clear_pair(cmp_dict, pair);
329a399,401
...
if (PyDict_SetItem(cmp_dict, pair, pair) == -1) {
              return -1;
      }
344a417
...
cmp_state_clear_pair(cmp_dict, pair);
350,364c423,425
<               else if (PyUnicode_Check(v) || PyUnicode_Check(w)) {
<                       int result = PyUnicode_Compare(v, w);
<                       if (result == -1 && PyErr_Occurred() && 
<                           PyErr_ExceptionMatches(PyExc_TypeError))
<                               /* TypeErrors are ignored: if Unicode coercion
<                               fails due to one of the arguments not
<                               having the right type, we continue as
<                               defined by the coercion protocol (see
<                               above). Luckily, decoding errors are
<                               reported as ValueErrors and are not masked
<                               by this technique. */
<                               PyErr_Clear();
<                       else
<                               return result;
<               }
---
...
cmp_state_clear_pair(cmp_dict, pair);
              if (PyUnicode_Check(v) || PyUnicode_Check(w))
                      return PyUnicode_Compare(v, w);
372c433,434
<       if (vtp->tp_compare == NULL)
---
...
if (vtp->tp_compare == NULL) {
              cmp_state_clear_pair(cmp_dict, pair);
374c436,439
<       return (*vtp->tp_compare)(v, w);
---
...
}
      result = (*vtp->tp_compare)(v, w);
      cmp_state_clear_pair(cmp_dict, pair);
      return result;