Automatic debugging of copy by reference errors?

Carl Banks pavlovevidence at gmail.com
Mon Dec 11 00:48:17 EST 2006


Niels L Ellegaard wrote:
> Marc 'BlackJack' Rintsch wrote:
> > In <1165666093.996141.244760 at n67g2000cwd.googlegroups.com>, Niels L
> > Ellegaard wrote:
> > > I have been using scipy for some time now, but in the beginning I made
> > > a few mistakes with copying by reference.
> > But "copying by reference" is the way Python works.  Python never copies
> > objects unless you explicitly ask for it.  So what you want is a warning
> > for *every* assignment.
>
> Maybe I am on the wrong track here, but just to clarify myself:
>
> I wanted  a each object to know whether or not it was being referred to
> by a living object, and I wanted to warn the user whenever he tried to
> change an object that was being refered to by a living object.

This really wouldn't work, trust us.  Objects do not know who
references them, and are not notified when bound to a symbol or added
to a container.  However, I do think you're right about one thing: it
would be nice to have a tool that can catch errors of this sort, even
if it's imperfect (as it must be).

ISTM the big catch for Fortran programmers is when a mutable container
is referenced from multiple places; thus a change via one reference
will confusingly show up via the other one.  (This, of course, is
something that happens all the time in Python, but it's not something
people writing glue for numerical code need to do a lot.)  Instead of
the interpreter doing it automatically, it might be better (read:
possible) to use an explicit check.  Here is a simple, untested
example:


class DuplicateReference(Exception):
    pass

def _visit_container(var,cset):
    if id(var) in cset:
        raise DuplicateReference("container referenced twice")
    cset.add(id(var))

def assert_no_duplicate_container_refs(datalist,cset=None):
    if cset is None:
        cset = set()
    for var in data:
        if isinstance(var,(list,set)):
            _visit_container(var,cset)
            assert_no_duplicate_container_refs(var,cset)
        elif isinstance(var,dict):
            _visit_container(var,cset)
            assert_no_duplicate_container_refs(var.itervalues(),cset)


Then, at important points along the way, call this function with data
you provide yourself.  For example, we might modify your example as
follows:

import os
output=[]
firstlines =[0,0]
for filename in os.listdir('.'):
    try:
        firstlines[0] = open(filename,"r").readlines()[0]
        firstlines[1] = open(filename,"r").readlines()[1]
        output.append((filename,firstlines))
    except (IOError, IndexError):
        # please don't use bare except unless reraising exception
        continue
assert_no_duplicate_container_refs([output,firstlines])
print output


We passed the function output and firstlines because those are the two
names you defined yourself.  It'll check to see if any containers are
referenced more than once.  And it turns out there are--firstlines is
referenced many times.  It'll raise DuplicateReference on you.

There's plenty of room for improvement in the recipe, for sure.  It'll
catch real basic stuff, but it doesn't account for all containers, and
doesn't show you the loci of the duplicate references.

Now, to be honest, the biggest benefit of this check is it gives
newbies a chance to learn about references some way other than the hard
way.  It's not meant to catch a common mistake, so much as a
potentially very confusing one.  (It's really not a common mistake for
programmers who are aware of references.)  Once the lesson is learned,
this check should be put aside, as it unnecessarily restricts
legitimate use of referencing.


Carl Banks




More information about the Python-list mailing list