[Tutor] Re: Testing if a number occurs more than once [dict version / What's the difference between has_key() and 'in']

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Wed Dec 4 22:10:00 2002


> > After rereading some stuff about dictionaries, I think the line
> >
> > 	if number in numbersfound:
> >
> > probably should be
> >
> > 	if numbersfound.has_key(number):


Hi Scot,

Ooops!  I forgot to mention that both statements:

    if number in numbersfound:

and

    if numbersfound.has_key(number):

are functionally equivalent as of Python 1.6.1, I think; that's when
Python added a hook for using 'in' with arbitrary objects (and not just
with lists and strings):

    http://python.org/1.6.1/

If we look for the part about '__contains__', we'll see a short blurb
about it.  They're interchangable for the most part; the 'in' version uses
special syntax, while 'has_key()' goes through the normal route that any
other Python method might take.  Otherwise, no harm in using either of
them.




[The following is just for people who like reading Python's C's
implementation for fun.]

But is there a difference between them?  If we're familiar with C, we
might like to delve in to see what really is so different between them.
If we look at the Python source code:

http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/python/python/dist/src/Objects/dictobject.c?rev=2.134&content-type=text/vnd.viewcvs-markup

we can see that has_key() and __contains__() have suspiciously similar
definitions, with one very slight difference: their return values are of
different types:

/******/
static PyObject *
dict_has_key(register dictobject *mp, PyObject *key)
{
	long hash;
	register long ok;
	if (!PyString_CheckExact(key) ||
	    (hash = ((PyStringObject *) key)->ob_shash) == -1) {
		hash = PyObject_Hash(key);
		if (hash == -1)
			return NULL;
	}
	ok = (mp->ma_lookup)(mp, key, hash)->me_value != NULL;
	return PyBool_FromLong(ok);
}

static int
dict_contains(dictobject *mp, PyObject *key)
{
	long hash;

	if (!PyString_CheckExact(key) ||
	    (hash = ((PyStringObject *) key)->ob_shash) == -1) {
		hash = PyObject_Hash(key);
		if (hash == -1)
			return -1;
	}
	return (mp->ma_lookup)(mp, key, hash)->me_value != NULL;
}
/******/

In first case, the implementation of the has_key() method returns a new
Python object, a PyBool object.  This isn't too much of a surprise, since
when we use the Python/C extension, all of the methods of an object have
to return some kind of Python object.

The second case is a little different --- the protocol that defines
__contains__ says that it must return either a true or false value, so the
low level code doesn't have to go through the effort of building
full-blown Python objects.  So, when we use the Python/C API layer, there
are some shortcuts built into the system that don't quite parallel what we
do in normal Python code.

The implementors could have just as easily defined 'in' in terms of
has_key(), but it looks like the implementers took the effort to fine-tune
their implementations, taking advantage the low-level hook for 'in'.
Even though there's a little duplication involved, the implementors felt
it was worth it --- we might suspect that the second version might go a
hairline faster than the first, since it has fewer lines of code.