[Tutor] hash table

Danny Yoo dyoo@hkn.eecs.berkeley.edu
Fri, 21 Dec 2001 22:14:20 -0800 (PST)

On Fri, 21 Dec 2001, Scott wrote:
> Hmm.  I was thinking that a hashtable was like a dictionary, except
> that it was more of an index "decoupled" from the data items.  But
> since, in Python, variables don't designate an actual memory address,
> but instead reference data located *somewhere* in memory, this becomes
> a distinction without a difference, as they say.  Do I have that about
> right?

A dictionary is a container that makes it very quick to look up particular
things.  The "quick lookup" part is the big reason why we have

Let's pretend, for the moment, that we didn't have dictionaries at all:
Would we still be able to do without them?  Let's try it:

defns = [('slang', 'informal language'),
         ('jargon', 'denotes "slangy" language peculiar to hackers'),
         ('techspeak', 'formal technical vocabulary of programming')]

Here is a list of terms and their definitions.  We could have done this
with a dictionary:

defns_as_a_dict = { 'slang' : 'informal language',
                    'jargon' : 'denotes "slangy" ...'
                    'techspeak' : 'formal technical vocabulary' }

but let's keep pretending for a moment that we don't know about
dictionaries yet.  There's one main thing we really need to do with
dictionaries: we need to know how to look things up with them, if we have
some sort of keyword.

How can we look up things in this 'defns' list?

def lookup(target, key_value_list):
    for key, value in key_value_list:
        if key == target:
            return value
    raise KeyError, "Can't find %s in key_value_list." % target

(Don't worry about the last line if you haven't seen exception handling
yet.)  Now that we have this definition, let's try it out, just to make
sure it sorta works:

>>> lookup('slang', defns)                                                      
'informal language'                                                             
>>> lookup('foobar', defns)
Traceback (innermost last):
  File "<stdin>", line 1, in ?
  File "<stdin>", line 5, in lookup
KeyError: Can't find foobar in key_value_list.

So looking things up works.  And it's not too hard to cook up something
that can add an entry in this structure.  (I have the overwhelming
temptation to say "Exercise for the Reader".  *grin*)

So on a first glance, we don't need dictionaries to record keys and their
values.  It's only after we've played around with these key-value-pair
lists a bit that we notice something: it takes quite a long time to find
anything in there!  In an average sense, when we lookup() through a
key-value-pair list, we have to march through half of the elements to find
anything.  Very very slow.

One strategy we can use to make looking up things faster is to organize
and make sense out of the chaos.  A language dictionary, for example,
organizes things by alphabetical order.  Many English dictionaries, in
fact, make section divisions based on the first letter of a word: there's
the section for words starting with 'A', words starting with 'B', and so
on.  This makes looking up definitions a bit easier, since we have less to
search through.

By putting words in different sections, we can put reasonable limits on
where we can find something.  In normal language, dictionaries have
sections that differ based on their first letter.  In techspeak, Python
dictionaries have "bins" that differ based on their "hash" value.

> Hmm.  I was thinking that a hashtable was like a dictionary, except
> that it was more of an index "decoupled" from the data items.  But

Yes, the index 'keys' don't have to be directly coupled with the data item
'values'.  But the same idea goes with lists:

l1 = [0, 1, 2, 3, 4]
l2 = ['four', 'three', 'two', 'one', 'zero']

In the first list, it does appear that there's a relationship between
l1[0] and 0 itself, but that's just a coincidence, since in l2, if we grab
at l[1], we get the string 'three'.

I believe that the term "hashtable" and "dictionary" is synonymous; I
can't think of anything that distinguishes one from the other.

> I'm really enjoying tinkering with Python, but sometimes these little
> distinctions get me downright confused!

Keep asking questions on tutor; we'll try to make sure it gets less
confusing over time.  *grin*