[Tutor] Clarification questions about how Python uses references.

Cameron Simpson cs at cskk.id.au
Thu Jun 24 04:09:30 EDT 2021


On 23Jun2021 23:13, boB Stepp <robertvstepp at gmail.com> wrote:
>Hashes/hashing have not made it into my education yet.  I am hoping
>that this intro CSc book I am reading will eventually address this.
>Plus it seems that there are other contexts/meanings as in things like
>checksums using MD5 hashing or some such, et al.

They're related terms.

I give a poor introduction below:

A hash function takes a source value (eg a string such as you might use 
to index a dict) or, really, anything else, and computes a value, 
normally an integer. It is always the same for a given source value.

The usual objective is that the various source values are evenly 
distributed over some range.

In the case of a "hash table", this is what gives dicts and sets O(1) 
lookup times. When you store things in a hash table, the table has an 
array of "slots" where things get stored according to their key.
You compute a hash value which dicates the slot a particular item is 
stored in. Then to see if an item is present, you only need to to look 
at the items stores in that particular slot, not at every item in the 
table as a whole.

For maximum speed the number of slots is around the number of items 
stored, so that the number of items in a slot is around 1. Obviously 
that depends a bit on the source values, and this is why hash functions 
are chosen to produce evenly distributes values - to get a fairly even 
spread of slot usage regardless of the source values.

Imagine you're storing strings. You might compute a hash function by 
summing the ordinals of the characters in the string. Then take that 
modulo the number of slots in the hash table. Then your hash function 
computes an index into the table.

This function is actually not a great one, because string values are 
often short and often have very skewed character frequencies. But it 
illustrates the idea: you're computing a number from the source value 
(the string) - it is always the same for a given source value.

Suppose "abcdef" results in 6. The if you stored something keyed by 
"abcdef" it would be kept in slot 6, and to find thing thing keyed by 
"abcdef" you only need to look in slot 6.

Cheers,
Cameron Simpson <cs at cskk.id.au>


More information about the Tutor mailing list