[Tutor] Clarification questions about how Python uses references.
Cameron Simpson
cs at cskk.id.au
Thu Jun 24 04:09:30 EDT 2021
On 23Jun2021 23:13, boB Stepp <robertvstepp at gmail.com> wrote:
>Hashes/hashing have not made it into my education yet. I am hoping
>that this intro CSc book I am reading will eventually address this.
>Plus it seems that there are other contexts/meanings as in things like
>checksums using MD5 hashing or some such, et al.
They're related terms.
I give a poor introduction below:
A hash function takes a source value (eg a string such as you might use
to index a dict) or, really, anything else, and computes a value,
normally an integer. It is always the same for a given source value.
The usual objective is that the various source values are evenly
distributed over some range.
In the case of a "hash table", this is what gives dicts and sets O(1)
lookup times. When you store things in a hash table, the table has an
array of "slots" where things get stored according to their key.
You compute a hash value which dicates the slot a particular item is
stored in. Then to see if an item is present, you only need to to look
at the items stores in that particular slot, not at every item in the
table as a whole.
For maximum speed the number of slots is around the number of items
stored, so that the number of items in a slot is around 1. Obviously
that depends a bit on the source values, and this is why hash functions
are chosen to produce evenly distributes values - to get a fairly even
spread of slot usage regardless of the source values.
Imagine you're storing strings. You might compute a hash function by
summing the ordinals of the characters in the string. Then take that
modulo the number of slots in the hash table. Then your hash function
computes an index into the table.
This function is actually not a great one, because string values are
often short and often have very skewed character frequencies. But it
illustrates the idea: you're computing a number from the source value
(the string) - it is always the same for a given source value.
Suppose "abcdef" results in 6. The if you stored something keyed by
"abcdef" it would be kept in slot 6, and to find thing thing keyed by
"abcdef" you only need to look in slot 6.
Cheers,
Cameron Simpson <cs at cskk.id.au>
More information about the Tutor
mailing list