[Tutor] Clarification questions about how Python uses references.

Richard Damon Richard at Damon-Family.org
Fri Jun 25 20:59:44 EDT 2021


On 6/25/21 8:20 PM, boB Stepp wrote:
> On Fri, Jun 25, 2021 at 6:21 PM Dennis Lee Bieber <wlfraed at ix.netcom.com> wrote:
>> On Fri, 25 Jun 2021 14:48:00 -0500, boB Stepp <robertvstepp at gmail.com>
>> declaimed the following:
>>
>>> The Wikipedia entry on MD5 states, "The MD5 message-digest algorithm
>>> is a widely used *hash function* [my emphasis] producing a 128-bit
>>> hash value."  Looking up "hash function", the Wikipedia article on it
>>> states, "A hash function is any function that can be used to map data
>>> of arbitrary size to fixed-size values. The values returned by a hash
>>> function are called hash values, hash codes, digests, or simply
>>> hashes. The values are usually used to index a fixed-size table called
>>> a hash table."  This seems to fit in perfectly well with what Cameron
>>> stated and my usage above seems to be correct.  The "index" in this
>>> instance would be for the entire file that the MD5 value was computed
>>> for.  This may be (ignorant?) quibbling on my part, but it seems that
>>> we spend much of our time on these mailing lists trying to be
>>> uber-precise in our language usage.  I guess I am either falling into
>>> this trap or am engaging in a good thing?  ~(:>))
>>>
>>         But there is no /table/ being indexed by the MD5 hash! So how do you
>> locate the original file if given the MD5 hash? File systems that use
>> hashes use the file name, and don't hash the file contents (any edit of the
>> contents would invalidate the MD5 hash, and require regenerating the hash
>> value). The file name stays the same regardless of the edits to the file
>> itself, so its hash also stays the same..
> I see your point better now; however, the hash function definition
> above does say, "...The values are *usually* [my emphasis] used to
> index..."  I guess my (ignorant?) quibbling point is that MD5 is still
> by definition a hash function.  But this nit is not worth picking.
> Everything you say sounds eminently practical and sane!
>
> Cheers!
> boB Stepp

Note, there are TWO major distinct uses for Hash Functions.

One technique uses a hash function to ultimately get a fairly small
number, to make lookup of values O(1) in a container.

The Second uses cryptographic secure hashes to verify a file. (This is
where MD5 is used), This use has hashes that generate BIG numbers.

Python uses that first type of hash for sets and dictionaries, and for
that use you want a hash that is quick to compute, It doesn't need to
make sure that two distinct object will always have different hash
values, but you do want this to be the normally expected case, and maybe
you want to make it hard to intentionally generate values that
intentionally collide to avoid Denial of Service Attacks. The second
type needs very different requirements, you don't want it to be
practical for someone given a hash value to create a different file that
gives that value, and you generally don't mind it taking a degree of
effort to compute that hash.

One generic tool, two very different specific version for different
applications.

-- 
Richard Damon



More information about the Tutor mailing list