Best practice for caching hash
Cameron Simpson
cs at cskk.id.au
Tue Mar 15 17:32:37 EDT 2022
On 12Mar2022 21:45, Marco Sulla <Marco.Sulla.Python at gmail.com> wrote:
>I have a custom immutable object, and I added a cache for its hash
>value. The problem is the object can be composed of mutable or
>immutable objects, so the hash can raise TypeError.
Is it sensible to compute the hash only from the immutable parts?
Bearing in mind that usually you need an equality function as well and
it may have the same stability issues.
>In this case I currently cache the value -1. The subsequent calls to
>__hash__() will check if the value is -1. If so, a TypeError is
>immediately raised.
This will also make these values behave badly in dicts/sets, as they all
hash to the same bucket. The performance of a hash index is pretty
dependent on having roughly even distribution of objects amongst the
hash slots.
>The problem is the first time I get an error with details, for example:
>
>TypeError: unhashable type: 'list'
>
>The subsequent times I simply raise a generic error:
>
>TypeError
You could, you know, cache the original exception. That does keep links
to the traceback and therefore all the associates stack frames, so that
isn't cheap (it is peerfectly fast - just the reference, t just prevents
those things from being reclaimed).
>Ok, I can improve it by raising, for example, TypeError: not all
>values are hashable. But do you think this is acceptable? Now I'm
>thinking about it and it seems a little hacky to me.
That's a policy issue. "Acceptable" depends on the imagined use cases.
Im' just having a read of your intro:
https://github.com/Marco-Sulla/python-frozendict/blob/35611f4cd869383678104dc94f82aa636c20eb24/README.md
So your objective is that a frozendict can itself be hashable, allowing,
say, sets of frozendicts?
In that case I would be inclined to never raise TypeError at all. I'd
compute the hash entirely from the keys of the dict and compute equality
in the normal fashion: identical keys and then equal corresponding
values. That removes the requirement that values be immutable and/or
hashable.
It that workable?
Cheers,
Cameron Simpson <cs at cskk.id.au>
More information about the Python-list
mailing list