[Python-Dev] Dataclasses and correct hashability

Eric V. Smith eric at trueblade.com
Fri Feb 2 10:08:43 EST 2018


On 2/2/2018 12:33 AM, Nick Coghlan wrote:
> For  3.7, I think we should seriously considered just straight up
> disallowing the "hash=True, frozen=False" combination, and instead
> require folks to provide their own hash function in that case.
> "Accidentally hashable" (whether by identity or field hash) isn't a
> thing that data classes should be allowing to happen.
> 
> If we did that, then the public "hash" parameter could potentially be
> dropped entirely for the time being - the replacement for "hash=True"
> would be a "def __hash__: ..." in the body of the class definition,
> and the replacement for "hash=False" would be "__hash__ = None" in the
> class body.

attrs has the same behavior (if you ignore how dataclasses handles the 
cases where __hash__ or __eq__ already exist in the class definition). 
Here's what attrs says about adding __hash__ via hash=True:

"Although not recommended, you can decide for yourself and force attrs 
to create one (e.g. if the class is immutable even though you didn’t 
freeze it programmatically) by passing True or not. Both of these cases 
are rather special and should be used carefully."

The problem with dropping hash=True is: how would you write __hash__ 
yourself? It seems like a bug magnet if you're adding fields to the 
class and forget to update __hash__, especially in the presence of 
per-field hash=False and eq=False settings. And you'd need to make sure 
it matches the generated __eq__ (if 2 objects are equal, they need to 
have the same hash value).

If we're going to start disallowing things, how about the per-field 
hash=True, eq=False case?

However, I don't feel very strongly about this. As I've said, I expect 
the use cases for hash=True to be very, very rare. And now that we allow 
overriding __hash__ in the class body without setting hash=False, there 
aren't a lot of uses for hash=False, either. But we would need to think 
through how you'd get the behavior of hash=False with multiple 
inheritance, if that's what you wanted. Again, a very, very rare case.

In all, I think we're better off documenting best practices and making 
them the default, like attrs does, and leave it to the programmer to 
follow them. I realize we're handing out footguns, the alternatives seem 
even more complex and are limiting.

Eric


More information about the Python-Dev mailing list