[Python-Dev] Dataclasses and correct hashability

Guido van Rossum guido at python.org
Mon Feb 5 00:49:50 EST 2018


Looks like this is turning into a major flamewar regardless of what I say.
:-(

I really don't want to lose the ability to add a hash function to a mutable
dataclass by flipping a flag in the decorator. I'll explain below. But I am
fine if this flag has a name that clearly signals it's an unsafe thing to
do.

I propose to replace the existing (as of 3.7.0b1) hash= keyword for the
@dataclass decorator with a simpler flag named unsafe_hash=. This would be
a simple bool (not a tri-state flag like the current hash=None|False|True).
The default would be False, and the behavior then would be to add a hash
function automatically only if it's safe (using the same rules as for
hash=None currently). With unsafe_hash=True, a hash function would always
be generated that takes all fields into account except those declared using
field(hash=False). If there's already a `def __hash__` in the function I
don't care what it does, maybe it should raise rather than quietly doing
nothing or quietly overwriting it.

Here's my use case.

A frozen class requires a lot of discipline, since you have to compute the
values of all fields before calling the constructor. A mutable class allows
other initialization patterns, e.g. manually setting some fields after the
instance has been constructed, or having a separate non-dunder init()
method. There may be good reasons for using these patterns, e.g. the object
may be part of a cycle (e.g. parent/child links in a tree). Or you may just
use one of these patterns because you're a pretty casual coder. Or you're
modeling something external.

My point is that once you have one of those patterns in place, changing
your code to avoid them may be difficult. And yet your code may treat the
objects as essentially immutable after the initialization phase (e.g. a
parse tree). So if you create a dataclass and start coding like that for a
while, and much later you need to put one of these into a set or use it as
a dict key, switching to frozen=True may not be a quick option. And writing
a __hash__ method by hand may feel like a lot of busywork. So this is where
[unsafe_]hash=True would come in handy.

I think naming the flag unsafe_hash should take away most objections, since
it will be clear that this is not a safe thing to do. People who don't
understand the danger are likely to copy a worse solution from
StackOverflow anyway. The docs can point to frozen=True and explain the
danger.

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20180204/a2278e1e/attachment.html>


More information about the Python-Dev mailing list