[Python-Dev] Immutability vs. hashability

Steven D'Aprano steve at pearwood.info
Mon Feb 5 20:17:56 EST 2018


On Mon, Feb 05, 2018 at 12:09:52AM -0600, Chris Barker wrote:

> But a bit more detail -- I'm commenting on the API, not the capability -
> that is, since users often equate hashable and immutability, they will
> expect that if they say hash=True, then will get an immutable, and if they
> say frozen=True, they will get something hashable (as long as the fields
> are hashable, just like a tuple.
> 
> That is, even though these concepts are independent, the defaults shouldn't
> reflect that.

I'm not happy about the concept of pandering to the least capable, most 
ignorant programmers by baking a miscomprehension into an important 
standard library API. The fact is that mutability and hashability ARE 
independent qualities, and the API ought to reflect reality, not 
ignorance. That's why there are two separate switches, frozen and hash, 
not just one "frozen_hashable" switch.

(Things would be different if we just outright banned mutable+hashable, 
but I don't think anyone wants that.)

Fortunately, I also believe that the number of programmers who would 
fail to draw the right conclusion from the existence of separate 
switches will actually be pretty small in practice. The fact that there 
are two separate switches is a pretty big clue that mutability and 
hashability can be controlled separately.

I believe that the proposed API is much simpler to understand than your 
revision. We have:

- frozen and hash both default to False;
- if you explicitly set one, the other uses the default.

This corresponds to a very common, Pythonic pattern that nearly 
everyone is familiar with:

    def spam(frozen=False, hash=False):
        ...

which is easy to understand and easy to explain. Versus your proposal:

- if you set neither, then frozen and hash both default to False;
- but if you explicitly set one, the other uses True, namely the 
  opposite of the standard default.

which corresponds to something harder to describe and much less common:

    def spam(frozen=None, hash=None):
        if frozen is hash is None:
            frozen = hash = False
        elif frozen is None:
            frozen = True
        elif hash is None:
            hash = True
        ...

"frozen and hash default to True, unless neither are set, in which case 
they default to False."


Let's look at the two possible scenarios you are worried about:

(1) I set frozen=True thinking that makes the class hashable so I can 
use it in a set or hash. The first time I actually do so, I get an 
explicit and obvious TypeError. Problem solved.[1]

(2) I set hash=True thinking that makes the class frozen. This scenario 
is more problematic, because there's no explicit and obvious error when 
I get it wrong. Instead, my program could silently do the wrong thing if 
my instances are quietly mutated.

The first error is self-correcting, and so I believe that the second is 
the only one we should worry about. There are two questions:

- how much should we worry? (how often will this happen?);

- what do we do about it?

I think the answers ought to be, not much and nothing. Or *at most*, 
raise a *warning* when hash=True is set without also explicitly setting 
frozen. But even that seems unnecessary to me.

I think that the intersection of events needed for this to be a real 
problem will be fairly small:

- people using DataClasses;
- who want a frozen, hashable class;
- and believe that the two are equivalent;
- and who weren't clued in by the existence of separate switches;
- and set hash=True without frozen=True;
- and don't write a unit test to confirm that their data is immutable;
- and accidentally mutate an instance which they thought was immutable;
- in such a way as to cause a silent failure.

I don't think this is a failure mode that we need to be concerned with. 
We can't protect everyone from everything.



[1] Yes, I'm glossing over the possible annoyance if not difficulty of 
actually solving the problem: somebody has to raise a bug report, 
someone has to fix the bug which in principle could involve a lot of 
disruption, there should be regression tests and maybe a new release of 
the application, etc. But this is par for the course for *any* bug -- 
there's no need to imagine that this specific bug is so terrible that 
the standard library needs to protect programmers from the possibility 
of ordinary, run-of-the-mill bugs.

-- 
Steve


More information about the Python-Dev mailing list