[Datetime-SIG] Another approach to 495's glitches

Sun Sep 6 08:11:33 CEST 2015

[Tim]
> ...
> FYI, I'm most concerned about how glibly I "sold" the idea that it
> really does solve the hash problem.  It seems obvious to me that it
> does, but ... hash problems have a way of popping up in unexpected
> ways in unconsidered contexts :-(

So, after thinking about this for a few days, it's obvious after all ;-)

Consider two aware datetimes that compare equal.  The task is to prove
they have the same hash.  The subtlety is that while __eq__ and
__hash__ both use a notion of "UTC equivalent", they're not always the
same notion.  __eq__ always uses the given values of `fold`, while
__hash__ always forces fold=0.

1. Same zone.

.utcoffset() isn't used for equality in this case; it's only used by
hash.  Equality implies they differ at most in `fold`.  Since hash()
forces fold=0, hash's calls to .utcoffset() see exactly the same stuff
for both, so hash's force-fold-to-0 UTC equivalents are the same.
Same UTC equivalents, same hashes.

2. Different zones.

Equality implies fold=0 for both, and that both map to the same UTC
time.  Since we know fold=0 for both, we know __eq__ and __hash__ use
the same notion of UTC equivalent for both, so __hash__ sees the same
UTC equivalents __eq__ already saw and judged equal.  Same UTC
equivalents, same hashes.

Where it failed before:  `later` is the later of an ambiguous time, so
has fold=1.  `ulater` is its UTC equivalent (with fold=0).  They
compared equal before.  But hash(later) computed the hash based on the
force-fold-to-0 UTC equivalent, which is not the same as the fold=1
UTC equivalent `ulater`.  hash(ulater) and hash(later) had no more in
common than hash(math.pi) and hash("hash").

And they still won't.  But in the new world later != ulater (at least
one has fold=1 in a cross-zone comparison), so it no longer matters
that the hashes differ.