[Datetime-SIG] Another approach to 495's glitches
tim.peters at gmail.com
Sun Sep 6 08:11:33 CEST 2015
> FYI, I'm most concerned about how glibly I "sold" the idea that it
> really does solve the hash problem. It seems obvious to me that it
> does, but ... hash problems have a way of popping up in unexpected
> ways in unconsidered contexts :-(
So, after thinking about this for a few days, it's obvious after all ;-)
Consider two aware datetimes that compare equal. The task is to prove
they have the same hash. The subtlety is that while __eq__ and
__hash__ both use a notion of "UTC equivalent", they're not always the
same notion. __eq__ always uses the given values of `fold`, while
__hash__ always forces fold=0.
1. Same zone.
.utcoffset() isn't used for equality in this case; it's only used by
hash. Equality implies they differ at most in `fold`. Since hash()
forces fold=0, hash's calls to .utcoffset() see exactly the same stuff
for both, so hash's force-fold-to-0 UTC equivalents are the same.
Same UTC equivalents, same hashes.
2. Different zones.
Equality implies fold=0 for both, and that both map to the same UTC
time. Since we know fold=0 for both, we know __eq__ and __hash__ use
the same notion of UTC equivalent for both, so __hash__ sees the same
UTC equivalents __eq__ already saw and judged equal. Same UTC
equivalents, same hashes.
Where it failed before: `later` is the later of an ambiguous time, so
has fold=1. `ulater` is its UTC equivalent (with fold=0). They
compared equal before. But hash(later) computed the hash based on the
force-fold-to-0 UTC equivalent, which is not the same as the fold=1
UTC equivalent `ulater`. hash(ulater) and hash(later) had no more in
common than hash(math.pi) and hash("hash").
And they still won't. But in the new world later != ulater (at least
one has fold=1 in a cross-zone comparison), so it no longer matters
that the hashes differ.
More information about the Datetime-SIG