suggested at first: reset fold to 0 before calling utcoffset() in
__hash__.
A rare hash collision is a small price to pay for having datetimes with
different timezones in the same dictionary.
[Tim]
Ya, I can live with that. In effect, we give up on converting to UTC
correctly for purposes of computing hash(), but only in rare cases.
hash() doesn't really care, and it remains true that datetime equality
(which does care) still implies hash equality. The later and earlier
of ambiguous times will simply land on the same hash chain.
Nope, you wore me out prematurely ;-)
It's getting late in my TZ, but what you are saying below sounds like a
complaint that if you put t=second 01:30 as a key in the dictionary, you
cannot later retrieve it by looking up t.astimezone(timezone.utc). Sorry,
but PEP 495 has never promised you that: "instances that differ only by the
value of fold will compare as equal. Applications that need to
differentiate between such instances should check the value of fold or
convert them to a timezone that does not have ambiguous times."
<https://www.python.org/dev/peps/pep-0495/#temporal-arithmetic>
Maybe if we decide to do something with the arithmetic, we will be able to
fix this wart as well.
Consider datetimes dt1 and dt2 representing the earlier & later of an
ambiguous time in their common zone (whatever it may be - doesn't
matter). Then all fields are identical except for `fold`. Assume
__hash__ forces `fold` to 0 before obtaining the UTC offset. Then we
have:
dt1 == dt2
hash(dt1) == hash(dt2)
Fine so far as it goes. Now do:
u1 = dt1.astimezone(timezone.utc)
u2 = dt2.astimezone(timezone.utc)
At this point we have:
u1 == dt1 == dt2 == u2 and u1 < u2
hash(dt1) == hash(dt2) == hash(u1)
(Parenthetically, note that despite the chain of equalities in the
first of those lines, we do _not_ have u1 == u2 - transitivity fails,
which is a bit of a wart by itself.)
Since u1 == dt1, and hash(u1) == hash(dt1), no problem there either.
But u1 isn't at all the same as u2, so hash(u2) can be the same as
hash(u1) only by (unlikely) accident. hash(u2) is off in a world of
its own. Therefore hash(dt2) can be the same as hash(u2) only by (the
same unlikely) accident, despite that dt2 == u2.
So, in all, __hash__ forcing fold=0 at the start hides the problem for
ambiguous times in the same zone, but doesn't really touch the problem
for cross-zone equivalent spellings of such times (not even if one of
the zones is UTC, which is likely the most important case).
One way to fix that is to have datetime.__hash__() _always_ return, say,