On Mon, Aug 31, 2015 at 7:55 PM, Tim Peters <tim.peters@gmail.com> wrote:
>
> [Alex]
> >> After some thought, I believe the way to fix the implementation is what I
> >> suggested at first: reset fold to 0 before calling utcoffset() in __hash__.
> >> A rare hash collision is a small price to pay for having datetimes with
> >> different timezones in the same dictionary.
>
> [Tim]
> > Ya, I can live with that. In effect, we give up on converting to UTC
> > correctly for purposes of computing hash(), but only in rare cases.
> > hash() doesn't really care, and it remains true that datetime equality
> > (which does care) still implies hash equality. The later and earlier
> > of ambiguous times will simply land on the same hash chain.
>
> Nope, you wore me out prematurely ;-)
>
It's getting late in my TZ, but what you are saying below sounds like a complaint that if you put t=second 01:30 as a key in the dictionary, you cannot later retrieve it by looking up t.astimezone(timezone.utc). Sorry, but PEP 495 has never promised you that: "instances that differ only by the value of fold will compare as equal. Applications that need to differentiate between such instances should check the value of fold or convert them to a timezone that does not have ambiguous times."
Maybe if we decide to do something with the arithmetic, we will be able to fix this wart as well.
>
> Consider datetimes dt1 and dt2 representing the earlier & later of an
> ambiguous time in their common zone (whatever it may be - doesn't
> matter). Then all fields are identical except for `fold`. Assume
> __hash__ forces `fold` to 0 before obtaining the UTC offset. Then we
> have:
>
> dt1 == dt2
> hash(dt1) == hash(dt2)
>
> Fine so far as it goes. Now do:
>
> u1 = dt1.astimezone(timezone.utc)
> u2 = dt2.astimezone(timezone.utc)
>
> At this point we have:
>
> u1 == dt1 == dt2 == u2 and u1 < u2
> hash(dt1) == hash(dt2) == hash(u1)
>
> (Parenthetically, note that despite the chain of equalities in the
> first of those lines, we do _not_ have u1 == u2 - transitivity fails,
> which is a bit of a wart by itself.)
>
> Since u1 == dt1, and hash(u1) == hash(dt1), no problem there either.
>
> But u1 isn't at all the same as u2, so hash(u2) can be the same as
> hash(u1) only by (unlikely) accident. hash(u2) is off in a world of
> its own. Therefore hash(dt2) can be the same as hash(u2) only by (the
> same unlikely) accident, despite that dt2 == u2.
>
> So, in all, __hash__ forcing fold=0 at the start hides the problem for
> ambiguous times in the same zone, but doesn't really touch the problem
> for cross-zone equivalent spellings of such times (not even if one of
> the zones is UTC, which is likely the most important case).
>
> One way to fix that is to have datetime.__hash__() _always_ return, say, 0 ;-)