[Datetime-SIG] Another round on error-checking

Tim Peters tim.peters at gmail.com
Tue Sep 1 01:55:10 CEST 2015

>> After some thought, I believe the way to fix the implementation is what I
>> suggested at first: reset fold to 0 before calling utcoffset() in __hash__.
>> A rare hash collision is a small price to pay for having datetimes with
>> different timezones in the same dictionary.

> Ya, I can live with that.  In effect, we give up on converting to UTC
> correctly for purposes of computing hash(), but only in rare cases.
> hash() doesn't really care, and it remains true that datetime equality
> (which does care) still implies hash equality.  The later and earlier
> of ambiguous times will simply land on the same hash chain.

Nope, you wore me out prematurely ;-)

Consider datetimes dt1 and dt2 representing the earlier & later of an
ambiguous time in their common zone (whatever it may be - doesn't
matter).  Then all fields are identical except for `fold`.  Assume
__hash__ forces `fold` to 0 before obtaining the UTC offset.  Then we

    dt1 == dt2
    hash(dt1) == hash(dt2)

Fine so far as it goes.  Now do:

    u1 = dt1.astimezone(timezone.utc)
    u2 = dt2.astimezone(timezone.utc)

At this point we have:

    u1 == dt1 == dt2 == u2 and u1 < u2
    hash(dt1) == hash(dt2) == hash(u1)

(Parenthetically, note that despite the chain of equalities in the
first of those lines, we do _not_ have u1 == u2 - transitivity fails,
which is a bit of a wart by itself.)

Since u1 == dt1, and hash(u1) == hash(dt1), no problem there either.

But u1 isn't at all the same as u2, so hash(u2) can be the same as
hash(u1) only by (unlikely) accident.  hash(u2) is off in a world of
its own.  Therefore hash(dt2) can be the same as hash(u2) only by (the
same unlikely) accident, despite that dt2 == u2.

So, in all, __hash__ forcing fold=0 at the start hides the problem for
ambiguous times in the same zone, but doesn't really touch the problem
for cross-zone equivalent spellings of such times (not even if one of
the zones is UTC, which is likely the most important case).

One way to fix that is to have datetime.__hash__() _always_ return, say, 0 ;-)

More information about the Datetime-SIG mailing list