[Datetime-SIG] Another round on error-checking

Tim Peters tim.peters at gmail.com
Mon Aug 31 22:17:27 CEST 2015

>>     def __hash__(self):
>>         if self._hashcode == -1:
>>             tzoff = self.utcoffset()
>>             if tzoff is None:
>>                 self._hashcode =
>> hash(self.replace(first=True)._getstate()[0])
>>             else:
>>                 days = _ymd2ord(self.year, self.month, self.day)
>>                 seconds = self.hour * 3600 + self.minute * 60 +
>> self.second
>>                 self._hashcode = hash(timedelta(days, seconds,
>> self.microsecond) - tzoff)
>>         return self._hashcode
>> So it's the case that two datetimes that compare true may have
>> different hashes, when they represent the earlier and later times in a
>> fold.  I didn't say "it's a puzzle" lightly ;-)

> Yes, it looks like I have a bug there, but isn't fixing it just a matter of
> moving self.replace(first=True) up two lines?  Is there a bigger puzzle?
> Certainly x == y ⇒ hash(x) == hash(y) is the implication that I intend to
> preserve in all cases.

Yes, there's a bigger puzzle:  datetimes expressed in different
timezones can also compare equal.  Conceptually, they're converted to
UTC before comparison - and so also, to maintain the crucial hash
invariant, before being hashed.  That can't work right without using
their actual UTC offsets (i.e,, `first` can't be ignored for interzone
equality, but would be ignored for hashes if forcing `first` to 1 were
done before extracting the offset).

The real problem here is that this stuff just barely managed to work
from the start ;-)  In effect, for the purpose of hashing, _all_
datetimes are converted to UTC first now.  That didn't interfere with
the "naive time" view before because all possible insanities were
blithely ignored in all contexts before.

The easiest way out of this particular puzzle is, I believe, to say
that two datetimes identical except for `fold` do _not_ compare equal.
`fold` breaks the tie in the obvious way (the one with fold==1 is
"greater").  Then __hash__ can continue using the real UTC offset,
just as before.  If a user doesn't force `fold` to 1, then no existing
code will change behavior, at least until they start using 495
tzinfos.  Then `fold=1` can start appearing "by magic" via .fromutc().

