[Datetime-SIG] Another round on error-checking

Alexander Belopolsky alexander.belopolsky at gmail.com
Tue Sep 1 17:59:15 CEST 2015


On Mon, Aug 31, 2015 at 9:01 PM, Tim Peters <tim.peters at gmail.com> wrote:

> [Alex]
> > It's getting late in my TZ, but what you are saying below sounds like a
> > complaint that if you put t=second 01:30 as a key in the dictionary, you
> > cannot later retrieve it by looking up t.astimezone(timezone.utc).
>
> I don't grasp that.  What I am saying should be very clear:  under all
> schemes so far, there are datetimes x and y such that x == y but
> hash(x) != hash(y).  You see that or you don't.  If you don't, I'll
> keep trying until you do ;-)  So do you see that?
>
>
I do (in the morning.)

> ..
>
[Alex]

>
> > Maybe if we decide to do something with the arithmetic, we will be able
> to
> > fix this wart as well.
>

[Tim]

> Doubt it - this has nothing to do with arithmetic I can see.  It's a
> consequence of wanting to ignore `fold` in contexts where it really
> does make a difference.  __hash__() is one such place.
>

Arithmetic and comparisons are intertwined as long as you require that
not(a - b) ⇔ a==b.  The main problem for hash as I see it is that x == y
may or may not call x.utcoffset() depending on the value of y.  This is a
problem for hash(x) which should decide whether

>
> Like I said at the start, it's a puzzle.
>

Let's formulate the puzzle: Define datetime.__hash__ so that given PEP 495
rules for datetime.__eq__, x == y implies hash(x) == hash(y).

First (trivial) observation:  a solution exists, e.g. hash(x) == 0.  This
is not a very practical solution, but shows that the puzzle is not a
logical impossibility.

Second observation: We cannot improve on hash(x) == 0 without some
knowledge of what timezones are known to the system. Proof: let u1 < u2 be
two arbitrary UTC times.   We can always construct a timezone (FoldZone)
where u1 and u2 map to the same local time.  All we need to do is to create
a fold of size u2 - u1 at some time u between u1 and u2.  Let t1 =
u1.astimezone(FoldZone)
and t2 = u2.astimezone(FoldZone).  By construction, t1 == t2, t1.fold = 0
and t2.fold = 1.  If x == y implies hash(x) == hash(y), then u1 == t1
implies hash(u1) == hash(t1) and similarly u2 == t2 implies hash(u2) ==
hash(t2) and t1 == t2 implies hash(t1) == hash(t2).   Since hash values are
integers and == is transitive for integers, from a chain hash(u1) ==
hash(t1), hash(t1) == hash(t2), hash(t2) == hash(u2 )we conclude that
hash(u1) == hash(u2) and therefore the only solution is hash(x) == const.

This sounds discouraging, but note that the FoldZone that we constructed is
rather unrealistic because depending on the values of u1 and u2, the size
of the fold can range from microseconds to centuries.

Third observation:  If we have only one variable offset timezone (Local),
then we can solve the problem by defining datetime.__hash__(self) as for
example, hash(self.astimezone(Local).replace(fold=0) - datetime(1, 1, 1,
tzinfo=Local)).  Note that in the last expression, hash is taken of a
timedelta object, so the definition is not circular.  (A proof that x == y
implies hash(x) == hash(y) in this case is left as an exercise for the
reader.:-)

Fourth observation:  A solution for one variable offset timezone
generalizes to the case of an arbitrary number of such timezones.  A
theoretical construction is to simply iterate x =
x.astimezone(Zone).replace(fold=0) over all the zones known to the system,
but certainly a more efficient algorithm can be devised to to achieve the
same result in a single lookup in a specially crafted table.

So the puzzle is not unsolvable, but how much of it has to be solved in PEP
495?  I would say not much.  I agree with Tim that non-transitivity of ==
and the violation of the hash invariant need to be mentioned in the PEP.
However, since PEP 495 by itself does not introduce any new tzinfo
implementations and the existing fixed offset timezones don't suffer from
this problem, I think we can leave the final resolution to the
timezone.local or the zoneinfo PEP.

An important lesson is in the second observation.  To solve the hash
puzzle, we need to have a global view of the totality of timezones that
will be supported by the system.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/datetime-sig/attachments/20150901/cda02d1f/attachment.html>


More information about the Datetime-SIG mailing list