[Datetime-SIG] Another round on error-checking
Guido van Rossum
guido at python.org
Tue Sep 1 18:12:53 CEST 2015
I could not accept a PEP that leads to different datetime being considered
== but having a different hash (*unless* due to a buggy tzinfo subclass
implementation -- however no historical timezone data should ever depend on
such a bug).
I'm much less concerned about < being intransitive in edge cases. I also
don't particularly care about == following from the difference being zero.
Still, unless we're constrained by backward compatibility, I would rather
not add equivalence between *any* two datetimes whose tzinfo is not the
same object -- even if we can infer that they both must refer to the same
instant.
On Tue, Sep 1, 2015 at 8:59 AM, Alexander Belopolsky <
alexander.belopolsky at gmail.com> wrote:
>
>
> On Mon, Aug 31, 2015 at 9:01 PM, Tim Peters <tim.peters at gmail.com> wrote:
>
>> [Alex]
>> > It's getting late in my TZ, but what you are saying below sounds like a
>> > complaint that if you put t=second 01:30 as a key in the dictionary, you
>> > cannot later retrieve it by looking up t.astimezone(timezone.utc).
>>
>> I don't grasp that. What I am saying should be very clear: under all
>> schemes so far, there are datetimes x and y such that x == y but
>> hash(x) != hash(y). You see that or you don't. If you don't, I'll
>> keep trying until you do ;-) So do you see that?
>>
>>
> I do (in the morning.)
>
>> ..
>>
> [Alex]
>
>>
>> > Maybe if we decide to do something with the arithmetic, we will be able
>> to
>> > fix this wart as well.
>>
>
> [Tim]
>
>> Doubt it - this has nothing to do with arithmetic I can see. It's a
>> consequence of wanting to ignore `fold` in contexts where it really
>> does make a difference. __hash__() is one such place.
>>
>
> Arithmetic and comparisons are intertwined as long as you require that
> not(a - b) ⇔ a==b. The main problem for hash as I see it is that x == y
> may or may not call x.utcoffset() depending on the value of y. This is a
> problem for hash(x) which should decide whether
>
>>
>> Like I said at the start, it's a puzzle.
>>
>
> Let's formulate the puzzle: Define datetime.__hash__ so that given PEP 495
> rules for datetime.__eq__, x == y implies hash(x) == hash(y).
>
> First (trivial) observation: a solution exists, e.g. hash(x) == 0. This
> is not a very practical solution, but shows that the puzzle is not a
> logical impossibility.
>
> Second observation: We cannot improve on hash(x) == 0 without some
> knowledge of what timezones are known to the system. Proof: let u1 < u2 be
> two arbitrary UTC times. We can always construct a timezone (FoldZone)
> where u1 and u2 map to the same local time. All we need to do is to create
> a fold of size u2 - u1 at some time u between u1 and u2. Let t1 =
> u1.astimezone(FoldZone)
> and t2 = u2.astimezone(FoldZone). By construction, t1 == t2, t1.fold = 0
> and t2.fold = 1. If x == y implies hash(x) == hash(y), then u1 == t1
> implies hash(u1) == hash(t1) and similarly u2 == t2 implies hash(u2) ==
> hash(t2) and t1 == t2 implies hash(t1) == hash(t2). Since hash values are
> integers and == is transitive for integers, from a chain hash(u1) ==
> hash(t1), hash(t1) == hash(t2), hash(t2) == hash(u2 )we conclude that
> hash(u1) == hash(u2) and therefore the only solution is hash(x) == const.
>
> This sounds discouraging, but note that the FoldZone that we constructed
> is rather unrealistic because depending on the values of u1 and u2, the
> size of the fold can range from microseconds to centuries.
>
> Third observation: If we have only one variable offset timezone (Local),
> then we can solve the problem by defining datetime.__hash__(self) as for
> example, hash(self.astimezone(Local).replace(fold=0) - datetime(1, 1, 1,
> tzinfo=Local)). Note that in the last expression, hash is taken of a
> timedelta object, so the definition is not circular. (A proof that x == y
> implies hash(x) == hash(y) in this case is left as an exercise for the
> reader.:-)
>
> Fourth observation: A solution for one variable offset timezone
> generalizes to the case of an arbitrary number of such timezones. A
> theoretical construction is to simply iterate x =
> x.astimezone(Zone).replace(fold=0) over all the zones known to the system,
> but certainly a more efficient algorithm can be devised to to achieve the
> same result in a single lookup in a specially crafted table.
>
> So the puzzle is not unsolvable, but how much of it has to be solved in
> PEP 495? I would say not much. I agree with Tim that non-transitivity of
> == and the violation of the hash invariant need to be mentioned in the
> PEP. However, since PEP 495 by itself does not introduce any new tzinfo
> implementations and the existing fixed offset timezones don't suffer from
> this problem, I think we can leave the final resolution to the
> timezone.local or the zoneinfo PEP.
>
> An important lesson is in the second observation. To solve the hash
> puzzle, we need to have a global view of the totality of timezones that
> will be supported by the system.
>
> _______________________________________________
> Datetime-SIG mailing list
> Datetime-SIG at python.org
> https://mail.python.org/mailman/listinfo/datetime-sig
> The PSF Code of Conduct applies to this mailing list:
> https://www.python.org/psf/codeofconduct/
>
--
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/datetime-sig/attachments/20150901/8fd613c7/attachment.html>
More information about the Datetime-SIG
mailing list