[Datetime-SIG] Another approach to 495's glitches

Tim Peters tim.peters at gmail.com
Sun Sep 6 03:22:04 CEST 2015

Thinking out loud.  Right now, we're making interzone arithmetic
consistent at the expense of making intrazone operations baffling in
some fold edge cases.  I'd like to see if we could reverse that.
Partly because datetime "shouldn't have" supported by-magic interzone
arithmetic to begin with.  But mostly because, outside of Python's
test suite, I've never seen an instance of by-magic interzone
comparison or subtraction (it's certain none of my code ever used it,
and I've never seen it elsewhere in real code I can recall).

So, compared to what Python does today:

1. Intrazone.

Go back to what the first 495 stab did:  ignore fold entirely (act as
if it were always 0), including in hash().

2. Interzone.

A. Subtraction.  Change nothing.

B. Comparison.
B1. __eq__.  If either operand has fold=1, return False.
B2. __ne__.  If either operand has fold=1, return True.
B3. The others.  Change nothing.

The hash problem goes away, because equality transitivity is restored
in the cases it matters for the hash problem (under 2B1 a datetime
with fold=1 never compares equal to any datetime in a different zone).
Before (first 495 stab) we had, where `early` and `late` are the same
except for `fold`:

    uearly = early.astimezone(utc)
    ulate = late.astimezone(utc)

and then:

    uearly == early == late == ulate
    uearly < ulate
    hash(uearly) == hash(early) == hash(late)
    hash(ulate) almost certainly != to those,
        despite late == ulate

That made a high-quality & correct hash() exceedingly painful.  Now
(current 495 stab) we have:

    uearly == early < late == ulate
    hash(uearly) == hash(early)
    hash(ulate) == hash(late)

No problem there, but "early < late" within the zone is so at odds
with "naive time" that various kinds of endcase backwards
incompatibilty snuck in (some of which explained in great detail in
messages between Carl and me).  It "looks nice" because we _are_
favoring by-magic intrazone consistency at the expense of everything
else.  In endcases sticking within the zone, it doesn't always "look
nice" at all.

Under 2B1 and 2B2:

    uearly == early == late != ulate
    uearly < ulate
    hash(uearly) == hash(early) == hash(late)
    hash(ulate) almost certainly != to those,
        but that's fine since late != ulate,
                              early != ulate, and
                              uearly != ulate

What we lose is:

A. trichotomy in interzone comparison in rare cases.  Right above, we
have late != ulate, but we do _not_ have late < ulate or late > ulate
either.  We're forcing __eq__ to say they're not equal, despite that
otherwise comparison logic would say they are equal.

B. equivalence between interzone comparison and interzone subtraction
in rare cases.  Right above, we have late - ulate == 0 despite that
late != ulate.

C. equality transitivity in rare cases that don't affect the hash
problem.  Right above, `late` has fold=1 so 2B2 says it's not equal to
`uearly` or `ulate` (it's "not equal" to _any_ datetime in UTC).
However, we also have uearly == early == late, from which we could
normally infer uearly == late.

D. zone conversion isn't wholly order-preserving.  Right above, the
ambiguous times compare equal in their own zone, but map to != values
in UTC.  `early` and `late` are equal in their own zone but not in any
other zone where neither ends up with fold=1.

So, until I find something I missed ;-) , all the rare endcase
surprises are pushed into interzone operations I doubt are used much
(if at all).  Seems better than putting them in routinely used
intrazone operations.

For the docs, the spiel would be along the lines that fold=1 is a new
case, and for technical reasons an aware datetime with fold=1 can't
compare equal to any datetime in any other zone.  That's "really" all
this amounts to.  Apps that need interzone comparison or subtraction
should convert to UTC instead.  Then everything will work fine.  I'd
also say that by-magic interzone comparison and subtraction may be
deprecated someday.  Something to discourage its use.  Especially
because, in fact, I bet it's barely (if ever) used now.

Someone else's turn now ;-)

PS:  not quite yet.  All the examples above assumed PEP 495-compliant
tzinfos were in use.  As detailed in a message with Carl, there are
also "backward compatibility" issues to consider after 495 is
implement but pre-495 tzinfos are used.  Making early < late can cause
endcase surprises there too.  Under the idea here, as in the first 495
stab, those surprises go away again, because _nothing_ within a zone
will "see fold=1", not even the tzinfo (remember, it's a pre-495
tzinfo in this case).

More information about the Datetime-SIG mailing list