[Tim]
>> Speaking of which, the current tzinfo API has no way to ask "is this
>> an ambiguous time?"
[Alexander Belopolsky]
> I was hoping that we would agree on the name of the flag before
> someone asks this question. :-)
You doubtless noted that I called it "first" near the end of my
message without putting up any stink at all ;-)
> With my proposal, a naive datetime t is ambiguous in timezone tz if
>
> tz.utcoffset(t) < tz.utcoffset(t.replace(first=False))
>
> or "is this an invalid (missing) time?"
Unless I'm missing your intent entirely, that's a fine illustration of
my "The logic is bound to be annoying enough that we'd want to
concentrate it in tzstrict". The problem I see is that the expression
you gave can never be true. The math is indeed trivial, but a key
part works "the opposite" of how even people who've thought a lot
about it "instinctively" believe.
Two cases:
1. t.first is False. Then the expression obviously returns false (the
LHS and RHS are applied to two datetimes all of whose components -
including .first - are the same, so both utcoffset()s return the same
value, and "<" is false because they're equal). But if t.first
(False) is telling the truth, t _is_ the later of two ambiguous times.
So we wanted an expression that returned true in that subcase. The
result is correct only when t.first being False is lying (i.e., when t
is not the later of two ambiguous times, but t.first is False despite
that).
2. t.first is True.
2a. And t is not an ambiguous time. Then I expect the two
utcoffset()s return the same value, and the expression correctly
returns False
2b. And t is an ambiguous time. Then t is the earlier of the two
times (that's what t.first is True means in this case) , and the
constructed datetime is the later. Obviously the earlier time should
compare less than the later time, but that's not what's being
compared. The offsets _from_ UTC are being compared, and it's the
earlier time that has the _greater_ offset (that's the part 90% of
people "instinctively" get backwards). So again the expression
returns False incorrectly (although would be correct in this case if
">" were used instead - but then 90% of people would instinctively
think the logic is backwards).
So in all cases the expression computes False - unless I'm missing
your intent entirely (in which case I trust you won't be shy about
enlightening me :-) ).
Why do people get this backwards? I've pondered that off & on for a
long time. I think it goes like this: at a given UTC time u, then,
say, u+1 is obviously an earlier time than u+2. So the greater the
offset the later the time. That's intuitively obvious. What it
wholly misses is that it's got nothing to do with what we're _trying_
to ask ;-)
We're trying to ask about how times act on a non-UTC clock. In the
bogus reasoning, u+1 and u+2 look an hour apart on the UTC clock, so
are irrelevant to the real question. When looking at a non-UTC clock,
the offsets have to be _subtracted_ from that clock's idea of time to
determine corresponding UTC time, and it's the negation that reverses
the sense of the comparison needed. For an ambiguous local time T:
offset1 < offset2 # if and only if (negate, which also reverses
the direction of comparison)
- offset1 > - offset2 # if and only if (add T to both sides)
T - offset1 > T - offset2 # if and only if (and now we have the
UTC equivalents)
UTCtime1 > UTCtime2
So we can't expect most people to get this right.
Wouldn't this work? t is ambiguous if and only if
tz.utcoffset(t) != tz.utcoffset(t.replace(first=not t.first))
That is, t is ambiguous iff the value of t.first makes a difference to
the offset. I expect people _could_ get that right most of the time,
but may have trouble remembering "the trick". But nobody could screw
up what, say, a new tz.is_ambiguous(t) means.
> I was hoping to sneak in a rule that for an invalid time t
>
> tz.utcoffset(t) > tz.utcoffset(t.replace(first=False))
I don't want to try to figure out what that _really_ does, although as
noted at the end of case 2b above that expression returns True when
t.first is True and t is in fact the earlier of two ambiguous times.
Because local "missing times" have no spelling in UTC, I doubt there's
any way for simple .utcoffset() expressions to detect one reliably.
IIRC, the Python docs say nothing whatsoever about how missing times
are, or "should be", handled in conversion.
But if the tzinfo class has any intelligence about the rules it's
implementing, it should be easy for a new tz.is_missing_time(t) method
to apply that intelligence.
Or, say, just a single new tz.classify(t) method returning, say, an
or'ing of flags from these two sets:
# set 1 - exactly one will be in the result
TZ_HAPPY_TIME = 1
TZ_MISSING_TIME = 2
TZ_AMBIGUOUS_TIME = 4
# set 2 - at most one will be in the result, and none with TZ_HAPPY_TIME
TZ_DUE_TO_DST_TRANSITION = 64
TZ_DUE_TO_BASE_OFFSET_TRANSITION = 128
> (I really don't want tz.utcoffset(t) to ever raise an exception)
Me neither.
> and of course, for most of the times
>
> tz.utcoffset(t) == tz.utcoffset(t.replace(first=False))
Agreement at last ;-) Although I'd spell it
tz.utcoffset(t) == tz.utcoffset(t.replace(first=not t.first)
as a pretty direct translation of "the value of t.first makes no difference".
>> The most important new question callers will want to resolve is "what should
>> `first` (aka is_dst) be now?".
> I want most callers
Gloss: by "callers" I mean not just Python users, but also people
_implementing_ the new stuff. Perhaps you do too.
> to be able to get away with not knowing that
> `first` exists and consistently get the earlier time from an ambiguous
> input and some "nearby" time from an invalid input.
In the case of a missing time, it's reasonable to guess they
definitely intended a time later than the closest preceding (on the
local clock) valid time. It's also reasonable to guess they
definitely intended a time earlier than the closest succeeding (on the
local clock) valid time. Happily, there is no possible local time
satisfying both ;-) But there's no sensible way to compute either
without knowing "the rules" (did DST cause us to miss an hour? an
hour and 30 minutes? just 15 minutes? did a politician decree we
lost 2 hours? in any case, how long ago did time _start_ to go
missing? and when will it stop going missing?). Seems again a case
that requires some intelligence _in_ the tzstrict class, not heroic
efforts by callers restricted to utcoffset() alone.
> A careful application will have to call tz.utcoffset() with both values of the
> flag and either warn about the default choice or ask the user for an
> additional input.
As above, how can one programatically pick a valid default when faced
with a missing time? The zoneinfo-like databases express this stuff
by giving parameters for a specific algorithm. One relatively simple
rule can cover a vast span of time. "It's easy" for code that _knows_
that stuff about the timezone. If all the programmer can know is
.utcoffset() results at specific instants of time, I expect the best
they can do is loop, incrementing (or decrementing) by a naive minute
at a time (Python restricts UTC offsets to multiples of a minute),
until they find a roundtrip fixed point (i.e., the nearest local time
that "gets itself back" when converted to UTC and back again).
Isaac earlier sketched a mathematical framework for a different
approach to computing UTC offsets, which explicitly materialized that
it's a function made up of a sequence of continuous monotonically
increasing functions ("jumps in time" are discontinuities in the
range, and that's what separates one function from the next). The
start and end of each function's domain is explicit, and so then also
are the start and end of each function's image. This makes pretty
much all conceivable elementary questions solvable by, at worst, forms
of binary search.(e.g.,"is this a missing time", "if so, what's the
next closest valid time (in either direction)", "how long until the
next transition of any kind?", "how many transitions of any kind
occurred in the past 1000 years?" ...).
But it's far more general than needed for any real-world time zone -
and there's no code for that either ;-)