Re: [Datetime-SIG] Calendar vs timespan calculations...
[Tim]
Speaking of which, the current tzinfo API has no way to ask "is this an ambiguous time?"
[Alexander Belopolsky]
I was hoping that we would agree on the name of the flag before someone asks this question. :-)
You doubtless noted that I called it "first" near the end of my message without putting up any stink at all ;-)
With my proposal, a naive datetime t is ambiguous in timezone tz if
tz.utcoffset(t) < tz.utcoffset(t.replace(first=False))
or "is this an invalid (missing) time?"
Unless I'm missing your intent entirely, that's a fine illustration of my "The logic is bound to be annoying enough that we'd want to concentrate it in tzstrict". The problem I see is that the expression you gave can never be true. The math is indeed trivial, but a key part works "the opposite" of how even people who've thought a lot about it "instinctively" believe. Two cases: 1. t.first is False. Then the expression obviously returns false (the LHS and RHS are applied to two datetimes all of whose components - including .first - are the same, so both utcoffset()s return the same value, and "<" is false because they're equal). But if t.first (False) is telling the truth, t _is_ the later of two ambiguous times. So we wanted an expression that returned true in that subcase. The result is correct only when t.first being False is lying (i.e., when t is not the later of two ambiguous times, but t.first is False despite that). 2. t.first is True. 2a. And t is not an ambiguous time. Then I expect the two utcoffset()s return the same value, and the expression correctly returns False 2b. And t is an ambiguous time. Then t is the earlier of the two times (that's what t.first is True means in this case) , and the constructed datetime is the later. Obviously the earlier time should compare less than the later time, but that's not what's being compared. The offsets _from_ UTC are being compared, and it's the earlier time that has the _greater_ offset (that's the part 90% of people "instinctively" get backwards). So again the expression returns False incorrectly (although would be correct in this case if ">" were used instead - but then 90% of people would instinctively think the logic is backwards). So in all cases the expression computes False - unless I'm missing your intent entirely (in which case I trust you won't be shy about enlightening me :-) ). Why do people get this backwards? I've pondered that off & on for a long time. I think it goes like this: at a given UTC time u, then, say, u+1 is obviously an earlier time than u+2. So the greater the offset the later the time. That's intuitively obvious. What it wholly misses is that it's got nothing to do with what we're _trying_ to ask ;-) We're trying to ask about how times act on a non-UTC clock. In the bogus reasoning, u+1 and u+2 look an hour apart on the UTC clock, so are irrelevant to the real question. When looking at a non-UTC clock, the offsets have to be _subtracted_ from that clock's idea of time to determine corresponding UTC time, and it's the negation that reverses the sense of the comparison needed. For an ambiguous local time T: offset1 < offset2 # if and only if (negate, which also reverses the direction of comparison) - offset1 > - offset2 # if and only if (add T to both sides) T - offset1 > T - offset2 # if and only if (and now we have the UTC equivalents) UTCtime1 > UTCtime2 So we can't expect most people to get this right. Wouldn't this work? t is ambiguous if and only if tz.utcoffset(t) != tz.utcoffset(t.replace(first=not t.first)) That is, t is ambiguous iff the value of t.first makes a difference to the offset. I expect people _could_ get that right most of the time, but may have trouble remembering "the trick". But nobody could screw up what, say, a new tz.is_ambiguous(t) means.
I was hoping to sneak in a rule that for an invalid time t
tz.utcoffset(t) > tz.utcoffset(t.replace(first=False))
I don't want to try to figure out what that _really_ does, although as noted at the end of case 2b above that expression returns True when t.first is True and t is in fact the earlier of two ambiguous times. Because local "missing times" have no spelling in UTC, I doubt there's any way for simple .utcoffset() expressions to detect one reliably. IIRC, the Python docs say nothing whatsoever about how missing times are, or "should be", handled in conversion. But if the tzinfo class has any intelligence about the rules it's implementing, it should be easy for a new tz.is_missing_time(t) method to apply that intelligence. Or, say, just a single new tz.classify(t) method returning, say, an or'ing of flags from these two sets: # set 1 - exactly one will be in the result TZ_HAPPY_TIME = 1 TZ_MISSING_TIME = 2 TZ_AMBIGUOUS_TIME = 4 # set 2 - at most one will be in the result, and none with TZ_HAPPY_TIME TZ_DUE_TO_DST_TRANSITION = 64 TZ_DUE_TO_BASE_OFFSET_TRANSITION = 128
(I really don't want tz.utcoffset(t) to ever raise an exception)
Me neither.
and of course, for most of the times
tz.utcoffset(t) == tz.utcoffset(t.replace(first=False))
Agreement at last ;-) Although I'd spell it tz.utcoffset(t) == tz.utcoffset(t.replace(first=not t.first) as a pretty direct translation of "the value of t.first makes no difference".
The most important new question callers will want to resolve is "what should `first` (aka is_dst) be now?".
I want most callers
Gloss: by "callers" I mean not just Python users, but also people _implementing_ the new stuff. Perhaps you do too.
to be able to get away with not knowing that `first` exists and consistently get the earlier time from an ambiguous input and some "nearby" time from an invalid input.
In the case of a missing time, it's reasonable to guess they definitely intended a time later than the closest preceding (on the local clock) valid time. It's also reasonable to guess they definitely intended a time earlier than the closest succeeding (on the local clock) valid time. Happily, there is no possible local time satisfying both ;-) But there's no sensible way to compute either without knowing "the rules" (did DST cause us to miss an hour? an hour and 30 minutes? just 15 minutes? did a politician decree we lost 2 hours? in any case, how long ago did time _start_ to go missing? and when will it stop going missing?). Seems again a case that requires some intelligence _in_ the tzstrict class, not heroic efforts by callers restricted to utcoffset() alone.
A careful application will have to call tz.utcoffset() with both values of the flag and either warn about the default choice or ask the user for an additional input.
As above, how can one programatically pick a valid default when faced with a missing time? The zoneinfo-like databases express this stuff by giving parameters for a specific algorithm. One relatively simple rule can cover a vast span of time. "It's easy" for code that _knows_ that stuff about the timezone. If all the programmer can know is .utcoffset() results at specific instants of time, I expect the best they can do is loop, incrementing (or decrementing) by a naive minute at a time (Python restricts UTC offsets to multiples of a minute), until they find a roundtrip fixed point (i.e., the nearest local time that "gets itself back" when converted to UTC and back again). Isaac earlier sketched a mathematical framework for a different approach to computing UTC offsets, which explicitly materialized that it's a function made up of a sequence of continuous monotonically increasing functions ("jumps in time" are discontinuities in the range, and that's what separates one function from the next). The start and end of each function's domain is explicit, and so then also are the start and end of each function's image. This makes pretty much all conceivable elementary questions solvable by, at worst, forms of binary search.(e.g.,"is this a missing time", "if so, what's the next closest valid time (in either direction)", "how long until the next transition of any kind?", "how many transitions of any kind occurred in the past 1000 years?" ...). But it's far more general than needed for any real-world time zone - and there's no code for that either ;-)
participants (1)
-
Tim Peters