Mailman 3 August 2015 - Datetime-SIG

Re: [Datetime-SIG] Calendar vs timespan calculations...
by Ethan Furman 01 Aug '15

01 Aug '15

On 08/01/2015 11:18 AM, Alexander Belopolsky wrote: > On Sat, Aug 1, 2015 at 1:52 PM, Ethan Furman wrote: >> I'm happy to concede that counting backwards to get the start time is less >> frequent, and the times this hits/crosses a time shift are even less >> frequent, but that is all the more reason to refuse the temptation to guess. > > My goal here is to minimize the impact on the programs that are > already written and deployed. Why is this even a problem? Already written programs will not be using the new strict tzinfo. -- ~Ethan~

2 1

Re: [Datetime-SIG] Calendar vs timespan calculations...
by Ethan Furman 01 Aug '15

01 Aug '15

On 08/01/2015 08:27 AM, Alexander Belopolsky wrote: > In the case of the US-style spring jump from 01:59 to 03:00 AM, for t > = 02:30 AM, u0 is such that L(u0) = 03:30 AM (this is the "what a > meant when I said 02:30" time) and L(u1) = 01:30 AM. The problem here is that if somebody is counting backwards to get that 1:30, then the time they need is 12:30, not 2:30. As a case in point: Today I have veterinary appointment for my cat to check his medication levels; the appointment is at 14:30, and needs to be in the window of 4 to 6 hours of him taking his meds. Counting backwards from 14:30 gives me a window of 8:30 to 10:30 to administer his meds. I'm happy to concede that counting backwards to get the start time is less frequent, and the times this hits/crosses a time shift are even less frequent, but that is all the more reason to refuse the temptation to guess. -- ~Ethan~

2 1

Re: [Datetime-SIG] Calendar vs timespan calculations...
by Alexander Belopolsky 01 Aug '15

01 Aug '15

On Sat, Aug 1, 2015 at 10:36 AM, Alexander Belopolsky <alexander.belopolsky(a)gmail.com> wrote: > So how do you represent three outcomes [], [u] or [u0, u1] in a way that xG(t) > always works? My solution: > > [] -> [u1, u0] > [u] -> [u, u] > [u0, u1] -> [u0, u1] Let me clarify what I propose to return for the local time in a gap: the two values u1 and u0 are *not* solutions to L(u) = t. For t in a gap, no such solutions exist. Instead, u0 is the solution for L0(u) = t where L0 is L linearly extrapolated from the times before the gap forward and u1 is the solution for L1(u) = t where L1 is L linearly extrapolated from the times after the gap back. In the case of the US-style spring jump from 01:59 to 03:00 AM, for t = 02:30 AM, u0 is such that L(u0) = 03:30 AM (this is the "what a meant when I said 02:30" time) and L(u1) = 01:30 AM.

1 0

Re: [Datetime-SIG] Calendar vs timespan calculations...
by Alexander Belopolsky 01 Aug '15

01 Aug '15

On Sat, Aug 1, 2015 at 1:16 AM, Tim Peters <tim.peters(a)gmail.com> wrote: >> With my proposal, a naive datetime t is ambiguous in timezone tz if >> >> tz.utcoffset(t) < tz.utcoffset(t.replace(first=False)) >> >> or "is this an invalid (missing) time?" > > Unless I'm missing your intent entirely, that's a fine illustration of > my "The logic is bound to be annoying enough that we'd want to > concentrate it in tzstrict". The problem I see is that the expression > you gave can never be true. You are absolutely right and this is the intent. The challenge that I tried to solve was that local-to-global function (G(t)) can have 0, 1 or 2 values if defined as mathematical inverse of the global-to-local (L(u)) function. (Purists would say that this means that local-to-global is not a function, but I find it convenient to say that a function has multiple values when it returns a variable-length list.) At the same time, I wanted naive code u = G(t) to (a) work for all values of t; (b) produce correct result when L(u) = t has only one solution; (c) produce one of the "correct: results when L(u) = t has two solutions; and (d) produce "useful" result when L(u) = t has no solutions. This ruled out the obvious design where G(t) would return [], [u] or [u0, u1] because all naive code that used u = G(t) would have to be rewritten as u = G(t)[0] and you would still face an index error when t is in the gap. (I've recently learned this useful terminology: the interval of non-existent local times that occurs when you move the clock forward is called a "gap" and the the interval of ambiguous local times that occurs when you move the clock back is called a "fold".) The other solution was to give G(t) an additional argument so that you could specify the index into the returned list upfront: def xG(t, which=0). return G(t)[which] this makes the naive u = xG(t) code work in 99.99% of the cases, but you still face an occasional index error. So how do you represent three outcomes [], [u] or [u0, u1] in a way that xG(t) always works? My solution: [] -> [u1, u0] [u] -> [u, u] [u0, u1] -> [u0, u1] Note that this solution satisfies all my design criteria including (d). The results produced from the time in a gap are "useful" because the default xG(t) result is what most people mean when they specify the time in the gap: they do it because they are unaware of the time change and expect 02:30 AM to be 150 minutes after midnight not knowing that it will be called 03:30 AM. The other solution is also useful because it allows you to detect the time in the gap without calling L(u) on the result.

1 0

Re: [Datetime-SIG] Calendar vs timespan calculations...
by ISAAC J SCHWABACHER 01 Aug '15

01 Aug '15

> Isaac earlier sketched a mathematical framework for a different > approach to computing UTC offsets, which explicitly materialized that > it's a function made up of a sequence of continuous monotonically > increasing functions ("jumps in time" are discontinuities in the > range, and that's what separates one function from the next). The > start and end of each function's domain is explicit, and so then also > are the start and end of each function's image. This makes pretty > much all conceivable elementary questions solvable by, at worst, forms > of binary search.(e.g.,"is this a missing time", "if so, what's the > next closest valid time (in either direction)", "how long until the > next transition of any kind?", "how many transitions of any kind > occurred in the past 1000 years?" ...). *Almost* everything can be accomplished with tz.first_transition_after(dt) and tz.last_transition_at_or_before(dt) returning appropriate (trans_utc, before_info, after_info) tuples, but not quite. But yes, I think it would be valuable to expose the transition times in some way, though preferably not as a list since that would preclude POSIX-style time zones (which have an infinite number of such transitions). Does anyone else have a better idea for this API? > But it's far more general than needed for any real-world time zone - > and there's no code for that either ;-) Not yet. I had finally gotten to work on it and realized that the API I was going to propose was insufficient to the task. ijs

1 0

Re: [Datetime-SIG] Calendar vs timespan calculations...
by Ethan Furman 01 Aug '15

01 Aug '15

On 07/31/2015 10:16 PM, Tim Peters wrote: > [Alexander Belopolsky] >> (I really don't want tz.utcoffset(t) to ever raise an exception) > [Tim] > Me neither. Why not? If the programmer is using strict tzinfos how would they end up with an invalid t? I only see two ways: - constructing from a literal (in which case an exception should be raised) - t is using a non-strict or missing tzinfo, possibly from an addition or subtraction (in which case we can't know which direction they were going and should not guess -- so raise an exception) -- ~Ethan~

1 0

Re: [Datetime-SIG] Calendar vs timespan calculations...
by Tim Peters 01 Aug '15

01 Aug '15

[Łukasz Rekucki] >> What happens then when you substract a datetime with *strict* tzinfo >> and a *naive* one? Would A - B == - (B - A) still be true ? [Guido] > [re-adding the list] > > That's for the authors of the new PEP to decide, really, but I think it > could be made to follow the strict rules in both cases, since clearly the > code isn't an old program requiring backward compatibility (how would such a > program end up with a strict tzinfo?). This one solves itself: it's _already_ the case that subtraction of aware datetime objects uses timeline arithmetic _unless_ both datetimes share a .tzinfo member. If one uses a tzstrict instance and the other does not, it's impossible that they both use the same instance. So timeline arithmetic will be used in any such case. Maybe some sketchy pseudocode will make it more obvious: class datetime: ... def __sub__(x, y): # assume x and y are both aware datetimes if x.tzinfo is y.tzinfo: # compute the difference using classic arithmetic else: # compute the difference using timeline arithmetic It's been like that forever. Note that the order of the operands is irrelevant to which kind of arithmetic is used. The same applies, mutatis mutandis, to datetime comparison operations.. What does need to change is that "x.tzinfo is y.tzinfo" needs more qualification, so that two datetimes sharing the same tzstrict instance don't end up using classic subtraction. I'm sure the docs will become even more pleasant to read ;-)

1 0

Re: [Datetime-SIG] Calendar vs timespan calculations...
by Tim Peters 01 Aug '15

01 Aug '15

[Tim] >> Speaking of which, the current tzinfo API has no way to ask "is this >> an ambiguous time?" [Alexander Belopolsky] > I was hoping that we would agree on the name of the flag before > someone asks this question. :-) You doubtless noted that I called it "first" near the end of my message without putting up any stink at all ;-) > With my proposal, a naive datetime t is ambiguous in timezone tz if > > tz.utcoffset(t) < tz.utcoffset(t.replace(first=False)) > > or "is this an invalid (missing) time?" Unless I'm missing your intent entirely, that's a fine illustration of my "The logic is bound to be annoying enough that we'd want to concentrate it in tzstrict". The problem I see is that the expression you gave can never be true. The math is indeed trivial, but a key part works "the opposite" of how even people who've thought a lot about it "instinctively" believe. Two cases: 1. t.first is False. Then the expression obviously returns false (the LHS and RHS are applied to two datetimes all of whose components - including .first - are the same, so both utcoffset()s return the same value, and "<" is false because they're equal). But if t.first (False) is telling the truth, t _is_ the later of two ambiguous times. So we wanted an expression that returned true in that subcase. The result is correct only when t.first being False is lying (i.e., when t is not the later of two ambiguous times, but t.first is False despite that). 2. t.first is True. 2a. And t is not an ambiguous time. Then I expect the two utcoffset()s return the same value, and the expression correctly returns False 2b. And t is an ambiguous time. Then t is the earlier of the two times (that's what t.first is True means in this case) , and the constructed datetime is the later. Obviously the earlier time should compare less than the later time, but that's not what's being compared. The offsets _from_ UTC are being compared, and it's the earlier time that has the _greater_ offset (that's the part 90% of people "instinctively" get backwards). So again the expression returns False incorrectly (although would be correct in this case if ">" were used instead - but then 90% of people would instinctively think the logic is backwards). So in all cases the expression computes False - unless I'm missing your intent entirely (in which case I trust you won't be shy about enlightening me :-) ). Why do people get this backwards? I've pondered that off & on for a long time. I think it goes like this: at a given UTC time u, then, say, u+1 is obviously an earlier time than u+2. So the greater the offset the later the time. That's intuitively obvious. What it wholly misses is that it's got nothing to do with what we're _trying_ to ask ;-) We're trying to ask about how times act on a non-UTC clock. In the bogus reasoning, u+1 and u+2 look an hour apart on the UTC clock, so are irrelevant to the real question. When looking at a non-UTC clock, the offsets have to be _subtracted_ from that clock's idea of time to determine corresponding UTC time, and it's the negation that reverses the sense of the comparison needed. For an ambiguous local time T: offset1 < offset2 # if and only if (negate, which also reverses the direction of comparison) - offset1 > - offset2 # if and only if (add T to both sides) T - offset1 > T - offset2 # if and only if (and now we have the UTC equivalents) UTCtime1 > UTCtime2 So we can't expect most people to get this right. Wouldn't this work? t is ambiguous if and only if tz.utcoffset(t) != tz.utcoffset(t.replace(first=not t.first)) That is, t is ambiguous iff the value of t.first makes a difference to the offset. I expect people _could_ get that right most of the time, but may have trouble remembering "the trick". But nobody could screw up what, say, a new tz.is_ambiguous(t) means. > I was hoping to sneak in a rule that for an invalid time t > > tz.utcoffset(t) > tz.utcoffset(t.replace(first=False)) I don't want to try to figure out what that _really_ does, although as noted at the end of case 2b above that expression returns True when t.first is True and t is in fact the earlier of two ambiguous times. Because local "missing times" have no spelling in UTC, I doubt there's any way for simple .utcoffset() expressions to detect one reliably. IIRC, the Python docs say nothing whatsoever about how missing times are, or "should be", handled in conversion. But if the tzinfo class has any intelligence about the rules it's implementing, it should be easy for a new tz.is_missing_time(t) method to apply that intelligence. Or, say, just a single new tz.classify(t) method returning, say, an or'ing of flags from these two sets: # set 1 - exactly one will be in the result TZ_HAPPY_TIME = 1 TZ_MISSING_TIME = 2 TZ_AMBIGUOUS_TIME = 4 # set 2 - at most one will be in the result, and none with TZ_HAPPY_TIME TZ_DUE_TO_DST_TRANSITION = 64 TZ_DUE_TO_BASE_OFFSET_TRANSITION = 128 > (I really don't want tz.utcoffset(t) to ever raise an exception) Me neither. > and of course, for most of the times > > tz.utcoffset(t) == tz.utcoffset(t.replace(first=False)) Agreement at last ;-) Although I'd spell it tz.utcoffset(t) == tz.utcoffset(t.replace(first=not t.first) as a pretty direct translation of "the value of t.first makes no difference". >> The most important new question callers will want to resolve is "what should >> `first` (aka is_dst) be now?". > I want most callers Gloss: by "callers" I mean not just Python users, but also people _implementing_ the new stuff. Perhaps you do too. > to be able to get away with not knowing that > `first` exists and consistently get the earlier time from an ambiguous > input and some "nearby" time from an invalid input. In the case of a missing time, it's reasonable to guess they definitely intended a time later than the closest preceding (on the local clock) valid time. It's also reasonable to guess they definitely intended a time earlier than the closest succeeding (on the local clock) valid time. Happily, there is no possible local time satisfying both ;-) But there's no sensible way to compute either without knowing "the rules" (did DST cause us to miss an hour? an hour and 30 minutes? just 15 minutes? did a politician decree we lost 2 hours? in any case, how long ago did time _start_ to go missing? and when will it stop going missing?). Seems again a case that requires some intelligence _in_ the tzstrict class, not heroic efforts by callers restricted to utcoffset() alone. > A careful application will have to call tz.utcoffset() with both values of the > flag and either warn about the default choice or ask the user for an > additional input. As above, how can one programatically pick a valid default when faced with a missing time? The zoneinfo-like databases express this stuff by giving parameters for a specific algorithm. One relatively simple rule can cover a vast span of time. "It's easy" for code that _knows_ that stuff about the timezone. If all the programmer can know is .utcoffset() results at specific instants of time, I expect the best they can do is loop, incrementing (or decrementing) by a naive minute at a time (Python restricts UTC offsets to multiples of a minute), until they find a roundtrip fixed point (i.e., the nearest local time that "gets itself back" when converted to UTC and back again). Isaac earlier sketched a mathematical framework for a different approach to computing UTC offsets, which explicitly materialized that it's a function made up of a sequence of continuous monotonically increasing functions ("jumps in time" are discontinuities in the range, and that's what separates one function from the next). The start and end of each function's domain is explicit, and so then also are the start and end of each function's image. This makes pretty much all conceivable elementary questions solvable by, at worst, forms of binary search.(e.g.,"is this a missing time", "if so, what's the next closest valid time (in either direction)", "how long until the next transition of any kind?", "how many transitions of any kind occurred in the past 1000 years?" ...). But it's far more general than needed for any real-world time zone - and there's no code for that either ;-)

1 0