[Python-Dev] Status on PEP-431 Timezones

Sat Jul 25 01:06:36 CEST 2015

Well, I was going to stay silent, but math is something I can do without wasting anyone's time or embarrassing myself. I don't think this mail answers Lennart's concerns, but I do want to get it out there to compete with the comment in `datetime.py`. I apologize if the LaTeX density is too high; I don't trust that my e-mail client would transmit the message faithfully were I to render it myself.

I disagree with the view Tim had of time zones when he wrote that comment (and that code). It sounds like he views US/Eastern and US/Central as time zones (which they are), but thinks of the various America/Indiana zones as switching back and forth between them, rather than being time zones in their own right. I think the right perspective is that a time zone *is* the function that its `fromutc()` method implements, although of course we need additional information in order to actually compute (rather than merely mathematically  define) its inverse. Daylight Saving Time is a red herring, and assumptions 2 and 4 in that exposition are just wrong from this point of view. In the worst case, Asia/Riyadh's two years of solar time completely shatter these assumptions.

I'm convinced that the right viewpoint on this is to view local time and UTC time each as isomorphic to $\RR$ (i.e., effectively as UNIX timestamps, minus the oft-violated guarantee that timestamps are in UTC), and to consider the time zone as
    \[ fromutc : \RR \to \RR. \]
(Leap seconds are a headache for this perspective, but it can still support them with well-placed epicycles.) Then our assumptions (inspired by zoneinfo) about the nature of this map are as follows:

* $fromutc$ is piecewise defined, with each piece being continuous and strictly monotonic increasing.  Let us call the set of discontinuities $\{ utc_i \in \RR | i \in \ZZ \}$, where the labels are in increasing order, and define $fromutc_i$ to be the $i$-th  piece. (The theoretical treatment doesn't suffer if there are only finitely many discontinuities, since we can place additional piece boundaries at will where no discontinuities exist; obviously, an implementation would not take this view.) 
* The piece $fromutc_i : [utc_i, utc_{i+1}) \to [local_{start, i}, local_{end, i})$ and its inverse, which we will call $fromlocal_i$, are both readily computable.  In particular, this means that $local_{start, i} = fromutc(utc_i)$ and $local_{end, i}$ is the limit of $fromutc(t)$ as $t$ approaches $utc_{i+1}$ from the left, and that these values are known.  Note that the (tzfile(5))[1] format and (zic(8)[2]) both assume that $fromutc_i$ is of the form $t \mapsto t + off_i$, where $off_i$ is a constant.  This assumption is true in practice, but is stronger than we actually need.
* The sequences $\{ local_{start, i} | i \in \ZZ \}$ and $\{ local_{end, i} | i \in \ZZ \}$ are strictly increasing, and $local_{end, i-1} < local_{start, i+1}$ for all $i \in \ZZ$.  This final condition is enough to guarantee that the preimage of any local time under $fromutc$ contains at most two UTC times.  This assumption would be violated if, for example, some jurisdiction decided to fall back two hours by falling back one hour and then immediately falling back a second hour.  I recommend the overthrow of any such jurisdiction and its (annexation by the Netherlands)[3].

Without the third assumption, it's impossible to specify a UTC time by a (local time, time zone, DST flag) triple since there may be more than two UTC times corresponding to the same local time, and computing $fromlocal$ becomes more complicated, but the problem can still be solved by replacing the DST flag by an index into the preimage.  (Lennart, I think this third assumption is the important part of your "no changes within 48 hours of each other" assumption, which is violated by Asia/Riyadh. Is it enough?)

Once we take this view, computing $fromutc(t)$ is trivial: find $i$ with $utc_i \le t < utc_{i+1}$ by binary search (presumably optimized to an $O(1)$ average case by using a good initial guess), and compute $fromutc_i(t)$.

Computing $fromlocal(t)$ is somewhat more difficult.  The first thing to address is that, as written, $fromlocal$ is not a function; in order to make it one, we need to pass it more information.  We could define $fromlocal(t, i) = fromlocal_i(t)$, but that's too circular to be useful.  Likewise with my (silly) earlier proposal to store $(local, offset)$ pairs-- then $fromlocal(t, off) = t - off$.  What we really need is a (partial, which is better than multi-valued!) function $fromlocal : \RR \times \{True, False\} \to \RR$ that takes a local time and a DST flag and returns a UTC time.  We define $fromlocal(local, flag)$ to be the first $utc \in \RR$ such that $fromutc(utc) = local$ when $flag$ is $True$ and the last such $utc$ when $flag$ is $False$.  (Our implementation will presumably also allow $flag$ to be $None$, in which case we require $utc$ to be unique.)

To compute $fromlocal$, we'll begin with a lemma:
    If $t < utc_i$, then $fromutc(t) < local_{end, i-1}$; and likewise, if $utc_i \le t$, then $local_{start, i} \le fromutc(t)$.
If $t < utc_i$, then we must have $utc_j \le t < utc_{j+1}$ for some $j \le i-1$, so $local_{start, j} \le fromutc(t) < local_{end, j} \le local_{end, i-1}$ because $local_{end, i}$ is increasing in $i$.  The proof of the second part is similar, using the fact that $local_{start, i}$ is increasing.

Now we compute $fromlocal(t, True)$ by finding the minimal $i$ such that $t < local_{end, i}$. Then for $s < utc_i$ we must have by the lemma that $fromutc(s) < local_{end, i-1} < t$, so $s < fromlocal(t, True)$ if the latter exists. For $s \ge utc_i$ the lemma gives us $fromutc(s) \ge local_{start, i}$, so if $t < local_{start, i}$ we see that there is no $s$ with $fromutc(s) = t$, so $fromlocal(t, True)$ is undefined (which we will implement as a NonexistentTimeError). If on the other hand $t \ge local_{start, i}$, we have $t \in [local_{start, i}, local_{end, i})$, so that $fromutc(fromlocal_i(t)) = t$.  Because $fromlocal_i$ is monotonic on this interval, $fromlocal(t, True) = fromlocal_i(t)$ is minimal.  An analogous argument establishes that if $i$ is maximal with $local_{start, i} \le t$, then $fromlocal(t, False) = fromlocal_i(t)$ if $t < local_{end, i}$ and $fromlocal(t, False)$ is undefined otherwise.  All of these computations can be accomplished by searches of ordered lists and applications of $fromlocal_i$.

We can also include the option for $fromlocal(t, None)$ to require uniqueness by computing $fromlocal(t, True)$ and verifying that $t < local_{start, i+1}$ where $i$ is as in the computation of $fromlocal(t, True)$, raising an AmbiguousTimeError if this condition is not met.

Notice that the definition of time zone that I've given here does not mention Daylight Saving Time, but this isn't a problem in most cases because ambiguity happens almost exclusively at "fall back" transitions, in which the first period is DST and the second period is STD.  I argue that the rare ambiguities for which this does not hold are best resolved by divorcing our DST flag from Daylight Saving Time; that is, by defining the flag to mean "choose the first ambiguous time"; otherwise, we fail to handle jurisdictional transitions not involving DST.

With this perspective, arithmetic becomes "translate to UTC, operate, translate back", which is as it should be.  The only arithmetic that is performed outside of this framework is in the implementation of  $fromutc_i$ and $fromlocal_i$, which operate on naive timestamps. But IIUC what Lennart is complaining about is the fact that the DST flag isn't part of and can't be embedded into a local time, so it's impossible to fold the second parameter to $fromlocal$ into $t$.  Without that, a local time isn't rich enough to designate a single point in time and the whole edifice breaks.

ijs

[1]:  http://linux.die.net/man/5/tzfile
[2]: http://linux.die.net/man/8/zic
[3]: https://what-if.xkcd.com/53/

Top-posted from Microsoft Outlook Web App; may its designers be consigned for eternity to that circle of hell in which their dog food is consumed.

From: Python-Dev <python-dev-bounces+ischwabacher=wisc.edu at python.org> on behalf of Alexander Belopolsky <alexander.belopolsky at gmail.com>
Sent: Thursday, July 23, 2015 20:28
To: Lennart Regebro
Cc: Python-Dev
Subject: Re: [Python-Dev] Status on PEP-431 Timezones

On Thu, Jul 23, 2015 at 12:22 PM, Lennart Regebro  <regebro at gmail.com> wrote:

It turns out it's very complex to solve this when internally storing
the time as the local time. Basically you have to normalize the time
(ie check if daylight savings have changed) when doing arithmetic, but
normalize is doing arithmetic, and you get infinite recursion.   
This is not true.  Tim's analysis immortalized [1] at the end of the datetime.py file,
shows that UTC to local mapping can be unambiguously recovered from the
local to UTC rules using a simple finite algorithm.  Tim assumes [2] that standard (non-DST)
time offset is constant throughout the history, but this requirement can be relaxed to offset
changing no more than once in any 48 hour period (if you generously allow timezones
from -24 to 24 hours).

Actually, it looks like I am repeating what I wrote back in April, so I'll stop with a
reference [3] to that post.

[1]: https://hg.python.org/cpython/file/v3.5.0b1/Lib/datetime.py#l1935
[2]: https://hg.python.org/cpython/file/v3.5.0b1/Lib/datetime.py#l1948
[3]: https://mail.python.org/pipermail/python-dev/2015-April/139171.html