Re: [Datetime-SIG] Trivial vs easy: .utcoffset()

On Aug 29, 2015, at 2:50 PM, Tim Peters <tim.peters@gmail.com> wrote:
So how can .utcoffset() be computed efficiently in a zoneinfo world using "hybrid" tzinfo classes (tzinfos that are smart enough to figure out the appropriate offset all on their own)?
As I am learning more about Olson/IANA database, I am more and more convinced that Python approach is better than that of UNIX. The Python approach is to provide effectively local to UTC mapping via utcoffset() while UNIX approach is to provide UTC to local mapping via the localtime() function. Python then supplies fromutc() which is real simple in regular cases and I think implementable in a general case while UNIX supplies its mktime which is a poke six times and hope it is enough mess. The reason I think Python API is superior is because with exception of leap seconds, all transitions in Olson database are given in local time in the raw files. The raw files then get "compiled" so that localtime() can be implemented efficiently and Olson never supplies his own mktime as far as I can tell. A familiar example where DST rules are simpler when formulated in local time are the US rules. In local time, all three (or four?) US zones have exactly the same rule - fall-back at 2am on first Sunday in November and spring-forward at the 2am on the second Sunday in March. Expressed in UTC, the transitions will be all at a different hour and may not even happen on the same day. I think the future of TZ support in Python is to come up with some automatic way to translate from raw Olson files to utcoffset()/dst()/tzname() implementations and invent some clever fromutc() algorithm that will correctly "invert" y = x - x.utcoffset() in all cases. For the later task, I have a solution in my prototype branch, but it requires up to six calls to utcoffset() which may indeed be the best on can do and it is not a coincidence that the number of calls in the worst case is the same as the number of pokes in mktime.

[Alexander Belopolsky <alexander.belopolsky@gmail.com>]
As I am learning more about Olson/IANA database, I am more and more convinced that Python approach is better than that of UNIX.
While I'm leaning more & more to the opposite conclusion ;-) That is, you can't fight crushing success. Like IEEE-754 was for binary floating point, zoneinfo is a "category killer". It seems very likely that no competing approach will ever attract enough interest to get anywhere, The number of people who truly care enough to even try can be counted on two middle fingers. Since the zoneinfo data so strongly favors UTC->local conversions, the only sane way to play along with it is to view .fromutc() as the primary tzinfo method and .utcoffset() as a possibly horridly slow afterthought. And then there's dateutil's wrapping. Amazingly enough, it inherits the default .fromutc(), despite that zoneinfo data makes that direction hard _not_ to get right in all cases.
The Python approach is to provide effectively local to UTC mapping via utcoffset() while UNIX approach is to provide UTC to local mapping via the localtime() function.
I expect that's mostly because the UNIX tradition strongly favors setting the system clock to use UTC. UTC->local conversions may be needed countless times each day just in ordinary use by people who couldn't care less about timezones (except that they want to see their own local time). Windows solves that program by running the system clock _in_ local time. What could possibly go wrong? ;-)
Python then supplies fromutc() which is real simple in regular cases and I think implementable in a general case
Except "it sucks" when a system-supplied function doesn't handle all cases. I spent most of my career working for computer design companies. More than once, the HW guys and the bosses would come with questions like "oops! we missed a gate in the ALU, and sometimes addition may not propagate a carry from bit 12 into bit 13 - will that be a problem for you guys?". When HW product releases and millions of dollars are on the line, it's real tempting to say "hey, no problem - ship it! if they really care, they can cross-check their additions with an abacus:" ;-)
while UNIX supplies its mktime which is a poke six times and hope it is enough mess.
Which is my original puzzle: _given_ that the zoneinfo world apparently dioesn't care much about local->UTC conversions, is mktime the best that can be done?
The reason I think Python API is superior is because with exception of leap seconds, all transitions in Olson database are given in local time in the raw files.
Well, they don't have to be in the plain text data files, that's just the default. And overwhelmingly most common. There are also ways to say "but this time is in the zone's 'standard time` regardless of is_dst", and "but this time is in UTC". I believe the _intent_ of all this is to specify the rules using whatever scheme the political authority announcing the rules used (to reduce errors, and to ease independent verification against source materials).
The raw files then get "compiled" so that localtime() can be implemented efficiently
Yes, the _explicit_ transition lists are all converted to POSIX timestamps (UTC seconds-from-the-epoch). But all the POSIX TZ rules generated in versions > 1 I've seen use the POSIX "local wall clock time" convention.
and Olson never supplies his own mktime as far as I can tell.
This old implementation has his name on it: http://www.opensource.apple.com/source/ntp/ntp-13/ntp/libntp/mktime.c I kinda like it. It doesn't try hard to be clever. At heart, it does a binary search over all possible time_t values, calling localtime() on each until it finally manages to reproduce the input. But a comment notes that it failed at first in some cases because it didn't take is_dst into account. That was repaired by assuming DST transitions are all exactly one hour, and: /* * So, if you live somewhere in the world where dst is not 60 minutes offset, * and your vendor doesn't supply mktime(), you'll have to edit this variable * by hand. Sorry about that. */ Alas, I'm still capable of being embarrassed ;-)
A familiar example where DST rules are simpler when formulated in local time are the US rules. In local time, all three (or four?) US zones
There are four major US zones. But people keep forgetting US places like Hawaii and Alaska and far east Maine (Atlantic Standard Time). I believe there are 9(!) "US" zones now.
have exactly the same rule - fall-back at 2am on first Sunday in November and spring-forward at the 2am on the second Sunday in March. Expressed in UTC, the transitions will be all at a different hour and may not even happen on the same day.
Well, the _writer_ of zoneinfo rules gets to use local times, so it's no problem for them. The UTC transition times in the binary tzfiles are indeed an irregular mess. But having an exhaustive list of transitions makes many tasks easy to code. zoneinfo seems determined to make UTC->local quick & reliable regardless of data-space burden.
I think the future of TZ support in Python is to come up with some automatic way to translate from raw Olson files to utcoffset()/dst()/tzname() implementations and invent some clever fromutc() algorithm that will correctly "invert" y = x - x.utcoffset() in all cases.
Which I'm afraid is backwards. since the overwhelmingly most important source of timezone data makes UTC->local easy, at least until 2038. That's why ".utcoffset()" is in the Subject line: zoneinfo hands us .fromutc() on a silver platter. It's .utcoffset() that's the puzzle now. After zoneinfo is wrapped in the post-495 world, it's possible nobody will ever write a tzinfo again :-)
For the later task, I have a solution in my prototype branch, but it requires up to six calls to utcoffset() which may indeed be the best on can do and it is not a coincidence that the number of calls in the worst case is the same as the number of pokes in mktime.
In which case it's about equally messy either way, yes? I haven't stared at mktime() in anger. Is there anything it could be _told_ about the local zone that could ease its job? For example, being told in advance the largest possible difference between adjacent UTC offsets? The smallest granularity of differences between adjacent UTC offsets? A list of all possible deltas between adjacent UTC offsets? Anything along those lines. tzfiles don't answer those questions directly, but it's easy to compute things like that while loading the file.
participants (2)
-
Alexander Belopolsky
-
Tim Peters