[Datetime-SIG] Trivial vs easy: .utcoffset()

Sun Aug 30 03:49:58 CEST 2015

[Alexander Belopolsky <alexander.belopolsky at gmail.com>]
> As I am learning more about Olson/IANA database, I am
> more and more convinced that Python approach is better
> than that of UNIX.

While I'm leaning more & more to the opposite conclusion ;-)  That is,
you can't fight crushing success.  Like IEEE-754 was for binary
floating point, zoneinfo is a "category killer".  It seems very likely
that no competing approach will ever attract enough interest to get
anywhere,  The number of people who truly care enough to even try can
be counted on two middle fingers.

Since the zoneinfo data so strongly favors UTC->local conversions, the
only sane way to play along with it is to view .fromutc() as the
primary tzinfo method and .utcoffset() as a possibly horridly slow
afterthought.

And then there's dateutil's wrapping.  Amazingly enough, it inherits
the default .fromutc(), despite that zoneinfo data makes that
direction hard _not_ to get right in all cases.

> The Python approach is to provide effectively local to UTC mapping
> via utcoffset() while UNIX approach is to provide UTC to local mapping
> via the localtime() function.

I expect that's mostly because the UNIX tradition strongly favors
setting the system clock to use UTC.  UTC->local conversions may be
needed countless times each day just in ordinary use by people who
couldn't care less about timezones (except that they want to see their
own local time).  Windows solves that program by running the system
clock _in_ local time.  What could possibly go wrong? ;-)

> Python then supplies fromutc() which is real simple in regular cases
> and I think implementable in a general case

Except "it sucks" when a system-supplied function doesn't handle all
cases.  I spent most of my career working for computer design
companies.  More than once, the HW guys and the bosses would come with
questions like "oops!  we missed a gate in the ALU, and sometimes
addition may not propagate a carry from bit 12 into bit 13 - will that
be a problem for you guys?".  When HW product releases and millions of
dollars are on the line, it's real tempting to say "hey, no problem -
ship it!  if they really care, they can cross-check their additions
with an abacus:" ;-)

> while UNIX supplies its mktime which is a poke six times and hope
> it is enough mess.

Which is my original puzzle:  _given_ that the zoneinfo world
apparently dioesn't care much about local->UTC conversions, is mktime
the best that can be done?

> The reason I think Python API is superior is because with exception
> of leap seconds, all transitions in Olson database are given in local
> time in the raw files.

Well, they don't have to be in the plain text data files, that's just
the default.  And overwhelmingly most common.  There are also ways to
say "but this time is in the zone's 'standard time` regardless of
is_dst", and "but this time is in UTC".  I believe the _intent_ of all
this is to specify the rules using whatever scheme the political
authority announcing the rules used (to reduce errors, and to ease
independent verification against source materials).

> The raw files then get "compiled" so that localtime() can be implemented
> efficiently

Yes, the _explicit_ transition lists are all converted to POSIX
timestamps (UTC seconds-from-the-epoch).  But all the POSIX TZ rules
generated in versions > 1 I've seen use the POSIX "local wall clock
time" convention.

> and Olson never supplies his own mktime as far as I can tell.

This old implementation has his name on it:

    http://www.opensource.apple.com/source/ntp/ntp-13/ntp/libntp/mktime.c

I kinda like it.  It doesn't try hard to be clever.  At heart, it does
a binary search over all possible time_t values, calling localtime()
on each until it finally manages to reproduce the input.  But a
comment notes that it failed at first in some cases because it didn't
take is_dst into account.  That was repaired by assuming DST
transitions are all exactly one hour, and:

/*
 * So, if you live somewhere in the world where dst is not 60 minutes offset,
 * and your vendor doesn't supply mktime(), you'll have to edit this variable
 * by hand.  Sorry about that.
 */

Alas, I'm still capable of being embarrassed ;-)

> A familiar example where DST rules are simpler when formulated
> in local time are the US rules.  In local time, all three (or four?) US
> zones

There are four major US zones.  But people keep forgetting US places
like Hawaii and Alaska and far east Maine (Atlantic Standard Time).  I
believe there are 9(!) "US" zones now.

> have exactly the same rule - fall-back at 2am on first Sunday in November
> and spring-forward at the 2am  on the second Sunday in March.  Expressed
> in UTC, the transitions will be all at a different hour and may not even
> happen on the same day.

Well, the _writer_ of zoneinfo rules gets to use local times, so it's
no problem for them.  The UTC transition times in the binary tzfiles
are indeed an irregular mess.  But having an exhaustive list of
transitions makes many tasks easy to code.  zoneinfo seems determined
to make UTC->local quick & reliable regardless of data-space burden.

> I think the future of TZ support in Python is to come up with some
> automatic way to translate from raw Olson files to
> utcoffset()/dst()/tzname() implementations and invent some clever
> fromutc() algorithm that will correctly "invert" y = x - x.utcoffset() in all cases.

Which I'm afraid is backwards. since the overwhelmingly most important
source of timezone data makes UTC->local easy, at least until 2038.
That's why ".utcoffset()" is in the Subject line:  zoneinfo hands us
.fromutc() on a silver platter.  It's .utcoffset() that's the puzzle
now.  After zoneinfo is wrapped in the post-495 world, it's possible
nobody will ever write a tzinfo again :-)

> For the later task, I have a solution in my prototype branch, but it
> requires up to six calls to utcoffset() which may indeed be the best
> on can do and it is not a coincidence that the number of calls in the
> worst case is the same as the number of pokes in mktime.

In which case it's about equally messy either way, yes?

I haven't stared at mktime() in anger.  Is there anything it could be
_told_ about the local zone that could ease its job?  For example,
being told in advance the largest possible difference between adjacent
UTC offsets?  The smallest granularity of differences between adjacent
UTC offsets?  A list of all possible deltas between adjacent UTC
offsets?

Anything along those lines.  tzfiles don't answer those questions
directly, but it's easy to compute things like that while loading the
file.