[Datetime-SIG] Trivial vs easy: .utcoffset()

Sat Aug 29 20:50:33 CEST 2015

Timezone conversion is mathematically trivial, but that doesn't mean
it's obvious or easy.  Details can really bite.

tzinfo supplies .utcoffset(), which made converting _to_ UTC dead
obvious.  But how to convert _from_ UTC remained clear as mud.  The
default .fromutc() "gets it right" (as far as is possible without a
fold/is_dst flag), but _only_ handles DST transitions that strictly
alternate between "on" (although with a DST adjustment that may change
each time) and "off" (a DST adjustment of exactly 0).  Nothing fancier
than that; e.g., no base offset changes.  It was amazingly annoying to
craft an efficient, correct (so far as it goes) implementation of just
that much.  Even then, hand-written tzinfo implementations had to
express DST transition points in "standard time" for it to always
work, instead of in natural local wall-clock times (end-of-DST is
where that makes a difference).

As Alex noted elsewhere, unlike the hand-written .utcoffset()
implementations shown in the Python docs, most timezone sources
(chiefly Olson - zoneinfo) effectively supply a .fromutc()
implementation instead.  Which makes converting from UTC dead obvious,
but - surprise ;-) - leaves how to convert _to_ UTC (how to implement
.utcoffset()) clear as mud instead.

In a zoneinfo world, referring back to Guido's diagram a local
datetime is staring at a chart with _no_ visible diagonal lines when
looking right from the Y (local) axis; they're only visible when
looking up from the X (UTC) axis.  The hand-written tzinfo classes in
the Python docs had the opposite problem, but implicitly left it to
the default .fromutc() to figure out the invisible part so "the
problem" isn't apparent in the docs.

Stewart noted before that always using fixed-offset classes in pytz
effectively supplies the missing is_dst bit, but it does more than
just that:  it effectively stores the datetime's current UTC offset
too.  The transition charts a pytz tzinfo sees always have a single,
continuous diagonal line, visible from both axes.  Easy peasy.  In
return, any operation on the datetime object that creates a new
datetime but just copies the original tzinfo into the result may end
up with a tzinfo that's no longer correct (lying about the UTC offset
that's _appropriate_ for the new date and time).  Hence the need to
call .normalize() all over the place.  If .normalize() were applied
magically instead by Python internals, that need would go away, but
then timeline arithmetic is "the natural" result - it's unclear to me
that classic arithmetic _could_ be implemented if the result of every
relevant operation added a "convert to UTC and back again, to get the
appropriate current UTC offset" step at the end (not to mention how
much slower much code would become).

So how can .utcoffset() be computed efficiently in a zoneinfo world
using "hybrid" tzinfo classes (tzinfos that are smart enough to figure
out the appropriate offset all on their own)?  It's like re-inventing
the default .fromutc() all over again, but in the other direction in a
much lumpier world.

Of course there are many ideas.  Rather than drone on about them, I'd
like to put the puzzle out there in case a correct "duh - it's
obvious, you moron" reply is just waiting for an invitation - but do
note the "correct" ;-)

BTW, after 15 minutes I wasn't able to convince myself I understood
what dateutil's zoneinfo-wrapping's .utcoffset() was doing; and if I
don't understand what it's doing, there's no way I can guess whether
it's always correct.  One obvious idea for a zoneinfo
exhaustive-list-of-transitions-in-UTC world:  precompute another
exhaustive list of transitions, but expressed in local time (including
"fold") mapping to the correct UTC offset at each point.  That could
pretty obviously work, but is essentially a way of implementing "poke
and hope" in a simple, uniform way (via binary search).

There's also that exhaustive lists of transition points is a doomed
approach over time.  zoneinfo supplies them through 2037 for the
benefit of legacy clients, but they expect modern clients to use a
POSIX TZ rule (stored in version 2 tzfiles) too.  pytz and dateutil
both ship with version 2 (or maybe version 3) tzfiles, but neither
goes beyond using the version 1 exhaustive-list portion of tzfiles.
So more fun is waiting there ;-)