Trivial vs easy: .utcoffset()
Timezone conversion is mathematically trivial, but that doesn't mean it's obvious or easy. Details can really bite. tzinfo supplies .utcoffset(), which made converting _to_ UTC dead obvious. But how to convert _from_ UTC remained clear as mud. The default .fromutc() "gets it right" (as far as is possible without a fold/is_dst flag), but _only_ handles DST transitions that strictly alternate between "on" (although with a DST adjustment that may change each time) and "off" (a DST adjustment of exactly 0). Nothing fancier than that; e.g., no base offset changes. It was amazingly annoying to craft an efficient, correct (so far as it goes) implementation of just that much. Even then, hand-written tzinfo implementations had to express DST transition points in "standard time" for it to always work, instead of in natural local wall-clock times (end-of-DST is where that makes a difference). As Alex noted elsewhere, unlike the hand-written .utcoffset() implementations shown in the Python docs, most timezone sources (chiefly Olson - zoneinfo) effectively supply a .fromutc() implementation instead. Which makes converting from UTC dead obvious, but - surprise ;-) - leaves how to convert _to_ UTC (how to implement .utcoffset()) clear as mud instead. In a zoneinfo world, referring back to Guido's diagram a local datetime is staring at a chart with _no_ visible diagonal lines when looking right from the Y (local) axis; they're only visible when looking up from the X (UTC) axis. The hand-written tzinfo classes in the Python docs had the opposite problem, but implicitly left it to the default .fromutc() to figure out the invisible part so "the problem" isn't apparent in the docs. Stewart noted before that always using fixed-offset classes in pytz effectively supplies the missing is_dst bit, but it does more than just that: it effectively stores the datetime's current UTC offset too. The transition charts a pytz tzinfo sees always have a single, continuous diagonal line, visible from both axes. Easy peasy. In return, any operation on the datetime object that creates a new datetime but just copies the original tzinfo into the result may end up with a tzinfo that's no longer correct (lying about the UTC offset that's _appropriate_ for the new date and time). Hence the need to call .normalize() all over the place. If .normalize() were applied magically instead by Python internals, that need would go away, but then timeline arithmetic is "the natural" result - it's unclear to me that classic arithmetic _could_ be implemented if the result of every relevant operation added a "convert to UTC and back again, to get the appropriate current UTC offset" step at the end (not to mention how much slower much code would become). So how can .utcoffset() be computed efficiently in a zoneinfo world using "hybrid" tzinfo classes (tzinfos that are smart enough to figure out the appropriate offset all on their own)? It's like re-inventing the default .fromutc() all over again, but in the other direction in a much lumpier world. Of course there are many ideas. Rather than drone on about them, I'd like to put the puzzle out there in case a correct "duh - it's obvious, you moron" reply is just waiting for an invitation - but do note the "correct" ;-) BTW, after 15 minutes I wasn't able to convince myself I understood what dateutil's zoneinfo-wrapping's .utcoffset() was doing; and if I don't understand what it's doing, there's no way I can guess whether it's always correct. One obvious idea for a zoneinfo exhaustive-list-of-transitions-in-UTC world: precompute another exhaustive list of transitions, but expressed in local time (including "fold") mapping to the correct UTC offset at each point. That could pretty obviously work, but is essentially a way of implementing "poke and hope" in a simple, uniform way (via binary search). There's also that exhaustive lists of transition points is a doomed approach over time. zoneinfo supplies them through 2037 for the benefit of legacy clients, but they expect modern clients to use a POSIX TZ rule (stored in version 2 tzfiles) too. pytz and dateutil both ship with version 2 (or maybe version 3) tzfiles, but neither goes beyond using the version 1 exhaustive-list portion of tzfiles. So more fun is waiting there ;-)
participants (1)
-
Tim Peters