[Python-Dev] Status on PEP-431 Timezones

Mon Jul 27 23:10:12 CEST 2015

[Paul Moore]
> ...
> I think the following statements are true. If they aren't, I'd
> appreciate clarification. I'm going to completely ignore leap seconds
> in the following - I hope that's OK, I don't understand leap seconds
> *at all* and I don't work in any application areas where they are
> relevant (to my knowledge) so I feel that for my situation, ignoring
> them (and being able to) is reasonable.

Guido will never allow any aspect of "leap seconds" into the core,
although it's fine by him if someone wants to write their own tzinfo
class to try to model them.

> Note that I'm not talking about internal representations - this is
> purely about user-visible semantics.
>
> 1. "Naive" datetime arithmetic means treating a day as 24 hours, an
> hour as 60 minutes, etc. Basically base-24/60/60 arithmetic.

It also means that the tzinfo(s) member (if any) is(are) ignored.  So
not only leap seconds are ignored:

1. Possible DST transitions are ignored.
2. Possible changes to the base UTC offset are ignored.
3. Possible changes to the name of the time zone (even if "the rules"
don't change) are ignored.
4. Everything else whatsover that could be learned from the tzinfo
member is ignored.

Note that in "aware" arithmetic, the current fromutc() implementation
is only strong enough to account reliably for #1.

> 2. If you're only working in a single timezone that's defined as UTC
> or a fixed offset from UTC, naive arithmetic is basically all there
> is.

Yup!

> 3. Converting between (fixed offset) timezones is a separate issue
> from calculation - but it's nothing more than applying the relevant
> offsets.

Yup!  Although that can't be exploited by Python:  there's nothing in
a tzinfo instance Python can query to discover the rules it
implements.

> 4. Calculations involving 2 different timezones (fixed-offset ones as
> above) is like any other exercise involving values on different
> scales. Convert both values to a common scale (in this case, a common
> timezone) and do the calculation there. Simple enough.

Yup.

> 5. The problems all arise *only* with timezones whose UTC offset
> varies depending on the actual time (e.g., timezones that include the
> transition to DST and back).

Yup.

> Are we OK to this point? This much comprises what I would class as a
> "naive" (i.e. 99% of the population ;-)) understanding of datetimes.
>
> The stdlib datetime module handles naive datetime values, and
> fixed-offset timezones, fine, as far as I can see.

It ignores the possibility called #3 above (that some bureaucrat
changed the name of a fixed-offset time zone despite that the offset
didn't change).  Everyone ignores #4, and always will ;-)

> (I'm not sure that the original implementation included fixed-offset tzinfo
> objects, but the 3.4 docs say they are there now, so that's fine).

The original implementation supplied no tzinfo objects, only an
abstract tzinfo base class.

> Looking at the complicated cases, the only ones I'm actually aware of
> in practice are the ones that switch to DST and back, so typically
> have two offsets that differ by an hour,

Some number of minutes, anyway (not all DST transitions move by whole hours).

> switching between the two at some essentially arbitrary points. If there are
> other more complex forms of timezone, I'd like to never need to know about
> them, please ;-)

#2 above is common enough, although there's not a _lot_ of
base-offset-changing going on in current times.

> The timezones we're talking about here are things like
> "Europe/London", not "GMT" or "BST" (the latter two are fixed-offset).
>
> There are two independent issues with complex timezones:
>
> 1. Converting to and from them. That's messy because the conversion to
> UTC needs more information than just the date & time (typically, for
> example, there is a day when 01:45:00 maps to 2 distinct UTC times).
> This is basically the "is_dst" bit that Tim discussed in an earlier
> post. The semantic issue here is that users typically say "01:45" and
> it never occurs to them to even think about *which* 01:45 they mean.
> So recovering that extra information is hard (it's like dealing with
> byte streams where the user didn't provide details of the text
> encoding used).

"Flatly impossible" is more on target than "hard".  In the case of
text encoding, it's often possible to guess correctly by statistical
analysis of the bytes.  01:45:00 in isolation gives no clue at all
about whether standard or daylight time was intended.  A similar point
applies to some ambiguous cases when the base ("standard") UTC offset
changes.

> Once we have the extra information, though, doing
> conversions is just a matter of applying a set of rules.

Yup, and it's easy.

> 2. Arithmetic within a complex timezone. Theoretically, this is simple
> enough (convert to UTC, do the calculation naively, and convert back).
> But in practice, that approach doesn't always match user expectations.
> So you have 2 mutually incompatible semantic options - 1 day after 4pm
> is 3pm the following day, or adding 1 day adds 25 hours - either is a
> viable choice, and either will confuse *some* set of users. This, I
> think, is the one where all the debate is occurring, and the one that
> makes my head explode.

Stick to naive time, and your head won't even hurt ;-)  There is no
"right" or "wrong" answer to this one:  different apps can _need_
different behaviors for this.  Python picked one to make dead easy
("naive"), and intended to make the other _possible_ via longer-winded
(but conceptually straightforward) code.

> It seems to me that the problem is that for this latter issue, it's
> the *timedelta* object that's not rich enough. You can't say "add 1
> day, and by 1 day I mean keep the same time tomorrow" as opposed to
> "add 1 day, and by that I mean 24 hours"[1]. In some ways, it's
> actually no different from the issue of adding 1 month to a date
> (which is equally ill-defined, but people "know what they mean" to
> just as great an extent). Python bypasses the latter by not having a
> timedelta for "a month". C (and the time module) bypasses the former
> by limiting all time offsets to numbers of seconds - datetime gave us
> a richer timedelta object and hence has extra problems.

There's more to it than that.  "Naive time" also wants, e.g.,
"01:45:00 tomorrow minus 01:45:00 today" to return 24 hours.  Maybe
the same thing in disguise, though.

> I don't have any solutions to this final issue. But hopefully the
> above analysis (assuming it's accurate!) helps clarify what the actual
> debate is about, for those bystanders like me who are interested in
> following the discussion. With luck, maybe it also gives the experts
> an alternative perspective from which to think about the problem - who
> knows?
>
> Paul
>
> [1] Well, you can, actually - you say that a timedelta of "1 day"
> means "the same time tomorrow" and if you want 24 hours, you say "24
> hours" not "1 day". So timedelta(days=1) != timedelta(hours=24) even
> though they give the same result for every case except arithmetic
> involving complex timezones.

While perhaps that _could_ have been said at the start, it's a decade
too late to say that now ;-)

> Is that what Lennart has been trying to say in his posts?

Have to leave that to him to say.  Various date-and-time
implementations have all sorts of gimmicks.  Possibilities raised in
this thread so far kind of scratch the surface :-(