Re: [Python-Dev] Status on PEP-431 Timezones

27 Jul 2015

      [Paul Moore]
...
...
I think the following statements are true. If they aren't, I'd
appreciate clarification. I'm going to completely ignore leap seconds
in the following - I hope that's OK, I don't understand leap seconds
*at all* and I don't work in any application areas where they are
relevant (to my knowledge) so I feel that for my situation, ignoring
them (and being able to) is reasonable.
Guido will never allow any aspect of "leap seconds" into the core,
although it's fine by him if someone wants to write their own tzinfo
class to try to model them.
...
Note that I'm not talking about internal representations - this is
purely about user-visible semantics.
1. "Naive" datetime arithmetic means treating a day as 24 hours, an
hour as 60 minutes, etc. Basically base-24/60/60 arithmetic.
It also means that the tzinfo(s) member (if any) is(are) ignored.  So
not only leap seconds are ignored:

1. Possible DST transitions are ignored.
2. Possible changes to the base UTC offset are ignored.
3. Possible changes to the name of the time zone (even if "the rules"
don't change) are ignored.
4. Everything else whatsover that could be learned from the tzinfo
member is ignored.

Note that in "aware" arithmetic, the current fromutc() implementation
is only strong enough to account reliably for #1.
...
2. If you're only working in a single timezone that's defined as UTC
or a fixed offset from UTC, naive arithmetic is basically all there
is.
Yup!
...
3. Converting between (fixed offset) timezones is a separate issue
from calculation - but it's nothing more than applying the relevant
offsets.
Yup!  Although that can't be exploited by Python:  there's nothing in
a tzinfo instance Python can query to discover the rules it
implements.
...
4. Calculations involving 2 different timezones (fixed-offset ones as
above) is like any other exercise involving values on different
scales. Convert both values to a common scale (in this case, a common
timezone) and do the calculation there. Simple enough.
Yup.
...
5. The problems all arise *only* with timezones whose UTC offset
varies depending on the actual time (e.g., timezones that include the
transition to DST and back).
Yup.
...
Are we OK to this point? This much comprises what I would class as a
"naive" (i.e. 99% of the population ;-)) understanding of datetimes.
The stdlib datetime module handles naive datetime values, and
fixed-offset timezones, fine, as far as I can see.
It ignores the possibility called #3 above (that some bureaucrat
changed the name of a fixed-offset time zone despite that the offset
didn't change).  Everyone ignores #4, and always will ;-)
...
(I'm not sure that the original implementation included fixed-offset tzinfo
objects, but the 3.4 docs say they are there now, so that's fine).
The original implementation supplied no tzinfo objects, only an
abstract tzinfo base class.
...
Looking at the complicated cases, the only ones I'm actually aware of
in practice are the ones that switch to DST and back, so typically
have two offsets that differ by an hour,
Some number of minutes, anyway (not all DST transitions move by whole hours).
...
switching between the two at some essentially arbitrary points. If there are
other more complex forms of timezone, I'd like to never need to know about
them, please ;-)
#2 above is common enough, although there's not a _lot_ of
base-offset-changing going on in current times.
...
The timezones we're talking about here are things like
"Europe/London", not "GMT" or "BST" (the latter two are fixed-offset).
There are two independent issues with complex timezones:
1. Converting to and from them. That's messy because the conversion to
UTC needs more information than just the date & time (typically, for
example, there is a day when 01:45:00 maps to 2 distinct UTC times).
This is basically the "is_dst" bit that Tim discussed in an earlier
post. The semantic issue here is that users typically say "01:45" and
it never occurs to them to even think about *which* 01:45 they mean.
So recovering that extra information is hard (it's like dealing with
byte streams where the user didn't provide details of the text
encoding used).
"Flatly impossible" is more on target than "hard".  In the case of
text encoding, it's often possible to guess correctly by statistical
analysis of the bytes.  01:45:00 in isolation gives no clue at all
about whether standard or daylight time was intended.  A similar point
applies to some ambiguous cases when the base ("standard") UTC offset
changes.
...
Once we have the extra information, though, doing
conversions is just a matter of applying a set of rules.
Yup, and it's easy.
...
2. Arithmetic within a complex timezone. Theoretically, this is simple
enough (convert to UTC, do the calculation naively, and convert back).
But in practice, that approach doesn't always match user expectations.
So you have 2 mutually incompatible semantic options - 1 day after 4pm
is 3pm the following day, or adding 1 day adds 25 hours - either is a
viable choice, and either will confuse *some* set of users. This, I
think, is the one where all the debate is occurring, and the one that
makes my head explode.
Stick to naive time, and your head won't even hurt ;-)  There is no
"right" or "wrong" answer to this one:  different apps can _need_
different behaviors for this.  Python picked one to make dead easy
("naive"), and intended to make the other _possible_ via longer-winded
(but conceptually straightforward) code.
...
It seems to me that the problem is that for this latter issue, it's
the *timedelta* object that's not rich enough. You can't say "add 1
day, and by 1 day I mean keep the same time tomorrow" as opposed to
"add 1 day, and by that I mean 24 hours"[1]. In some ways, it's
actually no different from the issue of adding 1 month to a date
(which is equally ill-defined, but people "know what they mean" to
just as great an extent). Python bypasses the latter by not having a
timedelta for "a month". C (and the time module) bypasses the former
by limiting all time offsets to numbers of seconds - datetime gave us
a richer timedelta object and hence has extra problems.
There's more to it than that.  "Naive time" also wants, e.g.,
"01:45:00 tomorrow minus 01:45:00 today" to return 24 hours.  Maybe
the same thing in disguise, though.
...
I don't have any solutions to this final issue. But hopefully the
above analysis (assuming it's accurate!) helps clarify what the actual
debate is about, for those bystanders like me who are interested in
following the discussion. With luck, maybe it also gives the experts
an alternative perspective from which to think about the problem - who
knows?
Paul
[1] Well, you can, actually - you say that a timedelta of "1 day"
means "the same time tomorrow" and if you want 24 hours, you say "24
hours" not "1 day". So timedelta(days=1) != timedelta(hours=24) even
though they give the same result for every case except arithmetic
involving complex timezones.
While perhaps that _could_ have been said at the start, it's a decade
too late to say that now ;-)
...
Is that what Lennart has been trying to say in his posts?
Have to leave that to him to say.  Various date-and-time
implementations have all sorts of gimmicks.  Possibilities raised in
this thread so far kind of scratch the surface :-(

Re: [Python-Dev] Status on PEP-431 Timezones

Tim Peters