[Python-Dev] Status on PEP-431 Timezones

Mon Jul 27 16:59:53 CEST 2015

On 27 July 2015 at 14:59, R. David Murray <rdmurray at bitdance.com> wrote:
> I have a feeling that I'm completely misunderstanding things, since
> tzinfo is still a bit of a mystery to me.

You're not the only one :-)

I think the following statements are true. If they aren't, I'd
appreciate clarification. I'm going to completely ignore leap seconds
in the following - I hope that's OK, I don't understand leap seconds
*at all* and I don't work in any application areas where they are
relevant (to my knowledge) so I feel that for my situation, ignoring
them (and being able to) is reasonable.

Note that I'm not talking about internal representations - this is
purely about user-visible semantics.

1. "Naive" datetime arithmetic means treating a day as 24 hours, an
hour as 60 minutes, etc. Basically base-24/60/60 arithmetic.
2. If you're only working in a single timezone that's defined as UTC
or a fixed offset from UTC, naive arithmetic is basically all there
is.
3. Converting between (fixed offset) timezones is a separate issue
from calculation - but it's nothing more than applying the relevant
offsets.
4. Calculations involving 2 different timezones (fixed-offset ones as
above) is like any other exercise involving values on different
scales. Convert both values to a common scale (in this case, a common
timezone) and do the calculation there. Simple enough.
5. The problems all arise *only* with timezones whose UTC offset
varies depending on the actual time (e.g., timezones that include the
transition to DST and back).

Are we OK to this point? This much comprises what I would class as a
"naive" (i.e. 99% of the population ;-)) understanding of datetimes.

The stdlib datetime module handles naive datetime values, and
fixed-offset timezones, fine, as far as I can see. (I'm not sure that
the original implementation included fixed-offset tzinfo objects, but
the 3.4 docs say they are there now, so that's fine).

Looking at the complicated cases, the only ones I'm actually aware of
in practice are the ones that switch to DST and back, so typically
have two offsets that differ by an hour, switching between the two at
some essentially arbitrary points. If there are other more complex
forms of timezone, I'd like to never need to know about them, please
;-)

The timezones we're talking about here are things like
"Europe/London", not "GMT" or "BST" (the latter two are fixed-offset).

There are two independent issues with complex timezones:

1. Converting to and from them. That's messy because the conversion to
UTC needs more information than just the date & time (typically, for
example, there is a day when 01:45:00 maps to 2 distinct UTC times).
This is basically the "is_dst" bit that Tim discussed in an earlier
post. The semantic issue here is that users typically say "01:45" and
it never occurs to them to even think about *which* 01:45 they mean.
So recovering that extra information is hard (it's like dealing with
byte streams where the user didn't provide details of the text
encoding used). Once we have the extra information, though, doing
conversions is just a matter of applying a set of rules.

2. Arithmetic within a complex timezone. Theoretically, this is simple
enough (convert to UTC, do the calculation naively, and convert back).
But in practice, that approach doesn't always match user expectations.
So you have 2 mutually incompatible semantic options - 1 day after 4pm
is 3pm the following day, or adding 1 day adds 25 hours - either is a
viable choice, and either will confuse *some* set of users. This, I
think, is the one where all the debate is occurring, and the one that
makes my head explode.

It seems to me that the problem is that for this latter issue, it's
the *timedelta* object that's not rich enough. You can't say "add 1
day, and by 1 day I mean keep the same time tomorrow" as opposed to
"add 1 day, and by that I mean 24 hours"[1]. In some ways, it's
actually no different from the issue of adding 1 month to a date
(which is equally ill-defined, but people "know what they mean" to
just as great an extent). Python bypasses the latter by not having a
timedelta for "a month". C (and the time module) bypasses the former
by limiting all time offsets to numbers of seconds - datetime gave us
a richer timedelta object and hence has extra problems.

I don't have any solutions to this final issue. But hopefully the
above analysis (assuming it's accurate!) helps clarify what the actual
debate is about, for those bystanders like me who are interested in
following the discussion. With luck, maybe it also gives the experts
an alternative perspective from which to think about the problem - who
knows?

Paul

[1] Well, you can, actually - you say that a timedelta of "1 day"
means "the same time tomorrow" and if you want 24 hours, you say "24
hours" not "1 day". So timedelta(days=1) != timedelta(hours=24) even
though they give the same result for every case except arithmetic
involving complex timezones. Is that what Lennart has been trying to
say in his posts?