... I think the following statements are true. If they aren't, I'd appreciate clarification. I'm going to completely ignore leap seconds in the following - I hope that's OK, I don't understand leap seconds *at all* and I don't work in any application areas where they are relevant (to my knowledge) so I feel that for my situation, ignoring them (and being able to) is reasonable.
Guido will never allow any aspect of "leap seconds" into the core, although it's fine by him if someone wants to write their own tzinfo class to try to model them.
Note that I'm not talking about internal representations - this is purely about user-visible semantics.
- "Naive" datetime arithmetic means treating a day as 24 hours, an
hour as 60 minutes, etc. Basically base-24/60/60 arithmetic.
It also means that the tzinfo(s) member (if any) is(are) ignored. So not only leap seconds are ignored:
1. Possible DST transitions are ignored. 2. Possible changes to the base UTC offset are ignored. 3. Possible changes to the name of the time zone (even if "the rules" don't change) are ignored. 4. Everything else whatsover that could be learned from the tzinfo member is ignored.
Note that in "aware" arithmetic, the current fromutc() implementation is only strong enough to account reliably for #1.
- If you're only working in a single timezone that's defined as UTC
or a fixed offset from UTC, naive arithmetic is basically all there is.
- Converting between (fixed offset) timezones is a separate issue
from calculation - but it's nothing more than applying the relevant offsets.
Yup! Although that can't be exploited by Python: there's nothing in a tzinfo instance Python can query to discover the rules it implements.
- Calculations involving 2 different timezones (fixed-offset ones as
above) is like any other exercise involving values on different scales. Convert both values to a common scale (in this case, a common timezone) and do the calculation there. Simple enough.
- The problems all arise *only* with timezones whose UTC offset
varies depending on the actual time (e.g., timezones that include the transition to DST and back).
Are we OK to this point? This much comprises what I would class as a "naive" (i.e. 99% of the population ;-)) understanding of datetimes.
The stdlib datetime module handles naive datetime values, and fixed-offset timezones, fine, as far as I can see.
It ignores the possibility called #3 above (that some bureaucrat changed the name of a fixed-offset time zone despite that the offset didn't change). Everyone ignores #4, and always will ;-)
(I'm not sure that the original implementation included fixed-offset tzinfo objects, but the 3.4 docs say they are there now, so that's fine).
The original implementation supplied no tzinfo objects, only an abstract tzinfo base class.
Looking at the complicated cases, the only ones I'm actually aware of in practice are the ones that switch to DST and back, so typically have two offsets that differ by an hour,
Some number of minutes, anyway (not all DST transitions move by whole hours).
switching between the two at some essentially arbitrary points. If there are other more complex forms of timezone, I'd like to never need to know about them, please ;-)
#2 above is common enough, although there's not a _lot_ of base-offset-changing going on in current times.
The timezones we're talking about here are things like "Europe/London", not "GMT" or "BST" (the latter two are fixed-offset).
There are two independent issues with complex timezones:
- Converting to and from them. That's messy because the conversion to
UTC needs more information than just the date & time (typically, for example, there is a day when 01:45:00 maps to 2 distinct UTC times). This is basically the "is_dst" bit that Tim discussed in an earlier post. The semantic issue here is that users typically say "01:45" and it never occurs to them to even think about *which* 01:45 they mean. So recovering that extra information is hard (it's like dealing with byte streams where the user didn't provide details of the text encoding used).
"Flatly impossible" is more on target than "hard". In the case of text encoding, it's often possible to guess correctly by statistical analysis of the bytes. 01:45:00 in isolation gives no clue at all about whether standard or daylight time was intended. A similar point applies to some ambiguous cases when the base ("standard") UTC offset changes.
Once we have the extra information, though, doing conversions is just a matter of applying a set of rules.
Yup, and it's easy.
- Arithmetic within a complex timezone. Theoretically, this is simple
enough (convert to UTC, do the calculation naively, and convert back). But in practice, that approach doesn't always match user expectations. So you have 2 mutually incompatible semantic options - 1 day after 4pm is 3pm the following day, or adding 1 day adds 25 hours - either is a viable choice, and either will confuse *some* set of users. This, I think, is the one where all the debate is occurring, and the one that makes my head explode.
Stick to naive time, and your head won't even hurt ;-) There is no "right" or "wrong" answer to this one: different apps can _need_ different behaviors for this. Python picked one to make dead easy ("naive"), and intended to make the other _possible_ via longer-winded (but conceptually straightforward) code.
It seems to me that the problem is that for this latter issue, it's the *timedelta* object that's not rich enough. You can't say "add 1 day, and by 1 day I mean keep the same time tomorrow" as opposed to "add 1 day, and by that I mean 24 hours". In some ways, it's actually no different from the issue of adding 1 month to a date (which is equally ill-defined, but people "know what they mean" to just as great an extent). Python bypasses the latter by not having a timedelta for "a month". C (and the time module) bypasses the former by limiting all time offsets to numbers of seconds - datetime gave us a richer timedelta object and hence has extra problems.
There's more to it than that. "Naive time" also wants, e.g., "01:45:00 tomorrow minus 01:45:00 today" to return 24 hours. Maybe the same thing in disguise, though.
I don't have any solutions to this final issue. But hopefully the above analysis (assuming it's accurate!) helps clarify what the actual debate is about, for those bystanders like me who are interested in following the discussion. With luck, maybe it also gives the experts an alternative perspective from which to think about the problem - who knows?
 Well, you can, actually - you say that a timedelta of "1 day" means "the same time tomorrow" and if you want 24 hours, you say "24 hours" not "1 day". So timedelta(days=1) != timedelta(hours=24) even though they give the same result for every case except arithmetic involving complex timezones.
While perhaps that _could_ have been said at the start, it's a decade too late to say that now ;-)
Is that what Lennart has been trying to say in his posts?
Have to leave that to him to say. Various date-and-time implementations have all sorts of gimmicks. Possibilities raised in this thread so far kind of scratch the surface :-(