[Python-Dev] Status on PEP-431 Timezones
ISAAC J SCHWABACHER
ischwabacher at wisc.edu
Tue Jul 28 00:38:34 CEST 2015
Responses to several partial messages follow.
> Then we can't implement timezones in a reasonable way with the current
> API, but have to have something like pytz's normalize() function or
> I'm sorry I've wasted everyones time with this PEP.
I think that integrating pytz into the stdlib, which is what the PEP proposes, would be valuable even without changing datetime arithmetic. But I see ways to accomplish the latter without breaking backward compatibility. The dream ain't dead! See below.
> 2. Arithmetic within a complex timezone. Theoretically, this is simple
> enough (convert to UTC, do the calculation naively, and convert back).
> But in practice, that approach doesn't always match user expectations.
> So you have 2 mutually incompatible semantic options - 1 day after 4pm
> is 3pm the following day, or adding 1 day adds 25 hours - either is a
> viable choice, and either will confuse *some* set of users. This, I
> think, is the one where all the debate is occurring, and the one that
> makes my head explode.
> It seems to me that the problem is that for this latter issue, it's
> the *timedelta* object that's not rich enough.
Yes! This is the heart of the matter. We can solve *almost* all the problems by having multiple competing timedelta classes-- which we already have. Do you care about what will happen after a fixed amount of elapsed time? Use `numpy.timedelta64` or `pandas.Timedelta`. Want this time tomorrow, come hell, high water, or DST transition? Use `dateutil.relativedelta.relativedelta` or `mx.DateTime.RelativeDateTime`. As long as the timedelta objects we're using are rich enough, we can make `dt + delta` say what we mean. There's no reason we can't have both naive and aware arithmetic in the stdlib at once.
All the stdlib needs is an added timedelta class that represents elapsed atomic clock time, and voila!
The biggest problem that this can't solve is subtraction. Which timedelta type do you get by subtracting two datetimes? Sure, you can have a `datetime.elapsed_since(self, other: datetime, **kwargs) -> some_timedelta_type` that determines what you want from the kwargs, but `datetime.__sub__` doesn't have that luxury. I think the right answer is that subtraction should yield the elapsed atomic clock time, but that would be a backward-incompatible change so I don't put a high probability on it happening any time soon. See the last message (below) for more on this.
> You can't say "add 1
> day, and by 1 day I mean keep the same time tomorrow" as opposed to
> "add 1 day, and by that I mean 24 hours". In some ways, it's
> actually no different from the issue of adding 1 month to a date
> (which is equally ill-defined, but people "know what they mean" to
> just as great an extent). Python bypasses the latter by not having a
> timedelta for "a month". C (and the time module) bypasses the former
> by limiting all time offsets to numbers of seconds - datetime gave us
> a richer timedelta object and hence has extra problems.
Because of the limits on the values of its members, `datetime.timedelta` is effectively just a counter of microseconds. It can't distinguish between 1 day, 24 hours, 1440 minutes or 86400 seconds. They're all normalized to the same value. So it's not actually richer; it only appears so.
> I don't have any solutions to this final issue. But hopefully the
> above analysis (assuming it's accurate!) helps clarify what the actual
> debate is about, for those bystanders like me who are interested in
> following the discussion. With luck, maybe it also gives the experts
> an alternative perspective from which to think about the problem - who
>  Well, you can, actually - you say that a timedelta of "1 day"
> means "the same time tomorrow" and if you want 24 hours, you say "24
> hours" not "1 day". So timedelta(days=1) != timedelta(hours=24) even
> though they give the same result for every case except arithmetic
> involving complex timezones. Is that what Lennart has been trying to
> say in his posts?
I thought for a long time that this would be sufficient, and I still think it's a good spelling that makes it clear what the user wants most of the time, but I have wanted things like "the first time the clock shows 1 hour later than it shows right now" enough times that I no longer think this is quite sufficient. (I *think* you can do that with `dt + dateutil.relativedelta.relativedelta(hour=dt.hour+1, minute=0, second=0, microsecond=0)`, but I'm not sure.)
> Ah, but it already happens that way - because the builtin datetime
> arithmetic is "naive". The docs have always promised this:
> datetime2 = datetime1 + timedelta (1)
> datetime2 = datetime1 - timedelta (2)
> 1) datetime2 is a duration of timedelta removed from datetime1, moving
> forward in time if timedelta.days > 0, or backward if timedelta.days <
> 0. The result has the same tzinfo attribute as the input datetime, and
> datetime2 - datetime1 == timedelta after. OverflowError is raised if
> datetime2.year would be smaller than MINYEAR or larger than MAXYEAR.
> Note that no time zone adjustments are done even if the input is an
> aware object.
> 2) Computes the datetime2 such that datetime2 + timedelta ==
> datetime1. As for addition, the result has the same tzinfo attribute
> as the input datetime, and no time zone adjustments are done even if
> the input is aware. This isn’t quite equivalent to datetime1 +
> (-timedelta), because -timedelta in isolation can overflow in cases
> where datetime1 - timedelta does not.
Once we add the is_dst bit, this becomes a problem. You can't have this and have equality be a congruence (i.e., dt1 == dt2 implies dt1+td == dt2+td) unless you're willing to have the is_dst bit always be significant to equality, even when a time isn't ambiguous. Practically, this means that equality stops being a congruence, but failing to obey that invariant causes a lot of trouble.
I have been remiss in not pointing this out, but it's wrong to assume that scientists use exclusively UTC. I got dragged into this mess because I was writing a piece of software to analyze circadian patterns of physical activity in our research subjects, which meant that in several cases we had a continuous record of data that crossed a DST boundary and we needed absolute durations between different times while caring about the local times between which those durations arose. The program started in ruby using ActiveSupport/Time (Rails's time bits) and got ported into python because ruby didn't have good enough support for scientific applications. I was able to get the program working using pandas's Timestamp class, which I think is more or less what Lennart wants to implement (minus all the cruft where it tries to interoperate with both datetime.datetime and numpy.datetime64), and which AFAICT seems to be the de facto standard for people in the science and finance worlds who need to deal with local times, absolute durations and relative durations all at the same time.
More information about the Python-Dev