[Stuart Bishop stuart@stuartbishop.net]
... [on timeline arithmetic] ... I'm wondering if it is worth formalizing this (post-PEP-495,or maybe some choice wording changes made in the docs). Would it work if we introduced a new type, datetimetz? We would have a time, with a tzinfo because it might be useful later, a naive time, with a tzinfo because it is useful for rendering and conversions, and a datetimetz with all the complexities and slowdowns of timeline arithmetic. While not changing the behaviour of datetime at all, we could get cats and dogs living together by just clarifying what it actually is.
There was a lot of discussion of this before you arrived here, and even a PEP (500).
At least Guido, Alex and I agreed it would be better for the tzinfo object to decide which kind of arithmetic to use. For example, if you're right that billions (nay, trillions!) of programmers will eventually suffer irreparable emotional harm from learning how classic arithmetic works, they'll want to convert their code immediately, before their innocent children suffer clinical depression too. Because datetimes are typically created all over the place, but programs typically have only a few places where a tzinfo is obtained from some factory functions, it should be much easier to just change the latter call sites. So, e.g., get one tzinfo that says "timeline arithmetic!" in some way, and _all_ datetimes using it obey God's Way To Do It.
The first question then is "how does a tzinfo spell that?".
PEP 500 proposed adding optional new magic methods to tzinfos, so they could implement whatever damn fool arithmetic they liked. datetime internals would only change to see whether a tzinfo supplied such-&-such a method, and delegate arithmetic to it if so.
1. For timeline arithmetic, a tzinfo subclass could supply methods for the 3 kinds of arithmetic (datetime - datetime,, and datetime +/- timedelta), with bodies akin to the simple one-liner I showed before for datetime + timedelta.
2. People who wanted leap seconds (to account for real-world durations between two civil times) could similarly supply _that_, via even slower arithmetic.
3. And, e.g., people who wanted to view timedeltas as representing durations in Mars seconds could convert to Earth seconds under the covers. That's Alex's primary use case.
So, quite general, and little impact on the core. Guido rejected it ;-)
The other idea was building timeline arithmetic into the core datetime implementation, and use it if and only if a tzinfo had a magic new attribute, or inherited from a magic new marker class. Not generalizable beyond _just_ that case, heavier impact on the core, and so far nobody has cared enough to write a PEP.
The second question is whether _anything_ should be done in this direction. I was +0.83 on PEP 500 at first, but -0.51 on anything now. Alex can move to Mars if he loves Mars time so much, while I don't really want Python to enable poor practice in the #1 and #2 cases. UTC is perfectly adequate for those who need timeline arithmetic, and that was the _intent_ from the start (although I don't recall the docs saying so) - and using UTC for this purpose is also universally recognized as best practice. If someone is determined to be foolish, fine, let 'em use an explicit function.
... If our underlying platforms that we needed to work with supported it, I'd probably be in favour of leap seconds. I doubt that would ever happen - there are more palatable workarounds.
People who need it really need it - but they should be working in TAI. In Python, if they work in UTC - or even in naive datetime - it's quite possible to write leap-second-aware functions to do what they want. Intriguingly, TAI is nearly identical to Python's "naive time". So stick that in your pipe and smoke it: the people responsible for building the most sophisticated clocks on Earth _live_ in naive time. It's the most sophisticated notion of time yet known ;-)
OTOH, for people who don't need it, accounting for leap seconds would be a mistake: best I can tell, every programming language on the planet with any kind of date-and-time support follows the POSIX-approximation-to-UTC model now. So if your arithmetic accounts for leap seconds, it won't agree with anyone else's in the computer world.
... I think in my view, as soon as you go to the bother of adding a tzinfo instance to the datetime you are making a statement about the expected behaviour; that the simpler classic arithmetic no longer applies and the more complex model needs to be used.
I had already guessed that ;-) It's just a dozen years too late to influence datetime's design.
... There you go: "timeline" datetime + timedelta arithmetic about as efficiently as possible in pure Python.
... What I don't like about this approach is the developers need to be aware that they need to call it,
Is that really worse than needing to call .normalize() after every arithmetic operation, with - I bet - most not being really clear on _why_ they need to?
and that dt + timedelta(hours=24) may not work.
Adding functions for timeline arithmetic can't possibly change what classic arithmetic does. For me, adding timedelta(hours=24) always does exactly what I intend it to do. But, yes, people will forget the distinction sometimes.
But easy solution: do what they _should_ have done from the start: work in UTC instead, and have no problems, surprises, missing magical invocations, or confusions of any kind ever.
Of course, developers will not be aware or have done more than skim the docs until after their guests have all died of salmonella poisoning from the undercooked Turkey.
Not a problem. My turkey party occurs at the _end_ of DST. "Same time next day" would keep the turkey in the smoker for 25 hours, not 23. No salmonella: you're obviously determined to spread groundless turkey FUD ;-)
... My hope was that 495 alone would at least spare pytz's users from needing to do a `.normalize()` dance after `.astimezone()` anymore. Although I'm not clear on why it's needed even now.
Instead of one tzinfo instance, there are dozens for your timezone. The datetime implementation does not give pytz the opportunity to choose which one is used when constructing the datetime, so localize is needed to sort that. Similarly, arithmetic does not always give pytz the opportunity to choose which one is used after crossing a timezone boundary, so normalize is needed to sort that out. While the results of the timeline arithmetic are unambiguous and obvious, they are arguably incorrect until normalize puts things right.
This is .astimezone(), though - no constructor and no (visible) arithmetic here. It's returning something via fromutc(), and I presume pytz has its own .fromutc() implementation.
... I think I'm after hooks to replace localize on construction and normalize after arithmetic, so users don't have to be relied on to do this explicitly. This doesn't need to happen now, and I fully understand this could be considered fast path and the overhead unacceptable.
If you're determined to supply by-magic timeline arithmetic, then I strongly suggest looking at the ideas at the top of this message, and push for a _real_ change to Python. That is, instead of pushing for hooks wholly specific to pytz, push for a change that will allow anyone to implement timeline arithmetic in a straightforward way, using non-magical "hybrid" tzinfo classes. But that's not my itch, and - indeed - I'd prefer Python left well enough alone after 495 allows repairing the fundamental problem with conversions.
... I think all the data we have access to, including from platform C library functions, uses the is_dst flag or is simpler to map to the is_dst flag.
I need a complete use case, start to finish, to make sense of what you're talking about here. In particular, you never mention any datetime or pytz operations when talking about is_dst. So I still have no idea why it's being discussed at all.
The C library as exposed by the time.struct_time gives you is_dst.
See other msgs today. mktime() is unreliable. Even if it was reliable, what of it? Why do you _want_ is_dst? There's no use case here that consumes it.
Mapping that to first/fold means first doing doing two conversions and determining which one comes first.
Ditto. I have no idea what use case you have in mind that would _require_ mapping is_dst to fold. Inside pytz, you have an exhaustive list of all transitions, thanks to zoneinfo. pytz internals don't need any flaky C library functions to determine anything about transitions.
Similarly, when loading your JSON file or examining email headers you need to load in a string like '2004-04-04 02:30:00 EDT-05:00'. Its simple to use a lookup table to map the abbreviation + offset to an is_dst flag.
As above.
Its harder to map it to first/fold because they are swapped around in April and October. And there can be more than two transitions in a year, so if you need to support that your going to need to do the lookup, construct a couple of instances, and compare to work out if EDT or EST comes first that month in that year.
Inside pytz you already know everything that can be known about transitions. You don't "poke and hope" to do that, you do a binary search, right? You find the zoneinfo record for the time of interest, and compare that to the transitions on either side to deduce whether there's a fold or gap in play. Although I bet this could be sped up by doing some precomputation when loading a tzfile to begin with.
But, really, I hate all the options for the flag name. I lean towards is_dst mainly because people are used to it.
I'm burned out on name bikeshedding - but `is_dst` makes no sense unless the flag is at least pretending to say something about whether DST is in effect. That's not enough. For example, the zoneinfo source notes that there's a place in Antarctica that has two different kinds of DST each year. It's so bizarre that zic (the zoneinfo compiler) had to be changed to handle it, and they've left the rules commented out until the new zic is more widely adopted. When they uncomment the rules, is_dst will tell you nothing about _which_ kind of DST is in effect (the offset+1 flavor, or the offset+2 flavor).. "fold" makes perfectly clear sense for transitions due to any cause whatsoever. The only advantage to is_dst is that it's so poorly defined for edge cases that no two mktime() implementations can be expected to agree :-(
But there's every reason to be optimistic: even someone as old and in-the-way as me doesn't find any of this particularly confusing ;-)
I may be old, but at least I'm not as old as Tim ;)
Ain't that the truth :-(