Re: [Datetime-SIG] Calendar vs timespan calculations...
On 07/30/2015 10:46 AM, Chris Barker wrote:
There is "correct" and incorrect, but I"m not arguing that anything is incorrect about the current behavior -- I thought a timedelta was a duration, but I was wrong, it is a Period in units of days (I think!), and sure it apparently does that right.
No, it isn't a Period, and it doesn't "do that right". The current behavior _is_ incorrect (or at least lacking in internal coherence), and I don't think we can get clarity on what we want unless we acknowledge that. The discussion keeps getting sidetracked on the red herring of whether there are use cases for period arithmetic with "one day" defined as "same local time next day" as a period unit. Of course there are such use cases (and timedelta(1) can satisfy those use cases), but the current timedelta (as it behaves in arithmetic with tz-aware datetimes) is a hybrid that is not coherent considered either as a Period or a Duration. "Satisfying some use cases sometimes" is not sufficient for correctness; conceptual coherency matters too. Timedelta can only be considered a "Period object in units of days" if one accepts that the things it calls "hours", "minutes", and "seconds" are not really hours, minutes, or seconds, but rather fractional units of the "day-as-same-time-next-day" period that often (but not always) correspond to real hours, minutes, and seconds. I don't think this is a tenable explanation (and no one has attempted it), but you've just shown that it's one possible conclusion from defending the current model. Tim's valiant efforts notwithstanding, I don't think there is any coherent conceptual model that justifies the current behavior of timedelta. The _implementation_ can easily be explained of course (and Tim has done so very clearly, many times - I'd summarize it as "all arithmetic temporarily pretends all datetimes are naive, and then blindly reattaches the original tzinfo member"), but in terms of the underlying concepts, it makes no sense. In order to defend the current model as coherent, one has to discard one of the following points, and (despite having read every message in all the related threads), I am still not clear precisely which one of these Tim et al consider untrue or expendable: 1) A datetime with a tzinfo member that carries both a timezone and a specific UTC offset within that timezone (e.g. a pytz timezone instance) corresponds precisely and unambiguously to a single instant in astronomical time (as well as carrying additional information). 2) A timedelta object is clearly a Duration, not a Period, because timedelta(days=1), timedelta(hours=24), and timedelta(seconds=86400) result in indistinguishable objects. I think this point is uncontroversial; Tim has said several times that a timedelta is just a complicated representation of an integer number of microseconds. That's a Duration. 3) If one has two datetime objects that each unambiguously correspond to an instant in real time, and one subtracts them and gets back an object which represents a Duration in real microseconds, the only reasonable content for that Duration is the elapsed microseconds in real time between the two instants. Much virtual ink has been spilled over whether the behavior of "datetime + timedelta(days=1)" is correct, but this is an intentionally muddying case to consider, because there _are_ two perfectly reasonable interpretations of "add one day to a datetime". It's just that one of those interpretations (the Period one), which has been used to justify the current model, is inconsistent with _everything else_ about the behavior and implementation of timedelta. To be clear, I'm not arguing that this behavior can now be changed in the existing library objects in a backwards-incompatible way. But accepting that it is lacking in internal coherence (rather than just being an "alternative and equally good model") would be useful in clarifying what kind of an implementation we actually want (IMO, something very much like JodaTime/NodaTime). And then can we figure out how to get there from here. Carl
On 07/30/2015 01:28 PM, Carl Meyer wrote:
[...] The _implementation_ can easily be explained [...] I'd summarize it as "all arithmetic temporarily pretends all datetimes are naive, and then blindly reattaches the original tzinfo member") [...]
This is the heart of the matter. The problem is not the timedelta, which is simply a number of seconds, but with how datetime uses it. And we cannot change existing behavior, but we can add to it -- so a new option for datetime that told it to take dst switches into account so that the new datetime was in fact timedelta seconds away should do the trick. (Don't ask me which trick, I don't remember any more ;) -- ~Ethan~
On Thu, Jul 30, 2015 at 2:19 PM, Ethan Furman
On 07/30/2015 01:28 PM, Carl Meyer wrote:
[...] The _implementation_ can easily be explained [...]
I'd summarize it as "all arithmetic temporarily pretends all datetimes are naive, and then blindly reattaches the original tzinfo member") [...]
This is the heart of the matter. The problem is not the timedelta, which is simply a number of seconds, but with how datetime uses it.
conceptually yes, in the code? I don't know. which __add__method actually does the work? But given backward compatibility, there is not [point in arguing out whether the current implementation is coherent, or wrong, or highly useful or???? It seems we have general consensus that both Period arithmetic and Duration Arithmetic with time zone aware datetime objects are useful. And that the current implementation in the datetime module does not provide a complete (or even mostly complete) implementation of either of these. And that we can't add functionality to timedelta to better support Period arithmetic without totally breaking backward compatibility And that we can't change the way datetime+tzinfo+timedelta interact witout breaking backward compatibility. So: Some combination of a new and new timedeltas are required. And we cannot change existing behavior, but we can add to it -- so a new
option for datetime that told it to take dst switches into account so that the new datetime was in fact timedelta seconds away should do the trick.
hmm -- that might buy us Duration Arithmetic, but how do we get Period Arithmetic. By the way -- which __add__ actually does the implementation? datetime's or timedelta's ? (both are slots, so not easy to see the code....) -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
[Carl Meyer
... In order to defend the current model as coherent, one has to discard one of the following points, and (despite having read every message in all the related threads), I am still not clear precisely which one of these Tim et al consider untrue or expendable:
1) A datetime with a tzinfo member that carries both a timezone and a specific UTC offset within that timezone (e.g. a pytz timezone instance) corresponds precisely and unambiguously to a single instant in astronomical time (as well as carrying additional information).
datetime had no intent to support "astronomical time" in any way, shape or form. It's no coincidence that, in Guido's first message about "naive time": https://mail.python.org/pipermail/python-dev/2002-March/020648.html he talked about "for most *business* uses of date and time". datetime development was suggested & funded by Zope Corporation, which mostly works to meet other businesses' "content management" needs. The use cases collected were overwhelmingly from the commercial business world. Astronomical time systems weren't on the table. In this respect, it's important to realize that while Python 3.2 finally supplied a concrete instance (of a tzinfo subclass) as "the standard" UTC timezone object (datetime.timezone.utc), that's still just an approximation: it wholly ignores that real-life UTC suffers from leap seconds added (or, perhaps some day also removed) at various times. Subtract two datetimes in `utc`, and the duration returned may be off from real life, but whether and by how much can only be determined by looking up the history of leap second adjustments (made to real-life UTC). Those who suspect "Noda Time" is what they really want should note that it ignores leap seconds too. As they say on their site, "We want to solve the 99% case. Noda Time doesn't support leap seconds, relativity or various other subtleties around time lines." Although in the Zope community (which mostly drove Python's datetime requirements), it was more the 99.997% case ;-) If an astronomical union had funded the project instead ...
2) A timedelta object is clearly a Duration, not a Period, because timedelta(days=1), timedelta(hours=24), and timedelta(seconds=86400) result in indistinguishable objects. I think this point is uncontroversial; Tim has said several times that a timedelta is just a complicated representation of an integer number of microseconds. That's a Duration.
That's my view, yes. Although these are "naive time" microseconds too, with eternally fixed relation to all of naive time seconds, minutes, hours, days and weeks. In real-life UTC, you can't even say how long a minute is in seconds - "it depends".
3) If one has two datetime objects that each unambiguously correspond to an instant in real time, and one subtracts them and gets back an object which represents a Duration in real microseconds, the only reasonable content for that Duration is the elapsed microseconds in real time between the two instants.
Since there's no accounting for leap seconds, this cannot always be true using tzinfo objects approximating real-life UTC, or any timezone defined as offsetting real-life UTC. Which is all of 'em ;-) So what's the hangup with leap seconds? They're of no use to business applications, but would introduce irregularities business logic is ill-prepared to deal with. Same as DST transitions, leap-second adjustments can create missing and ambiguous times on a local clock. But unlike DST transitions, which occur in each jurisdiction at a time picked to be minimally visible in the jurisdiction (wee hour on a weekend), leap-second adjustments occur at a fixed UTC time, which is usually "in the middle of the work day" in _some_ jurisdictions. For that reason, when a leap second was inserted this year, some major financial markets across the world - normally open at the adjustment time! - shut down temporarily rather than risk a cascade of software disasters: http://money.cnn.com/2015/06/29/technology/leap-second/ I'm glad they did. Example: The order in which trades are executed (based on timestamps with sub-second resolution) can have legal consequences. For example, a big customer calls a broker and tells them to buy a million shares of Apple stock. The broker thinks "good idea!". He tells his software to place the customer buy order, then wait a millisecond, then send an order to buy a thousand shares for his own account. That's legal. If the orders are placed in the opposite order, it's illegal and the broker could go to jail ("front running", placing his order first _knowing_ that a large order will soon follow; the large order will certainly drive the stock price up, benefiting the broker who bought before the thoroughly predictable rise). Inserting a leap second causes the local clock to "repeat a second" in its idea of time (just as "inserting an hour" at the end of DST causes local clocks to repeat an hour) - or to blow up. A repeated second could cause the orders in the example above to _appear_ to have arrived in "the other" order. Even if the system time services report a time like 13:59:60.000 (instead of repeating 13:59:59.000), lots of software never expected to see such a thing. Who knows what may happen? So I doubt datetime will ever use "real UTC". It's pretty horrid! For another example, what will the UTC calendar date and time be 300 million seconds from now? That's simply impossible to compute for real UTC, not even in theory. Saying how many seconds away it will be is trivial (300 million!), but the physical processes causing leap second adjustments to UTC are chaotic - nobody can predict how many leap second adjustments will be made to UTC over the next 300 million seconds, or when, so there's no way to know what the UTC calendar date and time will be then. It _can_ affect the calendar date-and-time even for times just half a year in the future . Unless the definition of UTC is changed yet again (dead serious proposals for which are pending, supported by most participating countries): https://en.wikipedia.org/wiki/Leap_second#Proposal_to_abolish_leap_seconds That page is also interesting for its account of various software problems known to have been caused so far by leap-second adjustments. Anyway, under "real UTC" today, you could get an excellent approximation of "real time durations" by subtracting, but would have to accept that there is no fixed mapping between UTC timeline points and calendar notations except for datetimes no later than about 3 months from now (best I can tell, "they" don't promise to give more than 3 month notice before the next leap second adjustment). Finally, I have to note the irony in asking anything about "real time" ;-) What does "real time" mean? The most accurate clocks we have are atomic clocks, but even when two are made as identically as possible - even if we made two that kept _perfect_ time forever - they will _appear_ to run at slightly different rates when placed at different locations on Earth. That's at least due to gravitational time dilation: relativistic effects matter at currently achievable resolutions. As a result, current TAI time (the astonishingly uniform "atomic time" measure from which today's definition of UTC is derived) can't be known _as_ it happens: it's the output of an algorithm (which consumes time!) that collects "elapsed seconds" from hundreds of ultra-stable clocks around the globe, and averages them in a way to make a highly informed, provably excellent guess at what they would have said had they all been flawless, all at mean sea level altitude, and all at 0 degrees Kelvin. This computed "TAI time" is out of date by the time it's known, and typically disagrees (slightly) with most of the clocks feeding into it. So the best measure of "real time" we have is a product of human ingenuity. The closer to "plain old unadulterated real time as it exists in nature" you want to get, the more contrived & bogglingly complex the means needed to achieve it ;-) Everyone is settling for an approximation, because that's the best that can be done. Naive time starts and stops with what most people "already know". When UTC started mucking with leap seconds (it didn't always), the computing world should have embraced TAI internally instead. TAI suffers no adjustments of any kind, ever - it's just the running total of SI seconds since the start of the TAI epoch, as determined by the best clocks on Earth. In fact, it's very close to Python's "naive time"! TAI uses the propleptic Gregorian calendar too (albeit starting at a different epoch than year 1), and the TAI "day" is also defined to be exactly 86400 SI seconds. The difference is that TAI's Gregorian calendar will, over time, become unboundedly out of synch with UTC's Gregorian calendar, as leap seconds pile up in the latter. So far they're only 36 seconds out of synch.
... To be clear, I'm not arguing that this behavior can now be changed in the existing library objects in a backwards-incompatible way. But accepting that it is lacking in internal coherence (rather than just being an "alternative and equally good model") would be useful in clarifying what kind of an implementation we actually want (IMO, something very much like JodaTime/NodaTime). And then can we figure out how to get there from here.
I mentioned Noda Time before. Just looked up Joda-Time, and: http://joda-time.sourceforge.net/faq.html """ Joda-Time does not support leap seconds. Leap seconds can be supported by writing a new, specialized chronology, or by making a few enhancements to the existing ZonedChronology class. In either case, future versions of Joda-Time will not enable leap seconds by default. Most applications have no need for it, and it might have additional performance costs. """ There's a pattern here: "almost all" people want nothing to do with leap seconds, not even time library developers. That doesn't mean they're right. But it doesn't mean they're wrong either ;-) Without leap seconds, they're all approximating real-life UTC, and in the same way Python's `utc` is.
participants (4)
-
Carl Meyer
-
Chris Barker
-
Ethan Furman
-
Tim Peters