Moved from python-dev to the datetime SIG
TL;DR:
There are are two types of operations on datetimes: adding subtracting
actual time units (multiples of seconds) and calendar operations: things
like "this time the next day".
I think the the datetime module should only support the former -- i.e. the
same old timedelta we have. In that context, storing datetimes in UTC,
doing all operations there, and converting back makes the most sense.
But for calendar manipulations, they need to be done in a time-zone aware
way.
If you're really interested, I got quite carried away below....
On Mon, Jul 27, 2015 at 10:03 PM, Lennart Regebro
On Tue, Jul 28, 2015 at 12:03 AM, Tim Peters
wrote: timedelta objects only store days, seconds, and microseconds,
Except that they don't actually store days. They store 24 hour periods, which, because of timezones changing, is not the same thing.
yes, it is ;-) And as you have repeated
many times now, the datetime module's arithmetic is "naive" ie, it assumes that one day is always 24 hours. The problem with that assumption is that it isn't true.
it's not an assumption, it's a definition. We have a serious semantic problem here -- and I _think_ it's the source of almost all the discussion in this thread -- I'm not sure what to do about it though. Tim understands this stuff, Lennart understands this stuff, I'm pretty sure I understand this stuff -- I'm not sure about everyone else on this discussion, but they probably all understand it, too -- the only thing any of us doesn't understand is what the heck anyone is talking about! Maybe use Tim's approach of "naive" -- so a naive day is exactly 24 hours period end of story -- this is the definition. But what do we call what I've been trying to refer to as "calendar" operations? A summary for (maybe) some clarity: We have this nifty model of a continuous time axis -- moving along at a steady rate from the beginning of the Universe to the end of the Universe. Modulo relativity, it works pretty well. Then we have units of time spans: in SI units it's seconds. then a bunch of other units that are clearly defined in terms of seconds: minutes (60 s) hours (60 min) days (24 hrs). And of course milliseconds and microseconds. Then we have Calendars: this is the year, month, day, etc. we are all familiar with (and the hours in that day 4:00 o'clock pm, etc) -- confusing here is that we use the same word for "day" and "hour" as part of a calendar descriptions AND also as timespan units -- but they are NOT the same thing (yes, they are related). Calendars are how we map a nice human understandable (and historically based) naem onto the theoretically time axis. Being both human-oriented and with a lot of historical baggage, calendar naming is designed to fit more or less with relationship between the earth and the sun. So we want the Solstices to fall around the same date every year, for instance, and we want 1200 hrs to be around the middle of the day. This is why it all gets ugly, because the various celestial phenomena aren't nice integers multiples of each-other (hence the need for leap years) and even constant (hence the need for leap seconds). And, of course, the earth is round, and the sun's relative position to each point on earth is different, hence the need for time zones. (then add political differences for it to get really ugly. So: I think there are more or less two types of manipulations one might need: What is currently supported by the datetime module, and I think Tim is referring to as "naive" time operations: adding, subtracting, units of time along the theoretical time axis. This is actually really simple math -- as Tim points out all the timedelta object really is is a fancy integer. Then there are what I am calling "calendar" operations -- these are operations that only make sense with a calendar (and, in fact a timezone, I think). this is operations like: "the same time two days later" -- this is not the same as moving two days (48 hrs) along the time axis -- it simply is not. It is a shame that we use the name "day" to refer to both 24 hours along the time axis and enumeration of sunrises in a month -- or, the thing we use on calendars. There has been a lot of chatter about "tomorrow" or "adding three days", as these bring up the ugly DST issues, but once you add one calendaring operation, people will want (and they should) more: next month, next year, or even uglier, next business day, etc. These are very useful things, but I argue they belong, as a unit, in a separate package -- maybe for potential inclusion in the standard library, but I don't think that's on the table now. (and doesn't dateutil support many of these?) Now on to time zones: the datetime package is useful because it not only supports time-axis arithmetic (really pretty trivial -- it's just integer arithmetic), but it supports translating from time-axis units (microseconds since some epoch) to Calendar units: year, month, day, hour, minute, seconds, microseconds (Using the (proleptic?) Gregorian Calendar -- note that different Calendars are a whole other ball of wax! This is where the magic is (or really where the ugly code is) , and where tiem zones come in. UTC time is the calendar time at one longitude on the earth. (ignoring leap seconds for the moment) it is relatively simple: continuous, etc, no DST, no changing definitions, etc. It is a useful reference system for this reason. math, and all that, are easier in this zone. By default, Python datetime objects are "naive" -- meaning they know nothing of timezones or daylight savings -- they can convert back and forth between the internal representation (time span since an epoch) and human calendar times (Gregorian, anyway). It turns out that you can use naive datetimes as UTC time -- there really is no difference, until you want to convert to a different time zone -- with UTC, you may be able to do that, with naive, you can't do that unless you specify what time zone the time is, and then it's no longer naive ;-) But how should pyton handle time zones? Given all the ugliness of DST and changing time zones, and all that, UTC is the lingua franca of time -- time zones are defined by how the are offset from UTC, and in UTC (as in naive), math is relatively easy. So the best way to handle time zones is to store and manipulate everything in UTC, and then convert to/from the calendar representation using the time zone, when the user needs (or provides). I've been trying to figure out what all the confusion, discussion has been about, and I think it's this: If you want to do a Calendar operation, like "this time tomorrow", then THAT is best done in a timezone aware way -- in particular, a DST-aware way. i.e. from 12:00 one day to 12:00 the next day will generally be 24 hours, but might be 23 or 25 hours if crossing a DST transition. But if we aren't supporting those operations, we don't need to worry about that now. -Chris
_______________________________________________ Python-Dev mailing list Python-Dev@python.org https://mail.python.org/mailman/listinfo/python-dev Unsubscribe: https://mail.python.org/mailman/options/python-dev/chris.barker%40noaa.gov
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
participants (1)
-
Chris Barker