Re: [Datetime-SIG] PEP-431/495

[Tim]
... Later you seem to say you'd prefer a 3-state flag instead, so not sure you really mean "boolean" here.
[Stuart Bishop <stuart@stuartbishop.net>]
I write Python and SQL for a living. Booleans are 3 state to me ;)
Got it! Python is sooooooo behind the times :-)
In this case, I'm not fussed if the datetime instance has a 2 state or 3 state flag. This is different to the various constructors which I think need a 3 state flag in their arguments. True, False, None.
As things seem to have progressed later, mapping pytz's explicit time checking into a more magical scheme sprayed all over the datetime internals is not straightforward, So, as I concluded elsewhere, that may or may not be done someday, but it's out of scope for PEP 495. I'm a fan of making progress ("now is better than never", where the latter was PEP 431's fate waiting for perfection on all counts).
... Grump. I always interpreted that documentation to mean that timezone conversions where *my* problem as the author of the tzinfo implementation.
Conversions, yes; arithmetic, no. The tzinfo methods authors _needed_ to implement were .tzname(), .dst(), and .utcoffset(). Optionally, .fromutc(). None of those are about how arithmetic works. Your particular implementation needed to conflate the two somewhat, since you avoided "hybrid" tzinfo classes in favor of always using fixed-offset classes, which in turn meant arithmetic routinely left you with "a wrong" class for the then-current datetime value. Which was in turn repaired by needing to call .normalize() all over the place, to replace the now-possibly-wrong tzinfo. As I understand it. If so, it's fair to say that was not an anticipated kind of implementation ;-)
I thought it was a documented problem to be fixed if/when Python ever provided more complex tzinfo implementations, and one of the reasons it never did provide such implementations in the first place.
The inability to do conversion correctly in all cases was documented. It annoyed me a lot, because there was no expectation that it would _ever_ "be fixed". As I've often said, I considered that to be datetime's biggest flaw. But, "now is better than never" ;-) , and we ran out of time to do more - datetime had already met all its original design goals for some time, and our mutual employer at the time was understandably annoyed at continuing to pay for more development.
Classic behaviour as you describe it is a bug.
Believe me, you won't get anywhere with that approach. - Classic arithmetic is the only kind that makes good sense in the "naive time" model, which _is_ datetime's model. - Timeline arithmetic is the only kind that makes good sense in the civil-time-based-on-POSIX-approximation-to-UTC model. Which is overwhelmingly the most common model among computer types (although fully understood by relatively few). - Timeline arithmetic _including_ accounting for leap seconds too is the only kind that makes good sense for how civil time (based on real-world UTC) has actually been defined for a few decades now. - It's all but certain that civil time will be redefined yet again someday, in which case only yet another kind of arithmetic will make good sense for that. So "bug" or "feature" depends on which model you have in mind. Absolute statements make no sense. Each kind of arithmetic is "a feature" for the model it intends to serve, and "a bug" for some purposes with respect to all other models. You can legitimately complain that you hate the naive time model, but you can't complain that Python's datetime arithmetic doesn't match datetime's model.
It sounds ok when you state it as 'add one day to today and you get the same time tomorrow'.
That's always rigorously so in the naive time model, and regardless of whether you're talking about 1 day, 24 hours, 1440 minutes, ...
It does not sound ok when you state it as 'add one second to now and you will normally get now + 1 second, but sometimes you will get an instant further in the future, and sometimes you will get an instant in the past'.
Then you have a _different_ model in mind, and you need a different arithmetic for that. Now picture, say, a scientist insisting that the arithmetic _you_ want is WRONG ;-) because it sometimes tells them that, e.g, two moments in time are 1 second apart when in _reality_ they were exactly 2 SI seconds apart (due to a leap second inserted between). The two of you simply have different models in mind. Neither is right, neither is wrong, both want the arithmetic appropriate for the model they favor, and each will be sorely disappointed if they use an arithmetic appropriate for the other's model. That said, I would have preferred it if Python's datetime had used classic arithmetic only for naive datetimes. I feared it might be endlessly confusing if an aware datetime used classic arithmetic too. I'm not sure about "endlessly" now, but it has come up more than once ;-) Far too late to change now, though.
I dispute that this is default behaviour that can't be changed.
The default arithmetic can't be changed. That was settled long ago - there wasn't ever the slightest possibility of changing the default arithmetic. So, while anyone is free to dispute it, there's not much point to that ;-) If someone wants different _default_ arithmetic, they'll need to write a new datetime-ish module.
The different arithmetic only matters when you have a dst-aware aware datetime in play, and Python has never provided any apart perhaps from your original reference implementation (which stopped working in 2006).
You're talking about the toy classes in the datetime docs? Looks like they've been updated all along to match changes in US daylight rules, although I would have kept only _the_ most recent rules as time went by. The original docs only intended to show how a tzinfo might be implemented, picking the then-current US daylight rules. Not really "a reference", a pedagogical example, and simplified to avoid burying the essential concepts under a mountain of arbitrary details.
pytz, however, has always provided timeline arithmetic.
Yes, and it's a marvel! I never expected "the problem" could be wholly solved without changing Python internals. It's a mighty hack :-)
I believe this is the most widely deployed way of obtaining dst-aware datetime instances,
Me too.
and this is the most widely expected behaviour.
Eh - there are many ways to get timeline arithmetic that "almost always" work. The angst over the tiny number of fold/gap cases seems way overblown to me, but so it goes. After PEP 495, it will be easy to get always-correct timeline arithmetic for anyone who wants it, provided they can get 495-conforming tzinfo objects (more below).
If you use pytz tzinfo instances, adding 1 second always adds one second
But only in the _model_ you have in mind: real-life clocks showing real-life civil time suffer from leap seconds too. You can laugh that off in _your_ apps (and I can too ;-) ), but for other apps it's dead serious.
and adding 1 day always adds 24 hours.
That's also true of classic arithmetic. The meanings of "day" and "24 hours" also depend on the model in use.
While calendaring style arithmetic is useful and a valid use case,
In naive time, the distinction you want to make here doesn't really exist: timedelta supports keyword arguments for all the durations that have the same meanings as durations and as "calendar periods" _in_ naive time.
it is useless if the only relative type is the day.
All common units <= a week and >= a microsecond are supported by timedelta, and all work perfectly fine in the naive time model. The extent to which they work for other purposes varies by model and purpose.
You also need months and years and periodic things like 'first sunday every month'. This is too complex to inflict its API on people by default.
Agreed! timedelta supplies only units for which there is no possible argument about behavior _in_ naive time, and left it at that. But do note that things like "first Sunday of the month" are quite easy to implement _building_ on those: you just find the 3rd Sunday of the month then subtract timedelta(weeks=2) ;-)
But pulling in dateutils relative time helpers could be nice.
If there's a groundswell of demand for adding "calendar operations" to Python, I'd be in favor of inviting Gustavo to fold dateutil's calendar operations into the core.
Do systems that rely on classic behavior actually exist?
Of course. A more-or-less subtle example appears later. But we already mentioned dead-obvious uses: things like "same time tomorrow" and "same time two weeks from now" are common as mud, and classic arithmetic implements them fine. So do functions building on those primitives to implement more sophisticated calendar operations. You might complain that naive time "same time tomorrow" makes no sense if someone is starting from 24 hours before what turns out to be a gap due to DST starting, but few in the real world schedule things at such times (e.g., DST transitions never occur during normal "business work hours" if, e.g;, some app postpones a business meeting a week, it's not credible that they'll ever end up in a gap by adding timedelta(weeks=1) - unless they're trying to account for leap seconds too, and the Earth's rotation speeds up "a lot", and "same time next week" ends up exactly in the missing second).
It requires someone to have explicitly chosen to use daylight savings capable timezones, without using pytz, while at the same time relying on classic's surprising arithmetic. Maybe systems using dateutils without using dateutils' implementation of datetime arithmetic.
? dateutil doesn't implement arithmetic that I know of, apart from "relative deltas". It inherits Python's classic arithmetic for datetime - datetime, and datetime +/- timedelta, AFAICT.
I believe that there are many more systems out there that are broken by this behaviour than are relying on this behaviour.
I don't know. I have little code of my own that needs timezones at all; In such code as I have, classic arithmetic works fine almost all the time, because things like "same time tomorrow" are overwhelmingly the only kinds of arithmetic I want. In the very few cases I give a rip about POSIX-approximation-to-real-world durations, I'm either using naive datetimes or tzinfo=timezone.utc, or I use one-liner functions like this one ("like this" because they're so easy to write when needed I never bothered to stick 'em in a module for reuse later): def dt_add(dt, td): return dt.tzinfo.fromutc(dt + (td - dt.utcoffset())) There you go: "timeline" datetime + timedelta arithmetic about as efficiently as possible in pure Python. Note that _if_ the default changed to timeline arithmetic, this code would no longer work. The "+" there requires classic arithmetic to get the right result. Change the default, this code would break too. I find it hard to imagine I'm the only person in the world who has code similarly taking advantage of what Python actually does. Example: from datetime import datetime, timedelta from pytz.reference import Eastern turkey_in = datetime(2004, 10, 30, 15, tzinfo=Eastern) turkey_out = dt_add(turkey_in, timedelta(days=1)) print(turkey_in) print(turkey_out) Output: 2004-10-30 15:00:00-04:00 2004-10-31 14:00:00-05:00 There my end-of-DST-party giant turkey needs to stay in the smoker for exactly 24 hours. That's "1 day" to me, because I think in naive time. The function effectively converts to UTC, adds 24 hours, then converts back, but more efficiently than bothering with .astimezone() in either direction. It correctly accounts for that the end of DST "added an hour", so while I put the turkey in at 3pm Saturday I need to take it out at 2pm Sunday. Note: my dt_add 1-liner may fail in cases starting or landing on a "problem time" (fold/gap). I've never cared, because DST transitions are intentionally scheduled to occur "wee hours on a weekend", i.e. when few people are both awake and sober enough _to_ care. But, after 495 tzinfos are available, the dt_add 1-liner will always work correctly. That this implementation of timeline arithmetic _can_ screw up now has nothing to do with its code, it's inherited from the inability of pure conversion to always work right now.
I think this is a bug worth fixing rather than entrenching, before adding any dst aware tzinfo implementations to stdlib (including 'local').
datetime was released a dozen years ago. There's nothing it does that wasn't already thoroughly entrenched a decade ago.
... However... this also means the new flag on the datetime instances is largely irrelevant to pytz. pytz' API will need to remain the same.
My hope was that 495 alone would at least spare pytz's users from needing to do a `.normalize()` dance after `.astimezone()` anymore. Although I'm not clear on why it's needed even now.
Adding a timedelta to a datetime will give you a datetime in exactly the same offset() and dst() as you started with (because pytz gives you timeline arithmetic, where adding 24 hours actually adds 24 hours), and you will need to fix it using the normalize method after the fact. The is_dst bit is effectively stored on the tzinfo instance currently in play, and having another copy on the datetime instance unnecessary.
Yes, 495 intends to repair conversion in all cases; it has no intent to do anything about arithmetic. A different PEP may address arithmetic later (well, PEP 500 already did, but it's been rejected). I won't be pushing for it, though. As above, after 495 solid timeline arithmetic is very easy to get via 1-line Python functions. Which I personally prefer to use: because I _want_ timeline arithmetic so rarely; using a named function instead makes it very clear that I'm doing something unusual (for me). Other people have different itches to scratch. But to be kinda brutal about it, _any_ catering to timeline arithmetic is misguided: it's enabling poor practices. People who need timeline arithmetic should really be working in UTC, where classic and timeline arithmetic are the same thing, and classic arithmetic runs much faster. My only use for it in a non-UTC datetime is calculating when to take the turkey out of the smoker one day per year ;-)
The new argument to the datetime constructors may be useful, if it accepts tri-state. If the is_dst/first flag accepts True, False or None, then pytz may be able to deprecate the localize method. If a user calls localize(is_dst=None), AmbiguousTImeError and NonExistantTimeError exceptions may be raised, but by default exceptions are not raised. I would also need the opportunity to swap in the correct fixed offset tzinfo instance for the given datetime. (example below)
Losing the localize method will be a huge win for pytz, as it is ugly and causes great confusion and many identical bug reports. The other problem, the normalize method, is less important - if you neglect to call normalize you still get the correct instant, but it may be reported in the incorrect timezone period (EST instead of EDT or vice versa).
There's a lot more about this in the recent "PEP-495 - Strict Invalid Time Checking" thread.
.... I also need to continue to support timeline arithmetic. This requires me not having a single tzinfo instance, but swapping in the correct fixed offset tzinfo instance at the right time. Currently, this uses the awful localize and normalize methods. Ideally, postPEP:
eastern = pytz.timezone('US/Eastern') dt = datetime(2004, 4, 3, 2, 0, 0, tzinfo=eastern) dt2 = dt + timedelta(days=1) eastern is dt.tzinfo False dt.tzinfo is dt2.tzinfo False
Nothing in PEP 495 changes anything about arithmetic behavior. In particular, dt's tzinfo will be copied to dt2 by "+", just as it is now. _Anything_ else would break the very strict backward compatibility constraints Guido established for this PEP.
.... If I can do this, there is no reason that pytz could not also support 'classic' style, but I certainly wouldn't want to encourage its use as my rant above might indicate ;) If I write documentation, it may require some editing, localizing from en_AU to something a little more polite.
I expect pytz users who want classic arithmetic can get it already simply by not using pytz ;-)
... For pytz users, being able to write a function do tell if the data you were given is broken is a step backwards. When constructing a datetime instance with pytz, users have the choice of raising exceptions or having pytz normalize the input. They are never given broken data (by their definition), and there is no need to weed it out.
Assuming they follow all "the rules", yes? For example, if they forget to use .localize(), etc, it seems like anything could happen. What if they use .replace()?: .combine()? Unpickle a datetime representing a missing time? Etc. I don't see that pytz has anything magical to check datetimes created by those.
... I think all functions that can create datetime instances will need the new optional flag and the flag should be tri-state, defaulting to not whine.
See the "PEP-495 - Strict Invalid Time Checking" thread for more. There seems to be increasing "feature creep" here. Rewriting vast swaths of datetime internals to cater to this is at best impractical, especially compared to supplying a "check this datetime" function users who care can call when they care. Nevertheless, it's a suitable subject for a different PEP. I don't want to bog 495 down with it. If it had _stopped_ with asking for an optional check in the datetime constructor, it may have been implemented already ;-)
... The important bit here for pytz is that tzinfo.fromutc() may return a datetime with a different tzinfo instance.
Sorry, didn't follow that. Of course you can write your .fromutc() to return anything you want.
Also, to drop pytz' localize method I need something like 'tzinfo.normalize(dt)', where I have the opportunity to replace the tzinfo the user provided with the one with the correct offset/dst info.
If you're proposing a richer tzinfo interface, that's certainly out of scope for PEP 495. But I don't expect there's any possible way that PEP 495 on its own can replace all of pytz's uses for `normalize()` regardless.
- My argument in favour of 'is_dst' over 'first' is that this is what we have in the data we are trying to load. You commonly have a timestamp with a timezone abbreviation and/or offset. This can easily be converted to an is_dst flag.
You mean by using platform C library functions (albeit perhaps wrapped by Python)?
I really missed an answer to that ;-)
To convert it to a 'first' flag, we need to first parse the datetime,
I'm unclear on this. To get a datetime _at all_ the timestamp has to be converted to calendar notation (year, month, ...). Which is what I'm guessing "parse" means here. That much has to be done in any case.
My example is weak. I'm thinking about parsing a string like:
2004-10-31 01:15 EST-05:00
Even if you know this is US/Eastern and not Estonia, you still need to know that for dates in October EDT is first and EST is not first, and for dates in april EST is first and EDT is not first
In April all times have first=True (or fold=0 in the latest spelling). first=False (fold=1) only occurs for the later times in a fold (during the second of the repeated hours at the end of EDT).
and you need to include a wide enough fuzz factor that future changes to the DST rules won't break your parser.
What does this have to do with datetime? So far you haven't mentioned any datetime - or pytz - operations.
But I guess a general purpose parser that cares would construct instances 3 days before and a 3 days later and use whichever tzinfo had the correct offset. Or just use a fixed offset tzinfo.
Sorry, I'm still not grasping what "the problem" is here. In pytz, you would presumably create a datetime with an Olson-derived US/Eastern timezone. That would internally search for where 2004-10-31 06:15 (the UTC spelling of your example) lands in the list of transitions, and deduce more-or-less directly that the original time is the later of times in a fold. If you're _not_ using datetime or pytz at all, then you have no reason to _want_ to compute first/fold to begin with, right?
... I despair at the bug reports, questions and general confusion that will occur if dst-aware tzinfo implementations are added to stdlib. At the moment, it is an obscure wart despite its age. It will become an in your face wart as soon as a tzlocal implementation is landed, and a wart people will be angry about because they won't realize it is there until their production system loses an hours worth of orders because their Python app spat out an hours worth of invalid timestamps right around Halloween sale time. But I'm drifting off into hyperbole.
But entertaining hyperbole, so it's appreciated :-) After 495 is implemented, huge swaths of confusing docs can moved into an appendix, covering all the rules and reasons for why ancient tzinfo implementations didn't allow for correct conversions in all cases. And that will make room for huge swaths of new confusing docs. But there's every reason to be optimistic: even someone as old and in-the-way as me doesn't find any of this particularly confusing ;-)
participants (1)
-
Tim Peters