Re: [Datetime-SIG] PEP-431/495
Tim Peters <tim.peters@gmail.com> writes:
[Tim]
... Later you seem to say you'd prefer a 3-state flag instead, so not sure you really mean "boolean" here.
[Stuart Bishop <stuart@stuartbishop.net>]
I write Python and SQL for a living. Booleans are 3 state to me ;)
Got it! Python is sooooooo behind the times :-)
In this case, I'm not fussed if the datetime instance has a 2 state or 3 state flag. This is different to the various constructors which I think need a 3 state flag in their arguments. True, False, None.
As things seem to have progressed later, mapping pytz's explicit time checking into a more magical scheme sprayed all over the datetime internals is not straightforward, So, as I concluded elsewhere, that may or may not be done someday, but it's out of scope for PEP 495. I'm a fan of making progress ("now is better than never", where the latter was PEP 431's fate waiting for perfection on all counts).
Even if datetime's or replace()'s *first* parameter would be 3-state None|True|False; the internal flag can still be 2-state True|False. first=None could cause a tzinfo callback (it implies that tzinfo must not be None in this case) that sets *first* to True|False appropriately. ...
Do systems that rely on classic behavior actually exist?
Of course. A more-or-less subtle example appears later. But we already mentioned dead-obvious uses: things like "same time tomorrow" and "same time two weeks from now" are common as mud, and classic arithmetic implements them fine. So do functions building on those primitives to implement more sophisticated calendar operations. You might complain that naive time "same time tomorrow" makes no sense if someone is starting from 24 hours before what turns out to be a gap due to DST starting, but few in the real world schedule things at such times (e.g., DST transitions never occur during normal "business work hours" if, e.g;, some app postpones a business meeting a week, it's not credible that they'll ever end up in a gap by adding timedelta(weeks=1) - unless they're trying to account for leap seconds too, and the Earth's rotation speeds up "a lot", and "same time next week" ends up exactly in the missing second).
There is a $5 wifi button that can be used to track baby data. Python helps at various stages: https://medium.com/@edwardbenson/how-i-hacked-amazon-s-5-wifi-button-to-trac... Babies can poop at night and during DST transitions too. Sleep-deprived parents should be able to see the tracking data in local time in addition to UTC (doing timezone conversions is computer's job). On the internet, people may cooperate while being in different time zones i.e., even "business" software might have to work during DST transitions. MMORPGs are probably also not limited to a single time zone. Non-pytz timezones make mistake on the order of an hour regularly. It is *three orders of magnitude larger* than a second. It is a different class of errors. The code that can't handle ~1s errors over short period of time should use time.monotonic() anyway. ...
It requires someone to have explicitly chosen to use daylight savings capable timezones, without using pytz, while at the same time relying on classic's surprising arithmetic. Maybe systems using dateutils without using dateutils' implementation of datetime arithmetic.
? dateutil doesn't implement arithmetic that I know of, apart from "relative deltas". It inherits Python's classic arithmetic for datetime - datetime, and datetime +/- timedelta, AFAICT.
dateutil doesn't work during DST transitions but PEP 495 might allow to fix it. As I understand, outside of DST transitions if dates are unique valid local times; dateutil uses "same time tomorrow": (d_with_dateutil_tzinfo + DAY == d.tzinfo.localize(d.replace(tzinfo=None) + DAY, is_dst=None)) while pytz uses "+24 hours": dt_add(d_with_dateutil_tzinfo, DAY) == d + DAY where dt_add() is defined below. The equility works but (d + DAY) may have a wrong tzinfo object if the arithmetic crosses DST boundaries (but it has correct timestamp/utc time anyway). d.tzinfo.normalize(d + DAY) should be used to get the correct tzinfo e.g. for displaying the result. Both types of operations should be supported.
... def dt_add(dt, td): return dt.tzinfo.fromutc(dt + (td - dt.utcoffset()))
... Note: my dt_add 1-liner may fail in cases starting or landing on a "problem time" (fold/gap). I've never cared, because DST transitions are intentionally scheduled to occur "wee hours on a weekend", i.e. when few people are both awake and sober enough _to_ care. But, after 495 tzinfos are available, the dt_add 1-liner will always work correctly. That this implementation of timeline arithmetic _can_ screw up now has nothing to do with its code, it's inherited from the inability of pure conversion to always work right now.
Such choices should make an application developer, not a library/language developer. library/language should avoid silent errors as much as possible.
I think this is a bug worth fixing rather than entrenching, before adding any dst aware tzinfo implementations to stdlib (including 'local').
datetime was released a dozen years ago. There's nothing it does that wasn't already thoroughly entrenched a decade ago.
pytz is widely used. datetime objects with dateutil and pytz tzinfo behave differently as shown above. There are no non-fixed tzinfos in stdlib. dst-tzinfo in stdlib could adopt either pytz or dateutil behavior. If dateutil can be fixed to work correctly using the disambiguation flag then its behavior is preferable because it eliminates localize, normalize calls except localize() could be useful in __new__ if first parameter is None to raise an exception for invalid input otherwise it is equivalent to the default *first* value.
... However... this also means the new flag on the datetime instances is largely irrelevant to pytz. pytz' API will need to remain the same.
My hope was that 495 alone would at least spare pytz's users from needing to do a `.normalize()` dance after `.astimezone()` anymore. Although I'm not clear on why it's needed even now.
As far as I know, normalize() is not necessary after astimezone() even now https://answers.launchpad.net/pytz/+question/249229
... For pytz users, being able to write a function do tell if the data you were given is broken is a step backwards. When constructing a datetime instance with pytz, users have the choice of raising exceptions or having pytz normalize the input. They are never given broken data (by their definition), and there is no need to weed it out.
Assuming they follow all "the rules", yes? For example, if they forget to use .localize(), etc, it seems like anything could happen. What if they use .replace()?: .combine()? Unpickle a datetime representing a missing time? Etc. I don't see that pytz has anything magical to check datetimes created by those.
If people forget localize() then tzinfo is not attached and an exception is raised later. It is like mixing bytes and Unicode: if you forget decode() then an exception is raised later. replace() is just a shortcut for a constructor. combine() returns naive objects. You can unpickle non-normalized datetime. replace(first=None) may force normalization. Your program may avoid producing non-normalized values. You can choose to save utc+tzid (to use whatever tzdata is available) or utc+tzid+tzdata-version (if you need the same local time) to restore it later.
... I think all functions that can create datetime instances will need the new optional flag and the flag should be tri-state, defaulting to not whine.
See the "PEP-495 - Strict Invalid Time Checking" thread for more. There seems to be increasing "feature creep" here. Rewriting vast swaths of datetime internals to cater to this is at best impractical, especially compared to supplying a "check this datetime" function users who care can call when they care. Nevertheless, it's a suitable subject for a different PEP. I don't want to bog 495 down with it. If it had _stopped_ with asking for an optional check in the datetime constructor, it may have been implemented already ;-)
It may be a subject for another PEP but here's a possible implementation: class datetime: def __new__(...): if first is None and hasattr(tzinfo, 'localize'): self = tzinfo.localize(naive, is_dst=None) # may raise InvalidTime note: self.first is never None i.e., utcoffset(), tzname(), dst() etc always see either first=True or first=False.
[Akira Li <4kir4.1i@gmail.com>]
... Even if datetime's or replace()'s *first* parameter would be 3-state None|True|False; the internal flag can still be 2-state True|False.
first=None could cause a tzinfo callback (it implies that tzinfo must not be None in this case) that sets *first* to True|False appropriately.
Well, you have your ideas on this, and others have theirs. This isn't going to make progress until the people who want it get together and agree among themselves first on a single, unified, comprehensive proposal. So please take this to the following thread instead (but after reading all of it first ;-) ): PEP-495 - Strict Invalid Time Checking
... There is a $5 wifi button that can be used to track baby data. Python helps at various stages: https://medium.com/@edwardbenson/how-i-hacked-amazon-s-5-wifi-button-to-trac...
Babies can poop at night and during DST transitions too. Sleep-deprived parents should be able to see the tracking data in local time in addition to UTC (doing timezone conversions is computer's job).
Nobody has said some apps don't need reliable conversions (to the contrary, that's the primary _point_ of PEP 495). Nobody has said some apps don't need timeline arithmetic - although I have said it's poor practice to even _try_ to do timeline arithmetic if an app isn't working in UTC or with naive datetimes. If an app is following best practice (UTC or naive datetimes), then timeline arithmetic is what they _always_ get (it's the same thing as classic arithmetic in those contexts).
On the internet, people may cooperate while being in different time zones i.e., even "business" software might have to work during DST transitions. MMORPGs are probably also not limited to a single time zone.
Ditto.
Non-pytz timezones make mistake on the order of an hour regularly. It is *three orders of magnitude larger* than a second. It is a different class of errors. The code that can't handle ~1s errors over short period of time should use time.monotonic() anyway.
Apps that care about leap seconds _should_ be using TAI. Apps that want timeline arithmetic _should_ be using UTC. Unfortunately, people shoot themselves in the feet all the time. Python can't stop that. But it doesn't have to _cater_ to poor practices either.
... dateutil doesn't work during DST transitions but PEP 495 might allow to fix it.
I don't know what "doesn't work" means, precisely. There are certain behaviors that do and don't work as you might hope. For example, even the stupidest possible tzinfo implementation that follows the docs today has no problem converting from UTC to local time across DST transitions - the default .fromutc() was designed to ensure that conversion in _that_ direction mimics the local clock in all cases (including skipping a local hour at DST start, and repeating a local hour at DST end - where "hour" really means "whole number of minutes"). What's impossible now (outside of pytz) is converting ambiguous local times _back_ to UTC in all cases. PEP 495 will repair that - that's its primary point. There's no "might" about it. But, for that to be of use to dateutil users, dateutil will need to change its tzinfo implementation to meet 495's new tzinfo requirements.
As I understand, outside of DST transitions if dates are unique valid local times; dateutil uses "same time tomorrow":
(d_with_dateutil_tzinfo + DAY == d.tzinfo.localize(d.replace(tzinfo=None) + DAY, is_dst=None))
while pytz uses "+24 hours":
dt_add(d_with_dateutil_tzinfo, DAY) == d + DAY
where dt_add() is defined below. The equility works but (d + DAY) may have a wrong tzinfo object if the arithmetic crosses DST boundaries (but it has correct timestamp/utc time anyway). d.tzinfo.normalize(d + DAY) should be used to get the correct tzinfo e.g. for displaying the result.
Both types of operations should be supported.
If you're saying that classic and timeline arithmetic both have legitimate uses, sure. Nobody has said otherwise. If you're trying to say more than just that, sorry, I missed the point. As to "supported", there are _degrees_ of support, and Python very obviously favors classic arithmetic. That can't change. I personally have no interest in providing more support for timeline arithmetic _beyond_ getting PEP 495 implemented so that error-free timeline arithmetic _can_ be implemented easily. At that point, my interest ends. I believe I've already been very clear that it's fine by me if the only further support Python supplies is to add some one-line Python functions to the docs implementing the 3 flavors of timeline arithmetic (datetime-datetime and datetime +/- timedelta) - but near the end of a new section explaining that working with UTC datetimes instead is far better practice fur timeline arithmetic use cases.
... pytz is widely used. datetime objects with dateutil and pytz tzinfo behave differently as shown above.
There are no non-fixed tzinfos in stdlib. dst-tzinfo in stdlib could adopt either pytz or dateutil behavior.
I don't know whether Stuart mucked with arithmetic because he believed that was necessary in order to get conversions to work correctly (if so, he was mistaken), or whether the effects on arithmetic were just a _consequence_ of using fixed-offset classes all the time (that's "a natural" outcome of using only fixed-offset classes - it would take extra effort to _stop_ it - classic and timeline arithmetic are the same thing in any eternally-fixed-offset timezone) . He said, in an earlier message, that conversion was his primary concern. But maybe we're all using the same words with different meanings. In any case, conversions are my - and PEP 495's - only real concern. Because timeline arithmetic is inappropriate for datetime's "naive time" model, is incompatible with what Python has been doing for a dozen years already, is far slower than classic arithmetic, and because people who need timeline arithmetic "shouldn't be" using non-UTC aware-datetimes at all for arithmetic, I don't see any chance of pytz's behaviors being adopted in all respects by Python. Nor dateutil's. That one can't always do conversions correctly today. After PEP 495 is implemented, whoever steps up to supply a wrapping of the Olson database with 495-compliant tzinfos will probably get rubber-stamp approval to fold it into the core. I'd also like to see dateutil's wrappings of timezones obtained from VTIMEZONE files, POSIX-TZ strings, and the Microsoft registry folded in. Not all apps _can_ use zoneinfo. zoneinfo is by far the most important, though. I prioritize. That's something mailing lists are incapable of, which is why no mailing list has ever released any software ;-)
If dateutil can be fixed to work correctly using the disambiguation flag then its behavior is preferable because it eliminates localize, normalize calls
Then you get classic arithmetic. Which is not only fine by me, I believe it's the only realistic outcome for the reasons explained just above.
except localize() could be useful in __new__ if first parameter is None to raise an exception for invalid input otherwise it is equivalent to the default *first* value.
That one is for the "PEP-495 - Strict Invalid Time Checking" thread.
... My hope was that 495 alone would at least spare pytz's users from needing to do a `.normalize()` dance after `.astimezone()` anymore. Although I'm not clear on why it's needed even now.
As far as I know, normalize() is not necessary after astimezone() even now https://answers.launchpad.net/pytz/+question/249229
That agrees with my best guess, but my knowledge of pytz is shallow. If it's correct that the .normalize() dance isn't needed here, it would be nice if Stuart plainly said so on that page, and - of course - changed the docs to stop saying it _is_ required. And then it's also the case that I don't see any benefit to pytz from PEP 495 alone. :-(
For pytz users, being able to write a function do tell if the data you were given is broken is a step backwards. When constructing a datetime instance with pytz, users have the choice of raising exceptions or having pytz normalize the input. They are never given broken data (by their definition), and there is no need to weed it out.
Assuming they follow all "the rules", yes? For example, if they forget to use .localize(), etc, it seems like anything could happen. What if they use .replace()?: .combine()? Unpickle a datetime representing a missing time? Etc. I don't see that pytz has anything magical to check datetimes created by those.
If people forget localize() then tzinfo is not attached and an exception is raised later. It is like mixing bytes and Unicode: if you forget decode() then an exception is raised later.
AFAICT, pytz can't enforce anything. You don't _need_ to call localize() to get _a_ datetime. From scanning message boards, e.g., I see it's a common mistake for new pytz users to use datetime.datetime(..., tzinfo=...;) directly, not using localize() at all, despite the very clear instructions in the docs that they must _not_ do that. That can be a real problem for modules fighting basic design warts: newcomers are lost at first, and even experts can have trouble inter-operating with code _outside_ what typically becomes an increasingly self-contained world (e.g., Isaac cheerfully complained earlier about his pains trying to get pytz and dateutil to work together).
replace() is just a shortcut for a constructor.
Yet pytz does nothing to check .replace() results, right?
combine() returns naive objects.
Not always true. Plain `time` objects can have a tzinfo of their own. Pass one of those to .combine(), and you get an aware datetime. And it's generally impossible to check a `time` on its own for fold/gap - you generally need a date too to have any chance of determining that. Anyway, that - and the rest below - belong in the "PEP-495 - Strict Invalid Time Checking" thread. I'm outta here ;-)
...
participants (2)
-
Akira Li
-
Tim Peters