[Datetime-SIG] PEP-431/495

Wed Aug 26 10:16:42 CEST 2015

[Tim]
>> ...
>> Later you seem to say you'd prefer a 3-state flag instead, so not sure
>> you really mean "boolean" here.

[Stuart Bishop <stuart at stuartbishop.net>]
> I write Python and SQL for a living. Booleans are 3 state to me ;)

Got it!  Python is sooooooo behind the times :-)

> In this case, I'm not fussed if the datetime instance has a 2 state or
> 3 state flag. This is different to the various constructors which I
> think need a 3 state flag in their arguments. True, False, None.

As things seem to have progressed later, mapping pytz's explicit time
checking into a more magical scheme sprayed all over the datetime
internals is not straightforward,  So, as I concluded elsewhere, that
may or may not be done someday, but it's out of scope for PEP 495.
I'm a fan of making progress ("now is better than never", where the
latter was PEP 431's fate waiting for perfection on all counts).

> ...
> Grump. I always interpreted that documentation to mean that timezone
> conversions where *my* problem as the author of the tzinfo
> implementation.

Conversions, yes; arithmetic, no.  The tzinfo methods authors _needed_
to implement were .tzname(), .dst(), and .utcoffset().  Optionally,
.fromutc().  None of those are about how arithmetic works.  Your
particular implementation needed to conflate the two somewhat, since
you avoided "hybrid" tzinfo classes in favor of always using
fixed-offset classes, which in turn meant arithmetic routinely left
you with "a wrong" class for the then-current datetime value.  Which
was in turn repaired by needing to call .normalize() all over the
place, to replace the now-possibly-wrong tzinfo.  As I understand it.
If so, it's fair to say that was not an anticipated kind of
implementation ;-)

> I thought it was a documented problem to be fixed if/when Python
> ever provided more complex tzinfo implementations, and
> one of the reasons it never did provide such implementations in the
> first place.

The inability to do conversion correctly in all cases was documented.
It annoyed me a lot, because there was no expectation that it would
_ever_ "be fixed".  As I've often said, I considered that to be
datetime's biggest flaw.  But, "now is better than never" ;-) , and we
ran out of time to do more - datetime had already met all its original
design goals for some time, and our mutual employer at the time was
understandably annoyed at continuing to pay for more development.

> Classic behaviour as you describe it is a bug.

Believe me, you won't get anywhere with that approach.

- Classic arithmetic is the only kind that makes good sense in the
"naive time" model, which _is_ datetime's model.

- Timeline arithmetic is the only kind that makes good sense in the
civil-time-based-on-POSIX-approximation-to-UTC model.  Which is
overwhelmingly the most common model among computer types (although
fully understood by relatively few).

- Timeline arithmetic _including_ accounting for leap seconds too is
the only kind that makes good sense for how civil time (based on
real-world UTC) has actually been defined for a few decades now.

- It's all but certain that civil time will be redefined yet again
someday, in which case only yet another kind of arithmetic will make
good sense for that.

So "bug" or "feature" depends on which model you have in mind.
Absolute statements make no sense.  Each kind of arithmetic is "a
feature" for the model it intends to serve, and "a bug" for some
purposes with respect to all other models.

You can legitimately complain that you hate the naive time model, but
you can't complain that Python's datetime arithmetic doesn't match
datetime's model.

> It sounds ok when you state it as 'add one day to today and you get
> the same time tomorrow'.

That's always rigorously so in the naive time model, and regardless of
whether you're talking about 1 day, 24 hours, 1440 minutes, ...

> It does not sound ok when you state it as 'add one second to now and
> you will normally get now + 1 second, but sometimes you will get an
> instant further in the future, and sometimes you will get an instant
> in the past'.

Then you have a _different_ model in mind, and you need a different
arithmetic for that.  Now picture, say, a scientist insisting that the
arithmetic _you_ want is WRONG ;-) because it sometimes tells them
that, e.g, two moments in time are 1 second apart when in _reality_
they were exactly 2 SI seconds apart (due to a leap second inserted
between).

The two of you simply have different models in mind.  Neither is
right, neither is wrong, both want the arithmetic appropriate for the
model they favor, and each will be sorely disappointed if they use an
arithmetic appropriate for the other's model.

That said, I would have preferred it if Python's datetime had used
classic arithmetic only for naive datetimes.  I feared it might be
endlessly confusing if an aware datetime used classic arithmetic too.
I'm not sure about "endlessly" now, but it has come up more than once
;-)  Far too late to change now, though.

> I dispute that this is default behaviour that can't be changed.

The default arithmetic can't be changed.  That was settled long ago -
there wasn't ever the slightest possibility of changing the default
arithmetic.  So, while anyone is free to dispute it, there's not much
point to that ;-)  If someone wants different _default_ arithmetic,
they'll need to write a new datetime-ish module.

> The different arithmetic only matters when you have a dst-aware aware
> datetime in play, and Python has never provided any apart perhaps from
> your original reference implementation (which stopped working in
> 2006).

You're talking about the toy classes in the datetime docs?  Looks like
they've been updated all along to match changes in US daylight rules,
although I would have kept only _the_ most recent rules as time went
by.  The original docs only intended to show how a tzinfo might be
implemented, picking the then-current US daylight rules.  Not really
"a reference", a pedagogical example, and simplified to avoid burying
the essential concepts under a mountain of arbitrary details.

> pytz, however, has always provided timeline arithmetic.

Yes, and it's a marvel!  I never expected "the problem" could be
wholly solved without changing Python internals.  It's a mighty hack
:-)

> I believe this is the most widely deployed way of obtaining dst-aware
> datetime instances,

Me too.

> and this is the most widely expected behaviour.

Eh - there are many ways to get timeline arithmetic that "almost
always" work.  The angst over the tiny number of fold/gap cases seems
way overblown to me, but so it goes.  After PEP 495, it will be easy
to get always-correct timeline arithmetic for anyone who wants it,
provided they can get 495-conforming tzinfo objects (more below).

> If you use pytz tzinfo instances, adding 1 second always adds one second

But only in the _model_ you have in mind:  real-life clocks showing
real-life civil time suffer from leap seconds too.  You can laugh that
off in _your_ apps (and I can too ;-) ), but for other apps it's dead
serious.

> and adding 1 day always adds 24 hours.

That's also true of classic arithmetic.  The meanings of "day" and "24
hours" also depend on the model in use.

> While calendaring style arithmetic is useful and a valid use case,

In naive time, the distinction you want to make here doesn't really
exist:  timedelta supports keyword arguments for all the durations
that have the same meanings as durations and as "calendar periods"
_in_ naive time.

> it is useless if the only relative type is the day.

All common units <= a week and >= a microsecond are supported by
timedelta, and all work perfectly fine in the naive time model.  The
extent to which they work for other purposes varies by model and
purpose.

> You also need months and years and periodic things like 'first
> sunday every month'. This is too complex to inflict its API on
> people by default.

Agreed!  timedelta supplies only units for which there is no possible
argument about behavior _in_ naive time, and left it at that.  But do
note that things like "first Sunday of the month" are quite easy to
implement _building_ on those:  you just find the 3rd Sunday of the
month then subtract timedelta(weeks=2) ;-)

> But pulling in dateutils relative time helpers could be nice.

If there's a groundswell of demand for adding "calendar operations" to
Python, I'd be in favor of inviting Gustavo to fold dateutil's
calendar operations into the core.

> Do systems that rely on classic behavior actually exist?

Of course.  A more-or-less subtle example appears later.  But we
already mentioned dead-obvious uses:  things like "same time tomorrow"
and "same time two weeks from now" are common as mud, and classic
arithmetic implements them fine.  So do functions building on those
primitives to implement more sophisticated calendar operations.  You
might complain that naive time "same time tomorrow" makes no sense if
someone is starting from 24 hours before what turns out to be a gap
due to DST starting, but few in the real world schedule things at such
times (e.g., DST transitions never occur during normal "business work
hours"  if, e.g;, some app postpones a business meeting a week, it's
not credible that they'll ever end up in a gap by adding
timedelta(weeks=1) - unless they're trying to account for leap seconds
too, and the Earth's rotation speeds up "a lot", and "same time next
week" ends up exactly in the missing second).

> It requires someone to have explicitly chosen to use daylight savings
> capable timezones, without using pytz, while at the same time relying on
> classic's surprising arithmetic. Maybe systems using dateutils without
> using dateutils' implementation of datetime arithmetic.

? dateutil doesn't implement arithmetic that I know of, apart from
"relative deltas".  It inherits Python's classic arithmetic for
datetime - datetime, and datetime +/- timedelta, AFAICT.

> I believe that there are many more systems out there that are broken by this
> behaviour than are relying on this behaviour.

I don't know.  I have little code of my own that needs timezones at
all;  In such code as I have, classic arithmetic works fine almost all
the time, because things like "same time tomorrow" are overwhelmingly
the only kinds of arithmetic I want.  In the very few cases I give a
rip about POSIX-approximation-to-real-world durations, I'm either
using naive datetimes or tzinfo=timezone.utc, or I use one-liner
functions like this one ("like this" because they're so easy to write
when needed I never bothered to stick 'em in a module for reuse
later):

    def dt_add(dt, td):
        return dt.tzinfo.fromutc(dt + (td - dt.utcoffset()))

There you go:  "timeline" datetime + timedelta arithmetic about as
efficiently as possible in pure Python.  Note that _if_ the default
changed to timeline arithmetic, this code would no longer work.  The
"+" there requires classic arithmetic to get the right result.  Change
the default, this code would break too.  I find it hard to imagine I'm
the only person in the world who has code similarly taking advantage
of what Python actually does.

Example:

    from datetime import datetime, timedelta
    from pytz.reference import Eastern

    turkey_in = datetime(2004, 10, 30, 15, tzinfo=Eastern)
    turkey_out = dt_add(turkey_in, timedelta(days=1))
    print(turkey_in)
    print(turkey_out)

Output:

    2004-10-30 15:00:00-04:00
    2004-10-31 14:00:00-05:00

There my end-of-DST-party giant turkey needs to stay in the smoker for
exactly 24 hours.  That's "1 day" to me, because I think in naive
time.  The function effectively converts to UTC, adds 24 hours, then
converts back, but more efficiently than bothering with .astimezone()
in either direction.  It correctly accounts for that the end of DST
"added an hour", so while I put the turkey in at 3pm Saturday I need
to take it out at 2pm Sunday.

Note:  my dt_add 1-liner may fail in cases starting or landing on a
"problem time" (fold/gap).  I've never cared, because DST transitions
are intentionally scheduled to occur "wee hours on a weekend", i.e.
when few people are both awake and sober enough _to_ care.  But, after
495 tzinfos are available, the dt_add 1-liner will always work
correctly.  That this implementation of timeline arithmetic _can_
screw up now has nothing to do with its code, it's inherited from the
inability of pure conversion to always work right now.

> I think this is a bug worth fixing rather than entrenching, before
> adding any dst aware tzinfo implementations to stdlib (including
> 'local').

datetime was released a dozen years ago.  There's nothing it does that
wasn't already thoroughly entrenched a decade ago.

> ...
> However... this also means the new flag on the datetime instances is
> largely irrelevant to pytz.  pytz' API will need to remain the same.

My hope was that 495 alone would at least spare pytz's users from
needing to do a `.normalize()` dance after `.astimezone()` anymore.
Although I'm not clear on why it's needed even now.

> Adding a timedelta to a datetime will give you a datetime in exactly
> the same offset() and dst() as you started with (because pytz gives
> you timeline arithmetic, where adding 24 hours actually adds 24
> hours), and you will need to fix it using the normalize method after
> the fact. The is_dst bit is effectively stored on the tzinfo instance
> currently in play, and having another copy on the datetime instance
> unnecessary.

Yes, 495 intends to repair conversion in all cases; it has no intent
to do anything about arithmetic.

A different PEP may address arithmetic later (well, PEP 500 already
did, but it's been rejected).  I won't be pushing for it, though.  As
above, after 495 solid timeline arithmetic is very easy to get via
1-line Python functions.  Which I personally prefer to use:   because
I _want_ timeline arithmetic so rarely; using a named function instead
makes it very clear that I'm doing something unusual (for me).  Other
people have different itches to scratch.

But to be kinda brutal about it, _any_ catering to timeline arithmetic
is misguided:  it's enabling poor practices.  People who need timeline
arithmetic should really be working in UTC, where classic and timeline
arithmetic are the same thing, and classic arithmetic runs much
faster.  My only use for it in a non-UTC datetime is calculating when
to take the turkey out of the smoker one day per year ;-)

> The new argument to the datetime constructors may be useful, if it
> accepts tri-state. If the is_dst/first flag accepts True, False or
> None, then pytz may be able to deprecate the localize method. If a
> user calls localize(is_dst=None), AmbiguousTImeError and
> NonExistantTimeError exceptions may be raised, but by default
> exceptions are not raised. I would also need the opportunity to swap
> in the correct fixed offset tzinfo instance for the given datetime.
> (example below)
>
> Losing the localize method will be a huge win for pytz, as it is ugly
> and causes great confusion and many identical bug reports. The other
> problem, the normalize method, is less important - if you neglect to
> call normalize you still get the correct instant, but it may be
> reported in the incorrect timezone period (EST instead of EDT or vice
> versa).

There's a lot more about this in the recent "PEP-495 - Strict Invalid
Time Checking" thread.

> ....
> I also need to continue to support timeline arithmetic. This requires
> me not having a single tzinfo instance, but swapping in the correct
> fixed offset tzinfo instance at the right time. Currently, this uses
> the awful localize and normalize methods. Ideally, postPEP:
>
> >>> eastern = pytz.timezone('US/Eastern')
> >>> dt = datetime(2004, 4, 3, 2, 0, 0, tzinfo=eastern)
> >>> dt2 = dt + timedelta(days=1)
> >>> eastern is dt.tzinfo
> False
> >>> dt.tzinfo is dt2.tzinfo
> False

Nothing in PEP 495 changes anything about arithmetic behavior.  In
particular, dt's tzinfo will be copied to dt2 by "+", just as it is
now.  _Anything_ else would break the very strict backward
compatibility constraints Guido established for this PEP.

> ....
> If I can do this, there is no reason that pytz could not also support
> 'classic' style, but I certainly wouldn't want to encourage its use as
> my rant above might indicate ;) If I write documentation, it may
> require some editing, localizing from en_AU to something a little more
> polite.

I expect pytz users who want classic arithmetic can get it already
simply by not using pytz ;-)

> ...
> For pytz users, being able to write a function do tell if the data you
> were given is broken is a step backwards. When constructing a datetime
> instance with pytz, users have the choice of raising exceptions or
> having pytz normalize the input. They are never given broken data (by
> their definition), and there is no need to weed it out.

Assuming they follow all "the rules", yes?  For example, if they
forget to use .localize(), etc, it seems like anything could happen.
What if they use .replace()?:  .combine()?  Unpickle a datetime
representing a missing time?  Etc.  I don't see that pytz has anything
magical to check datetimes created by those.

> ...
> I think all functions that can create datetime instances will need the
> new optional flag and the flag should be tri-state, defaulting to not
> whine.

See the "PEP-495 - Strict Invalid Time Checking" thread for more.
There seems to be increasing "feature creep" here.  Rewriting vast
swaths of datetime internals to cater to this is at best impractical,
especially compared to supplying a "check this datetime" function
users who care can call when they care.  Nevertheless, it's a suitable
subject for a different PEP.  I don't want to bog 495 down with it.
If it had _stopped_ with asking for an optional check in the datetime
constructor, it may have been implemented already ;-)

> ...
> The important bit here for pytz is that tzinfo.fromutc() may return a
> datetime with a different tzinfo instance.

Sorry, didn't follow that.  Of course you can write your .fromutc() to
return anything you want.

> Also, to drop pytz' localize method I need something like
> 'tzinfo.normalize(dt)', where I have the opportunity to replace
> the tzinfo the user provided with the one with the correct
> offset/dst info.

If you're proposing a richer tzinfo interface, that's certainly out of
scope for PEP 495.  But I don't expect there's any possible way that
PEP 495 on its own can replace all of pytz's uses for `normalize()`
regardless.

>>> - My argument in favour of 'is_dst' over 'first' is that this is what
>>> we have in the data we are trying to load.  You commonly have
>>> a timestamp with a timezone abbreviation and/or offset. This can
>>> easily be converted to an is_dst flag.

>> You mean by using platform C library functions (albeit perhaps wrapped
>> by Python)?

I really missed an answer to that ;-)

>>> To convert it to a 'first' flag, we need to first parse the datetime,

>> I'm unclear on this.  To get a datetime _at all_ the timestamp has to
>> be converted to calendar notation (year, month, ...).  Which is what
>> I'm guessing "parse" means here.  That much has to be done in any
>> case.

> My example is weak. I'm thinking about parsing a string like:
>
> 2004-10-31 01:15 EST-05:00
>
> Even if you know this is US/Eastern and not Estonia, you still need to
> know that for dates in October EDT is first and EST is not first, and
> for dates in april EST is first and EDT is not first

In April all times have first=True (or fold=0 in the latest spelling).
first=False (fold=1) only occurs for the later times in a fold (during
the second of the repeated hours at the end of EDT).

> and you need to include a wide enough fuzz factor that future changes
> to the DST rules won't break your parser.

What does this have to do with datetime?  So far you haven't mentioned
any datetime - or pytz - operations.

> But I guess a general purpose parser that cares would construct
> instances 3 days before and a 3 days later and use whichever tzinfo
> had the correct offset. Or just use a fixed offset tzinfo.

Sorry, I'm still not grasping what "the problem" is here.  In pytz,
you would presumably create a datetime with an Olson-derived
US/Eastern timezone.  That would internally search for where
2004-10-31 06:15 (the UTC spelling of your example) lands in the list
of transitions, and deduce more-or-less directly that the original
time is the later of times in a fold.  If you're _not_ using datetime
or pytz at all, then you have no reason to _want_ to compute
first/fold to begin with, right?

> ...
> I despair at the bug reports, questions and general confusion that
> will occur if dst-aware tzinfo implementations are added to stdlib. At
> the moment, it is an obscure wart despite its age. It will become an
> in your face wart as soon as a tzlocal implementation is landed, and a
> wart people will be angry about because they won't realize it is there
> until their production system loses an hours worth of orders because
> their Python app spat out an hours worth of invalid timestamps right
> around Halloween sale time. But I'm drifting off into hyperbole.

But entertaining hyperbole, so it's appreciated :-)

After 495 is implemented, huge swaths of confusing docs can moved into
an appendix, covering all the rules and reasons for why ancient tzinfo
implementations didn't allow for correct conversions in all cases.
And that will make room for huge swaths of new confusing docs.

But there's every reason to be optimistic:  even someone as old and
in-the-way as me doesn't find any of this particularly confusing ;-)