[Datetime-SIG] PEP-431/495

Fri Aug 28 08:01:06 CEST 2015

[Stuart Bishop <stuart at stuartbishop.net>]
> ... [on timeline arithmetic] ...
> I'm wondering if it is worth formalizing this (post-PEP-495,or maybe
> some choice wording changes made in the docs). Would it work if we
> introduced a new type, datetimetz? We would have a time, with a tzinfo
> because it might be useful later, a naive time, with a tzinfo because
> it is useful for rendering and conversions, and a datetimetz with all
> the complexities and slowdowns of timeline arithmetic. While not
> changing the behaviour of datetime at all, we could get cats and dogs
> living together by just clarifying what it actually is.

There was a lot of discussion of this before you arrived here, and
even a PEP (500).

At least Guido, Alex and I agreed it would be better for the tzinfo
object to decide which kind of arithmetic to use.  For example, if
you're right that billions (nay, trillions!) of programmers will
eventually suffer irreparable emotional harm from learning how classic
arithmetic works, they'll want to convert their code immediately,
before their innocent children suffer clinical depression too.
Because datetimes are typically created all over the place, but
programs typically have only a few places where a tzinfo is obtained
from some factory functions, it should be much easier to just change
the latter call sites.  So, e.g., get one tzinfo that says "timeline
arithmetic!" in some way, and _all_ datetimes using it obey God's Way
To Do It.

The first question then is "how does a tzinfo spell that?".

PEP 500 proposed adding optional new magic methods to tzinfos, so they
could implement whatever damn fool arithmetic they liked.  datetime
internals would only change to see whether a tzinfo supplied
such-&-such a method, and delegate arithmetic to it if so.

1. For timeline arithmetic, a tzinfo subclass could supply methods for
the 3 kinds of arithmetic (datetime - datetime,, and datetime +/-
timedelta), with bodies akin to the simple one-liner I showed before
for datetime + timedelta.

2. People who wanted leap seconds (to account for real-world durations
between two civil times) could similarly supply _that_, via even
slower arithmetic.

3. And, e.g., people who wanted to view timedeltas as representing
durations in Mars seconds could convert to Earth seconds under the
covers.  That's Alex's primary use case.

So, quite general, and little impact on the core.  Guido rejected it ;-)

The other idea was building timeline arithmetic into the core datetime
implementation, and use it if and only if a tzinfo had a magic new
attribute, or inherited from a magic new marker class.  Not
generalizable beyond _just_ that case, heavier impact on the core, and
so far nobody has cared enough to write a PEP.

The second question is whether _anything_ should be done in this
direction.  I was +0.83 on PEP 500 at first, but -0.51 on anything
now.  Alex can move to Mars if he loves Mars time so much, while I
don't really want Python to enable poor practice in the #1 and #2
cases.  UTC is perfectly adequate for those who need timeline
arithmetic, and that was the _intent_ from the start (although I don't
recall the docs saying so) - and using UTC for this purpose is also
universally recognized as best practice.  If someone is determined to
be foolish, fine, let 'em use an explicit function.

> ...
> If our underlying platforms that we needed to work with supported it,
> I'd probably be in favour of leap seconds. I doubt that would ever
> happen - there are more palatable workarounds.

People who need it really need it - but they should be working in TAI.
In Python, if they work in UTC - or even in naive datetime - it's
quite possible to write leap-second-aware functions to do what they
want.  Intriguingly, TAI is nearly identical to Python's "naive time".
So stick that in your pipe and smoke it:  the people responsible for
building the most sophisticated clocks on Earth _live_ in naive time.
It's the most sophisticated notion of time yet known ;-)

OTOH, for people who don't need it, accounting for leap seconds would
be a mistake:  best I can tell, every programming language on the
planet with any kind of date-and-time support follows the
POSIX-approximation-to-UTC model now.  So if your arithmetic accounts
for leap seconds, it won't agree with anyone else's in the computer
world.

> ...
> I think in my view, as soon as you go to the bother of adding a tzinfo
> instance to the datetime you are making a statement about the expected
> behaviour; that the simpler classic arithmetic no longer applies and
> the more complex model needs to be used.

I had already guessed that ;-)  It's just a dozen years too late to
influence datetime's design.

>> ...
>> There you go:  "timeline" datetime + timedelta arithmetic about as
>> efficiently as possible in pure Python.

> ...
> What I don't like about this approach is the developers need to be
> aware that they need to call it,

Is that really worse than needing to call .normalize() after every
arithmetic operation, with - I bet - most not being really clear on
_why_ they need to?

> and that dt + timedelta(hours=24) may not work.

Adding functions for timeline arithmetic can't possibly change what
classic arithmetic does.  For me, adding timedelta(hours=24) always
does exactly what I intend it to do.  But, yes, people will forget the
distinction sometimes.

But easy solution:   do what they _should_ have done from the start:
work in UTC instead, and have no problems, surprises, missing magical
invocations, or confusions of any kind ever.

> Of course, developers will not be aware or have done more
> than skim the docs until after their guests have all died of
> salmonella poisoning from the undercooked Turkey.

Not a problem.  My turkey party occurs at the _end_ of DST.  "Same
time next day" would keep the turkey in the smoker for 25 hours, not
23.  No salmonella:  you're obviously determined to spread groundless
turkey FUD ;-)

>> ...
>> My hope was that 495 alone would at least spare pytz's users from
>> needing to do a `.normalize()` dance after `.astimezone()` anymore.
>> Although I'm not clear on why it's needed even now.

> Instead of one tzinfo instance, there are dozens for your timezone.
> The datetime implementation does not give pytz the opportunity to
> choose which one is used when constructing the datetime, so localize
> is needed to sort that. Similarly, arithmetic does not always give
> pytz the opportunity to choose which one is used after crossing a
> timezone boundary, so normalize is needed to sort that out. While the
> results of the timeline arithmetic are unambiguous and obvious, they
> are arguably incorrect until normalize puts things right.

This is .astimezone(), though - no constructor and no (visible)
arithmetic here.  It's returning something via fromutc(), and I
presume pytz has its own .fromutc() implementation.

> ...
> I think I'm after hooks to replace localize on construction and
> normalize after arithmetic, so users don't have to be relied on to do
> this explicitly. This doesn't need to happen now, and I fully
> understand this could be considered fast path and the overhead
> unacceptable.

If you're determined to supply by-magic timeline arithmetic, then I
strongly suggest looking at the ideas at the top of this message, and
push for a _real_ change to Python.  That is, instead of pushing for
hooks wholly specific to pytz, push for a change that will allow
anyone to implement timeline arithmetic in a straightforward way,
using non-magical "hybrid" tzinfo classes.  But that's not my itch,
and - indeed - I'd prefer Python left well enough alone after 495
allows repairing the fundamental problem with conversions.

> ...
> I think all the data we have access to, including from platform C
> library functions, uses the is_dst flag or is simpler to map to the
> is_dst flag.

I need a complete use case, start to finish, to make sense of what
you're talking about here.  In particular, you never mention any
datetime or pytz operations when talking about is_dst.  So I still
have no idea why it's being discussed at all.

> The C library as exposed by the time.struct_time gives you is_dst.

See other msgs today.  mktime() is unreliable.  Even if it was
reliable, what of it?  Why do you _want_ is_dst?  There's no use case
here that consumes it.

> Mapping that to first/fold means first doing doing two conversions and
> determining which one comes first.

Ditto.  I have no idea what use case you have in mind that would
_require_ mapping is_dst to fold.  Inside pytz, you have an exhaustive
list of all transitions, thanks to zoneinfo.  pytz internals don't
need any flaky C library functions to determine anything about
transitions.

> Similarly, when loading your JSON file or examining email headers you
> need to load in a string like '2004-04-04 02:30:00 EDT-05:00'. Its
> simple to use a lookup table to map the abbreviation + offset to an
> is_dst flag.

As above.

> Its harder to map it to first/fold because they are
> swapped around in April and October. And there can be more than two
> transitions in a year, so if you need to support that your going to
> need to do the lookup, construct a couple of instances, and compare to
> work out if EDT or EST comes first that month in that year.

Inside pytz you already know everything that can be known about
transitions.  You don't "poke and hope" to do that, you do a binary
search, right?  You find the zoneinfo record for the time of interest,
and compare that to the transitions on either side to deduce whether
there's a fold or gap in play.  Although I bet this could be sped up
by doing some precomputation when loading a tzfile to begin with.

> But, really, I hate all the options for the flag name. I lean towards
> is_dst mainly because people are used to it.

I'm burned out on name bikeshedding - but `is_dst` makes no sense
unless the flag is at least pretending to say something about whether
DST is in effect.  That's not enough.  For example, the zoneinfo
source notes that there's a place in Antarctica that has two different
kinds of DST each year.  It's so bizarre that zic (the zoneinfo
compiler) had to be changed to handle it, and they've left the rules
commented out until the new zic is more widely adopted.  When they
uncomment the rules, is_dst will tell you nothing about _which_ kind
of DST is in effect (the offset+1 flavor, or the offset+2 flavor)..
"fold" makes perfectly clear sense for transitions due to any cause
whatsoever.  The only advantage to is_dst is that it's so poorly
defined for edge cases that no two mktime() implementations can be
expected to agree :-(

>> But there's every reason to be optimistic:  even someone as old and
>> in-the-way as me doesn't find any of this particularly confusing ;-)

> I may be old, but at least I'm not as old as Tim ;)

Ain't that the truth :-(