[Datetime-SIG] PEP-431/495

Mon Aug 24 15:56:11 CEST 2015

On 22 August 2015 at 03:49, Tim Peters <tim.peters at gmail.com> wrote:

>> - I want a boolean added to datetime instances, even if I don't like
>> the name, because I can then deprecate pytz and its confusing API and
>> implementation. I'm happy to work on Python implementation and
>> documentation. It will save me time and effort in the long run.
>
> Later you seem to say you'd prefer a 3-state flag instead, so not sure
> you really mean "boolean" here.

I write Python and SQL for a living. Booleans are 3 state to me ;)

In this case, I'm not fussed if the datetime instance has a 2 state or
3 state flag. This is different to the various constructors which I
think need a 3 state flag in their arguments. True, False, None.

>> - Most of my thoughts got encoded in PEP-431. This would give us a
>> datetime module that operates exactly the way it does today,
>
> No.  While 431 was highly obscure on this point, it turned out that
> Lennart was determined to change arithmetic behavior.  That can't fly,
> for backward compatibility, and because even "aware" datetimes were
> intended to use a "naive time" model internally.
>
> Specifically, if you add timedelta(days=1) to a datetime today, you
> get "same time tomorrow" (day goes up by 1, but hour, minute, second
> and microsecond remain the same) in all cases. Even if a DST
> transition (or base-offset change, or leap-second change) occurred.
> That's now called "classic" arithmetic.  The default behavior can't be
> changed.
>
> What you seem to have in mind (accounting for two of the three known
> reasons for why a local clock may jump:  DST and base-offset changes,
> but not leap second changes) is now called "timeline" (sometimes
> "strict") arithmetic.

Grump. I always interpreted that documentation to mean that timezone
conversions where *my* problem as the author of the tzinfo
implementation. I thought it was a documented problem to be fixed
if/when Python ever provided more complex tzinfo implementations, and
one of the reasons it never did provide such implementations in the
first place.

Classic behaviour as you describe it is a bug. It sounds ok when you
state it as 'add one day to today and you get the same time tomorrow'.
It does not sound ok when you state it as 'add one second to now and
you will normally get now + 1 second, but sometimes you will get an
instant further in the future, and sometimes you will get an instant
in the past'.

I dispute that this is default behaviour that can't be changed. The
different arithmetic only matters when you have a dst-aware aware
datetime in play, and Python has never provided any apart perhaps from
your original reference implementation (which stopped working in
2006). pytz, however, has always provided timeline arithmetic. I
believe this is the most widely deployed way of obtaining dst-aware
datetime instances, and this is the most widely expected behaviour. If
you use pytz tzinfo instances, adding 1 second always adds one second
and adding 1 day always adds 24 hours. While calendaring style
arithmetic is useful and a valid use case, it is useless if the only
relative type is the day. You also need months and years and periodic
things like 'first sunday every month'. This is too complex to inflict
its API on people by default. But pulling in dateutils relative time
helpers could be nice.

Do systems that rely on classic behavior actually exist? It requires
someone to have explicitly chosen to use daylight savings capable
timezones, without using pytz, while at the same time relying on
classic's surprising arithmetic. Maybe systems using dateutils without
using dateutils' implementation of datetime arithmetic. I believe that
there are many more systems out there that are broken by this
behaviour than are relying on this behaviour.

I think this is a bug worth fixing rather than entrenching, before
adding any dst aware tzinfo implementations to stdlib (including
'local').

> According to Lennart, under PEP 431 timeline arithmetic would always
> be used.  Under PEP 495, nothing about arithmetic changes.  495 is
> less ambitious, only intending to supply the bit(s) needed to _allow_
> timeline arithmetic to be implemented as an option later.  PEP 500 is
> about supplying different arithmetics, but Guido hates PEP 500.

Ok.

However... this also means the new flag on the datetime instances is
largely irrelevant to pytz. pytz' API will need to remain the same.
Adding a timedelta to a datetime will give you a datetime in exactly
the same offset() and dst() as you started with (because pytz gives
you timeline arithmetic, where adding 24 hours actually adds 24
hours), and you will need to fix it using the normalize method after
the fact. The is_dst bit is effectively stored on the tzinfo instance
currently in play, and having another copy on the datetime instance
unnecessary.

The new argument to the datetime constructors may be useful, if it
accepts tri-state. If the is_dst/first flag accepts True, False or
None, then pytz may be able to deprecate the localize method. If a
user calls localize(is_dst=None), AmbiguousTImeError and
NonExistantTimeError exceptions may be raised, but by default
exceptions are not raised. I would also need the opportunity to swap
in the correct fixed offset tzinfo instance for the given datetime.
(example below)

Losing the localize method will be a huge win for pytz, as it is ugly
and causes great confusion and many identical bug reports. The other
problem, the normalize method, is less important - if you neglect to
call normalize you still get the correct instant, but it may be
reported in the incorrect timezone period (EST instead of EDT or vice
versa).

> In the end, I expect timezone wrappers will supply factory functions,
> either separate functions for "give me such-and-such a timezone using
> classic arithmetic" and "give me such-and-such a timezone using
> timeline arithmetic", or a single function specifying the desired
> timezone and an optional flag to specify the arithmetic desired.

pytz users need to be able to construct datetimes that get silently
normalized if they are ambiguous or non-existant. Some pytz users need
to have exceptions raised if they attempt to construct datetimes that
are ambiguous or non-existant. This is what I consider strict vs
loose. Ideally:

>>> str(datetime(2004, 4, 4, 2, 0, 0, tzinfo=eastern))
'2004-04-04 03:00:00 -04:00'
>>> str(datetime(2004, 4, 4, 2, 0, 0, tzinfo=eastern, first=None)
Traceback:
...
pytz.NonExistantTimeError()

I also need to continue to support timeline arithmetic. This requires
me not having a single tzinfo instance, but swapping in the correct
fixed offset tzinfo instance at the right time. Currently, this uses
the awful localize and normalize methods. Ideally, postPEP:

>>> eastern = pytz.timezone('US/Eastern')
>>> dt = datetime(2004, 4, 3, 2, 0, 0, tzinfo=eastern)
>>> dt2 = dt + timedelta(days=1)
>>> eastern is dt.tzinfo
False
>>> dt.tzinfo is dt2.tzinfo
False
>>> str(dt)
'2004-04-03 02:00:00-05:00'
>>> str(dt2)
'2004-04-03 03:00:00-04:00'

If I can do this, there is no reason that pytz could not also support
'classic' style, but I certainly wouldn't want to encourage its use as
my rant above might indicate ;) If I write documentation, it may
require some editing, localizing from en_AU to something a little more
polite.

> It's possible that 495 should do more in this direction.  For now, it
> specifies enough that someone who cares can easily write a function to
> distinguish among "ambiguous time (in a fold)", "invalid time" (in a
> gap), and "happy time" ;-) , and do whatever _they_ want (ignore some
> subset, raise an exception, print a warning, supply a default, prompt
> the user for more info, ...).

As long as this doesn't break pytz, as it sounds like pytz will still be needed.

For pytz users, being able to write a function do tell if the data you
were given is broken is a step backwards. When constructing a datetime
instance with pytz, users have the choice of raising exceptions or
having pytz normalize the input. They are never given broken data (by
their definition), and there is no need to weed it out.

> As above, it's possible 495 should do more.  But it's hard to know
> when to stop.  For example, there are many ways of specifying a
> datetime, including. e.g., using .combine() to paste a date and time
> together.  It's generally impossible to make a fold/gap determination
> on a time alone - that's only possible in combination with a date.  So
> does .combine() also need to whine?  It's simpler overall to leave it
> to those users who care to check when they do care.

I think all functions that can create datetime instances will need the
new optional flag and the flag should be tri-state, defaulting to not
whine.

> 495 couldn't care less what causes folds and gaps - it's equally
> applicable to all causes, and whether in isolation or combination.
> What it _does_ assume is that a single bit suffices to resolve
> ambiguities:  that there is no case in which more than two UTC times
> have the same spelling on a local clock.  The goal of the PEP is to
> supply that bit.  The burden is on the tzinfo supplier to set and use
> it correctly.  The burden is also on the tzinfo supplier to supply a
> .utcoffset() "that works" to convert a local time to UTC, to supply a
> .dst() that returns whatever the tzinfo supplier thinks it should
> return, and to supply a .fromutc() that sets the bit correctly.

The important bit here for pytz is that tzinfo.fromutc() may return a
datetime with a different tzinfo instance. Also, to drop pytz'
localize method I need something like 'tzinfo.normalize(dt)', where I
have the opportunity to replace the tzinfo the user provided with the
one with the correct offset/dst info.

>> - My argument in favour of 'is_dst' over 'first' is that this is what
>> we have in the data we are trying to load.  You commonly have
> .> a timestamp with a timezone abbreviation and/or offset. This can
>> easily be converted to an is_dst flag.
>
> You mean by using platform C library functions (albeit perhaps wrapped
> by Python)?
>
>> To convert it to a 'first' flag, we need to first parse the datetime,
>
> I'm unclear on this.  To get a datetime _at all_ the timestamp has to
> be converted to calendar notation (year, month, ...).  Which is what
> I'm guessing "parse" means here.  That much has to be done in any
> case.

My example is weak. I'm thinking about parsing a string like:

2004-10-31 01:15 EST-05:00

Even if you know this is US/Eastern and not Estonia, you still need to
know that for dates in October EDT is first and EST is not first, and
for dates in april EST is first and EDT is not first, and you need to
include a wide enough fuzz factor that future changes to the DST rules
won't break your parser.

But I guess a general purpose parser that cares would construct
instances 3 days before and a 3 days later and use whichever tzinfo
had the correct offset. Or just use a fixed offset tzinfo.

>> - I think datetime should consider 1 day == 24 hours and not have
>> concepts like years or months, just like it does today. As others
>> suggested, a separate module dealing with leap years and variable
>> length days may be useful to some people, as would leapsecond support
>> for astronomers and astrologers. But if the default implementation
>> gives different results to all the other tools on your system, people
>> will think the default is wrong.
>
> Not sure what you mean here without specific examples of what you have
> in mind.  But, as above, classic arithmetic will remain the default
> regardless - it's a dozen years too late to change that, even if
> everyone wanted to (and - surprise - everyone doesn't ;-) ).

I despair at the bug reports, questions and general confusion that
will occur if dst-aware tzinfo implementations are added to stdlib. At
the moment, it is an obscure wart despite its age. It will become an
in your face wart as soon as a tzlocal implementation is landed, and a
wart people will be angry about because they won't realize it is there
until their production system loses an hours worth of orders because
their Python app spat out an hours worth of invalid timestamps right
around Halloween sale time. But I'm drifting off into hyperbole.

For amusement, here is how you can add an hour and end up exactly
where you started. Careful you do your conversions at the right time,
or the dst transition might eat your data (this example performed by a
professional stuntman and should not be attempted at home):

>>> from pytz.reference import Eastern
>>> dt = datetime(2004, 4, 4, 1, 0, 0, tzinfo=Eastern)
>>> str(dt.astimezone(timezone.utc))
'2004-04-04 06:00:00+00:00'
>>> str((dt + timedelta(hours=1)).astimezone(timezone.utc))
'2004-04-04 06:00:00+00:00'

-- 
Stuart Bishop <stuart at stuartbishop.net>
http://www.stuartbishop.net/