[Datetime-SIG] Another round on error-checking

Mon Aug 31 19:58:12 CEST 2015

I've been playing with what it would take to wrap zoneinfo efficiently
in a post-495 world.  When I got to .utcoffset(), I just cringed when
trying to implement the "in the face of ambiguity and/or
impossibility, make stuff up ;-) ", parts.

The pytz folks have been enthusiastic about pytz's approach.  Alas,
it's a poor fit to datetime's design, because pytz strives to make it
appear that "naive time" doesn't exist at all for datetimes with a
tzinfo.  But in the design, they do.  Regardless of whether a tzinfo
is present, a datetime is intended to be viewed as working in naive
time.  "Missing" and "ambiguous" times plain don't exist in naive
time, so it's unnatural to check for them all over the place.

It's when a timezone-*specific* operation is attempted that the user
is explicitly moving out of naive time (not merely when a tzinfo is
attached).  So, in my view, *that's* when to check.  .utcoffset() is
the primary such place (whether called directly or indirectly).  At
that point, two kinds of "meaningless" times pop into existence:

1. fold != 0 when the datetime isn't actually in a fold.
2. The datetime is in a gap.

There is no UTC time that maps back to such cases, so there is no
possible timedelta .utcoffset() can return that's wholly justifiable.

PEP 495 specifies resolving such cases by magic, in essentially
arbitrary (from the user's point of view) ways.  This isn't for
backward compatibility, because 495-compliant tzinfos don't currently
exist(*).  It's more that 495 gives users no other way to determine
whether a datetime _is_ "a problem case" other than by calling
.utcoffset() twice with different values for `fold`, and then making
.utcoffset() return carefully chosen (but arbitrary from the user's
POV) problem-case results sufficient to classify the datetime from the
two .utcoffset() results.

I think I'd rather acknowledge that problem cases exist in a direct
and straightforward way, by adding a new tzinfo (say).classify()
method.  For example, .classify() could return a

    (kind, detail)

2-tuple.

- kind==DTKIND_NORMAL.
  Not an exceptional case.
  detail is None.

- kind==DTKIND_FOLD_NORMAL.
  The datetime is in a fold, and its `fold` value is sane.
  detail is the datetime's `fold` value (0 or 1).

- kind==DTKIND_FOLD_INVALID.
  The datetime does not have `fold==0`, but the datetime is not in a fold.
  detail is the datetime's `fold` value (whatever it may be).

- kind==DTKIND_GAP.
  The datetime is in a gap.
  detail is a (d1, d2) 2-tuple, where `d1` and `d2` are
  timedeltas such that (in classic arithmetic):
      datetime - d1 is the closest earlier non-gap time
      datetime + d2 is the closest later non-gap time

Users can call that directly when they like.

.utcoffset() (and other appropriate timezone-specific methods) would
raise exceptions in the DTKIND_FOLD_INVALID and DTKIND_GAP cases, with
the same exception detail as `classify()` returns.

This would, of course, require major rewriting of the PEP.  So Alex
will hate it ;-)  But, leaving aside how much design pain it would
cause, is it "the right" (or "a righter") thing to do?  That's what
I'm more concerned about.  In any case, since this is _a_ view of
error checking that hasn't been mentioned at all before, it's worth
putting it out in public.

BTW, I don't expect pytz to like it.  In Python's datetime design,
timeline arithmetic should be done in UTC (or via timestamps) instead.
The scheme above intends to catch errors _when_ converting to UTC,
leaving naive time alone until (if ever) the user does explicitly
invoke a timezone operation.

(*) WRT backward compatibility, there are other non-obvious cases
after 495 tzinfos do exist.  LIke datetime.__hash__() calling
.utcoffset().  It would be desirable that people living in naive time
(despite attaching tzinfos) not need to worry about exceptions in
cases like that when using a 495 tzinfo.  In the kind of scheme above,
one way around that is changing __hash__ (which could resolve problem
cases in any way that works best for _its_ purposes).  Another way is
adding optional `check` Boolean arguments to various methods,
defaulting to False, in which case the current 495 "make stuff up"
results would be returned.  But I'm trying to take a higher-level view
of "what's right" in _this_ msg ;-)