[Datetime-SIG] Local time disambiguation proposal

Tim Peters tim.peters at gmail.com
Wed Aug 5 05:43:20 CEST 2015


[ijs]
> Not so long ago I think I finally got a point that Tim has
> been dancing around throughout this whole discussion,
> without quite saying it outright. (Another possibility
> would be Reading Comprehension Fail on my part...)

I have to suggest another:  that datetime is so far from what you
would have designed that you just can't believe Guido didn't have your
design in mind all along and was just too harried to implement it
properly.  See?  I don't dance at all ;-)

For example, I'm pretty sure you would have designed datetime to store
a duration from an epoch.  But that was never on the table.  It was an
explicit requirement to maintain year, month, day, hour ... attributes
separately, in both the storage ("pickle") and in-memory formats.
That's because conversions between that and a duration is expensive,
and many use cases required quick access to the attributes.  For
example, business-oriented web apps typically do little (if any) time
arithmetic, but are forever reading up datetimes from a database and
needing to display them in human-readable form (show the attribute
values).  Sometimes with a time zone indicator attached.  And
sometimes converted to the viewer's timezone.

> Classic arithmetic and `replace()` are low level operations
> on datetimes,

I can speak about .replace() definitively since I "invented" it:
it'is just intended to be shorthand for calling a constructor in cases
where "most of" the fields retain the same values.  It was born of
necessity, because I found an early datetime prototype unbearably
tedious to use just for writing unit tests.  Because datetime objects
are immutable, "changing a single attribute" is horridly verbose
without .replace().

> whereas duration

The only intended support for durations was in "naive time".  As Guido
said earlier, he expected that people who needed more than that would
continue using timestamps.  It's not the first time his expectations
were dashed.

> and period arithmetic

In naive time, period arithmetic with timedelta works fine for units
<= weeks.  There _were_ use cases for many more kinds of period
arithmetic, although they were still all in naive time.  But
specifying and implementing all of that too is a major project of its
own, and we ran out of time.  For example, look at all the text it
takes to explain RRULE in the iCalendar spec:

    http://www.ietf.org/rfc/rfc2445.txt

Things like "the first Tuesday after a Monday in November, every 4
years" (which describes US presidential election dates) are just the
start.  Speaking of which, I think it would be insane to try to
express such complexities by overloading binary arithmetic operators.
Unless someone used that every day, they'd soon forget what all the
magic meant:  "write-only" code.

So we punted on that, hoping someone else would take up the challenge.
And, e.g., I believe "dateutil" does implement the whole RRULE spec.

> and time zone conversion are high level ones.

They're a world of pain unto themselves ;-)  And another case where
there wan't enough time to do a full-blown job, so only an abstract
tzinfo base class was released at first.

> The original design of datetime was to expose all of the low level
> details

It was primarily to implement "naive time" because that alone sufficed
to meet the vast majority of the requirements.

> so that people could implement their own high level stuff, because
> we're programmers and that's what we do, ya know?

We certainly did hope people could build on it, but this wasn't so
much driven by philosophy as by that our employer was getting visibly
(& understandably!) annoyed with continuing to pay for datetime
development after it met every major use case identified in the
requirements phase.  We had to cut it off.

> All of this is to say that I agree with Tim that `first=True` is not a
> good default for `replace()`. While that method verifies that all of its
> numerical arguments are within their valid ranges, it doesn't verify
|> that the resulting time exists in its time zone, so for consistency I
> would expect that it simply keep the value of the `first` flag without
> validation or modification if that argument is not provided.

I believe Alexander already agreed with this.

> But this raises the question of how an unambiguous datetime with
> first=False should be handled by other code. My preference is that
> all high level operations should treat such datetimes as having
> first=True.

I agree this needs to be cleared up.  The _actual_ "rules" now appear to be:

1. first==False means this is the later of two ambiguous times.
2. first==True means anything else (it's the earlier of two ambiguous
times, or it's not an ambiguous time at all).
3. And, by the way, #1 was fibbing:  first=False may also mean it's
not an ambiguous time.

However, so far #3 may only be in internal uses (like the .fromutc()
implementation fiddling the flag to trick .utcoffset() into telling it
something useful).

I don't think it needs to be decided just yet.  As things get
implemented, the delights and drawbacks will get clearer.

> Also, given this divide, it would be good to document in the datetime
> module which methods are high level and which are low.

Well, since that distinction doesn't exist in my head, you'll have to
write the doc patch ;-)

...
>> Are all known DST adjustments, and changes to standard UTC offsets --
>> in, say, zoneinfo -- exactly one hour?  For example, I read this on
>> the web, so it must be true:  "Lord Howe Island (Australia) advances
>> its clocks by half an hour in the summer" ;-)  Since I expect we
>> expect to support all the goofy timezones in zoneinfo, best to get
>> that right from the start.

> Also relevant is the fact that this rationale fails in such cases:
>
> """
> We chose the minute byte to store the the "first" bit because this choice preserves the natural ordering.
> """

Sorry, I didn't grasp your meaning there.  But I didn't grasp
Alexander's intent either ;-)  From "later ambiguous time" alone we
have no idea _how_ much later, so the bit-fiddling comment there
didn't make sense to me.  Perhaps that was what you were getting at?

The C-level datetime comparison code is much lower-level than in
datetime.py, and the former actually compares raw bytestrings:

    if (GET_DT_TZINFO(self) == GET_DT_TZINFO(other)) {
        diff = memcmp(((PyDateTime_DateTime *)self)->data,
                      ((PyDateTime_DateTime *)other)->data,
                      _PyDateTime_DATETIME_DATASIZE);

Stuffing a flag into any of the existing bytes is bound to break some
case there.  I'd add another byte to the pickle, but then I'm no
longer paid to obsess over bytes ;-)

> ...
> I agree with dropping the part about comparisons, since that will
> no longer be true for datetimes with tzstrict time zones if that part
> of the discussion comes to fruition.

Yup!  It's a big step in the right direction if we can just get
ambiguous local->utc conversions working correctly first.


More information about the Datetime-SIG mailing list