[Datetime-SIG] Implementing tzinfo for all valid datetimes (was Re: PEP-431/495)

Tim Peters tim.peters at gmail.com
Mon Aug 24 04:49:14 CEST 2015


[Tim]
> Let me be clearer about this.  I appreciate that Olson-general
> timezones are a PITA to implement both compactly and efficiently.
> ...

Looks like I need to elaborate on that.  It could well be that I'm
using pytz incorrectly, but best I can tell it only handles a
relatively small range of Python datetimes.  Here using:

    >>> pytz.__version__
    '2015.4'

under Python 3.4.3, on Windows 10 Pro.

    from pytz import timezone
    from datetime import datetime

    def tostr(dt):
        return dt.strftime("%Y-%m-%d %H:%M:%S %Z%z")

    uz = timezone("utc")
    ez = timezone("US/Eastern")
    u = uz.localize(datetime(2015, 8, 23, 20))

I don't much care what time I'm starting with - it happens to be
"today" as I type, but the only thing of interest is that it's firmly
in US/Eastern daylight time (nowhere near any transitions).

So let's check:

    for i in range(30):
        u2 = u.replace(year=u.year + i)
        e = ez.normalize(u2.astimezone(ez))
        print(tostr(u2), "is", tostr(e))

giving:

2015-08-23 20:00:00 UTC+0000 is 2015-08-23 16:00:00 EDT-0400
2016-08-23 20:00:00 UTC+0000 is 2016-08-23 16:00:00 EDT-0400
2017-08-23 20:00:00 UTC+0000 is 2017-08-23 16:00:00 EDT-0400
2018-08-23 20:00:00 UTC+0000 is 2018-08-23 16:00:00 EDT-0400
2019-08-23 20:00:00 UTC+0000 is 2019-08-23 16:00:00 EDT-0400
2020-08-23 20:00:00 UTC+0000 is 2020-08-23 16:00:00 EDT-0400
2021-08-23 20:00:00 UTC+0000 is 2021-08-23 16:00:00 EDT-0400
2022-08-23 20:00:00 UTC+0000 is 2022-08-23 16:00:00 EDT-0400
2023-08-23 20:00:00 UTC+0000 is 2023-08-23 16:00:00 EDT-0400
2024-08-23 20:00:00 UTC+0000 is 2024-08-23 16:00:00 EDT-0400
2025-08-23 20:00:00 UTC+0000 is 2025-08-23 16:00:00 EDT-0400
2026-08-23 20:00:00 UTC+0000 is 2026-08-23 16:00:00 EDT-0400
2027-08-23 20:00:00 UTC+0000 is 2027-08-23 16:00:00 EDT-0400
2028-08-23 20:00:00 UTC+0000 is 2028-08-23 16:00:00 EDT-0400
2029-08-23 20:00:00 UTC+0000 is 2029-08-23 16:00:00 EDT-0400
2030-08-23 20:00:00 UTC+0000 is 2030-08-23 16:00:00 EDT-0400
2031-08-23 20:00:00 UTC+0000 is 2031-08-23 16:00:00 EDT-0400
2032-08-23 20:00:00 UTC+0000 is 2032-08-23 16:00:00 EDT-0400
2033-08-23 20:00:00 UTC+0000 is 2033-08-23 16:00:00 EDT-0400
2034-08-23 20:00:00 UTC+0000 is 2034-08-23 16:00:00 EDT-0400
2035-08-23 20:00:00 UTC+0000 is 2035-08-23 16:00:00 EDT-0400
2036-08-23 20:00:00 UTC+0000 is 2036-08-23 16:00:00 EDT-0400
2037-08-23 20:00:00 UTC+0000 is 2037-08-23 16:00:00 EDT-0400
2038-08-23 20:00:00 UTC+0000 is 2038-08-23 15:00:00 EST-0500
2039-08-23 20:00:00 UTC+0000 is 2039-08-23 15:00:00 EST-0500
2040-08-23 20:00:00 UTC+0000 is 2040-08-23 15:00:00 EST-0500
2041-08-23 20:00:00 UTC+0000 is 2041-08-23 15:00:00 EST-0500
2042-08-23 20:00:00 UTC+0000 is 2042-08-23 15:00:00 EST-0500
2043-08-23 20:00:00 UTC+0000 is 2043-08-23 15:00:00 EST-0500
2044-08-23 20:00:00 UTC+0000 is 2044-08-23 15:00:00 EST-0500

Oops!  Somewhere around 2037-2038 it apparently lost all knowledge of
US/Eastern daylight time.  I expect this is why:

    >>> ez._utc_transition_times[-1]
    datetime.datetime(2037, 11, 1, 6, 0)

That is, the last transition it knows about is the end of daylight time in 2037.

In general, as I understand it, Olson-derived tzfiles reduce most
calculation to uniform binary search across a precomputed, exhaustive,
sorted list of transition instants, at the expense of needing to store
that exhaustive list.  It buys some speed and much client-code
simplicity at the cost of client-side data space.

    >>> len(ez._utc_transition_times)
    237

But at least this version of tzfile doesn't store all that many.  It
would require over 15,000 entries to extend this way of doing it
through year 9999 (where Python's datetime ends).

Does that really matter?  I don't know.  None of this matters much to
me ;-)  But a scheme that's striving to be anally correct about
precise seconds for years a century ago in places nobody ever heard of
;-) should really try to make a reasonable guess about years just a
few decades from now.

Digging deeper, I don't think I can pin this on tzfile.  The docs say
that, if possible, a tzfile also contains a POSIX-TZ-style rule to be
used for times beyond the last explicit transition instant.  In the
US/Eastern tzfile shipped with this version of pytz, that's:

    EST5EDT,M3.2.0,M11.1.0

So a "complete" wrapping of zoneinfo also requires implementing such
rules when present.  And then we're back where I started:  the puzzle
of how to do so both efficiently and compactly.  It won't be that long
before explicit transition lists ending in 2037 will be useless for
almost all real-life purposes :-(

There's one trick that could be used as a compromise:  things like
"second Sunday in March" give exactly the same result (like "the
second Sunday in March is March 9") for years 400 apart (every
400-year span starting at date D in the proleptic Gregorian calendar
looks exactly the same as the 400-year span starting at date D+400*i,
for all integer i, and where "+" is interpreted as "add to the year
component").  So an exhaustive list covering a (any) 400-year span
suffices to do those kinds of calculations for all years (add or
subtract multiples of 400 to/from the year until hitting a year in the
canonical 400-year span).


More information about the Datetime-SIG mailing list