[Python-Dev] Adding a tzidx cache to datetime

Tue May 7 15:42:56 EDT 2019

Greetings all,

I have one last feature request that I'd like added to datetime for
Python 3.8, and this one I think could use some more discussion, the
addition of a "time zone index cache" to the /datetime/ object. The
rationale is laid out in detail in bpo-35723
<https://bugs.python.org/issue35723>. The general problem is that
currently, /every/ invocation of utcoffset, tzname and dst needs to do
full, independent calculations of the time zone offsets, even for time
zones where the mapping is guaranteed to be stable because datetimes are
immutable. I have a proof of concept implementation: PR #11529
<https://github.com/python/cpython/pull/11529/>.

I'm envisioning that the `datetime` class will add a private `_tzidx`
single-byte member (it seems that this does not increase the size of the
datetime object, because it's just using an unused alignment byte).
`datetime` will also add a `tzidx()` method, which will return `_tzidx`
if it's been set and otherwise it will call `self.tzinfo.tzidx()`.  If
`self.tzinfo.tzidx()` returns a number between 0 and 254 (inclusive), it
sets `_tzidx` to this value. tzidx() then returns whatever
self.tzinfo.tzidx() returned.

The value of this is that as far as I can tell, nearly all non-trivial
tzinfo implementations construct a list of possible offsets, and
implement utcoffset(), tzname() and dst() by calculating an index into
that list and returning it. There are almost always less than 255
distinct offsets. By adding this cache /on the datetime/, we're using a
small amount of currently-unused memory to prevent unnecessary
calculations about a given datetime. The feature is entirely opt-in, and
has no downsides if it goes unused, and it makes it possible to write
tzinfo implementations that are both lazy and as fast as the "eager
calculation" mode that pytz uses (and that causes many problems for
pytz's users).

I have explored the idea of using an lru cache of some sort on the
tzinfo object itself, but there are two problems with this:

1. Calculating the hash of a datetime calls .utcoffset(), which means
that it is necessary to, at minimum, do a `replace` on the datetime (and
constructing a new datetime is a pretty considerable speed hit)

2. It will be a much bigger memory cost, since my current proposal uses
approximately zero additional memory (not sure if the alignment stuff is
platform-dependent or something, but it doesn't use additional memory on
my linux computer).

I realize this proposal is somewhat difficult to wrap your head around,
so if anyone would like to chat with me about it in person, I'll be at
PyCon sprints until Thursday morning.

Best,
Paul

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20190507/e54f2aad/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://mail.python.org/pipermail/python-dev/attachments/20190507/e54f2aad/attachment.sig>