hashability of tzinfo objects
This is tied in with the equality semantics of these objects, has come up here, in the dateutil tracker, and in pandas: https://mail.python.org/archives/list/datetime-sig@python.org/thread/45P3EXY... / <https://mail.python.org/archives/list/datetime-sig@python.org/thread/45P3EXY3OJM56MJJH57VJ7NZEBXG7HG4/> https://github.com/dateutil/dateutil/issues/835 https://github.com/dateutil/dateutil/issues/792 https://github.com/pandas-dev/pandas/pull/24006#discussion_r238483612 My understanding is that hashing cannot be implemented until/unless equality is changed. Is that accurate? Is there a compelling reason for these _not_ to be hashable?
"tzinfo" objects can be hashable or not depending on the implementation, since "tzinfo" is an abstract base class. In fact, `datetime.timezone.utc` and `datetime.timezone` objects are *already* hashable. There are several reasons why `dateutil` does has not made its tzinfo objects hashable, and they depend on the particular time zone class: 1. `dateutil.tz.tzoffset` is the one that would be easiest to make hashable, /but/ the hashing semantics may not be what you want them to be, because tz.tzoffset("CST", timedelta(hours=-4)) == tz.tzoffset("EDT", timedelta(hours=-4)) returns True. As such those two objects must have the same hash. tz.UTC is a special case of `tz.tzoffset` 2. All the data required to determine if two `dateutil.tz.tzlocal` objects are "the same" is not even available to the Python layer, since `tzlocal` dynamically queries the system time functions whenever you try to resolve an offset with it. This leads to a host of subtle breakages that it may not be possible to fix. It also means `tzlocal` objects are kinda-sorta mutable in some ways. 3. `dateutil.tz.tzfile` and to a lesser extent `dateutil.tz.tzical` both can have a fairly large amount of information backing them. A hash operation that actually uses all this information might be expensive, and one that doesn't use all the information might be wrong. Other than `tz.tzlocal`, these are all fairly surmountable barriers, but honestly I have never seen a particularly /good/ example of a reason to hash tzinfo objects, so I have never felt that it should be a particularly high priority for dateutil. Most of this is basically off-topic for this list, though. On 4/15/19 7:51 PM, Brock Mendel wrote:
This is tied in with the equality semantics of these objects, has come up here, in the dateutil tracker, and in pandas:
https://mail.python.org/archives/list/datetime-sig@python.org/thread/45P3EXY... <https://mail.python.org/archives/list/datetime-sig@python.org/thread/45P3EXY3OJM56MJJH57VJ7NZEBXG7HG4/> / <https://mail.python.org/archives/list/datetime-sig@python.org/thread/45P3EXY3OJM56MJJH57VJ7NZEBXG7HG4/>https://github.com/dateutil/dateutil/issues/835 https://github.com/dateutil/dateutil/issues/792 https://github.com/pandas-dev/pandas/pull/24006#discussion_r238483612
My understanding is that hashing cannot be implemented until/unless equality is changed. Is that accurate? Is there a compelling reason for these _not_ to be hashable?
_______________________________________________ Datetime-SIG mailing list -- datetime-sig@python.org To unsubscribe send an email to datetime-sig-leave@python.org https://mail.python.org/mailman3/lists/datetime-sig.python.org/ The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/
In a sibling response, I responded as to why hashing tzinfo objects may be a bad idea in dateutil, but one thing that I think is on topic for this list that was brought up in this comment on the dateutil tracker <https://github.com/dateutil/dateutil/issues/792#issuecomment-409701152>, which is that as it stands now, datetime.timezone object equality (and hashing) both ignore the offset name, which means:
timezone(timedelta(hours=-4), "EDT") == timezone(timedelta(hours=-4), "CST") True
It also means any 0-offset zone is equal to timezone.utc. I find at least the first behavior somewhat surprising, as I consider the offset to be only one facet of the tzinfo, even for fixed-offset zones. For the (possibly dubious) use case mentioned in dateutil issue #792 linked in the original post of storing `tzoffset` objects in a set, I think this will be extra surprising, since it will collapse all time zones with the same offset into one entry. I think it might be asking a lot to actually make a possibly backwards-incompatible change to fix this if this is not a desirable change, but for those of us designing our own time zone offsets, I am curious to know what the reasoning for this was, and if it is still seen as desirable. Best, Paul On 4/15/19 7:51 PM, Brock Mendel wrote:
This is tied in with the equality semantics of these objects, has come up here, in the dateutil tracker, and in pandas:
https://mail.python.org/archives/list/datetime-sig@python.org/thread/45P3EXY... <https://mail.python.org/archives/list/datetime-sig@python.org/thread/45P3EXY3OJM56MJJH57VJ7NZEBXG7HG4/> / <https://mail.python.org/archives/list/datetime-sig@python.org/thread/45P3EXY3OJM56MJJH57VJ7NZEBXG7HG4/>https://github.com/dateutil/dateutil/issues/835 https://github.com/dateutil/dateutil/issues/792 https://github.com/pandas-dev/pandas/pull/24006#discussion_r238483612
My understanding is that hashing cannot be implemented until/unless equality is changed. Is that accurate? Is there a compelling reason for these _not_ to be hashable?
_______________________________________________ Datetime-SIG mailing list -- datetime-sig@python.org To unsubscribe send an email to datetime-sig-leave@python.org https://mail.python.org/mailman3/lists/datetime-sig.python.org/ The PSF Code of Conduct applies to this mailing list: https://www.python.org/psf/codeofconduct/
participants (2)
-
Brock Mendel
-
Paul Ganssle