[Datetime-SIG] Another round on error-checking

Tim Peters tim.peters at gmail.com
Wed Sep 9 08:34:48 CEST 2015


[ijs]
> I stop following for the week and the world goes mad. I've
> lost count of the number of times I've thought, "Are you
> out of your *mind*!?" while reading this thread. You actually
> considered breaking the __hash__ invariant?

It went unnoticed for some time that the original PEP 495 _did_ break
it.  Not intentionally.  "Unintended consequence."

Alex resisted accepting that it was a fatal problem at first, but was
converted to One Of Us after a single night's intense torture ;-)

...

> I'm assuming that the moment of temporary insanity has
> passed and you consider the __hash__ invariant to be sacrosanct.

Of course!


> The problem here is that someone (Alexander, I think?)
> demonstrated a method of producing a tzinfo class and b
> and c to make this true, *given arbitrary a and d*. Equality
> may not be transitive, but equality of hashes is, which
> means that __hash__ must be constant over equivalence
> classes in the transitive closure of the relation defined by
> __eq__. In this case, this boils down to "if __hash__ ignores
> fold, all datetime objects must have the same hash".

Alex also sketched an approach to constructing a far higher-quality
hash (than a constant function), but it required having, in advance
(of the first hash() call), all tzinfos that could possibly be used
across a program's run.

For example, if we knew in advance there was only one possible
non-fixed-offset zone Z, hash(x) could convert x to zone Z. then
convert the result of that (ignoring its `fold`) to a timestamp (as a
timedelta object) relative to 0001-01-01 00:00:00 in Z, then hash the
timestamp.  Then all spellings in all zones of one of the times in a Z
fold would have the same hash.

It's clever, but can't see a way to make it practical.  There's
nothing, e.g., to stop code from building a brand new tzinfo as a big
string containing Python code, and compiling the string at runtime.


> I imagine the performance implications of this are not acceptable.

Heh.  We could try a constant hash function and see whether anyone
noticed.  That would be fun :-)


> There is no satisfactory way of weaseling out of this;

_Something_ has to give, yes.  "Satisfactory" is Guido's call.
Weaseling is our job.  I already did a small test to convince myself
people _would_ notice if we removed dicts from the language.  They're
the real source of this problem ;-)


> datetime equality is timeline equality now and forever, unless
> you're willing to give up one of backward compatibility, the
> __hash__ invariant, or the ability to implement new tzinfo classes.
> (The tzinfo in the example was contrived but not buggy.)

No tzinfo contrivance is necessary.  The hash problem in the original
PEP could be provoked using any zone whatsoever in which there's a
fold (like, say, US/Eastern).  I think you have in mind part of Alex's
sketch of a better-than-constant hash, where zones were indeed
contrived just to illustrate how nasty it _could_ get.

Guido is least fond of by-magic interzone comparison, and that's what
we've been picking on.  All worm-arounds so far would sacrifice
trichotomy in some (or all) cases of "problem times", by declaring
that some problem times wouldn't compare equal to any datetime in any
other zone.

In the latest version of that, there would be no change to comparison
results so long as pre-495 tzinfos were used.  If you started to use
post-495 tzinfos, that's your choice:  then you get by-magic `fold`
set correctly in all cases, correct zone conversions in all cases, and
correct by-magic interzone subtraction in all cases - at the cost of
living with that all problem times (whether in a gap or a fold) would
compare "not equal" to all datetimes in all other zones.

My own code couldn't care less (I've never used an interzone
comparison outside of lines in datetime's test suite).  You _could_
still compare them, but you'd either have to convert to a zone in
which they were not problem times (timezone.utc would always work for
this) first, or use by-magic interzone subtraction and check the sign
of the result.

So, given that a user would have to "do something" to have even the
possibility of suffering a surprise that will probably never happen in
their life, "not satisfactory" isn't a slam dunk.  Luckily, PEP 20 is
crystal clear about the right decision in this case.


More information about the Datetime-SIG mailing list