[Python-ideas] Please reconsider the Boolean evaluation of midnight

Thu Mar 6 13:08:54 CET 2014

On 6 March 2014 21:04, M.-A. Lemburg <mal at egenix.com> wrote:
> Wait. Let's be clear on this:
>
> Writing
>
>     if x: print ('x is None')
>
> or
>
>     if x == None: print ('x is None')

The case in question is essentially this one:

    if x:
        assert x is not None # Always valid!
        ....
    else:
        assert x is None # Valid for most user defined types

There is a learned intuition that people naturally acquire when learning Python:

    - numbers may be false (it means zero)
    - containers may be false (it means empty)
    - everything else is always true (as that's the default for user
defined classes)

Where datetime().time() is confusing is the fact the expected
behaviour changes based on *which* category you put the number in.

The vast majority of Python's users will place structured date and
time objects in the "arbitrary object" category, and expect them to
always be true. When using None as a sentinel for such types, writing
"if x:" when you really mean "if x is not None:" is a harmless style
error, with no practical ill-effects.

When dealing with time objects, such users are unlikely to be crafting
test cases to ensure that "midnight UTC" is handled correctly, so
their "harmless style error" is in fact a subtle data driven bug
waiting to bite them. It is *really* hard for a static analyser to
pick this up, because at point of use "if x:" gives no information
about the type of "x", and hence any such alert would have an
unacceptably high false positive rate.

Now, let's consider the time with the *best* possible claim to being
false: timestamp zero. How does that behave?

Python 3.3:

>>> bool(dt.datetime.fromtimestamp(0))
True
>>> bool(dt.datetime.fromtimestamp(0).date())
True
>>> bool(dt.datetime.fromtimestamp(0).time())
True

Huh, if times are supposed to be valid as truth values, that looks
rather weird. What's going on?

>>> dt.datetime.fromtimestamp(0).time()
datetime.time(10, 0)

Oh, I'm in Brisbane - *of course* the truthiness of timestamps should
depend on my timezone! Clearly, what I really meant was timestamp
-36000, or perhaps 50400:

>>> bool(dt.datetime.fromtimestamp(-36000).time())
False
>>> bool(dt.datetime.fromtimestamp(50400).time())
False

So, unless I happen to live in UTC, it's highly unlikely that I'm
going to infer from Python's *behaviour* that datetime.time() (unlike
datetime.date() and datetime.datetime()) belong in the "number"
category, rather than the "arbitrary object" category.

Perhaps it behaves like a number in other ways:

>>> utcmidnight = dt.datetime.fromtimestamp(50400).time()
>>> utcmidnight + 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for +: 'datetime.time' and 'int'
>>> utcmidnight * 1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported operand type(s) for *: 'datetime.time' and 'int'
>>> int(utcmidnight)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: int() argument must be a string or a number, not 'datetime.time'

Hmm, nope. And that last one *explicitly* tells me it's not a number!

There's a great saying in the usability world: "You can't document
your way out of a usability problem". What it means is that if all the
affordances of your application (or programming language!) push users
towards a particular logical conclusion (in this case, "datetime.time
values are not numbers"), having a caveat in your documentation isn't
going to help, because people aren't even going to think to ask the
question. It doesn't matter if you originally had a good reason for
the behaviour, you've ended up in a place where your behaviour is
confusing and inconsistent, because there is one piece of behaviour
that is out of line with an otherwise consistent mental model.

But perhaps I've been told "midnight is false in boolean context". But
which midnight? There are three that apply to me:

>>> naivemidnight
datetime.time(0, 0)
>>> utcmidnight
datetime.time(0, 0, tzinfo=datetime.timezone.utc)
>>> localmidnight
datetime.time(0, 0, tzinfo=datetime.timezone(datetime.timedelta(0, 36000)))

Are they all False? No, no they're not (unless your local timezone is UTC):

>>> bool(utcmidnight)
False
>>> bool(naivemidnight)
False
>>> bool(localmidnight)
True

There's a phrase for APIs like this one: "expert friendly". Experts
like a particular behaviour because it lets them do advanced things
(like leave the door open for modular arithmetic on timestamp.time()
values). However, that's not normally something we see as a virtue
when designing APIs for Python - instead, we generally aim for layered
complexity, where simple things are simple, and we provide power tools
for advanced users that want them.

Now, suppose this boolean behaviour went away. How would I detect if a
value was midnight or not? Well, first, I would need to clarify my
question. Do I mean midnight UTC? Or do I mean midnight local time? It
makes a difference for aware objects, after all. Or perhaps I'm not
interested in aware objects at all, and only care about naive
midnight. Using appropriately named values like those I defined above,
those questions are all very easy to express explicitly in ways that
don't rely on readers understanding that "bool(x)" on a datetime.time
object means the same thing as "x is naive midnight or UTC midnight":

    if x not in (naivemidnight, utcmidnight):
        # This is the current bool(x)
        ....
   if x != naivemidnight:
        # This has no current shorthand
       ....
    if x != utcmidnight:
        # This has no current shorthand
       ....
    if x != localmidnight:
        # This has no current shorthand
        ....

And just to demonstrate that equivalence is accurate:

>>> local_10am = localmidnight.replace(hour=10)
>>> local_10am
datetime.time(10, 0, tzinfo=datetime.timezone(datetime.timedelta(0, 36000)))
>>> bool(local_10am)
False
>>> local_10am != utcmidnight
False
>>> local_10am not in (naivemidnight, utcmidnight)
False

While it was originally the desire to reduce the impact of a very
common bug that prompted me to reopen the issue, it is the improved
consistency in the behaviour presented to users of Python 3.6+ that
makes me belief this is actually worth fixing. We embarked on the
whole Python 3 exercise in the name of making the language easier to
learn by removing legacy features and fixing some problematic defaults
and design warts - this is a tiny tweak by comparison, and will still
go through the full normal deprecation cycle (warning in 3.5,
behavioural change in 3.6)

Regards,
Nick.