[Python-ideas] Repurpose `assert' into a general-purpose check
Steven D'Aprano
steve at pearwood.info
Tue Nov 28 11:34:02 EST 2017
On Tue, Nov 28, 2017 at 10:11:46AM +0300, Ivan Pozdeev via Python-ideas wrote:
> I invite you to show me a single use case for those "assertions" because
> after ~20 years of experience in coding (that included fairly large
> projects), I've yet to see one.
I already mentioned not one but multiple use-cases for assertions:
- checked comments (assertions are documentation, not just code)
- checks of internal algorithm logic
- design-by-contract style pre- and post-conditions
- checking program invariants
- defensive programming against conditions that "cannot happen".
John Regehr wrote an excellent post on this from the perspective of a
systems programmer:
https://blog.regehr.org/archives/1091
(he even links to a post by our own Ned Batchelder) but many of his
use-cases for assert applies just as well to Python as C. Assertions
also work great with fuzzers:
http://www.squarefree.com/2014/02/03/fuzzers-love-assertions/
I've also written on assertions before:
https://import-that.dreamwidth.org/676.html
Assertions can also be used as a kind of "Continuous Testing" that
operates in debug builds (or in Python in __debug__ mode): every time
you run the code, it tests its own internal state by running the
asserts. As Regehr puts it:
"When we write an assertion, we are teaching a program to
diagnose bugs in itself."
If the code you have worked with needed none of these things, assertions
might not be useful to you. But that doesn't mean there are no use-cases
for assertions. Not everyone is lucky to work with code that is so
self-documenting that checked comments are redundant, or code so
obviously correct that there is no point to checking its internal state.
In your opening post, you gave a use-case for assertions:
"a check to only do in debug mode, when you can't yet trust your
code to manage and pass around internal data correctly."
Now you say there are none. Would you like to revise one of those
statements?
> Any, every check that you make at debug time either
> * belongs in production as well (all the more because it's harder to
> diagnose there), or
I'm not going to say that claim is entirely wrong. For example,
Microsoft's research project Midori used two kinds of assertions:
Debug.Assert only runs under debugging;
Release.Assert always runs
and it also promised that all contracts will either be checked at
runtime, or the compiler can prove that the contract is always satisfied
and so can skip the check. Otherwise they cannot be disabled.
http://joeduffyblog.com/2016/02/07/the-error-model/
But Midori was written in a custom systems language based on C#, and the
lessons from it probably don't apply directly to a rapid application
development language / scripting language like Python. In any case,
Midori's approach was rather unusual even compared to other systems
languages.
More conventionally, Eiffel (for example) allows the developer to enable
or disable contract checking. By default, pre-conditions are checked in
release builds and post-conditions and invariants are skipped, but each
one can be enabled or disabled individually. We have the choice to
choose faster code or more extensive error checking, depending on which
we value more.
I think that's an excellent tradeoff to have. Python currently only has
a very coarse switch that can only turn assertions on or off, but even
that coarse switch is better than nothing.
> * belongs in a test -- something coded independently from the program
> (if your code as a whole cannot be trusted, how any specific part of it
> can?), or
I love tests. I write unit tests and doc tests all the time. (Well,
except when I'm being lazy, when I only write doc tests.) But tests
cannot replace assertions. There are at least two problems:
- external tests don't have access to the function locals;
- and even if you can write a test for something, its in the wrong place
to be useful as a replacement of an assertion.
John Regehr discusses this exact issue (see link above) and writes that
"there is a strong synergy between assertions and unit tests". If you do
them right, they complement each other. And I would argue that
assertions are a kind of test integrated in the code.
Neither assertions nor unit tests can find all bugs. The wise programmer
uses both.
In an earlier post, I gave an actual assertion from one of my functions.
Here it is again:
assert 0 <= r < abs(y)
Its not obvious from that line in isolation, but both r and y are local
variables of a function that calculates a mathematical result (hence the
short and undescriptive names). How can I possibly write a test to check
this? They are *local* to the function, so I have no access to them from
outside of the test!
(Actually, in this *specific case*, r is part of the return value and
could be extracted by a test function. But y isn't.)
In general, assert is great for tests that occur inside the body of a
function, checking assertions about the function internals. You cannot
replace that with an external test. Here's an assertion from another
function:
c = collections.Counter(symbols)
assert c
The Counter c is an internal implementation detail of this one function.
It would be silly to expose it (how?) so I can write a test to check it.
Even if I could write such a test, that would be the wrong place for the
check. I want the check there inside the function, not in a completely
different .py file. The assertion is as much for me, the reader of the
code, as for the interpreter. After reading that assertion, I can read
the rest of the function knowing that it is safe to assume that c will
always have at least one key. It is a checked comment: `assert c` is
better than:
# c will not be empty
because the assertion is checked at runtime unless I disable assert
checking, while comments are "lies in code" that rapidly become
obsolete or inaccurate.
> * isn't needed at all because a fault will inevitably surface somewhere
> down the line (as some exception or an incorrect result that a test will
> catch).
You don't know that an error will be raised. The program may simply do
the wrong thing and silently return garbage that you have no way of
knowing is garbage.
Nor do you know that a test will catch the problem. Most real world code
does not have even close to 100% test coverage -- which is why people
are always discovering new bugs.
But even if there is an obvious failure later on, or a failing test, it
is better to catch errors sooner rather than later. The longer it takes
to discover the error, the harder it is to debug. Assertions can reduce
the distance between where the bug occurs and where you notice it.
> Finally, I've got much experience using existing code outside its
> original use cases, where the original author's assumptions may no
> longer hold but the specific logic can be gauded to produce the desired
> result. Coding these assumptions in would undermine that goal.
Of course assertions can be misused. The same applies to code that
defeats duck-typing with excessive isinstance() checks, or people who
make everything private and all classes final (in languages that support
that). John Regehr also discusses some poor ways to misuse assertions.
This is not a good argument for changing assert.
> So, I see "debug assertions" as either intentionally compromizing
> correctness for performance (a direct opposite of Python's design
> principles),
You have that backwards: assertions compromise performance for
correctness.
I'm very aware that every time I write an assert, that's a runtime check
that compromises performance. But I do so when I believe that the gain
in correctness, or the advantage in finding bugs closer to their origin,
or even their value as documentation, outweighs the performance cost.
> or as an inferiour, faulty, half-measure rudiment from
> times when CI wasn't a thing (thus not something that should be taught
> and promoted as a best practice any longer).
If you think that Continuous Integration is a suitable replacement for
assertions, then I think you have misunderstood either CI or assertions
or both. That's like saying that now that we have CI, we don't need to
check the return code on C functions, or catch exceptions. CI and
assertions are complementary, not in opposition.
CI is a development practice to ensure that the master is always in a
working state. That's great. But how do you know the master is working?
You need *tests*, and assertions complement tests. Unit tests are not a
replacement for assertions.
You wouldn't say "unit tests are obsolete now that we have integration
tests", or "we don't need fuzzers, we have regression tests". Why would
you say that you don't need assertions just because you are using CI?
--
Steve
More information about the Python-ideas
mailing list