Re: [Python-ideas] [Python-Dev] the role of assert in the standard library ?

This is a thread from python-dev I am moving to python-ideas, because I want to debate the "assert" keyword :) On Thu, Apr 28, 2011 at 12:27 PM, Michael Foord <fuzzyman@voidspace.org.uk> wrote:
But it does affect the behaviour at the end: when the code has a bug, then the way the code works is affected and differs depending if -O is used. Let me take an example: bucket = [] def add(stuff): bucket.append(stuff) def purge_and_do_something(): bucket.clear() assert len(bucket) == 0 ... do something by being sure the bucket is empty ... here, we could say assert is legitimate, as it just checks a post-condition. But this code is not thread-safe and if it's run via several threads, some code could add something in the bucket while purge() check for the assertion. So, if I run this code using -O it will not behave the same: it will seem to work. I am not arguing against the fact that this code should be changed and set up a lock. My point is that I do not understand why there are assert calls to check post-conditions. Those seem to be relics from the developer that marked something she needed to take care of in her code, but did not yet. e.g. like a TODO or a XXX or a NotImplementedError Moreover, why -O and -OO are removing assertions in that case ? even if the assert is "supposed not to happen" it's present in the code and can happen (or well, remove it). So "optimize" potentially changes the behaviour of the code.
Cheers Tarek -- Tarek Ziadé | http://ziade.org

Tarek Ziadé wrote:
"assert"s are meant to easily and transparently include testing code in production code, without affecting the production version's performance when run with the -O flag. In your example, you can assume that your code does indeed do what it's meant to do in production code, because you will have tested the code in your test environment. Assuming that you don't allow untested code to run on a production system, you can then safely remove the assert bytecodes from the program and avoid the added overhead using the -O flag. For longer pieces of testing code, you can use: if __debug__: print("Testing ...") This code will also get removed by the -O flag. Both methods allow testing complex code without having to duplicate much of the code in test cases, just to check certain corner cases.
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 28 2011)
2011-06-20: EuroPython 2011, Florence, Italy 53 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

Tarek Ziadé wrote:
No. There's a difference between dead code (code that cannot be reached) and asserts. An assert is a statement of confidence: "I'm sure that this assertion is true, but I'm not 100% sure, because bugs do exist. But since I'm *nearly* 100% sure, it is safe to optimize the assertions away, *if* you are brave enough to run with the -O flag." If you've ever written the typical defensive pattern: if a: do_this() elif b: do_that() elif c: do_something_else() else: # This can never happen. raise RuntimeError('unexpected error') then you have a perfect candidate for an assertion: assert any([a, b, c], 'unexpected error') You know what they say about things that can never happen: they *do* happen, more often than you like. An assertion is to check for things that can never happen. Since it can't happen, it's safe to optimize it away and not perform the check. But since things that can't happen do happen (due to logic errors and the presence of unexpected bugs), it's better to generate a failure immediately, where you perform the check, rather than somewhere else far distant from the problem, or worse, returning the wrong result. Assertions are a case of practicality beats purity: in a perfect world, we'd always be 100% confident that our code was bug-free and free of any logic errors, and that our assumptions were perfectly correct. But in the real world, we can't always be quite so sure, and asserts cover that middle ground where we're *almost* sure about a condition, but not entirely. Or perhaps we're merely paranoid. Either way, asserts should never be something you *rely* on (e.g. checking user input). Let me give you a real world example: I have some code to calculate the Pearson's Correlation Coefficient, r. At the end of the function, just before returning, I assert -1 <= r <= 1. I've run this function thousands of times, possibly tens of thousands, over periods of months, without any problems. Last week I got an assertion error: r was something like 1.00000000000000001 (or thereabouts). So now I know there's a bug in my code. (Unfortunately, not *where* it is.) Without the assert, I wouldn't know, because that r would likely have been used in some larger calculation, without me ever noticing that it can be out of range. -- Steven

On 4/28/2011 8:27 AM, Tarek Ziadé wrote:
On 28/04/2011 09:34, Terry Reedy wrote:
That is just what I just said: 'unless the code has a bug'.
then the way the code works is affected and differs depending if -O is used.
As long as Python has flags that affect code and its execution, the stdlib should be tested with them.
No, not legitimate: the .clear() post-condition check should be in .clear(), not in code that uses it.
It is, or should be, well-known that concurrency, in general, makes programs non-deterministic (which is to say, potentially buggy relative to expectations). Purge needs a write-lock (that still allows .clear() to run).
My point is that I do not understand why there are assert calls to check post-conditions.
Assert are inline unittests that long predate the x-unit modules for various languages. One advantage over separate test suites is that they test that the function works properly not only with the relatively few inputs in most test suites but also with all inputs in the integration and acceptance tests and, if left in place, in actual use.
Moreover, why -O and -OO are removing assertions in that case ?
A polynomial-time verification function may still take awhile. For instance, is_sorted_version_of(input, output) has to verify both that output is sorted (easy) and that it is a permutation of the input (harder). So you might like it optionally gone in production but still present in source and optionally present in production where speed is not critical.
I use asserts to test my custom test functions that they pass obviously good functions and fail intentionally bad functions with the proper error message. Not dead at all, especially when I 'refactor' (fiddle with) the test functin code. Of course, I could replace then with if (): raise or print, by why? --- Terry Jan Reedy

Tarek Ziadé wrote:
"assert"s are meant to easily and transparently include testing code in production code, without affecting the production version's performance when run with the -O flag. In your example, you can assume that your code does indeed do what it's meant to do in production code, because you will have tested the code in your test environment. Assuming that you don't allow untested code to run on a production system, you can then safely remove the assert bytecodes from the program and avoid the added overhead using the -O flag. For longer pieces of testing code, you can use: if __debug__: print("Testing ...") This code will also get removed by the -O flag. Both methods allow testing complex code without having to duplicate much of the code in test cases, just to check certain corner cases.
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Apr 28 2011)
2011-06-20: EuroPython 2011, Florence, Italy 53 days to go ::: Try our new mxODBC.Connect Python Database Interface for free ! :::: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

Tarek Ziadé wrote:
No. There's a difference between dead code (code that cannot be reached) and asserts. An assert is a statement of confidence: "I'm sure that this assertion is true, but I'm not 100% sure, because bugs do exist. But since I'm *nearly* 100% sure, it is safe to optimize the assertions away, *if* you are brave enough to run with the -O flag." If you've ever written the typical defensive pattern: if a: do_this() elif b: do_that() elif c: do_something_else() else: # This can never happen. raise RuntimeError('unexpected error') then you have a perfect candidate for an assertion: assert any([a, b, c], 'unexpected error') You know what they say about things that can never happen: they *do* happen, more often than you like. An assertion is to check for things that can never happen. Since it can't happen, it's safe to optimize it away and not perform the check. But since things that can't happen do happen (due to logic errors and the presence of unexpected bugs), it's better to generate a failure immediately, where you perform the check, rather than somewhere else far distant from the problem, or worse, returning the wrong result. Assertions are a case of practicality beats purity: in a perfect world, we'd always be 100% confident that our code was bug-free and free of any logic errors, and that our assumptions were perfectly correct. But in the real world, we can't always be quite so sure, and asserts cover that middle ground where we're *almost* sure about a condition, but not entirely. Or perhaps we're merely paranoid. Either way, asserts should never be something you *rely* on (e.g. checking user input). Let me give you a real world example: I have some code to calculate the Pearson's Correlation Coefficient, r. At the end of the function, just before returning, I assert -1 <= r <= 1. I've run this function thousands of times, possibly tens of thousands, over periods of months, without any problems. Last week I got an assertion error: r was something like 1.00000000000000001 (or thereabouts). So now I know there's a bug in my code. (Unfortunately, not *where* it is.) Without the assert, I wouldn't know, because that r would likely have been used in some larger calculation, without me ever noticing that it can be out of range. -- Steven

On 4/28/2011 8:27 AM, Tarek Ziadé wrote:
On 28/04/2011 09:34, Terry Reedy wrote:
That is just what I just said: 'unless the code has a bug'.
then the way the code works is affected and differs depending if -O is used.
As long as Python has flags that affect code and its execution, the stdlib should be tested with them.
No, not legitimate: the .clear() post-condition check should be in .clear(), not in code that uses it.
It is, or should be, well-known that concurrency, in general, makes programs non-deterministic (which is to say, potentially buggy relative to expectations). Purge needs a write-lock (that still allows .clear() to run).
My point is that I do not understand why there are assert calls to check post-conditions.
Assert are inline unittests that long predate the x-unit modules for various languages. One advantage over separate test suites is that they test that the function works properly not only with the relatively few inputs in most test suites but also with all inputs in the integration and acceptance tests and, if left in place, in actual use.
Moreover, why -O and -OO are removing assertions in that case ?
A polynomial-time verification function may still take awhile. For instance, is_sorted_version_of(input, output) has to verify both that output is sorted (easy) and that it is a permutation of the input (harder). So you might like it optionally gone in production but still present in source and optionally present in production where speed is not critical.
I use asserts to test my custom test functions that they pass obviously good functions and fail intentionally bad functions with the proper error message. Not dead at all, especially when I 'refactor' (fiddle with) the test functin code. Of course, I could replace then with if (): raise or print, by why? --- Terry Jan Reedy
participants (4)
-
M.-A. Lemburg
-
Steven D'Aprano
-
Tarek Ziadé
-
Terry Reedy