Where to put the error handing test?

Tue Nov 24 05:58:42 EST 2009

Peng Yu wrote:
> On Mon, Nov 23, 2009 at 9:44 PM, Lie Ryan <lie.1296 at gmail.com> wrote:
>   
>> Peng Yu wrote:
>>     
>>> Suppose that I have function f() that calls g(), I can put a test on
>>> the argument 'x' in either g() or f(). I'm wondering what is the
>>> common practice.
>>>
>>> My thought is that if I put the test in g(x), the code of g(x) is
>>> safer, but the test is not necessary when g() is called by h().
>>>
>>> If I put the test in f(), then g() becomes more efficient when other
>>> code call g() and guarantee x will pass the test even though the test
>>> code in not in g(). But there might be some caller of g() that pass an
>>> 'x' that might not pass the test, if there were the test in g().
>>>       
>> Typically, you test for x as early as possible, e.g. just after user input
>> (or file or url load or whatever). After that test, you can (or should be
>> able to) assume that all function calls will always be called with the
>> correct argument. This is the ideal situation, it's not always easy to do.
>>
>> In any case though, don't optimize early.
>>     
>
> Let's suppose that g() is refactored out from f() and is call by not
> only f() but other functions, and g() is likely to be called by new
> functions.
>
> If I don't optimize early, I should put the test in g(), rather than f(), right?
>
>   
Your question is so open-ended as to be unanswerable.  All we should do 
in this case is supply some guidelines so you can guess which one might 
apply in your particular case.

You could be referring to a test that triggers alternate handling.  Or 
you could be referring to a test that notices bad input by a user, or 
bad data from an untrusted source.  Or you could be referring to a test 
that discovers bugs in your code.  And there are variations of these, 
depending on whether your user is also writing code (eval, or even 
import of user-supplied mixins), etc.

The first thing that's needed in the function g() is a docstring, 
defining what inputs it expects, and what it'll do with them.  Then if 
it gets any input that doesn't meet those requirements, it might throw 
an exception.  Or it might just get an arbitrary result.  That's all up 
to the docstring.  Without any documentation, nothing is correct.

Functions that are only called by trusted code need not have explicit 
tests on their inputs, since you're writing it all.  Part of debugging 
is catching those cases where f () can pass bad data to g().  If it's 
caused because bad data is passed to f(), then you have a bug in that 
caller.  Eventually, you get to the user.  If the bad data comes from 
the user, it should be caught as soon as possible, and feedback supplied 
right then.

assert() ought to be the correct way to add tests in g() that test 
whether there's such a bug in f().  Unfortunately, in CPython it 
defaults to debug mode, so scripts that are run will execute those tests 
by default.  Consequently, people leave them out, to avoid slowing down 
code.

It comes down to trust.  If you throw the code together without a test 
suite, you'll be a long time finding all the bugs in non-trivial code.  
So add lots of defensive tests throughout the code, and pretend that's 
equivalent to a good test system.  If you're writing a library to be 
used by others, then define your public interfaces with exceptions for 
any invalid code, and write careful documentation describing what's 
invalid.  And if you're writing an end-user application, test their 
input as soon as you get it, so none of the rest of the application ever 
gets "invalid" data.

DaveA