Mailman 3 [Brainstorm] Testing with Documented ABCs - Python-ideas

Nov. 28, 2018

      I've been pulling a lot of ideas from the recent discussion on design by
contract (DBC), the elegance and drawbacks
<https://bemusement.org/doctests-arent-code> of doctests
<https://docs.python.org/3/library/doctest.html>, and the amazing talk
<https://www.youtube.com/watch?v=MYucYon2-lk> given by Hillel Wayne at this
year's PyCon entitled "Beyond Unit Tests: Taking your Tests to the Next
Level".

To recap a lot of previous discussions:

- Documentation should tell you:
    A) What a variable represents
    B) What kind of thing a variable is
    C) The acceptable values a variable can take

- Typing and Tests can partially take the place of documentation by filling
in B and C (respectively) and sometimes A can be inferred from decent
naming and context.

- Contracts can take the place of many tests (especially when combined with
a library like hypothesis)

- Contracts/assertions can provide "stable" documentation in the sense that
it can't get out of sync with the code.

- Attempts to implement contracts using standard Python syntax are verbose
and noisy because they rely heavily on decorators that add a lot of
repetitive preamble to the methods being decorated. They may also require a
metaclass which restricts their use to code that doesn't already use a
metaclass.

- There was some discussion about the importance of "what a variable
represents" which pointed to this article
<http://pgbovine.net/python-unreadable.htm> by Philip J. Guo (author of the
magnificent pythontutor.com). I believe Guo's usage of "in-the-small" and
"in-the-large" are confusing because a well decoupled program shouldn't
yield functions that know or care how they're being used in the grand
machinations of your project. The examples he gives are of functions that
could use a doc string and some type annotations, but don't actually say
how they relate to the rest of the project.

One thing that caught me about Hillel Wayne's talk was that some of his
examples were close to needing practically no code. He starts with:

def tail(lst: List[Any]) -> List[Any]:
  assert len(lst) > 0, "precondition"
  result = lst[1:]
  assert [lst[0]] + result == lst, "postcondition"
  return result

He then re-writes the function using a contracts library:

@require("lst must not be empty", lambda args: len(args.lst) > 0)
@ensure("result is tail of lst", lambda args, result: [args.lst[0]] +
result == args.lst)
def tail(lst: List[Any]) -> List[Any]:
  return lst[1:]

He then writes a unit test for the function:

@given(lists(integers(), 1))
def test_tail(lst):
  tail(lst)

What strikes me as interesting is that the test pretty-much doesn't need to
be written. The 'given' statement should be redundant based on the type
annotation and the precondition. Anyone who knows hypothesis, just imagine
the @require is a hypothesis 'assume' call. Furthermore, hypothesis should
be able to build strategies for more complex objects based on class
invariants and attribute types:

@invariant("no overdrafts", lambda self: self.balance >= 0)
class Account:
  def __init__(self, number: int, balance: float = 0):
    super().__init__()
    self.number: int = number
    self.balance: float = balance

A library like hypothesis should be able to generate valid account objects.
Hypothesis also has stateful testing
<https://hypothesis.readthedocs.io/en/1.4.1/stateful.html> but I think the
implementation could use some work. As it is, you have inherit from a class
that uses a metaclass AND you have to pollute your class's name-space with
helper objects and methods.

If we could figure out a cleaner syntax for defining invariants,
preconditions, and postconditions we'd be half-way to automated testing
UTOPIA! (ok, maybe I'm being a little over-zealous)

I think there are two missing pieces to this testing problem: side-effect
verification and failure verification.

Failure verification should test that the expected exceptions get thrown
when known bad data is passed in or when an object is put in a known
illegal state. This should be doable by allowing Hypothesis to probe the
bounds of unacceptable input data or states, though it might seem a bit
silly because if you've already added a precondition, "x >= 0" to a
function, then it obviously should raise a PreconditionViolated when passed
any x < 0. It may be important, however; if for performance reasons, you
need to disable invariant checking but you still want certain bad input to
raise exceptions, or your system has two components that interact with
slightly mis-matched invariants and you want to make sure the components
handle the edge-condition correctly. You can think of Types from a
set-theory perspective where the Integer type is conceptually the set of
all integers, and invariants would specify a smaller subset than Typing
alone, however if the set of all valid outputs of one component is not
completely contained within the set of all valid inputs to another
component, then there will be edge-cases resulting from the mismatch. In
that sense, some of the invariant verification could be static-ish (as much
as Python allows).

Side-effect verification is usually done by mocking dependencies. You pass
in a mock database connection and make sure my object sends and receives
data as expected. As crazy as it sounds, this too can be almost completely
automated away if all of the above tools are in place AND if Python gained
support for Exception annotations. I wrote a Java (yuck) library at work
that does this. I wan't to port it to Python and share it, but it basically
enumerates a bunch of stuff: the "sources" and "destinations" of the
system, how those relate to dependencies, how they relate to each other (if
dependency X is unresponsive, I can't get sources A, B, or G and if I can't
get source B, I can't write destination Y), the dependency failure modes
(Exceptions raised, timeouts, unrecognized key, missing data, etc.), all
the public methods of the class under test and what sources and
destinations they use.

Then I enumerate 'k' from 0 to some limit for the max number of
simultaneous faults to test for:
   Then for each method that can have n >= k simultaneous faults I test all
(n choose k) combinations of faults for that method against the desired
behavior.

I'm sure that explanation is as clear as mud. I will try to get a working
Python example at some point to demonstrate.

Finally, in the PyCon video; Hillel Wayne shows an example of testing that
an "add" function is commutative. It seems that once you write that
invariant, it might apply to many different functions. A similar invariant
may be "reversibility" like:

@given(text())
def test_reversable_codex(s):
   assert s == decode(encode(s)), "not reversible"

That might be a common property that other functions share:

@invariant(reversible(decode))
def encode(s: str) -> bytes: ...

Having said all that, I wanted to brainstorm some possible solutions for
implementing some or all of the above in Python without drowning you code
in decorators.

NOTE: Please don't get hung up on specific syntax suggestions! Try to see
the forest through the trees!

An example syntax could be:

#Instead of this
@require("lst must not be empty", lambda args: len(args.lst) > 0)
@ensure("result is tail of lst", lambda args, result: [args.lst[0]] +
result == args.lst)
def tail(lst: List[Any]) -> List[Any]:
  return lst[1:]

#Maybe this?
non_empty = invariant("Must not be empty", lambda x: len(x) > 0)  # can be
re-used

def tail(lst: List[Any]   d"Description of what this param represents.
{non_empty}") -> List[Any]  d"Description of return value {lst == [lst[0]]
+ __result__}":
  """
  Description of function
  """
  return lst[1:]

Python could build the full doc string like so:

"""
Description of function

Args:
  lst: Description of what this param represents. Must not be empty.

Returns:
  Description of return value.
"""

d-strings have some description followed by some terminator after which
either invariant objects or [optionally strings] followed by an expression
on the arguments and __return__?

I'm sorry this is so half-baked. I don't really like the d-string concept
and I'm pretty sure there are a million problems with it. I'll try to flesh
out the side-effect verification concept more later along with all the
other poorly explained stuff. I just wanted to get these thoughts out for
discussion, but now it's super late and I have to go!

[Brainstorm] Testing with Documented ABCs

tags

participants (10)