Python-ideas

Download

python-ideas@python.org

December 2018

49 participants
26 discussions

[Brainstorm] Testing with Documented ABCs
by Abe Dillon Dec. 8, 2018

Dec. 8, 2018

I've been pulling a lot of ideas from the recent discussion on design by contract (DBC), the elegance and drawbacks <https://bemusement.org/doctests-arent-code> of doctests <https://docs.python.org/3/library/doctest.html>, and the amazing talk <https://www.youtube.com/watch?v=MYucYon2-lk> given by Hillel Wayne at this year's PyCon entitled "Beyond Unit Tests: Taking your Tests to the Next Level". To recap a lot of previous discussions: - Documentation should tell you: A) … [View More]What a variable represents B) What kind of thing a variable is C) The acceptable values a variable can take - Typing and Tests can partially take the place of documentation by filling in B and C (respectively) and sometimes A can be inferred from decent naming and context. - Contracts can take the place of many tests (especially when combined with a library like hypothesis) - Contracts/assertions can provide "stable" documentation in the sense that it can't get out of sync with the code. - Attempts to implement contracts using standard Python syntax are verbose and noisy because they rely heavily on decorators that add a lot of repetitive preamble to the methods being decorated. They may also require a metaclass which restricts their use to code that doesn't already use a metaclass. - There was some discussion about the importance of "what a variable represents" which pointed to this article <http://pgbovine.net/python-unreadable.htm> by Philip J. Guo (author of the magnificent pythontutor.com). I believe Guo's usage of "in-the-small" and "in-the-large" are confusing because a well decoupled program shouldn't yield functions that know or care how they're being used in the grand machinations of your project. The examples he gives are of functions that could use a doc string and some type annotations, but don't actually say how they relate to the rest of the project. One thing that caught me about Hillel Wayne's talk was that some of his examples were close to needing practically no code. He starts with: def tail(lst: List[Any]) -> List[Any]: assert len(lst) > 0, "precondition" result = lst[1:] assert [lst[0]] + result == lst, "postcondition" return result He then re-writes the function using a contracts library: @require("lst must not be empty", lambda args: len(args.lst) > 0) @ensure("result is tail of lst", lambda args, result: [args.lst[0]] + result == args.lst) def tail(lst: List[Any]) -> List[Any]: return lst[1:] He then writes a unit test for the function: @given(lists(integers(), 1)) def test_tail(lst): tail(lst) What strikes me as interesting is that the test pretty-much doesn't need to be written. The 'given' statement should be redundant based on the type annotation and the precondition. Anyone who knows hypothesis, just imagine the @require is a hypothesis 'assume' call. Furthermore, hypothesis should be able to build strategies for more complex objects based on class invariants and attribute types: @invariant("no overdrafts", lambda self: self.balance >= 0) class Account: def __init__(self, number: int, balance: float = 0): super().__init__() self.number: int = number self.balance: float = balance A library like hypothesis should be able to generate valid account objects. Hypothesis also has stateful testing <https://hypothesis.readthedocs.io/en/1.4.1/stateful.html> but I think the implementation could use some work. As it is, you have inherit from a class that uses a metaclass AND you have to pollute your class's name-space with helper objects and methods. If we could figure out a cleaner syntax for defining invariants, preconditions, and postconditions we'd be half-way to automated testing UTOPIA! (ok, maybe I'm being a little over-zealous) I think there are two missing pieces to this testing problem: side-effect verification and failure verification. Failure verification should test that the expected exceptions get thrown when known bad data is passed in or when an object is put in a known illegal state. This should be doable by allowing Hypothesis to probe the bounds of unacceptable input data or states, though it might seem a bit silly because if you've already added a precondition, "x >= 0" to a function, then it obviously should raise a PreconditionViolated when passed any x < 0. It may be important, however; if for performance reasons, you need to disable invariant checking but you still want certain bad input to raise exceptions, or your system has two components that interact with slightly mis-matched invariants and you want to make sure the components handle the edge-condition correctly. You can think of Types from a set-theory perspective where the Integer type is conceptually the set of all integers, and invariants would specify a smaller subset than Typing alone, however if the set of all valid outputs of one component is not completely contained within the set of all valid inputs to another component, then there will be edge-cases resulting from the mismatch. In that sense, some of the invariant verification could be static-ish (as much as Python allows). Side-effect verification is usually done by mocking dependencies. You pass in a mock database connection and make sure my object sends and receives data as expected. As crazy as it sounds, this too can be almost completely automated away if all of the above tools are in place AND if Python gained support for Exception annotations. I wrote a Java (yuck) library at work that does this. I wan't to port it to Python and share it, but it basically enumerates a bunch of stuff: the "sources" and "destinations" of the system, how those relate to dependencies, how they relate to each other (if dependency X is unresponsive, I can't get sources A, B, or G and if I can't get source B, I can't write destination Y), the dependency failure modes (Exceptions raised, timeouts, unrecognized key, missing data, etc.), all the public methods of the class under test and what sources and destinations they use. Then I enumerate 'k' from 0 to some limit for the max number of simultaneous faults to test for: Then for each method that can have n >= k simultaneous faults I test all (n choose k) combinations of faults for that method against the desired behavior. I'm sure that explanation is as clear as mud. I will try to get a working Python example at some point to demonstrate. Finally, in the PyCon video; Hillel Wayne shows an example of testing that an "add" function is commutative. It seems that once you write that invariant, it might apply to many different functions. A similar invariant may be "reversibility" like: @given(text()) def test_reversable_codex(s): assert s == decode(encode(s)), "not reversible" That might be a common property that other functions share: @invariant(reversible(decode)) def encode(s: str) -> bytes: ... Having said all that, I wanted to brainstorm some possible solutions for implementing some or all of the above in Python without drowning you code in decorators. NOTE: Please don't get hung up on specific syntax suggestions! Try to see the forest through the trees! An example syntax could be: #Instead of this @require("lst must not be empty", lambda args: len(args.lst) > 0) @ensure("result is tail of lst", lambda args, result: [args.lst[0]] + result == args.lst) def tail(lst: List[Any]) -> List[Any]: return lst[1:] #Maybe this? non_empty = invariant("Must not be empty", lambda x: len(x) > 0) # can be re-used def tail(lst: List[Any] d"Description of what this param represents. {non_empty}") -> List[Any] d"Description of return value {lst == [lst[0]] + __result__}": """ Description of function """ return lst[1:] Python could build the full doc string like so: """ Description of function Args: lst: Description of what this param represents. Must not be empty. Returns: Description of return value. """ d-strings have some description followed by some terminator after which either invariant objects or [optionally strings] followed by an expression on the arguments and __return__? I'm sorry this is so half-baked. I don't really like the d-string concept and I'm pretty sure there are a million problems with it. I'll try to flesh out the side-effect verification concept more later along with all the other poorly explained stuff. I just wanted to get these thoughts out for discussion, but now it's super late and I have to go! [View Less]

10 24