[Brainstorm] Testing with Documented ABCs

I've been pulling a lot of ideas from the recent discussion on design by contract (DBC), the elegance and drawbacks <https://bemusement.org/doctests-arent-code> of doctests <https://docs.python.org/3/library/doctest.html>, and the amazing talk <https://www.youtube.com/watch?v=MYucYon2-lk> given by Hillel Wayne at this year's PyCon entitled "Beyond Unit Tests: Taking your Tests to the Next Level". To recap a lot of previous discussions: - Documentation should tell you: A) What a variable represents B) What kind of thing a variable is C) The acceptable values a variable can take - Typing and Tests can partially take the place of documentation by filling in B and C (respectively) and sometimes A can be inferred from decent naming and context. - Contracts can take the place of many tests (especially when combined with a library like hypothesis) - Contracts/assertions can provide "stable" documentation in the sense that it can't get out of sync with the code. - Attempts to implement contracts using standard Python syntax are verbose and noisy because they rely heavily on decorators that add a lot of repetitive preamble to the methods being decorated. They may also require a metaclass which restricts their use to code that doesn't already use a metaclass. - There was some discussion about the importance of "what a variable represents" which pointed to this article <http://pgbovine.net/python-unreadable.htm> by Philip J. Guo (author of the magnificent pythontutor.com). I believe Guo's usage of "in-the-small" and "in-the-large" are confusing because a well decoupled program shouldn't yield functions that know or care how they're being used in the grand machinations of your project. The examples he gives are of functions that could use a doc string and some type annotations, but don't actually say how they relate to the rest of the project. One thing that caught me about Hillel Wayne's talk was that some of his examples were close to needing practically no code. He starts with: def tail(lst: List[Any]) -> List[Any]: assert len(lst) > 0, "precondition" result = lst[1:] assert [lst[0]] + result == lst, "postcondition" return result He then re-writes the function using a contracts library: @require("lst must not be empty", lambda args: len(args.lst) > 0) @ensure("result is tail of lst", lambda args, result: [args.lst[0]] + result == args.lst) def tail(lst: List[Any]) -> List[Any]: return lst[1:] He then writes a unit test for the function: @given(lists(integers(), 1)) def test_tail(lst): tail(lst) What strikes me as interesting is that the test pretty-much doesn't need to be written. The 'given' statement should be redundant based on the type annotation and the precondition. Anyone who knows hypothesis, just imagine the @require is a hypothesis 'assume' call. Furthermore, hypothesis should be able to build strategies for more complex objects based on class invariants and attribute types: @invariant("no overdrafts", lambda self: self.balance >= 0) class Account: def __init__(self, number: int, balance: float = 0): super().__init__() self.number: int = number self.balance: float = balance A library like hypothesis should be able to generate valid account objects. Hypothesis also has stateful testing <https://hypothesis.readthedocs.io/en/1.4.1/stateful.html> but I think the implementation could use some work. As it is, you have inherit from a class that uses a metaclass AND you have to pollute your class's name-space with helper objects and methods. If we could figure out a cleaner syntax for defining invariants, preconditions, and postconditions we'd be half-way to automated testing UTOPIA! (ok, maybe I'm being a little over-zealous) I think there are two missing pieces to this testing problem: side-effect verification and failure verification. Failure verification should test that the expected exceptions get thrown when known bad data is passed in or when an object is put in a known illegal state. This should be doable by allowing Hypothesis to probe the bounds of unacceptable input data or states, though it might seem a bit silly because if you've already added a precondition, "x >= 0" to a function, then it obviously should raise a PreconditionViolated when passed any x < 0. It may be important, however; if for performance reasons, you need to disable invariant checking but you still want certain bad input to raise exceptions, or your system has two components that interact with slightly mis-matched invariants and you want to make sure the components handle the edge-condition correctly. You can think of Types from a set-theory perspective where the Integer type is conceptually the set of all integers, and invariants would specify a smaller subset than Typing alone, however if the set of all valid outputs of one component is not completely contained within the set of all valid inputs to another component, then there will be edge-cases resulting from the mismatch. In that sense, some of the invariant verification could be static-ish (as much as Python allows). Side-effect verification is usually done by mocking dependencies. You pass in a mock database connection and make sure my object sends and receives data as expected. As crazy as it sounds, this too can be almost completely automated away if all of the above tools are in place AND if Python gained support for Exception annotations. I wrote a Java (yuck) library at work that does this. I wan't to port it to Python and share it, but it basically enumerates a bunch of stuff: the "sources" and "destinations" of the system, how those relate to dependencies, how they relate to each other (if dependency X is unresponsive, I can't get sources A, B, or G and if I can't get source B, I can't write destination Y), the dependency failure modes (Exceptions raised, timeouts, unrecognized key, missing data, etc.), all the public methods of the class under test and what sources and destinations they use. Then I enumerate 'k' from 0 to some limit for the max number of simultaneous faults to test for: Then for each method that can have n >= k simultaneous faults I test all (n choose k) combinations of faults for that method against the desired behavior. I'm sure that explanation is as clear as mud. I will try to get a working Python example at some point to demonstrate. Finally, in the PyCon video; Hillel Wayne shows an example of testing that an "add" function is commutative. It seems that once you write that invariant, it might apply to many different functions. A similar invariant may be "reversibility" like: @given(text()) def test_reversable_codex(s): assert s == decode(encode(s)), "not reversible" That might be a common property that other functions share: @invariant(reversible(decode)) def encode(s: str) -> bytes: ... Having said all that, I wanted to brainstorm some possible solutions for implementing some or all of the above in Python without drowning you code in decorators. NOTE: Please don't get hung up on specific syntax suggestions! Try to see the forest through the trees! An example syntax could be: #Instead of this @require("lst must not be empty", lambda args: len(args.lst) > 0) @ensure("result is tail of lst", lambda args, result: [args.lst[0]] + result == args.lst) def tail(lst: List[Any]) -> List[Any]: return lst[1:] #Maybe this? non_empty = invariant("Must not be empty", lambda x: len(x) > 0) # can be re-used def tail(lst: List[Any] d"Description of what this param represents. {non_empty}") -> List[Any] d"Description of return value {lst == [lst[0]] + __result__}": """ Description of function """ return lst[1:] Python could build the full doc string like so: """ Description of function Args: lst: Description of what this param represents. Must not be empty. Returns: Description of return value. """ d-strings have some description followed by some terminator after which either invariant objects or [optionally strings] followed by an expression on the arguments and __return__? I'm sorry this is so half-baked. I don't really like the d-string concept and I'm pretty sure there are a million problems with it. I'll try to flesh out the side-effect verification concept more later along with all the other poorly explained stuff. I just wanted to get these thoughts out for discussion, but now it's super late and I have to go!

Hi Abe, I've been pulling a lot of ideas from the recent discussion on design by
Have you looked at the recent discussions regarding design-by-contract on this list ( https://groups.google.com/forum/m/#!topic/python-ideas/JtMgpSyODTU and the following forked threads)? You might want to have a look at static checking techniques such as abstract interpretation. I hope to be able to work on such a tool for Python in some two years from now. We can stay in touch if you are interested. Re decorators: to my own surprise, using decorators in a larger code base is completely practical including the readability and maintenance of the code. It's neither that ugly nor problematic as it might seem at first look. We use our https://github.com/Parquery/icontract at the company. Most of the design choices come from practical issues we faced -- so you might want to read the doc even if you don't plant to use the library. Some of the aspects we still haven't figured out are: how to approach multi-threading (locking around the whole function with an additional decorator?) and granularity of contract switches (right now we use always/optimized, production/non-optimized and teating/slow, but it seems that a larger system requires finer categories). Cheers Marko

[Marko Ristin-Kaufmann]
Have you looked at the recent discussions regarding design-by-contract on this list
I tried to read through them all before posting, but I may have missed some of the forks. There was a lot of good discussion! [Marko Ristin-Kaufmann]
I'll look into that! I'm very interested! [Marko Ristin-Kaufmann]
Interesting. In the thread you linked on DBC, it seemed like Steve D'Aprano and David Mertz (and possibly others) were put off by the verbosity and noisiness of the decorator-based solution you provided with icontract (though I think there are ways to streamline that solution). It seems like syntactic support could offer a more concise and less noisy implementation. One thing that I can get on a soap-box about is the benefit putting the most relevant information to the reader in the order of top to bottom and left to right whenever possible. I've written many posts about this. I think a lot of Python syntax gets this right. It would have been easy to follow the same order as for-loops when designing comprehensions, but expressions allow you some freedom to order things differently, so now comprehensions read: squares = ... # squares is squares = [... # squares is a list squares = [number*number... # squares is a list of num squared squares = [number*number for num in numbers] # squares is a list of num squared 'from' numbers I think decorators sort-of break this rule because they can put a lot of less important information (like, that a function is logged or timed) before more important information (like the function's name, signature, doc-string, etc...). It's not a huge deal because they tend to be de-emphasized by my IDE and there typically aren't dozens of them on each function, but I definitely prefer Eiffel's syntax <https://www.eiffel.com/values/design-by-contract/introduction/> over decorators for that reason. I understand that syntax changes have an very high bar for very good reasons. Hillel Wayne's PyCon talk got me thinking that we might be close enough to a really great solution to a wide variety of testing problems that it might justify some new syntax or perhaps someone has an idea that wouldn't require new syntax that I didn't think of. [Marko Ristin-Kaufmann]
Yeah... I don't know anything about testing concurrent or parallel code. On Wed, Nov 28, 2018 at 1:12 AM Marko Ristin-Kaufmann < marko.ristin@gmail.com> wrote:

Btw, it would be relatively easy to create a parser for Python. Python doesn't have any crazy grammar constructs like the lexer hack <https://en.wikipedia.org/wiki/The_lexer_hack> AFAIK. I'm imagining using Bison: 1. convert python's grammar ( https://github.com/python/cpython/blob/master/Lib/lib2to3/Grammar.txt) to Bison format. 2. write a lexer to parse tokens and convert indentation to indent/dedent tokens. 3. extend the grammar however you want it. Call these custom AST nodes "contract nodes." 4. create a simple AST, really an annotated parse tree. I think we can use a simple one that's a bunch of nested lists: ["for_stmt", "for i in range(10):", [ ["exprlist", "i", [ ... ]], ["testlist", "range(10)", [ ... ]] ]] # ["node_type", "<source code>", <grammar nodes contained inside the for stmt>] The AST can be made more detailed on an as-needed basis. 5. traverse the AST, and "rewrite" the the AST by pasting traditional python AST nodes where contract nodes are. This example from the Babel handbook may help if you have trouble understanding what this step means. https://github.com/jamiebuilds/babel-handbook/blob/master/translations/en/pl... 6. turn the AST back into python source. Since we're storing the source code from the beginning, this should be fairly easy. (Bison lets your lexer tell the parser the line and column numbers of each token.) --- I made a joke language with Bison once, it's really flexible and well-suited for this kind of task. This 6-step p Tip: I found Bison's C++ mode too complicated, so I used it in C mode with the C++ Standard Library and C++ references enabled. --- I'm interested, what contract-related functionality do you think Python's existing syntax is inadequate for? You could look into using with statements and a python program that takes the AST and snips contract-related with statements to produce optimized code, though I suppose that's one step below the custom-parser method. On Wed, Nov 28, 2018 at 3:29 PM Abe Dillon <abedillon@gmail.com> wrote:

Marko, I have a few thoughts that might improve icontract. First, multiple clauses per decorator: @pre( *lambda* x: x >= 0, *lambda* y: y >= 0, *lambda* width: width >= 0, *lambda* height: height >= 0, *lambda* x, width, img: x + width <= width_of(img), *lambda* y, height, img: y + height <= height_of(img)) @post( *lambda* self: (self.x, self.y) in self, *lambda* self: (self.x+self.width-1, self.y+self.height-1) in self, *lambda* self: (self.x+self.width, self.y+self.height) not in self) *def* __init__(self, img: np.ndarray, x: int, y: int, width: int, height: int) -> None: self.img = img[y : y+height, x : x+width].copy() self.x = x self.y = y self.width = width self.height = height *def* __contains__(self, pt: Tuple[int, int]) -> bool: x, y = pt return (self.x <= x < self.x + self.width) and (self.y <= y < self.y + self.height) You might be able to get away with some magic by decorating a method just to flag it as using contracts: @contract # <- does byte-code and/or AST voodoo *def* __init__(self, img: np.ndarray, x: int, y: int, width: int, height: int) -> None: pre(x >= 0, y >= 0, width >= 0, height >= 0, x + width <= width_of(img), y + height <= height_of(img)) # this would probably be declared at the class level inv(*lambda* self: (self.x, self.y) in self, *lambda* self: (self.x+self.width-1, self.y+self.height-1) in self, *lambda* self: (self.x+self.width, self.y+self.height) not in self) self.img = img[y : y+height, x : x+width].copy() self.x = x self.y = y self.width = width self.height = height That might be super tricky to implement, but it saves you some lambda noise. Also, I saw a forked thread in which you were considering some sort of transpiler with similar syntax to the above example. That also works. Another thing to consider is that the role of descriptors <https://www.smallsurething.com/python-descriptors-made-simple/> overlaps some with the role of invariants. I don't know what to do with that knowledge, but it seems like it might be useful. Anyway, I hope those half-baked thoughts have *some* value... On Wed, Nov 28, 2018 at 1:12 AM Marko Ristin-Kaufmann < marko.ristin@gmail.com> wrote:

Hi Abe, Thanks for your suggestions! We actually already considered the two alternatives you propose. *Multiple predicates per decorator. *The problem is that you can not deal with toggling/describing individual contracts easily. While you can hack your way through it (considering the arguments in the sequence, for example), we found it clearer to have separate decorators. Moreover, tracebacks are much easier to read, which is important when you debug a program. *AST magic. *The problem with any approach based on parsing (be it parsing the code or the description) is that parsing is slow so you end up spending a lot of cycles on contracts which might not be enabled (many contracts are applied only in the testing environment, not int he production). Hence you must have an approach that offers practically zero overhead cost to importing a module when its contracts are turned off. Decoding byte-code does not work as current decoding libraries can not keep up with the changes in the language and the compiler hence they are always lagging behind. *Practicality of decorators. *We have retrospective meetings at the company and I frequently survey the opinions related to the contracts (explicitly asking about the readability and maintainability) -- so far nobody had any difficulties and nobody was bothered by the noisy syntax. The decorator syntax is simply not beautiful, no discussion about that. But when it comes to maintenance, there's a linter included ( https://github.com/Parquery/pyicontract-lint), and if you want contracts rendered in an appealing way, there's a documentation tool for sphinx ( https://github.com/Parquery/sphinx-icontract). The linter facilitates the maintainability a lot and sphinx tool gives you nice documentation for a library so that you don't even have to look into the source code that often if you don't want to. We need to be careful not to mistake issues of aesthetics for practical issues. Something might not be beautiful, but can be useful unless it's unreadable. *Conclusion. *What we do need at this moment, IMO, is a broad practical experience of using contracts in Python. Once you make a change to the language, it's impossible to undo. In contrast to what has been suggested in the previous discussions (including my own voiced opinions), I actually now don't think that introducing a language change would be beneficial *at this precise moment*. We don't know what the use cases are, and there is no practical experience to base the language change on. I'd prefer to hear from people who actually use contracts in their professional Python programming -- apart from the noisy syntax, how was the experience? Did it help you catch bugs (and how many)? Were there big problems with maintainability? Could you easily refactor? What were the limits of the contracts you encountered? What kind of snapshot mechanism do we need? How did you deal with multi-threading? And so on. icontract library is already practically usable and, if you don't use inheritance, dpcontracts is usable as well. I would encourage everybody to try out programming with contracts using an existing library and just hold their nose when writing the noisy syntax. Once we unearthed deeper problems related to contracts, I think it will be much easier and much more convincing to write a proposal for introducing contracts in the core language. If I had to write a proposal right now, it would be only based on the experience of writing a humble 100K code base by a team of 5-10 people. Not very convincing. Cheers, Marko On Thu, 29 Nov 2018 at 02:26, Abe Dillon <abedillon@gmail.com> wrote:

[Marko Ristin-Kaufmann]
I agree. That's why I prefaced this topic with [Brainstorm]. I want to explore the solution space to this problem and discuss some of the pros and cons of different ideas, *not* proceed straight to action. I also wanted to bring three thoughts to the table: 1. Fuzz testing and stateful testing like that provided by hypothesis might work together with contracts in an interesting way. 2. Tying tests/contracts to the bits of documentation that they validate is a great way to keep documentation in sync with code, but doctest does it a bit "backwards". Like in icontract-sphinx (or even this) it's better to construct documentation (partially) from test code than to write test code within documentation. In general, I find the relationship between documentation, testing, and type-checking interesting. The problems they each address seem to overlap quite a bit. 3. There seems like a lot of opportunity for the re-use of contracts, so maybe we should consider a mechanism to facilitate that. [Marko Ristin-Kaufmann]
That's a good point. I would argue that the concept of contracts isn't new, so there should be at least a few cases that we can draw on where others have tread before us (which you've obviously done to a large degree). That's not to belittle the work you've done on icontracts. It's a great tool for the reasons you describe. [Marko Ristin-Kaufmann]
I suppose it may be difficult to implement a clean, *backwards-compatible* solution, but yes; going through the arguments in a sequence would be my naive solution. Each entry has an optional description, a callable, and an optional tag or level to enable toggling (I would follow a simple model such as logging levels) *in that order*. It makes sense that the text description come first because that's the most relevant to a reader (like a doc-string), then the corresponding code, then the toggling flag which will often be an optimization detail which generally fall behind code correctness in priority. It may be less straight-forward to parse, but I wouldn't call it a "hack". I guess I'm not sure what to say about tracebacks being hard to read. [Marko Ristin-Kaufmann]
That's fair enough. I think the implementation you've come up with is pretty close to optimally concise given the tools at your disposal. I think something like Eiffel is a good goal for Python to eventually shoot for, but without new syntax; each step between icontracts and an Eiffel-esque platonic ideal would require significant hackery with diminishing returns on investment. On Thu, Nov 29, 2018 at 1:05 AM Marko Ristin-Kaufmann < marko.ristin@gmail.com> wrote:

Hi Abe, I agree. That's why I prefaced this topic with [Brainstorm]. I want to
explore the solution space to this problem and discuss some of the pros and cons of different ideas, *not* proceed straight to action.
You are right. Please apologize, I was so primed by the discussions we had in October 2019 that I didn't pay enough attention to "branstorm" in the subject. Fuzz testing and stateful testing like that provided by hypothesis might
work together with contracts in an interesting way.
You might want to look at the literature on automatic test generation. A possible entry point could be: https://www.research-collection.ethz.ch/handle/20.500.11850/69581 If I had time available, I would start with a tool that analyses a given module and automatically generates code for the Hypothesis test cases. The tool needs to select functions which accept primitive data types and for each one of them translates their contracts into Hypothesis code. If contracts are not trivially translatable to Hypothesis, the function is ignored. For readability and speed of development (of the code under test, not of the tool), I would prefer this tool *not *to be dynamic so that the developer herself needs to re-run it if the function signatures changed. The ingredients for such a tool are all there with icontract (similar to sphinx-icontract, you import the module and analyze its functions; you can copy/past parts of sphinx-icontract implementation for parsing and listing the AST of the contracts). (If you'd like to continue discussing this topic, let's create an issue on icontract github page or switch to private correspondence in order not to spam this mail list). There seems like a lot of opportunity for the re-use of contracts, so maybe
we should consider a mechanism to facilitate that.
This was the case for the requests library. @James Lu <jamtlu@gmail.com> was looking into it -- a lot of functions had very similar contracts. However, in our code base at the company (including the open-sourced libraries), there was not a single case where we thought that contracts re-use would be beneficial. Either it would have hurt the readability and introduce unnecessary couplings (when the contracts were trivial) or it made sense to encapsulate more complex contracts in a separate function.
I found that to be too error-prone in a larger code base, but that is my very subjective opinion. Maybe you could make an example? but without new syntax; each step between icontracts and an Eiffel-esque
platonic ideal would require significant hackery with diminishing returns on investment.
I agree. There are also issues with core python interpreter which I expect to remain open for a long time (see the issues related to retrieving code text of lambda functions and decorators and tweaking dynamically the behavior of help(.) for functions). Cheers, Marko

On Tue, Nov 27, 2018 at 10:47:06PM -0600, Abe Dillon wrote:
You should look at the state of the art in Design By Contract. In Eiffel, DBC is integrated in the language: https://www.eiffel.com/values/design-by-contract/introduction/ https://www.eiffel.org/doc/eiffel/ET-_Design_by_Contract_%28tm%29%2C_Asserti... Eiffel uses a rather Pythonic block structure to define invariants. The syntax is not identical to Python's (Eiffel eschews the colons) but it also comes close to executable pseudo-code. trust this syntax requires little explanation: require ... preconditions, tested on function entry do ... body of the function ensure ... postconditions, tested on function exit end There is a similar invariant block for classes. Cobra is a language which intentionally modeled its syntax on Python. It too has contracts integrated with the language: http://cobra-language.com/how-to/DeclareContracts/ http://cobra-language.com/trac/cobra/wiki/Contracts -- Steve

[Steven D'Aprano]
Thank you! I forgot to mention this (or look into how other languages solve this problem). I saw your example syntax in the recent DBC main thread and liked it a lot. One thought I keep coming back to is this comparison between doc string formats <https://bwanamarko.alwaysdata.net/napoleon/format_exception.html>. It seems obvious that the "Sphynxy" style is the noisiest, most verbose, and ugliest format. Instead of putting ":arg ...:" and ":type ...:" for each parameter and the return value, it makes much more sense to open up an Args: section and use a concise notation for type. The decorator-based pre and post conditions seem like they suffer from the same redundant, noisy, verbosity problem as the Sphynxy docstring format but makes it worse by put all that noise before the function declaration itself. It makes sense to me that a docstring might have a markdown-style syntax like def format_exception(etype, value): """ Format the exception with a traceback. Args: etype (str): what etype represents [some constraint on etype](precondition) [another constraint on etype](in_line_precondition?) value (int): what value represents [some constraint on value](precondition) [some constraints across multiple params](precondition) Returns: What the return value represents # usually very similar to the description at the top [some constraint on return](postcondition) """ ... That ties most bits of the documentation to some code that enforces the correctness of the documentation. And if it's a little noisy, we could take another page from markdown's book and offer alternate ways to reference precondition and postcondition logic. I'm worried that such a style would carry a lot of the same drawbacks as doctest <https://bemusement.org/doctests-arent-code> Also, my sense of coding style has been heavily influenced by [this talk]( https://vimeo.com/74316116), particularly the part where he shoves a mangled Hamlet Soliloquy into the margins, so now many of my functions adopt the following style: def someDescriptiveName( arg1: SomeType, arg2: AnotherType[Thing], ... argN: SomeOtherType = default_value) -> ReturnType: """ what the function does Args: arg1: what arg1 represents arg2: what arg2 represents ... """ ... This highlights a rather obvious duplication of code. We declare an arguments section in code and list all the arguments, then we do so again in the doc string. If you want your doc string to stay in sync with the code, this duplication is a problem. It makes more sense to tie the documentation for an argument to said argument: def someDescriptiveName( # what the function does arg1: SomeType, # what arg1 represents arg2: AnotherType[Thing], # what arg2 represents ... argN: SomeOtherType = default_value # what argN represents ) -> ReturnType: # what the return value represents ... I think it especially makes sense if you consider the preconditions, postconditions, and invariants as a sort-of extension of typing in the sense that it Typing narrows the set of acceptable values to a set of types and contracts restrict that set further. I hope that clarifies my thought process. I don't like the d-strings that I proposed. I'd prefer syntax closer to Eiffel, but the above is the line of thought I was following to arrive at d-strings.

On Tue, 27 Nov 2018 22:47:06 -0600 Abe Dillon <abedillon@gmail.com> wrote:
I think utopia is the word here. Fuzz testing can be useful, but it's not a replacement for manual testing of carefully selected values. Also, the idea that fuzz testing will automatically find edge cases in your code is idealistic. It depends on the algorithm you've implemented and the distribution of values chosen by the tester. Showcasing trivially wrong examples (such as an addition function that always returns 0, or a tail function that doesn't return the tail) isn't very helpful for a real-world analysis, IMHO. In the end, you have to be rigorous when writing tests, and for most non-trivial functions it requires that you devise the distribution of input values depending on the implemented algorithm, not leave that distribution to a third-party library that knows nothing about your program. Regards Antoine.

Indeed. But the great thing about the "hypothesis" tool is that it allows me to somewhat automate the generation of sets of input values based on my specific requirements derived from my knowledge of my program. It allows me to think about what is the reasonable distribution of values for each argument in a function by either using existing strategies, using their arguments, combining and extending them, and them letting the tool do the grunt work of running the test for lots of different equivalent classes of argument values. I think that as long as the tool user keeps what you said in mind and uses the tool accordingly it can be a great helper, and probably even force the average programmer to think more rigorously about the input values to be tested, not to mention the whole class of trivial mistakes and forgetfulness we are all bound to be subject when writing test cases. Best, Em qua, 28 de nov de 2018 às 12:18, Antoine Pitrou <solipsis@pitrou.net> escreveu:
-- Marcos Eliziário Santos mobile/whatsapp/telegram: +55(21) 9-8027-0156 skype: marcos.eliziario@gmail.com linked-in : https://www.linkedin.com/in/eliziario/

[Antoinw Pitrou]
I think utopia is the word here. Fuzz testing can be useful, but it's not a replacement for manual testing of carefully selected values.
First, they aren't mutually exclusive. It's trivial to add manually selected cases to a hypothesis test. Second, from my experience; people rarely choose between carefully selected optimal values and Fuzz testing, they usually choose between manually selected trivial values or no test at all. Thirdly, Computers are very good at exhaustively searching multidimensional spaces. If your tool sucks so bad at that that a human can do it better, then your tool needs work. Improving the tool saves way more time than reverting to manual testing. There was a post long ago (I think I read it on Digg.com for some indication) about how to run a cloud-based system correctly. One of the controversial practice the article advocated was disabling ssh on the machine instances. The rational is that you never want to waste your time fiddling with an instance that's not behaving properly. In cloud-systems, instances should not be special. If they fail, blow them away and bring up another. If the failure persists, it's a problem with the *system* not the instance. If you care about individual instances YOU'RE DOING IT WRONG. You need to re-design the system. On Wed, Nov 28, 2018 at 8:19 AM Antoine Pitrou <solipsis@pitrou.net> wrote:

On Wed, 28 Nov 2018 15:58:24 -0600 Abe Dillon <abedillon@gmail.com> wrote:
Thirdly, Computers are very good at exhaustively searching multidimensional spaces.
How long do you think it will take your computer to exhaustively search the space of possible input values to a 2-integer addition function? Do you think it can finish before the Earth gets engulfed by the Sun? Regards Antoine.

[Antoine Pitrou]
Yes, ok. I used the word "exhaustively" wrong. Sorry about that. I don't think humans are made of a magical substance that can exhaustively search the space of possible pairs of integers before the heat-death of the universe. I think humans use strategies based, hopefully; in logic to come up with test examples, and that it's often more valuable to capture said strategies in code than to make a human run the algorithms. In cases where domain-knowledge helps inform the search strategy, there should be easy-to-use tools to build a domain-specific search strategy. On Wed, Nov 28, 2018 at 4:09 PM Antoine Pitrou <solipsis@pitrou.net> wrote:

That's easy, Antoine. On a reasonable modern multi-core workstation, I can do 4 billion additions per second. A year is just over 30 million seconds. For 32-bit ints, I can whiz through the task in only 130,000 years. We have at least several hundred million years before the sun engulfs us. On Wed, Nov 28, 2018, 5:09 PM Antoine Pitrou <solipsis@pitrou.net wrote:

On Thu, Nov 29, 2018 at 10:25 AM David Mertz <mertz@gnosis.cx> wrote:
That's easy, Antoine. On a reasonable modern multi-core workstation, I can do 4 billion additions per second. A year is just over 30 million seconds. For 32-bit ints, I can whiz through the task in only 130,000 years. We have at least several hundred million years before the sun engulfs us.
Python ints are not 32-bit ints. Have fun. :) ChrisA

But Python integers are variable-sized, and their size is basically limited by available memory or address space. Let's take a typical 64-bit Python build, assuming 4 GB RAM available. Let's also assume that 90% of those 4 GB can be readily allocated for Python objects (there's overhead, etc.). Also let's take a look at the Python integer representation:
sys.int_info sys.int_info(bits_per_digit=30, sizeof_digit=4)
This means that every 4 bytes of integer object store 30 bit of actual integer data. So, how many bits has the largest allocatable integer on that system, assuming 90% of 4 GB are available for allocation?
Now how many possible integers are there in that number of bits?
(yes, that number was successfully allocated in full. And the Python process occupies 3.7 GB RAM at that point, which validates the estimate.) Let's try to have a readable approximation of that number. Convert it to a float perhaps?
Well, of course. So let's just extract a power of 10:
(yes, math.log10() works on non-float-convertible integers. I'm impressed!) So the number of representable integers on that system is approximately 6.6e8727169408. Let's hope the Sun takes its time. (and of course, what is true for ints is true for any variable-sized input, such as strings, lists, dicts, sets, etc.) Regards Antoine. Le 29/11/2018 à 00:24, David Mertz a écrit :

But nobody is talking about exhausting the combinatoric space of all possible values. Property Based Testing looks like Fuzzy Testing but it is not quite the same thing. Property based testing is not about just generating random values till the heath death of the universe, but generating sensible values in a configurable way to cover all equivalence classes we can think of. if my function takes two floating point numbers as arguments, hypothesis "strategies" won't try all possible combinations of all possible floating point values, but instead all possible combination of interesting values (NaN, Infinity, too big, too small, positive, negative, zero, None, decimal fractions, etc..), something that an experienced programmer probably would end up doing by himself with a lot of test cases, but that can be better done with less effort by the automation provided by the hypothesis package. It could be well that just by using such a tool, a naive programmer could end up being convinced of the fact that maybe he probably would better be served by sticking to Decimal Arithmetics :-) Em qua, 28 de nov de 2018 às 21:43, Antoine Pitrou <antoine@python.org> escreveu:
-- Marcos Eliziário Santos mobile/whatsapp/telegram: +55(21) 9-8027-0156 skype: marcos.eliziario@gmail.com linked-in : https://www.linkedin.com/in/eliziario/

Hi, Property based testing is not about just generating random values till the
Exactly. A tool can go a step further and, based on the assertions and contracts, generate the tests automatically or prove that certain properties of the program always hold. I would encourage people interested in automatic testing to have a look at the scientific literature on the topic (formal static analysis). Abstract interpretation has been already mentioned: https://en.wikipedia.org/wiki/Abstract_interpretation. For some bleeding edge, have a look what they do at this lab with the machine learning: https://eth-sri.github.io/publications/

On Wed, 28 Nov 2018 23:22:20 -0200 Marcos Eliziario <marcos.eliziario@gmail.com> wrote:
Well, the OP did talk about "exhaustively searching the multidimensional space". But I agree mere sampling is useful. I might give hypothesis a try someday. Usually I prefer hand-rolling my own stress testing routines. Regards Antoine.

I was assuming it was a Numba-ized function since it's purely numeric. ;-) FWIW, the theoretical limit of Python ints is limited by the fact 'int.bit_length()' is a platform native int. So my system cannot store ints larger than (2**(2**63-1)). It'll take a lot more memory than my measly 4GiB to store that number though. So yes, that's way longer that heat-death-of-universe even before 128-bit machines are widespread. On Wed, Nov 28, 2018, 6:43 PM Antoine Pitrou <antoine@python.org wrote:

OK. I know I made a mistake by saying, "computers are very good at *exhaustively* searching multidimensional spaces." I should have said, "computers are very good at enumerating examples from multi-dimensional spaces" or something to that effect. Now that we've had our fun, can you guys please continue in a forked conversation so it doesn't derail the conversation? On Wed, Nov 28, 2018 at 7:47 PM David Mertz <mertz@gnosis.cx> wrote:

One thought I had pertains to a very narrow sub-set of cases, but may provide a starting point. For the cases where a precondition, invariant, or postcondition only involves a single parameter, attribute, or the return value (respectively) and it's reasonably simple, one could write it as an expression acting directly on the type annotation: def encabulate( reactive_inductance: 1 >= float > 0, # description capacitive_diractance: int > 1, # description delta_winding: bool # description ) -> len(Set[DingleArm]) > 0: # ??? I don't know how you would handle more complex objects... do_stuff with_things .... Anyway. Just more food for thought... On Tue, Nov 27, 2018 at 10:47 PM Abe Dillon <abedillon@gmail.com> wrote:

I wrote a lib specially for the case of validator that would also override the documentation : default is if name of function +args speaks by it itself then only this is added to the docstring ex: @require_odd_numbers() => it would add require_odd_numbers at the end of __doc__ and the possibilitly to add template of doc strings. https://github.com/jul/check_arg

Hi Abe, I've been pulling a lot of ideas from the recent discussion on design by
Have you looked at the recent discussions regarding design-by-contract on this list ( https://groups.google.com/forum/m/#!topic/python-ideas/JtMgpSyODTU and the following forked threads)? You might want to have a look at static checking techniques such as abstract interpretation. I hope to be able to work on such a tool for Python in some two years from now. We can stay in touch if you are interested. Re decorators: to my own surprise, using decorators in a larger code base is completely practical including the readability and maintenance of the code. It's neither that ugly nor problematic as it might seem at first look. We use our https://github.com/Parquery/icontract at the company. Most of the design choices come from practical issues we faced -- so you might want to read the doc even if you don't plant to use the library. Some of the aspects we still haven't figured out are: how to approach multi-threading (locking around the whole function with an additional decorator?) and granularity of contract switches (right now we use always/optimized, production/non-optimized and teating/slow, but it seems that a larger system requires finer categories). Cheers Marko

[Marko Ristin-Kaufmann]
Have you looked at the recent discussions regarding design-by-contract on this list
I tried to read through them all before posting, but I may have missed some of the forks. There was a lot of good discussion! [Marko Ristin-Kaufmann]
I'll look into that! I'm very interested! [Marko Ristin-Kaufmann]
Interesting. In the thread you linked on DBC, it seemed like Steve D'Aprano and David Mertz (and possibly others) were put off by the verbosity and noisiness of the decorator-based solution you provided with icontract (though I think there are ways to streamline that solution). It seems like syntactic support could offer a more concise and less noisy implementation. One thing that I can get on a soap-box about is the benefit putting the most relevant information to the reader in the order of top to bottom and left to right whenever possible. I've written many posts about this. I think a lot of Python syntax gets this right. It would have been easy to follow the same order as for-loops when designing comprehensions, but expressions allow you some freedom to order things differently, so now comprehensions read: squares = ... # squares is squares = [... # squares is a list squares = [number*number... # squares is a list of num squared squares = [number*number for num in numbers] # squares is a list of num squared 'from' numbers I think decorators sort-of break this rule because they can put a lot of less important information (like, that a function is logged or timed) before more important information (like the function's name, signature, doc-string, etc...). It's not a huge deal because they tend to be de-emphasized by my IDE and there typically aren't dozens of them on each function, but I definitely prefer Eiffel's syntax <https://www.eiffel.com/values/design-by-contract/introduction/> over decorators for that reason. I understand that syntax changes have an very high bar for very good reasons. Hillel Wayne's PyCon talk got me thinking that we might be close enough to a really great solution to a wide variety of testing problems that it might justify some new syntax or perhaps someone has an idea that wouldn't require new syntax that I didn't think of. [Marko Ristin-Kaufmann]
Yeah... I don't know anything about testing concurrent or parallel code. On Wed, Nov 28, 2018 at 1:12 AM Marko Ristin-Kaufmann < marko.ristin@gmail.com> wrote:

Btw, it would be relatively easy to create a parser for Python. Python doesn't have any crazy grammar constructs like the lexer hack <https://en.wikipedia.org/wiki/The_lexer_hack> AFAIK. I'm imagining using Bison: 1. convert python's grammar ( https://github.com/python/cpython/blob/master/Lib/lib2to3/Grammar.txt) to Bison format. 2. write a lexer to parse tokens and convert indentation to indent/dedent tokens. 3. extend the grammar however you want it. Call these custom AST nodes "contract nodes." 4. create a simple AST, really an annotated parse tree. I think we can use a simple one that's a bunch of nested lists: ["for_stmt", "for i in range(10):", [ ["exprlist", "i", [ ... ]], ["testlist", "range(10)", [ ... ]] ]] # ["node_type", "<source code>", <grammar nodes contained inside the for stmt>] The AST can be made more detailed on an as-needed basis. 5. traverse the AST, and "rewrite" the the AST by pasting traditional python AST nodes where contract nodes are. This example from the Babel handbook may help if you have trouble understanding what this step means. https://github.com/jamiebuilds/babel-handbook/blob/master/translations/en/pl... 6. turn the AST back into python source. Since we're storing the source code from the beginning, this should be fairly easy. (Bison lets your lexer tell the parser the line and column numbers of each token.) --- I made a joke language with Bison once, it's really flexible and well-suited for this kind of task. This 6-step p Tip: I found Bison's C++ mode too complicated, so I used it in C mode with the C++ Standard Library and C++ references enabled. --- I'm interested, what contract-related functionality do you think Python's existing syntax is inadequate for? You could look into using with statements and a python program that takes the AST and snips contract-related with statements to produce optimized code, though I suppose that's one step below the custom-parser method. On Wed, Nov 28, 2018 at 3:29 PM Abe Dillon <abedillon@gmail.com> wrote:

Marko, I have a few thoughts that might improve icontract. First, multiple clauses per decorator: @pre( *lambda* x: x >= 0, *lambda* y: y >= 0, *lambda* width: width >= 0, *lambda* height: height >= 0, *lambda* x, width, img: x + width <= width_of(img), *lambda* y, height, img: y + height <= height_of(img)) @post( *lambda* self: (self.x, self.y) in self, *lambda* self: (self.x+self.width-1, self.y+self.height-1) in self, *lambda* self: (self.x+self.width, self.y+self.height) not in self) *def* __init__(self, img: np.ndarray, x: int, y: int, width: int, height: int) -> None: self.img = img[y : y+height, x : x+width].copy() self.x = x self.y = y self.width = width self.height = height *def* __contains__(self, pt: Tuple[int, int]) -> bool: x, y = pt return (self.x <= x < self.x + self.width) and (self.y <= y < self.y + self.height) You might be able to get away with some magic by decorating a method just to flag it as using contracts: @contract # <- does byte-code and/or AST voodoo *def* __init__(self, img: np.ndarray, x: int, y: int, width: int, height: int) -> None: pre(x >= 0, y >= 0, width >= 0, height >= 0, x + width <= width_of(img), y + height <= height_of(img)) # this would probably be declared at the class level inv(*lambda* self: (self.x, self.y) in self, *lambda* self: (self.x+self.width-1, self.y+self.height-1) in self, *lambda* self: (self.x+self.width, self.y+self.height) not in self) self.img = img[y : y+height, x : x+width].copy() self.x = x self.y = y self.width = width self.height = height That might be super tricky to implement, but it saves you some lambda noise. Also, I saw a forked thread in which you were considering some sort of transpiler with similar syntax to the above example. That also works. Another thing to consider is that the role of descriptors <https://www.smallsurething.com/python-descriptors-made-simple/> overlaps some with the role of invariants. I don't know what to do with that knowledge, but it seems like it might be useful. Anyway, I hope those half-baked thoughts have *some* value... On Wed, Nov 28, 2018 at 1:12 AM Marko Ristin-Kaufmann < marko.ristin@gmail.com> wrote:

Hi Abe, Thanks for your suggestions! We actually already considered the two alternatives you propose. *Multiple predicates per decorator. *The problem is that you can not deal with toggling/describing individual contracts easily. While you can hack your way through it (considering the arguments in the sequence, for example), we found it clearer to have separate decorators. Moreover, tracebacks are much easier to read, which is important when you debug a program. *AST magic. *The problem with any approach based on parsing (be it parsing the code or the description) is that parsing is slow so you end up spending a lot of cycles on contracts which might not be enabled (many contracts are applied only in the testing environment, not int he production). Hence you must have an approach that offers practically zero overhead cost to importing a module when its contracts are turned off. Decoding byte-code does not work as current decoding libraries can not keep up with the changes in the language and the compiler hence they are always lagging behind. *Practicality of decorators. *We have retrospective meetings at the company and I frequently survey the opinions related to the contracts (explicitly asking about the readability and maintainability) -- so far nobody had any difficulties and nobody was bothered by the noisy syntax. The decorator syntax is simply not beautiful, no discussion about that. But when it comes to maintenance, there's a linter included ( https://github.com/Parquery/pyicontract-lint), and if you want contracts rendered in an appealing way, there's a documentation tool for sphinx ( https://github.com/Parquery/sphinx-icontract). The linter facilitates the maintainability a lot and sphinx tool gives you nice documentation for a library so that you don't even have to look into the source code that often if you don't want to. We need to be careful not to mistake issues of aesthetics for practical issues. Something might not be beautiful, but can be useful unless it's unreadable. *Conclusion. *What we do need at this moment, IMO, is a broad practical experience of using contracts in Python. Once you make a change to the language, it's impossible to undo. In contrast to what has been suggested in the previous discussions (including my own voiced opinions), I actually now don't think that introducing a language change would be beneficial *at this precise moment*. We don't know what the use cases are, and there is no practical experience to base the language change on. I'd prefer to hear from people who actually use contracts in their professional Python programming -- apart from the noisy syntax, how was the experience? Did it help you catch bugs (and how many)? Were there big problems with maintainability? Could you easily refactor? What were the limits of the contracts you encountered? What kind of snapshot mechanism do we need? How did you deal with multi-threading? And so on. icontract library is already practically usable and, if you don't use inheritance, dpcontracts is usable as well. I would encourage everybody to try out programming with contracts using an existing library and just hold their nose when writing the noisy syntax. Once we unearthed deeper problems related to contracts, I think it will be much easier and much more convincing to write a proposal for introducing contracts in the core language. If I had to write a proposal right now, it would be only based on the experience of writing a humble 100K code base by a team of 5-10 people. Not very convincing. Cheers, Marko On Thu, 29 Nov 2018 at 02:26, Abe Dillon <abedillon@gmail.com> wrote:

[Marko Ristin-Kaufmann]
I agree. That's why I prefaced this topic with [Brainstorm]. I want to explore the solution space to this problem and discuss some of the pros and cons of different ideas, *not* proceed straight to action. I also wanted to bring three thoughts to the table: 1. Fuzz testing and stateful testing like that provided by hypothesis might work together with contracts in an interesting way. 2. Tying tests/contracts to the bits of documentation that they validate is a great way to keep documentation in sync with code, but doctest does it a bit "backwards". Like in icontract-sphinx (or even this) it's better to construct documentation (partially) from test code than to write test code within documentation. In general, I find the relationship between documentation, testing, and type-checking interesting. The problems they each address seem to overlap quite a bit. 3. There seems like a lot of opportunity for the re-use of contracts, so maybe we should consider a mechanism to facilitate that. [Marko Ristin-Kaufmann]
That's a good point. I would argue that the concept of contracts isn't new, so there should be at least a few cases that we can draw on where others have tread before us (which you've obviously done to a large degree). That's not to belittle the work you've done on icontracts. It's a great tool for the reasons you describe. [Marko Ristin-Kaufmann]
I suppose it may be difficult to implement a clean, *backwards-compatible* solution, but yes; going through the arguments in a sequence would be my naive solution. Each entry has an optional description, a callable, and an optional tag or level to enable toggling (I would follow a simple model such as logging levels) *in that order*. It makes sense that the text description come first because that's the most relevant to a reader (like a doc-string), then the corresponding code, then the toggling flag which will often be an optimization detail which generally fall behind code correctness in priority. It may be less straight-forward to parse, but I wouldn't call it a "hack". I guess I'm not sure what to say about tracebacks being hard to read. [Marko Ristin-Kaufmann]
That's fair enough. I think the implementation you've come up with is pretty close to optimally concise given the tools at your disposal. I think something like Eiffel is a good goal for Python to eventually shoot for, but without new syntax; each step between icontracts and an Eiffel-esque platonic ideal would require significant hackery with diminishing returns on investment. On Thu, Nov 29, 2018 at 1:05 AM Marko Ristin-Kaufmann < marko.ristin@gmail.com> wrote:

Hi Abe, I agree. That's why I prefaced this topic with [Brainstorm]. I want to
explore the solution space to this problem and discuss some of the pros and cons of different ideas, *not* proceed straight to action.
You are right. Please apologize, I was so primed by the discussions we had in October 2019 that I didn't pay enough attention to "branstorm" in the subject. Fuzz testing and stateful testing like that provided by hypothesis might
work together with contracts in an interesting way.
You might want to look at the literature on automatic test generation. A possible entry point could be: https://www.research-collection.ethz.ch/handle/20.500.11850/69581 If I had time available, I would start with a tool that analyses a given module and automatically generates code for the Hypothesis test cases. The tool needs to select functions which accept primitive data types and for each one of them translates their contracts into Hypothesis code. If contracts are not trivially translatable to Hypothesis, the function is ignored. For readability and speed of development (of the code under test, not of the tool), I would prefer this tool *not *to be dynamic so that the developer herself needs to re-run it if the function signatures changed. The ingredients for such a tool are all there with icontract (similar to sphinx-icontract, you import the module and analyze its functions; you can copy/past parts of sphinx-icontract implementation for parsing and listing the AST of the contracts). (If you'd like to continue discussing this topic, let's create an issue on icontract github page or switch to private correspondence in order not to spam this mail list). There seems like a lot of opportunity for the re-use of contracts, so maybe
we should consider a mechanism to facilitate that.
This was the case for the requests library. @James Lu <jamtlu@gmail.com> was looking into it -- a lot of functions had very similar contracts. However, in our code base at the company (including the open-sourced libraries), there was not a single case where we thought that contracts re-use would be beneficial. Either it would have hurt the readability and introduce unnecessary couplings (when the contracts were trivial) or it made sense to encapsulate more complex contracts in a separate function.
I found that to be too error-prone in a larger code base, but that is my very subjective opinion. Maybe you could make an example? but without new syntax; each step between icontracts and an Eiffel-esque
platonic ideal would require significant hackery with diminishing returns on investment.
I agree. There are also issues with core python interpreter which I expect to remain open for a long time (see the issues related to retrieving code text of lambda functions and decorators and tweaking dynamically the behavior of help(.) for functions). Cheers, Marko

On Tue, Nov 27, 2018 at 10:47:06PM -0600, Abe Dillon wrote:
You should look at the state of the art in Design By Contract. In Eiffel, DBC is integrated in the language: https://www.eiffel.com/values/design-by-contract/introduction/ https://www.eiffel.org/doc/eiffel/ET-_Design_by_Contract_%28tm%29%2C_Asserti... Eiffel uses a rather Pythonic block structure to define invariants. The syntax is not identical to Python's (Eiffel eschews the colons) but it also comes close to executable pseudo-code. trust this syntax requires little explanation: require ... preconditions, tested on function entry do ... body of the function ensure ... postconditions, tested on function exit end There is a similar invariant block for classes. Cobra is a language which intentionally modeled its syntax on Python. It too has contracts integrated with the language: http://cobra-language.com/how-to/DeclareContracts/ http://cobra-language.com/trac/cobra/wiki/Contracts -- Steve

[Steven D'Aprano]
Thank you! I forgot to mention this (or look into how other languages solve this problem). I saw your example syntax in the recent DBC main thread and liked it a lot. One thought I keep coming back to is this comparison between doc string formats <https://bwanamarko.alwaysdata.net/napoleon/format_exception.html>. It seems obvious that the "Sphynxy" style is the noisiest, most verbose, and ugliest format. Instead of putting ":arg ...:" and ":type ...:" for each parameter and the return value, it makes much more sense to open up an Args: section and use a concise notation for type. The decorator-based pre and post conditions seem like they suffer from the same redundant, noisy, verbosity problem as the Sphynxy docstring format but makes it worse by put all that noise before the function declaration itself. It makes sense to me that a docstring might have a markdown-style syntax like def format_exception(etype, value): """ Format the exception with a traceback. Args: etype (str): what etype represents [some constraint on etype](precondition) [another constraint on etype](in_line_precondition?) value (int): what value represents [some constraint on value](precondition) [some constraints across multiple params](precondition) Returns: What the return value represents # usually very similar to the description at the top [some constraint on return](postcondition) """ ... That ties most bits of the documentation to some code that enforces the correctness of the documentation. And if it's a little noisy, we could take another page from markdown's book and offer alternate ways to reference precondition and postcondition logic. I'm worried that such a style would carry a lot of the same drawbacks as doctest <https://bemusement.org/doctests-arent-code> Also, my sense of coding style has been heavily influenced by [this talk]( https://vimeo.com/74316116), particularly the part where he shoves a mangled Hamlet Soliloquy into the margins, so now many of my functions adopt the following style: def someDescriptiveName( arg1: SomeType, arg2: AnotherType[Thing], ... argN: SomeOtherType = default_value) -> ReturnType: """ what the function does Args: arg1: what arg1 represents arg2: what arg2 represents ... """ ... This highlights a rather obvious duplication of code. We declare an arguments section in code and list all the arguments, then we do so again in the doc string. If you want your doc string to stay in sync with the code, this duplication is a problem. It makes more sense to tie the documentation for an argument to said argument: def someDescriptiveName( # what the function does arg1: SomeType, # what arg1 represents arg2: AnotherType[Thing], # what arg2 represents ... argN: SomeOtherType = default_value # what argN represents ) -> ReturnType: # what the return value represents ... I think it especially makes sense if you consider the preconditions, postconditions, and invariants as a sort-of extension of typing in the sense that it Typing narrows the set of acceptable values to a set of types and contracts restrict that set further. I hope that clarifies my thought process. I don't like the d-strings that I proposed. I'd prefer syntax closer to Eiffel, but the above is the line of thought I was following to arrive at d-strings.

On Tue, 27 Nov 2018 22:47:06 -0600 Abe Dillon <abedillon@gmail.com> wrote:
I think utopia is the word here. Fuzz testing can be useful, but it's not a replacement for manual testing of carefully selected values. Also, the idea that fuzz testing will automatically find edge cases in your code is idealistic. It depends on the algorithm you've implemented and the distribution of values chosen by the tester. Showcasing trivially wrong examples (such as an addition function that always returns 0, or a tail function that doesn't return the tail) isn't very helpful for a real-world analysis, IMHO. In the end, you have to be rigorous when writing tests, and for most non-trivial functions it requires that you devise the distribution of input values depending on the implemented algorithm, not leave that distribution to a third-party library that knows nothing about your program. Regards Antoine.

Indeed. But the great thing about the "hypothesis" tool is that it allows me to somewhat automate the generation of sets of input values based on my specific requirements derived from my knowledge of my program. It allows me to think about what is the reasonable distribution of values for each argument in a function by either using existing strategies, using their arguments, combining and extending them, and them letting the tool do the grunt work of running the test for lots of different equivalent classes of argument values. I think that as long as the tool user keeps what you said in mind and uses the tool accordingly it can be a great helper, and probably even force the average programmer to think more rigorously about the input values to be tested, not to mention the whole class of trivial mistakes and forgetfulness we are all bound to be subject when writing test cases. Best, Em qua, 28 de nov de 2018 às 12:18, Antoine Pitrou <solipsis@pitrou.net> escreveu:
-- Marcos Eliziário Santos mobile/whatsapp/telegram: +55(21) 9-8027-0156 skype: marcos.eliziario@gmail.com linked-in : https://www.linkedin.com/in/eliziario/

[Antoinw Pitrou]
I think utopia is the word here. Fuzz testing can be useful, but it's not a replacement for manual testing of carefully selected values.
First, they aren't mutually exclusive. It's trivial to add manually selected cases to a hypothesis test. Second, from my experience; people rarely choose between carefully selected optimal values and Fuzz testing, they usually choose between manually selected trivial values or no test at all. Thirdly, Computers are very good at exhaustively searching multidimensional spaces. If your tool sucks so bad at that that a human can do it better, then your tool needs work. Improving the tool saves way more time than reverting to manual testing. There was a post long ago (I think I read it on Digg.com for some indication) about how to run a cloud-based system correctly. One of the controversial practice the article advocated was disabling ssh on the machine instances. The rational is that you never want to waste your time fiddling with an instance that's not behaving properly. In cloud-systems, instances should not be special. If they fail, blow them away and bring up another. If the failure persists, it's a problem with the *system* not the instance. If you care about individual instances YOU'RE DOING IT WRONG. You need to re-design the system. On Wed, Nov 28, 2018 at 8:19 AM Antoine Pitrou <solipsis@pitrou.net> wrote:

On Wed, 28 Nov 2018 15:58:24 -0600 Abe Dillon <abedillon@gmail.com> wrote:
Thirdly, Computers are very good at exhaustively searching multidimensional spaces.
How long do you think it will take your computer to exhaustively search the space of possible input values to a 2-integer addition function? Do you think it can finish before the Earth gets engulfed by the Sun? Regards Antoine.

[Antoine Pitrou]
Yes, ok. I used the word "exhaustively" wrong. Sorry about that. I don't think humans are made of a magical substance that can exhaustively search the space of possible pairs of integers before the heat-death of the universe. I think humans use strategies based, hopefully; in logic to come up with test examples, and that it's often more valuable to capture said strategies in code than to make a human run the algorithms. In cases where domain-knowledge helps inform the search strategy, there should be easy-to-use tools to build a domain-specific search strategy. On Wed, Nov 28, 2018 at 4:09 PM Antoine Pitrou <solipsis@pitrou.net> wrote:

That's easy, Antoine. On a reasonable modern multi-core workstation, I can do 4 billion additions per second. A year is just over 30 million seconds. For 32-bit ints, I can whiz through the task in only 130,000 years. We have at least several hundred million years before the sun engulfs us. On Wed, Nov 28, 2018, 5:09 PM Antoine Pitrou <solipsis@pitrou.net wrote:

On Thu, Nov 29, 2018 at 10:25 AM David Mertz <mertz@gnosis.cx> wrote:
That's easy, Antoine. On a reasonable modern multi-core workstation, I can do 4 billion additions per second. A year is just over 30 million seconds. For 32-bit ints, I can whiz through the task in only 130,000 years. We have at least several hundred million years before the sun engulfs us.
Python ints are not 32-bit ints. Have fun. :) ChrisA

But Python integers are variable-sized, and their size is basically limited by available memory or address space. Let's take a typical 64-bit Python build, assuming 4 GB RAM available. Let's also assume that 90% of those 4 GB can be readily allocated for Python objects (there's overhead, etc.). Also let's take a look at the Python integer representation:
sys.int_info sys.int_info(bits_per_digit=30, sizeof_digit=4)
This means that every 4 bytes of integer object store 30 bit of actual integer data. So, how many bits has the largest allocatable integer on that system, assuming 90% of 4 GB are available for allocation?
Now how many possible integers are there in that number of bits?
(yes, that number was successfully allocated in full. And the Python process occupies 3.7 GB RAM at that point, which validates the estimate.) Let's try to have a readable approximation of that number. Convert it to a float perhaps?
Well, of course. So let's just extract a power of 10:
(yes, math.log10() works on non-float-convertible integers. I'm impressed!) So the number of representable integers on that system is approximately 6.6e8727169408. Let's hope the Sun takes its time. (and of course, what is true for ints is true for any variable-sized input, such as strings, lists, dicts, sets, etc.) Regards Antoine. Le 29/11/2018 à 00:24, David Mertz a écrit :

But nobody is talking about exhausting the combinatoric space of all possible values. Property Based Testing looks like Fuzzy Testing but it is not quite the same thing. Property based testing is not about just generating random values till the heath death of the universe, but generating sensible values in a configurable way to cover all equivalence classes we can think of. if my function takes two floating point numbers as arguments, hypothesis "strategies" won't try all possible combinations of all possible floating point values, but instead all possible combination of interesting values (NaN, Infinity, too big, too small, positive, negative, zero, None, decimal fractions, etc..), something that an experienced programmer probably would end up doing by himself with a lot of test cases, but that can be better done with less effort by the automation provided by the hypothesis package. It could be well that just by using such a tool, a naive programmer could end up being convinced of the fact that maybe he probably would better be served by sticking to Decimal Arithmetics :-) Em qua, 28 de nov de 2018 às 21:43, Antoine Pitrou <antoine@python.org> escreveu:
-- Marcos Eliziário Santos mobile/whatsapp/telegram: +55(21) 9-8027-0156 skype: marcos.eliziario@gmail.com linked-in : https://www.linkedin.com/in/eliziario/

Hi, Property based testing is not about just generating random values till the
Exactly. A tool can go a step further and, based on the assertions and contracts, generate the tests automatically or prove that certain properties of the program always hold. I would encourage people interested in automatic testing to have a look at the scientific literature on the topic (formal static analysis). Abstract interpretation has been already mentioned: https://en.wikipedia.org/wiki/Abstract_interpretation. For some bleeding edge, have a look what they do at this lab with the machine learning: https://eth-sri.github.io/publications/

On Wed, 28 Nov 2018 23:22:20 -0200 Marcos Eliziario <marcos.eliziario@gmail.com> wrote:
Well, the OP did talk about "exhaustively searching the multidimensional space". But I agree mere sampling is useful. I might give hypothesis a try someday. Usually I prefer hand-rolling my own stress testing routines. Regards Antoine.

I was assuming it was a Numba-ized function since it's purely numeric. ;-) FWIW, the theoretical limit of Python ints is limited by the fact 'int.bit_length()' is a platform native int. So my system cannot store ints larger than (2**(2**63-1)). It'll take a lot more memory than my measly 4GiB to store that number though. So yes, that's way longer that heat-death-of-universe even before 128-bit machines are widespread. On Wed, Nov 28, 2018, 6:43 PM Antoine Pitrou <antoine@python.org wrote:

OK. I know I made a mistake by saying, "computers are very good at *exhaustively* searching multidimensional spaces." I should have said, "computers are very good at enumerating examples from multi-dimensional spaces" or something to that effect. Now that we've had our fun, can you guys please continue in a forked conversation so it doesn't derail the conversation? On Wed, Nov 28, 2018 at 7:47 PM David Mertz <mertz@gnosis.cx> wrote:

One thought I had pertains to a very narrow sub-set of cases, but may provide a starting point. For the cases where a precondition, invariant, or postcondition only involves a single parameter, attribute, or the return value (respectively) and it's reasonably simple, one could write it as an expression acting directly on the type annotation: def encabulate( reactive_inductance: 1 >= float > 0, # description capacitive_diractance: int > 1, # description delta_winding: bool # description ) -> len(Set[DingleArm]) > 0: # ??? I don't know how you would handle more complex objects... do_stuff with_things .... Anyway. Just more food for thought... On Tue, Nov 27, 2018 at 10:47 PM Abe Dillon <abedillon@gmail.com> wrote:

I wrote a lib specially for the case of validator that would also override the documentation : default is if name of function +args speaks by it itself then only this is added to the docstring ex: @require_odd_numbers() => it would add require_odd_numbers at the end of __doc__ and the possibilitly to add template of doc strings. https://github.com/jul/check_arg
participants (10)
-
Abe Dillon
-
Antoine Pitrou
-
Antoine Pitrou
-
Chris Angelico
-
David Mertz
-
James Lu
-
julien tayon
-
Marcos Eliziario
-
Marko Ristin-Kaufmann
-
Steven D'Aprano