
On Aug 26, 2022, at 7:52 AM, Jean-Paul Calderone <exarkun@twistedmatrix.com> wrote:
I think property testing is a valuable tool in the testing toolbox and Twisted should consider adopting Hypothesis to employ it for some tests.
Thoughts?
I am somewhere in the neighborhood of +0 on this. In principle, I love hypothesis and I feel like it can potentially reveal a lot of correctness issues, particularly in things like protocol parsing (hey, that's a thing we do!). In practice, my one experience using it in depth is with Hyperlink and Klein, where the main effect of its deployment seems to have been to have uninteresting test strategy bugs provoking spurious flaky test failures downstream. Here's the Klein issue: https://github.com/twisted/klein/issues/561 <https://github.com/twisted/klein/issues/561> One interesting detail of this particular bug is that Hypothesis has spotted a real issue here, but not in the right way: in its public "generate some valid URL representations for testing" strategy, it has instead accidentally produced invalid garbage that Hyperlink should arguably deal with somehow but which it should never produce as output because it's garbage. It also flagged the issue in the wrong project, because we saw the failure in Klein and not in Hyperlink itself. I didn't do this integration, so I may have missed its value. Perhaps on our way to this point, it revealed and fixed several important parsing bugs in Hyperlink or data-handling issues in Klein, but I'm not aware of any. So the devil's in the details here. I don't know what configuration is required to achieve this, but Hypothesis should be able to generate new bug reports, but not novel blocking failures on other (presumably unrelated) PRs. It seems like everyone agrees this is how it's supposed to be used, but in practice nobody bothers to figure out all the relevant configuration details and it becomes a source of never-ending suite flakiness that interrupts other work. I haven't fully read through all of these, but a quick search on Github reveals thousands and thousands of issues like these, where others projects' test suites take a reliability hit due to Hypothesis's stochastic nature: https://github.com/anachronauts/jeff65/issues/41 <https://github.com/anachronauts/jeff65/issues/41> https://github.com/Scille/parsec-cloud/issues/2864 <https://github.com/Scille/parsec-cloud/issues/2864> https://github.com/Scille/parsec-cloud/issues/2867 <https://github.com/Scille/parsec-cloud/issues/2867> https://github.com/astropy/astropy/issues/10724 <https://github.com/astropy/astropy/issues/10724> https://github.com/pytorch/pytorch/issues/9832 <https://github.com/pytorch/pytorch/issues/9832> And I don't even know how to properly set up Hypothesis so this doesn't happen. So, in summary: I'd love to see Hypothesis help us discover and fix bugs but if it is a choice between occasionally helping us find those bugs but constantly blocking important work with difficult-to-debug failures in unrelated parsing code that is now covered by hypothesis, or "nothing", I'd go with "nothing". -g