On Mon, Jun 27, 2022 at 9:16 PM DavidKorczynski <david@adalogics.com> wrote:
Thanks for the detailed insights Zac!

Thanks indeed, this was very helpful.
 
Numpy maintainers, are you interested in trying out OSS-Fuzz? The only
thing needed is some maintainer email for receiving issues and then I
can get things moving.

Based on your descriptions and Zac's recommendation, I think it is worth trying this. If you need a personal email, please use mine (ralf.gommers@gmail.com). Or if you can accept a private group list that several maintainers look at, please use numpy-team@googlegroups.com

Thanks for doing this David.

Cheers,
Ralf

 

On 10/06/2022 04:45, Zac Hatfield-Dodds wrote:
> As a maintainer of Hypothesis and sometime-fuzzing-researcher, hopefully sharing my perspective might help.
>
> Firstly, fuzzing and property-based testing are clearly related fields!  Personally I tend to divide them more by the UX than underlying tool: PBT tends to be quick (seconds), done by developers, look like unit tests, check semantics.  Fuzzing tends to run for much longer (hours to weeks), done by security specialists, look like custom binaries/scripts, and check for crashes and memory errors.  https://hypothesis.works/articles/what-is-property-based-testing/ digs into this in some more detail, though I don't personally find the definitions very useful - mostly because everyone has their own so they're not much use for communication!
>
> I also really like these three essays from my now-colleague Nelson: https://blog.nelhage.com/post/property-testing-is-fuzzing/ https://blog.nelhage.com/post/property-testing-like-afl/ and https://blog.nelhage.com/post/two-kinds-of-testing/
>
> I think Matti's underlying question is really "what would Numpy get out of OSS-Fuzz, and is it worth it?".
>
> - OSS-Fuzz is designed around AFL-style coverage-guided fuzzing of compiled languages, with additional use of sanitizers to detect memory errors and undefined behaviour.  This makes it highly effective at catching certain C programming bugs, including security classics like buffer overflows, but a relatively poor choice for high-level semantic tests (where Hypothesis shines).
>
> - The most effective harnesses tend to have a minimum of logic between the bytes produced by the fuzzer, and internal logic - for example, David's initial proposal just calls `np.loadtxt()` on a fuzzer-generated string.  While Atheris has a pretty nice Python interface, it's still designed around very simple types for simplicity and speed.  The coverage feedback for an evolutionary search also gives asymptotically better performance, which is often a really big deal in practice (in my experiments, usually overtaking heuristic-random after a few hundred or thousand seconds)
>
> - There's a pretty serious impedance mismatch between Atheris and the more complicated parsers inside Hypothesis strategies.  They're much slower than Atheris' native code, but also much more expressive and better at finding weird edge cases like subnormals, edge cases, signalling nans, etc; equally important IMO is that they make it easy to express _all_ possible values instead of just the simple ones.  However, that comes at the cost of fewer cases-per-second and more rejection sampling; conversely Hypothesis gives you free replay and shrinking of any test discovered via Atheris simply by running the test normally.
>
> - I designed https://hypofuzz.com/ with an eye to this and making the UX as simple as possible; if you're interested I can't provide server(s) to run it on but of course it's free for community OSS projects.  There's also https://github.com/HypothesisWorks/hypothesis/issues/3086 to provide lower-overhead hooks for symbolic execution and Atheris, though it's slow going as I don't have enough free time to push that forward at the moment.
>
> I haven't gotten OSS-Fuzz emails myself, but I know they've put a lot of work into making the reporting reasonably compact and actionable.
>
> So... if you want to find low-level problems with the C parts of Numpy, I'd suggest trying out OSS-Fuzz.  If you want to test the high-level semantics, I'd stick with Hypothesis; and if you want to fuzz property-based tests I'd recommend HypoFuzz over Atheris unless the latter is much easier to set up (plausible, if OSS-Fuzz handles all the infra for you!).
>
> If Numpy maintainers - or anyone else - would like to discuss this in more detail, I'll also be at SciPy US in a few weeks and happy to talk it over or spend some sprint time then.
> _______________________________________________
> NumPy-Discussion mailing list -- numpy-discussion@python.org
> To unsubscribe send an email to numpy-discussion-leave@python.org
> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
> Member address: david@adalogics.com
> ADA Logics Ltd is registered in England. No: 11624074.
> Registered office: 266 Banbury Road, Post Box 292,
> OX2 7DL, Oxford, Oxfordshire , United Kingdom
ADA Logics Ltd is registered in England. No: 11624074.
Registered office: 266 Banbury Road, Post Box 292,
OX2 7DL, Oxford, Oxfordshire , United Kingdom
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-leave@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: ralf.gommers@googlemail.com