Fuzzing integration of Numpy into OSS-Fuzz

Hi Numpy maintainers, Would you be interested in integrating continuous fuzzing by way of OSS-Fuzz? Fuzzing is a way to automate test-case generation and has been heavily used for memory unsafe languages. Recently efforts have been put into fuzzing memory safe languages and Python is one of the languages where it would be great to use fuzzing. In this PR: https://github.com/google/oss-fuzz/pull/7681 I did an initial integration into OSS-Fuzz. Essentially, OSS-Fuzz is a free service run by Google that performs continuous fuzzing of important open source projects. If you would like to integrate, the only thing I need is a list of email(s) that will get access to the data produced by OSS-Fuzz, such as bug reports, coverage reports and more stats. Notice the emails affiliated with the project will be public in the OSS-Fuzz repo, as they will be part of a configuration file. There are already some important Python projects on OSS-Fuzz such as tensorflow-python (https://github.com/google/oss-fuzz/tree/master/projects/tensorflow-py) and it would be great to add Numpy to the list. Let me know your thoughts on this and if you have any questions as I’m happy to clarify or go more into details with fuzzing. Kind regards, David ADA Logics Ltd is registered in England. No: 11624074. Registered office: 266 Banbury Road, Post Box 292, OX2 7DL, Oxford, Oxfordshire , United Kingdom

On 7/6/22 14:02, david korczynski wrote:
Hi Numpy maintainers,
Would you be interested in integrating continuous fuzzing by way of OSS-Fuzz? Fuzzing is a way to automate test-case generation and has been heavily used for memory unsafe languages. Recently efforts have been put into fuzzing memory safe languages and Python is one of the languages where it would be great to use fuzzing.
...
Let me know your thoughts on this and if you have any questions as I’m happy to clarify or go more into details with fuzzing.
Kind regards, David
Could you compare and contrast this to hypothesis [0], which we are already using in our testing? I don't understand what you mean by "Python is one of the languages where it would be great to use fuzzing". Why? Matti [0] https://hypothesis.readthedocs.io/en/latest/index.html

I'm not 100% about the important differences, so this is a bit of an intuitive analysis from my side (I know little about Hypothesis and more about fuzzing). Hypothesis has support for traditional fuzzing [sic]: https://hypothesis.readthedocs.io/en/latest/details.html?highlight=fuzz#use-... and OSS-Fuzz supports using Python fuzzing by way of Hypothesis https://google.github.io/oss-fuzz/getting-started/new-project-guide/python-l... although it will be seeded with the Atheris fuzzer and based on this issue https://github.com/google/atheris/issues/20 it seems Atheris + Hypothesis might not be working particularly well together. I think based on the above and skimming through the Hypothesis docs that there are many similarities between fuzzing (Atheris specifically) but the underlying engine that explores the input space is different. Fuzzing is coverage-guided (which I don't think Hypothesis is, but I could be wrong), meaning the target program is instrumented to identify if a newly generated input explores new code. In essence, this makes fuzzing a mutational genetic algorithm. Another benefit is OSS-Fuzz will build the target code with various sanitizers (ASan, UBSan, MSan) which will help highlight issues in the native code. About the why it would be great to fuzz more Python code, then this was more of a general statement in that a lot of effort is being put into this from the OSS-Fuzz side because Python is a widely used language. For example, an effort in this domain is investigation into new bug oracles for Python (like sanitizers but targeted memory safe languages). On 07/06/2022 15:10, Matti Picus wrote:
On 7/6/22 14:02, david korczynski wrote:
Hi Numpy maintainers,
Would you be interested in integrating continuous fuzzing by way of OSS-Fuzz? Fuzzing is a way to automate test-case generation and has been heavily used for memory unsafe languages. Recently efforts have been put into fuzzing memory safe languages and Python is one of the languages where it would be great to use fuzzing.
...
Let me know your thoughts on this and if you have any questions as I’m happy to clarify or go more into details with fuzzing.
Kind regards, David
Could you compare and contrast this to hypothesis [0], which we are already using in our testing?
I don't understand what you mean by "Python is one of the languages where it would be great to use fuzzing". Why?
Matti
[0] https://hypothesis.readthedocs.io/en/latest/index.html
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: david@adalogics.com ADA Logics Ltd is registered in England. No: 11624074. Registered office: 266 Banbury Road, Post Box 292, OX2 7DL, Oxford, Oxfordshire , United Kingdom
ADA Logics Ltd is registered in England. No: 11624074. Registered office: 266 Banbury Road, Post Box 292, OX2 7DL, Oxford, Oxfordshire , United Kingdom

I know the hypothesis developers consider Hypothesis to be different from fuzzing. But I've never been exactly clear just what is meant by "fuzzing" in the context you are suggesting. When you say you want to "fuzz NumPy" what sorts of things would the fuzzer be doing? Would you need to tell it what various NumPy functions and operations are and how to generate inputs for them? Or does it do that automatically somehow? And how would you tell it what sorts of things to check for a given set of inputs? For a Hypothesis test, you would tell it explicitly what the input is, like "a is an array with some given properties (e.g., >1 dim, has a numerical dtype, has positive values, etc.)". Then you explicitly write a bunch of assertions that such arrays should satisfy (like some f(a).all()). It then generates examples from the given set of inputs in an attempt to falsify the given assertions. The whole process requires a considerable amount of human work because you have to figure out a bunch of properties that various operations should satisfy on certain sets of inputs and write tests for them. I'm still unclear on just what "fuzzing" is, but my impression has always been that it's not this. One difference I do know between hypothesis and a fuzzer is that hypothesis is more geared toward finding test failures and getting you to fix them. So for example, Hypothesis only runs 100 examples by default each run. You have to manually increase that number to run more. Another difference is if Hypothesis finds a failure, it will fixate on that failure and always return it, even to the detriment of finding other possible failures, until you either fix it or modify the strategies to ignore it. My understanding is that a fuzzer is more geared toward exploring a wide search space and finding as many possible issues as possible, even if there isn't the immediate possibility of them becoming fixed. I've used Hypothesis on several projects that depend on NumPy and incidentally found several bugs in NumPy with it (for example, https://github.com/numpy/numpy/issues/15753). Aaron Meurer On Wed, Jun 8, 2022 at 8:44 AM david korczynski <david@adalogics.com> wrote:
I'm not 100% about the important differences, so this is a bit of an intuitive analysis from my side (I know little about Hypothesis and more about fuzzing).
Hypothesis has support for traditional fuzzing [sic]:
https://hypothesis.readthedocs.io/en/latest/details.html?highlight=fuzz#use-... and OSS-Fuzz supports using Python fuzzing by way of Hypothesis
https://google.github.io/oss-fuzz/getting-started/new-project-guide/python-l... although it will be seeded with the Atheris fuzzer and based on this issue https://github.com/google/atheris/issues/20 it seems Atheris + Hypothesis might not be working particularly well together.
I think based on the above and skimming through the Hypothesis docs that there are many similarities between fuzzing (Atheris specifically) but the underlying engine that explores the input space is different. Fuzzing is coverage-guided (which I don't think Hypothesis is, but I could be wrong), meaning the target program is instrumented to identify if a newly generated input explores new code. In essence, this makes fuzzing a mutational genetic algorithm. Another benefit is OSS-Fuzz will build the target code with various sanitizers (ASan, UBSan, MSan) which will help highlight issues in the native code.
About the why it would be great to fuzz more Python code, then this was more of a general statement in that a lot of effort is being put into this from the OSS-Fuzz side because Python is a widely used language. For example, an effort in this domain is investigation into new bug oracles for Python (like sanitizers but targeted memory safe languages).
On 07/06/2022 15:10, Matti Picus wrote:
On 7/6/22 14:02, david korczynski wrote:
Hi Numpy maintainers,
Would you be interested in integrating continuous fuzzing by way of OSS-Fuzz? Fuzzing is a way to automate test-case generation and has been heavily used for memory unsafe languages. Recently efforts have been put into fuzzing memory safe languages and Python is one of the languages where it would be great to use fuzzing.
...
Let me know your thoughts on this and if you have any questions as I’m happy to clarify or go more into details with fuzzing.
Kind regards, David
Could you compare and contrast this to hypothesis [0], which we are already using in our testing?
I don't understand what you mean by "Python is one of the languages where it would be great to use fuzzing". Why?
Matti
[0] https://hypothesis.readthedocs.io/en/latest/index.html
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: david@adalogics.com ADA Logics Ltd is registered in England. No: 11624074. Registered office: 266 Banbury Road, Post Box 292, OX2 7DL, Oxford, Oxfordshire , United Kingdom
ADA Logics Ltd is registered in England. No: 11624074. Registered office: 266 Banbury Road, Post Box 292, OX2 7DL, Oxford, Oxfordshire , United Kingdom _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: asmeurer@gmail.com

Coverage-guided fuzzing is fundamentally just a technique that iteratively generates input that explores more code relative to the possible execution space of the code targeted. What the fuzzer gives you to play with is a byte-array that you can massage in any way possible and pass it into the code under analysis. The fuzz engine will then observe whether the code under analysis executed in a way that was not seen before, and save the given byte array. Using this you can test for so many things. The way you describe using hypothesis in terms of testing a given input and whether some post condition is satisfied: you can do this with fuzzing by converting the byte-array from the fuzzer into higher level data structures, pass these data structures into the target code and then use the same asserts to see if all post conditions are satisfied. In the context of Numpy, what we can test for are: 1) Memory corruption issues in the native code (OSS-Fuzz will compile it with sanitizers). 2) Unexpected exceptions, i.e. call functions in Numpy with a data that is seeded with fuzz input and ensure no exceptions are raised besides those documented. 3) Behavioural testing similar to how you describe using Hypothesis. In the OSS-Fuzz PR I added a fuzzer that tests option (2) listed above: https://github.com/google/oss-fuzz/pull/7681 You're right in that the fuzzing will continue to explore the search space whenever it runs into an issue. OSS-Fuzz, however, comes with a large backend that manages all the running of the fuzzers and will do de-duplication such that a bug is only reported once even if the fuzzer hits it N times. Kind regards, David On 08/06/2022 21:46, Aaron Meurer wrote: I know the hypothesis developers consider Hypothesis to be different from fuzzing. But I've never been exactly clear just what is meant by "fuzzing" in the context you are suggesting. When you say you want to "fuzz NumPy" what sorts of things would the fuzzer be doing? Would you need to tell it what various NumPy functions and operations are and how to generate inputs for them? Or does it do that automatically somehow? And how would you tell it what sorts of things to check for a given set of inputs? For a Hypothesis test, you would tell it explicitly what the input is, like "a is an array with some given properties (e.g., >1 dim, has a numerical dtype, has positive values, etc.)". Then you explicitly write a bunch of assertions that such arrays should satisfy (like some f(a).all()). It then generates examples from the given set of inputs in an attempt to falsify the given assertions. The whole process requires a considerable amount of human work because you have to figure out a bunch of properties that various operations should satisfy on certain sets of inputs and write tests for them. I'm still unclear on just what "fuzzing" is, but my impression has always been that it's not this. One difference I do know between hypothesis and a fuzzer is that hypothesis is more geared toward finding test failures and getting you to fix them. So for example, Hypothesis only runs 100 examples by default each run. You have to manually increase that number to run more. Another difference is if Hypothesis finds a failure, it will fixate on that failure and always return it, even to the detriment of finding other possible failures, until you either fix it or modify the strategies to ignore it. My understanding is that a fuzzer is more geared toward exploring a wide search space and finding as many possible issues as possible, even if there isn't the immediate possibility of them becoming fixed. I've used Hypothesis on several projects that depend on NumPy and incidentally found several bugs in NumPy with it (for example, https://github.com/numpy/numpy/issues/15753). Aaron Meurer On Wed, Jun 8, 2022 at 8:44 AM david korczynski <david@adalogics.com<mailto:david@adalogics.com>> wrote: I'm not 100% about the important differences, so this is a bit of an intuitive analysis from my side (I know little about Hypothesis and more about fuzzing). Hypothesis has support for traditional fuzzing [sic]: https://hypothesis.readthedocs.io/en/latest/details.html?highlight=fuzz#use-... and OSS-Fuzz supports using Python fuzzing by way of Hypothesis https://google.github.io/oss-fuzz/getting-started/new-project-guide/python-l... although it will be seeded with the Atheris fuzzer and based on this issue https://github.com/google/atheris/issues/20 it seems Atheris + Hypothesis might not be working particularly well together. I think based on the above and skimming through the Hypothesis docs that there are many similarities between fuzzing (Atheris specifically) but the underlying engine that explores the input space is different. Fuzzing is coverage-guided (which I don't think Hypothesis is, but I could be wrong), meaning the target program is instrumented to identify if a newly generated input explores new code. In essence, this makes fuzzing a mutational genetic algorithm. Another benefit is OSS-Fuzz will build the target code with various sanitizers (ASan, UBSan, MSan) which will help highlight issues in the native code. About the why it would be great to fuzz more Python code, then this was more of a general statement in that a lot of effort is being put into this from the OSS-Fuzz side because Python is a widely used language. For example, an effort in this domain is investigation into new bug oracles for Python (like sanitizers but targeted memory safe languages). On 07/06/2022 15:10, Matti Picus wrote:
On 7/6/22 14:02, david korczynski wrote:
Hi Numpy maintainers,
Would you be interested in integrating continuous fuzzing by way of OSS-Fuzz? Fuzzing is a way to automate test-case generation and has been heavily used for memory unsafe languages. Recently efforts have been put into fuzzing memory safe languages and Python is one of the languages where it would be great to use fuzzing.
...
Let me know your thoughts on this and if you have any questions as I’m happy to clarify or go more into details with fuzzing.
Kind regards, David
Could you compare and contrast this to hypothesis [0], which we are already using in our testing?
I don't understand what you mean by "Python is one of the languages where it would be great to use fuzzing". Why?
Matti
[0] https://hypothesis.readthedocs.io/en/latest/index.html
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org<mailto:numpy-discussion@python.org> To unsubscribe send an email to numpy-discussion-leave@python.org<mailto:numpy-discussion-leave@python.org> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: david@adalogics.com<mailto:david@adalogics.com> ADA Logics Ltd is registered in England. No: 11624074. Registered office: 266 Banbury Road, Post Box 292, OX2 7DL, Oxford, Oxfordshire , United Kingdom
ADA Logics Ltd is registered in England. No: 11624074. Registered office: 266 Banbury Road, Post Box 292, OX2 7DL, Oxford, Oxfordshire , United Kingdom _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org<mailto:numpy-discussion@python.org> To unsubscribe send an email to numpy-discussion-leave@python.org<mailto:numpy-discussion-leave@python.org> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: asmeurer@gmail.com<mailto:asmeurer@gmail.com> ADA Logics Ltd is registered in England. No: 11624074. Registered office: 266 Banbury Road, Post Box 292, OX2 7DL, Oxford, Oxfordshire , United Kingdom _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org<mailto:numpy-discussion@python.org> To unsubscribe send an email to numpy-discussion-leave@python.org<mailto:numpy-discussion-leave@python.org> https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: david@adalogics.com<mailto:david@adalogics.com> ADA Logics Ltd is registered in England. No: 11624074. Registered office: 266 Banbury Road, Post Box 292, OX2 7DL, Oxford, Oxfordshire , United Kingdom

As a maintainer of Hypothesis and sometime-fuzzing-researcher, hopefully sharing my perspective might help. Firstly, fuzzing and property-based testing are clearly related fields! Personally I tend to divide them more by the UX than underlying tool: PBT tends to be quick (seconds), done by developers, look like unit tests, check semantics. Fuzzing tends to run for much longer (hours to weeks), done by security specialists, look like custom binaries/scripts, and check for crashes and memory errors. https://hypothesis.works/articles/what-is-property-based-testing/ digs into this in some more detail, though I don't personally find the definitions very useful - mostly because everyone has their own so they're not much use for communication! I also really like these three essays from my now-colleague Nelson: https://blog.nelhage.com/post/property-testing-is-fuzzing/ https://blog.nelhage.com/post/property-testing-like-afl/ and https://blog.nelhage.com/post/two-kinds-of-testing/ I think Matti's underlying question is really "what would Numpy get out of OSS-Fuzz, and is it worth it?". - OSS-Fuzz is designed around AFL-style coverage-guided fuzzing of compiled languages, with additional use of sanitizers to detect memory errors and undefined behaviour. This makes it highly effective at catching certain C programming bugs, including security classics like buffer overflows, but a relatively poor choice for high-level semantic tests (where Hypothesis shines). - The most effective harnesses tend to have a minimum of logic between the bytes produced by the fuzzer, and internal logic - for example, David's initial proposal just calls `np.loadtxt()` on a fuzzer-generated string. While Atheris has a pretty nice Python interface, it's still designed around very simple types for simplicity and speed. The coverage feedback for an evolutionary search also gives asymptotically better performance, which is often a really big deal in practice (in my experiments, usually overtaking heuristic-random after a few hundred or thousand seconds) - There's a pretty serious impedance mismatch between Atheris and the more complicated parsers inside Hypothesis strategies. They're much slower than Atheris' native code, but also much more expressive and better at finding weird edge cases like subnormals, edge cases, signalling nans, etc; equally important IMO is that they make it easy to express _all_ possible values instead of just the simple ones. However, that comes at the cost of fewer cases-per-second and more rejection sampling; conversely Hypothesis gives you free replay and shrinking of any test discovered via Atheris simply by running the test normally. - I designed https://hypofuzz.com/ with an eye to this and making the UX as simple as possible; if you're interested I can't provide server(s) to run it on but of course it's free for community OSS projects. There's also https://github.com/HypothesisWorks/hypothesis/issues/3086 to provide lower-overhead hooks for symbolic execution and Atheris, though it's slow going as I don't have enough free time to push that forward at the moment. I haven't gotten OSS-Fuzz emails myself, but I know they've put a lot of work into making the reporting reasonably compact and actionable. So... if you want to find low-level problems with the C parts of Numpy, I'd suggest trying out OSS-Fuzz. If you want to test the high-level semantics, I'd stick with Hypothesis; and if you want to fuzz property-based tests I'd recommend HypoFuzz over Atheris unless the latter is much easier to set up (plausible, if OSS-Fuzz handles all the infra for you!). If Numpy maintainers - or anyone else - would like to discuss this in more detail, I'll also be at SciPy US in a few weeks and happy to talk it over or spend some sprint time then.

Thanks for the detailed insights Zac! Numpy maintainers, are you interested in trying out OSS-Fuzz? The only thing needed is some maintainer email for receiving issues and then I can get things moving. On 10/06/2022 04:45, Zac Hatfield-Dodds wrote:
As a maintainer of Hypothesis and sometime-fuzzing-researcher, hopefully sharing my perspective might help.
Firstly, fuzzing and property-based testing are clearly related fields! Personally I tend to divide them more by the UX than underlying tool: PBT tends to be quick (seconds), done by developers, look like unit tests, check semantics. Fuzzing tends to run for much longer (hours to weeks), done by security specialists, look like custom binaries/scripts, and check for crashes and memory errors. https://hypothesis.works/articles/what-is-property-based-testing/ digs into this in some more detail, though I don't personally find the definitions very useful - mostly because everyone has their own so they're not much use for communication!
I also really like these three essays from my now-colleague Nelson: https://blog.nelhage.com/post/property-testing-is-fuzzing/ https://blog.nelhage.com/post/property-testing-like-afl/ and https://blog.nelhage.com/post/two-kinds-of-testing/
I think Matti's underlying question is really "what would Numpy get out of OSS-Fuzz, and is it worth it?".
- OSS-Fuzz is designed around AFL-style coverage-guided fuzzing of compiled languages, with additional use of sanitizers to detect memory errors and undefined behaviour. This makes it highly effective at catching certain C programming bugs, including security classics like buffer overflows, but a relatively poor choice for high-level semantic tests (where Hypothesis shines).
- The most effective harnesses tend to have a minimum of logic between the bytes produced by the fuzzer, and internal logic - for example, David's initial proposal just calls `np.loadtxt()` on a fuzzer-generated string. While Atheris has a pretty nice Python interface, it's still designed around very simple types for simplicity and speed. The coverage feedback for an evolutionary search also gives asymptotically better performance, which is often a really big deal in practice (in my experiments, usually overtaking heuristic-random after a few hundred or thousand seconds)
- There's a pretty serious impedance mismatch between Atheris and the more complicated parsers inside Hypothesis strategies. They're much slower than Atheris' native code, but also much more expressive and better at finding weird edge cases like subnormals, edge cases, signalling nans, etc; equally important IMO is that they make it easy to express _all_ possible values instead of just the simple ones. However, that comes at the cost of fewer cases-per-second and more rejection sampling; conversely Hypothesis gives you free replay and shrinking of any test discovered via Atheris simply by running the test normally.
- I designed https://hypofuzz.com/ with an eye to this and making the UX as simple as possible; if you're interested I can't provide server(s) to run it on but of course it's free for community OSS projects. There's also https://github.com/HypothesisWorks/hypothesis/issues/3086 to provide lower-overhead hooks for symbolic execution and Atheris, though it's slow going as I don't have enough free time to push that forward at the moment.
I haven't gotten OSS-Fuzz emails myself, but I know they've put a lot of work into making the reporting reasonably compact and actionable.
So... if you want to find low-level problems with the C parts of Numpy, I'd suggest trying out OSS-Fuzz. If you want to test the high-level semantics, I'd stick with Hypothesis; and if you want to fuzz property-based tests I'd recommend HypoFuzz over Atheris unless the latter is much easier to set up (plausible, if OSS-Fuzz handles all the infra for you!).
If Numpy maintainers - or anyone else - would like to discuss this in more detail, I'll also be at SciPy US in a few weeks and happy to talk it over or spend some sprint time then. _______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: david@adalogics.com ADA Logics Ltd is registered in England. No: 11624074. Registered office: 266 Banbury Road, Post Box 292, OX2 7DL, Oxford, Oxfordshire , United Kingdom ADA Logics Ltd is registered in England. No: 11624074. Registered office: 266 Banbury Road, Post Box 292, OX2 7DL, Oxford, Oxfordshire , United Kingdom

On Mon, Jun 27, 2022 at 9:16 PM DavidKorczynski <david@adalogics.com> wrote:
Thanks for the detailed insights Zac!
Thanks indeed, this was very helpful.
Numpy maintainers, are you interested in trying out OSS-Fuzz? The only thing needed is some maintainer email for receiving issues and then I can get things moving.
Based on your descriptions and Zac's recommendation, I think it is worth trying this. If you need a personal email, please use mine ( ralf.gommers@gmail.com). Or if you can accept a private group list that several maintainers look at, please use numpy-team@googlegroups.com Thanks for doing this David. Cheers, Ralf
As a maintainer of Hypothesis and sometime-fuzzing-researcher, hopefully sharing my perspective might help.
Firstly, fuzzing and property-based testing are clearly related fields! Personally I tend to divide them more by the UX than underlying tool: PBT tends to be quick (seconds), done by developers, look like unit tests, check semantics. Fuzzing tends to run for much longer (hours to weeks), done by security specialists, look like custom binaries/scripts, and check for crashes and memory errors. https://hypothesis.works/articles/what-is-property-based-testing/ digs into this in some more detail, though I don't personally find the definitions very useful - mostly because everyone has their own so they're not much use for communication!
I also really like these three essays from my now-colleague Nelson: https://blog.nelhage.com/post/property-testing-is-fuzzing/ https://blog.nelhage.com/post/property-testing-like-afl/ and https://blog.nelhage.com/post/two-kinds-of-testing/
I think Matti's underlying question is really "what would Numpy get out of OSS-Fuzz, and is it worth it?".
- OSS-Fuzz is designed around AFL-style coverage-guided fuzzing of compiled languages, with additional use of sanitizers to detect memory errors and undefined behaviour. This makes it highly effective at catching certain C programming bugs, including security classics like buffer overflows, but a relatively poor choice for high-level semantic tests (where Hypothesis shines).
- The most effective harnesses tend to have a minimum of logic between
On 10/06/2022 04:45, Zac Hatfield-Dodds wrote: the bytes produced by the fuzzer, and internal logic - for example, David's initial proposal just calls `np.loadtxt()` on a fuzzer-generated string. While Atheris has a pretty nice Python interface, it's still designed around very simple types for simplicity and speed. The coverage feedback for an evolutionary search also gives asymptotically better performance, which is often a really big deal in practice (in my experiments, usually overtaking heuristic-random after a few hundred or thousand seconds)
- There's a pretty serious impedance mismatch between Atheris and the
more complicated parsers inside Hypothesis strategies. They're much slower than Atheris' native code, but also much more expressive and better at finding weird edge cases like subnormals, edge cases, signalling nans, etc; equally important IMO is that they make it easy to express _all_ possible values instead of just the simple ones. However, that comes at the cost of fewer cases-per-second and more rejection sampling; conversely Hypothesis gives you free replay and shrinking of any test discovered via Atheris simply by running the test normally.
- I designed https://hypofuzz.com/ with an eye to this and making the
UX as simple as possible; if you're interested I can't provide server(s) to run it on but of course it's free for community OSS projects. There's also https://github.com/HypothesisWorks/hypothesis/issues/3086 to provide lower-overhead hooks for symbolic execution and Atheris, though it's slow going as I don't have enough free time to push that forward at the moment.
I haven't gotten OSS-Fuzz emails myself, but I know they've put a lot of
work into making the reporting reasonably compact and actionable.
So... if you want to find low-level problems with the C parts of Numpy,
I'd suggest trying out OSS-Fuzz. If you want to test the high-level semantics, I'd stick with Hypothesis; and if you want to fuzz property-based tests I'd recommend HypoFuzz over Atheris unless the latter is much easier to set up (plausible, if OSS-Fuzz handles all the infra for you!).
If Numpy maintainers - or anyone else - would like to discuss this in
more detail, I'll also be at SciPy US in a few weeks and happy to talk it over or spend some sprint time then.
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: david@adalogics.com ADA Logics Ltd is registered in England. No: 11624074. Registered office: 266 Banbury Road, Post Box 292, OX2 7DL, Oxford, Oxfordshire , United Kingdom ADA Logics Ltd is registered in England. No: 11624074. Registered office: 266 Banbury Road, Post Box 292, OX2 7DL, Oxford, Oxfordshire , United Kingdom
NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-leave@python.org https://mail.python.org/mailman3/lists/numpy-discussion.python.org/ Member address: ralf.gommers@googlemail.com
participants (6)
-
Aaron Meurer
-
david korczynski
-
DavidKorczynski
-
Matti Picus
-
Ralf Gommers
-
Zac Hatfield-Dodds