I know the hypothesis developers consider Hypothesis to be different from fuzzing. But I've never been exactly clear just what is meant by "fuzzing" in the context you are suggesting. When you say you want to "fuzz NumPy" what sorts of things would the fuzzer be doing? Would you need to tell it what various NumPy functions and operations are and how to generate inputs for them? Or does it do that automatically somehow? And how would you tell it what sorts of things to check for a given set of inputs?
For a Hypothesis test, you would tell it explicitly what the input is, like "a is an array with some given properties (e.g., >1 dim, has a numerical dtype, has positive values, etc.)". Then you explicitly write a bunch of assertions that such arrays should satisfy (like some f(a).all()). It then generates examples from the given set of inputs in an attempt to falsify the given assertions. The whole process requires a considerable amount of human work because you have to figure out a bunch of properties that various operations should satisfy on certain sets of inputs and write tests for them. I'm still unclear on just what "fuzzing" is, but my impression has always been that it's not this.
One difference I do know between hypothesis and a fuzzer is that hypothesis is more geared toward finding test failures and getting you to fix them. So for example, Hypothesis only runs 100 examples by default each run. You have to manually increase that number to run more. Another difference is if Hypothesis finds a failure, it will fixate on that failure and always return it, even to the detriment of finding other possible failures, until you either fix it or modify the strategies to ignore it. My understanding is that a fuzzer is more geared toward exploring a wide search space and finding as many possible issues as possible, even if there isn't the immediate possibility of them becoming fixed.
Aaron Meurer