- parallel code in general is not very composable. If someone is calling a numpy operation from one thread, great, transparently using multiple threads internally is a win. If they're exploiting some higher-level structure in their problem to break it into pieces and process each in parallel, and then using numpy on each piece, then numpy spawning threads internally will probably destroy performance. And numpy is too low-level to know which case it's in. This problem exists to some extent already with multi-threaded BLAS, so people use various BLAS-specific knobs to manage it in ad hoc ways, but this doesn't scale.