[Numpy-discussion] fast_any_all , a trivial but fast/useful helper function for numpy

Graeme B. Bell grb at skogoglandskap.no
Thu Sep 5 04:47:28 EDT 2013


Hi Robert, 

Thanks for proposing an alternative implementation approach. 
However, did you test your proposal before you made the assertion about its behaviour?


>reduce(np.logical_or, inputs, False)
>reduce(np.logical_and, inputs, True)

This code consistently benchmarks 20% slower than the method I use (tested on two different machines several times).


>Your fast_logic() is basically reduce().

No, it isn't.


Updated benchmarks for your proposal and also for another alternative implemenation using boolean indexing at: 
https://github.com/gbb/numpy-fast-any-all/blob/master/BENCHMARK.md 


Three general points arising from this:

1 - idioms don't have test coverage

Generally, by using idioms rather than functions, you risk mistyping or misusing the form of the idiom and thus introducing a bug. You also lose out on explicit testing and implicit 'real world testing' that tends to build up around library functions.


2 - idioms aren't maintained or updated (and they have a unknown shelf life)

An idiom might be fast today (or not), it may be correct today, but tomorrow is unknown. 

A key problem is that the relative performance of the parts of a library like numpy will keep changing - sometimes substantially - and idiomatic approaches to overcome performance difficulties in the short term tend to become outdated and even harmful very quickly. As in this example, they can even be harmful from the moment they're written. Browsing a site like stackoverflow should show you both new and experienced users often taking inefficient approaches because of outdated idiomatic advice. 


3 - idioms are OK, but functions are better, because implementation hiding and abstraction are good things. 

If you use a benchmarked/tested function which acknowledges a range of alternative implementations, you have a reasonable degree of confidence that you're getting the best performance and correct behaviour, because you can actually see the effects of the alternative implementations in benchmarks/test output. 

It's a lot more sensible to use a function from a publicly available library - any library - than to manually maintain a set of idioms and have to continually search your software for the idioms, benchmark them to see if they're still beneficial, and modify them when they're not. 

Graeme





More information about the NumPy-Discussion mailing list