On Mon, Sep 26, 2011 at 8:19 AM, Keith Hughitt <keith.hughitt@gmail.com> wrote:
Hi all,

Myself and several colleagues have recently started work on a Python library for solar physics, in order to provide an alternative to the current mainstay for solar physics, which is written in IDL.

One of the first steps we have taken is to create a Python port of a popular benchmark for IDL (time_test3) which measures performance for a variety of (primarily matrix) operations. In our initial attempt, however, Python performs significantly poorer than IDL for several of the tests. I have attached a graph which shows the results for one machine: the x-axis is the test # being compared, and the y-axis is the time it took to complete the test, in milliseconds. While it is possible that this is simply due to limitations in Python/Numpy, I suspect that this is due at least in part to our lack in familiarity with NumPy and SciPy.

So my question is, does anyone see any places where we are doing things very inefficiently in Python?

In order to try and ensure a fair comparison between IDL and Python there are some things (e.g. the style of timing and output) which we have deliberately chosen to do a certain way. In other cases, however, it is likely that we just didn't know a better method.

Any feedback or suggestions people have would be greatly appreciated. Unfortunately, due to the proprietary nature of IDL, we cannot share the original version of time_test3, but hopefully the comments in time_test3.py will be clear enough.


The first three tests are of Python loops over python lists, so I'm not much surprised at the results. Number 6 uses numpy roll, which is not implemented in a particularly efficient way, so could use some improvement. I haven't looked at the rest of the results, but I suspect they are similar. So in some cases I think the benchmark isn't particularly useful, but in a few others numpy could be improved.

It would be interesting to see which features are actually widely used in IDL code and weight them accordingly. In general, for loops are to be avoided, but if some numpy routine is a bottleneck we should fix it.

Chuck