Mailman 3 numexpr efficency depends on the size of the computing kernel - NumPy-Discussion

March 14, 2007

      Hi,

Now that I'm commanding my old AMD Duron machine, I've made some
benchmarks just to prove that the numexpr computing is not influenced by
the size of the CPU cache, but I failed miserably (and Tim was right:
there is a dependency of the numexpr efficency on CPU cache size).

Provided that the pytables instance of the computing kernel of numexpr
is quite larger (it supports more datatypes) than the original,
comparing the performance of both versions can be a good way to check
the influence of CPU cache on the computing efficency.

The attached benchmark is a small modification of the timing.py that
comes with the numexpr package (the modification was needed to allow the
numexpr version of pytables to run all the cases). Basically, the
expressions tested operations with arrays of 1 million of elements, with
a mix of contiguous and strided arrays (no unaligned arrays are present
here). See the code in benchmark for the details.

The speed-ups of numexpr over plain numpy on a AMD Duron machine (64 +
64 KB L1 cache, 64 KB L2 cache) are:

For the original numexpr package:

2.14, 2.21, 2.21  (these represent averages for 3 complete runs)

For the modified pytables version (enlarged computing kernel):

1.32, 1.34, 1.37

So, with a CPU with a very small cache, the original numexpr kernel is
1.6x faster than the pytables one.

However, using a AMD Opteron which has a much bigger L2 cache (64 + 64
KB L1 cache, 1 MB L2 cache), the speed-ups are quite similar:

For the original numexpr package:

3.10, 3.35, 3.35

For the modified pytables version (enlarged computing kernel):

3.37, 3.50, 3.45

So, there is effectively a dependency on the CPU cache size. It would be
nice to run the benchmark with other CPUs with a L2 cache in the range
between 64 KB and 1 MB so as to find the point where the performance
starts to be similar (this should be a good guess on the size of the
computing kernel).

Meanwhile, the lesson learned is that Tim worries were correct: one
should be very careful on adding more opcodes (at least, until CPUs with
a very small L2 cache are in use).  With this, perhaps we will have to
reduce the opcodes in the numexpr version for pytables to a bare
minimum :-/

Cheers,

-- 
Francesc Altet    |  Be careful about using the following code --
Carabos Coop. V.  |  I've only proven that it works, 
www.carabos.com   |  I haven't tested it. -- Donald Knuth

numexpr efficency depends on the size of the computing kernel

Francesc Altet

Francesc Altet

Francesc Altet

tags

participants (1)