[Numpy-discussion] Openmp support (was numpy's future (1.1 and beyond): which direction(s) ?)

Charles R Harris charlesr.harris at gmail.com
Sun Mar 23 02:18:34 EDT 2008


On Sat, Mar 22, 2008 at 10:59 PM, David Cournapeau <
david at ar.media.kyoto-u.ac.jp> wrote:

> Charles R Harris wrote:
> >
> > It looks like memory access is the bottleneck, otherwise running 4
> > floats through in parallel should go a lot faster. I need to modify
> > the program a bit and see how it works for doubles.
>
> I am not sure the benchmark is really meaningful: it does not uses
> aligned buffers (16 bytes alignement), and because of that, does not
> give a good idea of what can be expected from SSE. It shows why it is
> not so easy to get good performances, and why just throwing a few
> optimized loops won't work, though. Using sse/sse2 from unaligned
> buffers is a waste of time. Without this alignement, you need to take
> into account the alignement (using _mm_loadu_ps vs _mm_load_ps), and
> that's extremely slow, basically killing most of the speed increase you
> can expect from using sse.
>

Yep, but I expect the compilers to take care of alignment, say by inserting
a few single ops when needed. So I would just as soon leave vectorization to
the compilers and wait until they get that good. The only thing needed then
is contiguous data.

Chuck
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20080323/b0a03a7b/attachment.html>


More information about the NumPy-Discussion mailing list