Re: [Numpy-discussion] low level optimization in NumPy and minivect

25 Jun 2013

      Hi,

I wasn't able to attend this year Scipy Conference. My tutorial proposal
was rejected and other deadline intefered with this conference date.

Will the presentation be recorded? If not, can you make the slide available?

What is your opinion on this question:

- Should other lib like NumPy/Theano/Cython/Numba base their elemwise
implemention (or part of it) on dynd or minivect? I know cython and Numba
do it, but it was before dynd and I don't know where dynd fit in the big
picture. Do dynd reuse minivect itself?

thanks

Frédéric

On Mon, Jun 24, 2013 at 11:46 AM, Mark Wiebe  wrote:
...
On Wed, Jun 19, 2013 at 7:48 AM, Charles R Harris <
charlesr.harris@gmail.com> wrote:
...
On Wed, Jun 19, 2013 at 5:45 AM, Matthew Brett wrote:
...
Hi,
...
Hi,
On Mon, Jun 17, 2013 at 5:03 PM, Julian Taylor
 wrote:
...
On 17.06.2013 17:11, Frédéric Bastien wrote:
...
Hi,
I saw that recently Julian Taylor is doing many low level
optimization
...
...
like using SSE instruction. I think it is great.
Last year, Mark Florisson released the minivect[1] project that he
worked on during is master thesis. minivect is a compiler for
element-wise expression that do some of the same low level
optimization
that Julian is doing in NumPy right now.
Mark did minivect in a way that allow it to be reused by other
...
...
...
It is used now by Cython and Numba I think. I had plan to reuse it
in
Theano, but I didn't got the time to integrate it up to now.
What about reusing it in NumPy? I think that some of Julian
optimization
aren't in minivect (I didn't check to confirm). But from I heard,
minivect don't implement reduction and there is a pull request to
optimize this in NumPy.
Hi,
what I vectorized is just the really easy cases of unit stride
continuous operations, so the min/max reductions which is now in numpy
is in essence pretty trivial.
minivect goes much further in optimizing general strided access and
broadcasting via loop optimizations (it seems to have a lot of overlap
with the graphite loop optimizer available in GCC [0]) so my code is
probably not of very much use to minivect.
The most interesting part in minivect for numpy is probably the
optimization of broadcasting loops which seem to be pretty inefficient
in numpy [0].
Concerning the rest I'm not sure how much of a bottleneck general
strided operations really are in common numpy using code.
I guess a similar discussion about adding an expression compiler to
numpy has already happened when numexpr was released?
If yes what was the outcome of that?
I don't recall a discussion when numexpr was done as this is before I
read
this list. numexpr do optimization that can't be done by NumPy: fusing
element-wise operation in one call. So I don't see how it could be
done to
reuse it in NumPy.
You call your optimization trivial, but I don't. In the git log of
NumPy,
the first commit is in 2001. It is the first time someone do this in 12
years! Also, this give 1.5-8x speed up (from memory from your PR
description). This is not negligible. But how much time did you spend
on
them? Also, some of them are processor dependent, how many people in
On Wed, Jun 19, 2013 at 1:43 AM, Frédéric Bastien 
wrote:
project.
this
...
list already have done this? I suppose not many.
Yes, your optimization don't cover all cases that minivect do. I see 2
level
of optimization. 1) The inner loop/contiguous cases, 2) the strided,
broadcasted level. We don't need all optimization being done for them
to be
useful. Any of them are useful.
So what I think is that we could reuse/share that work. NumPy have c
code
generator. They could call minivect code generator for some of them
when
compiling NumPy. This will make optimization done to those code
generator
reused by more people. For example, when new processor are launched,
we will
need only 1 place to change for many projects. Or for example, it the
call
to MKL vector library is done there, more people will benefit from it.
Right
now, only numexpr do it.
About the level 2 optimization (strides, broadcast), I never read
NumPy code
that deal with that. Do someone that know it have an idea if it would
be
possible to reuse minivect for this?
Would someone be able to guide some of the numpy C experts into a room
to do some thinking / writing on this at the scipy conference?
I completely agree that these kind of optimizations and code sharing
seem likely to be very important for the future.
I'm not at the conference, but if there's anything I can do to help,
please someone let me know.
Concerning the future development of numpy, I'd also suggest that we look
at libdynd https://github.com/ContinuumIO/libdynd. It looks to me like
it is reaching a level of maturity where it is worth trying to plan out a
long term path to merger.
I'm in Austin for SciPy, and will giving a talk on the dynd library on
Thursday, please drop by if you can make it, I'm very interested in
cross-pollination of ideas between numpy, libdynd, blaze, and other array
programming projects. The Python exposure of dynd as it is now can
transform data to/from numpy via views very easily, where the data is
compatible, and I expect libdynd and numpy to live alongside each other for
quite some time. One possible way things could work is to think of libdynd
as a more rapidly changing "playground" for functionality that would be
nice to have in numpy, without the guarantees of C-level ABI or API
backwards compatibility that numpy has, at least before libdynd 1.0.
Cheers,
Mark
...
Chuck
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion