Hi,
I saw that recently Julian Taylor is doing many low level optimization like using SSE instruction. I think it is great.
Last year, Mark Florisson released the minivect[1] project that he worked on during is master thesis. minivect is a compiler for element-wise expression that do some of the same low level optimization that Julian is doing in NumPy right now.
Mark did minivect in a way that allow it to be reused by other project. It is used now by Cython and Numba I think. I had plan to reuse it in Theano, but I didn't got the time to integrate it up to now.
What about reusing it in NumPy? I think that some of Julian optimization aren't in minivect (I didn't check to confirm). But from I heard, minivect don't implement reduction and there is a pull request to optimize this in NumPy.
The advantage to concentrate some functionality in one common package is that more project benefit from optimization done to it. (after the work to use it first!)
How this could be done in NumPy? NumPy have its own code generator for many dtype. We could call minivect code generator to replace some of it.
What do you think of this idea?
Sadly, I won't be able to spent time on the code for this, but I wanted to raise the idea while people are working on that, in case it is helpful.
Frédéric
On 17.06.2013 17:11, Frédéric Bastien wrote:
Hi,
I saw that recently Julian Taylor is doing many low level optimization like using SSE instruction. I think it is great.
Last year, Mark Florisson released the minivect[1] project that he worked on during is master thesis. minivect is a compiler for element-wise expression that do some of the same low level optimization that Julian is doing in NumPy right now.
Mark did minivect in a way that allow it to be reused by other project. It is used now by Cython and Numba I think. I had plan to reuse it in Theano, but I didn't got the time to integrate it up to now.
What about reusing it in NumPy? I think that some of Julian optimization aren't in minivect (I didn't check to confirm). But from I heard, minivect don't implement reduction and there is a pull request to optimize this in NumPy.
Hi, what I vectorized is just the really easy cases of unit stride continuous operations, so the min/max reductions which is now in numpy is in essence pretty trivial. minivect goes much further in optimizing general strided access and broadcasting via loop optimizations (it seems to have a lot of overlap with the graphite loop optimizer available in GCC [0]) so my code is probably not of very much use to minivect.
The most interesting part in minivect for numpy is probably the optimization of broadcasting loops which seem to be pretty inefficient in numpy [0].
Concerning the rest I'm not sure how much of a bottleneck general strided operations really are in common numpy using code.
I guess a similar discussion about adding an expression compiler to numpy has already happened when numexpr was released? If yes what was the outcome of that?
[0] http://gcc.gnu.org/wiki/Graphite [1] ones((5000,100)) - ones((100,) spends about 40% of its time copying stuff around in buffers
On 06/17/2013 11:03 PM, Julian Taylor wrote:
On 17.06.2013 17:11, Frédéric Bastien wrote:
Hi,
I saw that recently Julian Taylor is doing many low level optimization like using SSE instruction. I think it is great.
Last year, Mark Florisson released the minivect[1] project that he worked on during is master thesis. minivect is a compiler for element-wise expression that do some of the same low level optimization that Julian is doing in NumPy right now.
Mark did minivect in a way that allow it to be reused by other project. It is used now by Cython and Numba I think. I had plan to reuse it in Theano, but I didn't got the time to integrate it up to now.
What about reusing it in NumPy? I think that some of Julian optimization aren't in minivect (I didn't check to confirm). But from I heard, minivect don't implement reduction and there is a pull request to optimize this in NumPy.
Hi, what I vectorized is just the really easy cases of unit stride continuous operations, so the min/max reductions which is now in numpy is in essence pretty trivial. minivect goes much further in optimizing general strided access and broadcasting via loop optimizations (it seems to have a lot of overlap with the graphite loop optimizer available in GCC [0]) so my code is probably not of very much use to minivect.
The most interesting part in minivect for numpy is probably the optimization of broadcasting loops which seem to be pretty inefficient in numpy [0].
There's also related things like
arr + arr.T
which has much less than optimal performance in NumPy (unless there was recent changes). This example was one of the motivating examples for minivect.
Dag Sverre
Hi,
On Mon, Jun 17, 2013 at 5:03 PM, Julian Taylor < jtaylor.debian@googlemail.com> wrote:
On 17.06.2013 17:11, Frédéric Bastien wrote:
Hi,
I saw that recently Julian Taylor is doing many low level optimization like using SSE instruction. I think it is great.
Last year, Mark Florisson released the minivect[1] project that he worked on during is master thesis. minivect is a compiler for element-wise expression that do some of the same low level optimization that Julian is doing in NumPy right now.
Mark did minivect in a way that allow it to be reused by other project. It is used now by Cython and Numba I think. I had plan to reuse it in Theano, but I didn't got the time to integrate it up to now.
What about reusing it in NumPy? I think that some of Julian optimization aren't in minivect (I didn't check to confirm). But from I heard, minivect don't implement reduction and there is a pull request to optimize this in NumPy.
Hi, what I vectorized is just the really easy cases of unit stride continuous operations, so the min/max reductions which is now in numpy is in essence pretty trivial. minivect goes much further in optimizing general strided access and broadcasting via loop optimizations (it seems to have a lot of overlap with the graphite loop optimizer available in GCC [0]) so my code is probably not of very much use to minivect.
The most interesting part in minivect for numpy is probably the optimization of broadcasting loops which seem to be pretty inefficient in numpy [0].
Concerning the rest I'm not sure how much of a bottleneck general strided operations really are in common numpy using code.
I guess a similar discussion about adding an expression compiler to numpy has already happened when numexpr was released? If yes what was the outcome of that?
I don't recall a discussion when numexpr was done as this is before I read this list. numexpr do optimization that can't be done by NumPy: fusing element-wise operation in one call. So I don't see how it could be done to reuse it in NumPy.
You call your optimization trivial, but I don't. In the git log of NumPy, the first commit is in 2001. It is the first time someone do this in 12 years! Also, this give 1.5-8x speed up (from memory from your PR description). This is not negligible. But how much time did you spend on them? Also, some of them are processor dependent, how many people in this list already have done this? I suppose not many.
Yes, your optimization don't cover all cases that minivect do. I see 2 level of optimization. 1) The inner loop/contiguous cases, 2) the strided, broadcasted level. We don't need all optimization being done for them to be useful. Any of them are useful.
So what I think is that we could reuse/share that work. NumPy have c code generator. They could call minivect code generator for some of them when compiling NumPy. This will make optimization done to those code generator reused by more people. For example, when new processor are launched, we will need only 1 place to change for many projects. Or for example, it the call to MKL vector library is done there, more people will benefit from it. Right now, only numexpr do it.
About the level 2 optimization (strides, broadcast), I never read NumPy code that deal with that. Do someone that know it have an idea if it would be possible to reuse minivect for this?
Frédéric
Hi,
On Wed, Jun 19, 2013 at 1:43 AM, Frédéric Bastien nouiz@nouiz.org wrote:
Hi,
On Mon, Jun 17, 2013 at 5:03 PM, Julian Taylor jtaylor.debian@googlemail.com wrote:
On 17.06.2013 17:11, Frédéric Bastien wrote:
Hi,
I saw that recently Julian Taylor is doing many low level optimization like using SSE instruction. I think it is great.
Last year, Mark Florisson released the minivect[1] project that he worked on during is master thesis. minivect is a compiler for element-wise expression that do some of the same low level optimization that Julian is doing in NumPy right now.
Mark did minivect in a way that allow it to be reused by other project. It is used now by Cython and Numba I think. I had plan to reuse it in Theano, but I didn't got the time to integrate it up to now.
What about reusing it in NumPy? I think that some of Julian optimization aren't in minivect (I didn't check to confirm). But from I heard, minivect don't implement reduction and there is a pull request to optimize this in NumPy.
Hi, what I vectorized is just the really easy cases of unit stride continuous operations, so the min/max reductions which is now in numpy is in essence pretty trivial. minivect goes much further in optimizing general strided access and broadcasting via loop optimizations (it seems to have a lot of overlap with the graphite loop optimizer available in GCC [0]) so my code is probably not of very much use to minivect.
The most interesting part in minivect for numpy is probably the optimization of broadcasting loops which seem to be pretty inefficient in numpy [0].
Concerning the rest I'm not sure how much of a bottleneck general strided operations really are in common numpy using code.
I guess a similar discussion about adding an expression compiler to numpy has already happened when numexpr was released? If yes what was the outcome of that?
I don't recall a discussion when numexpr was done as this is before I read this list. numexpr do optimization that can't be done by NumPy: fusing element-wise operation in one call. So I don't see how it could be done to reuse it in NumPy.
You call your optimization trivial, but I don't. In the git log of NumPy, the first commit is in 2001. It is the first time someone do this in 12 years! Also, this give 1.5-8x speed up (from memory from your PR description). This is not negligible. But how much time did you spend on them? Also, some of them are processor dependent, how many people in this list already have done this? I suppose not many.
Yes, your optimization don't cover all cases that minivect do. I see 2 level of optimization. 1) The inner loop/contiguous cases, 2) the strided, broadcasted level. We don't need all optimization being done for them to be useful. Any of them are useful.
So what I think is that we could reuse/share that work. NumPy have c code generator. They could call minivect code generator for some of them when compiling NumPy. This will make optimization done to those code generator reused by more people. For example, when new processor are launched, we will need only 1 place to change for many projects. Or for example, it the call to MKL vector library is done there, more people will benefit from it. Right now, only numexpr do it.
About the level 2 optimization (strides, broadcast), I never read NumPy code that deal with that. Do someone that know it have an idea if it would be possible to reuse minivect for this?
Would someone be able to guide some of the numpy C experts into a room to do some thinking / writing on this at the scipy conference?
I completely agree that these kind of optimizations and code sharing seem likely to be very important for the future.
I'm not at the conference, but if there's anything I can do to help, please someone let me know.
Cheers,
Matthew
On Wed, Jun 19, 2013 at 5:45 AM, Matthew Brett matthew.brett@gmail.comwrote:
Hi,
On Wed, Jun 19, 2013 at 1:43 AM, Frédéric Bastien nouiz@nouiz.org wrote:
Hi,
On Mon, Jun 17, 2013 at 5:03 PM, Julian Taylor jtaylor.debian@googlemail.com wrote:
On 17.06.2013 17:11, Frédéric Bastien wrote:
Hi,
I saw that recently Julian Taylor is doing many low level optimization like using SSE instruction. I think it is great.
Last year, Mark Florisson released the minivect[1] project that he worked on during is master thesis. minivect is a compiler for element-wise expression that do some of the same low level
optimization
that Julian is doing in NumPy right now.
Mark did minivect in a way that allow it to be reused by other
project.
It is used now by Cython and Numba I think. I had plan to reuse it in Theano, but I didn't got the time to integrate it up to now.
What about reusing it in NumPy? I think that some of Julian
optimization
aren't in minivect (I didn't check to confirm). But from I heard, minivect don't implement reduction and there is a pull request to optimize this in NumPy.
Hi, what I vectorized is just the really easy cases of unit stride continuous operations, so the min/max reductions which is now in numpy is in essence pretty trivial. minivect goes much further in optimizing general strided access and broadcasting via loop optimizations (it seems to have a lot of overlap with the graphite loop optimizer available in GCC [0]) so my code is probably not of very much use to minivect.
The most interesting part in minivect for numpy is probably the optimization of broadcasting loops which seem to be pretty inefficient in numpy [0].
Concerning the rest I'm not sure how much of a bottleneck general strided operations really are in common numpy using code.
I guess a similar discussion about adding an expression compiler to numpy has already happened when numexpr was released? If yes what was the outcome of that?
I don't recall a discussion when numexpr was done as this is before I
read
this list. numexpr do optimization that can't be done by NumPy: fusing element-wise operation in one call. So I don't see how it could be done
to
reuse it in NumPy.
You call your optimization trivial, but I don't. In the git log of NumPy, the first commit is in 2001. It is the first time someone do this in 12 years! Also, this give 1.5-8x speed up (from memory from your PR description). This is not negligible. But how much time did you spend on them? Also, some of them are processor dependent, how many people in this list already have done this? I suppose not many.
Yes, your optimization don't cover all cases that minivect do. I see 2
level
of optimization. 1) The inner loop/contiguous cases, 2) the strided, broadcasted level. We don't need all optimization being done for them to
be
useful. Any of them are useful.
So what I think is that we could reuse/share that work. NumPy have c code generator. They could call minivect code generator for some of them when compiling NumPy. This will make optimization done to those code generator reused by more people. For example, when new processor are launched, we
will
need only 1 place to change for many projects. Or for example, it the
call
to MKL vector library is done there, more people will benefit from it.
Right
now, only numexpr do it.
About the level 2 optimization (strides, broadcast), I never read NumPy
code
that deal with that. Do someone that know it have an idea if it would be possible to reuse minivect for this?
Would someone be able to guide some of the numpy C experts into a room to do some thinking / writing on this at the scipy conference?
I completely agree that these kind of optimizations and code sharing seem likely to be very important for the future.
I'm not at the conference, but if there's anything I can do to help, please someone let me know.
Concerning the future development of numpy, I'd also suggest that we look at libdynd https://github.com/ContinuumIO/libdynd. It looks to me like it is reaching a level of maturity where it is worth trying to plan out a long term path to merger.
Chuck
I didn't know about this project. It is interresting.
Is some of you discuss this at the scipy conference, it would be appreciated if you write here a summary of that. I won't be there this year.
thanks
Frédéric
On Wed, Jun 19, 2013 at 8:48 AM, Charles R Harris <charlesr.harris@gmail.com
wrote:
On Wed, Jun 19, 2013 at 5:45 AM, Matthew Brett matthew.brett@gmail.comwrote:
Hi,
On Wed, Jun 19, 2013 at 1:43 AM, Frédéric Bastien nouiz@nouiz.org wrote:
Hi,
On Mon, Jun 17, 2013 at 5:03 PM, Julian Taylor jtaylor.debian@googlemail.com wrote:
On 17.06.2013 17:11, Frédéric Bastien wrote:
Hi,
I saw that recently Julian Taylor is doing many low level
optimization
like using SSE instruction. I think it is great.
Last year, Mark Florisson released the minivect[1] project that he worked on during is master thesis. minivect is a compiler for element-wise expression that do some of the same low level
optimization
that Julian is doing in NumPy right now.
Mark did minivect in a way that allow it to be reused by other
project.
It is used now by Cython and Numba I think. I had plan to reuse it in Theano, but I didn't got the time to integrate it up to now.
What about reusing it in NumPy? I think that some of Julian
optimization
aren't in minivect (I didn't check to confirm). But from I heard, minivect don't implement reduction and there is a pull request to optimize this in NumPy.
Hi, what I vectorized is just the really easy cases of unit stride continuous operations, so the min/max reductions which is now in numpy is in essence pretty trivial. minivect goes much further in optimizing general strided access and broadcasting via loop optimizations (it seems to have a lot of overlap with the graphite loop optimizer available in GCC [0]) so my code is probably not of very much use to minivect.
The most interesting part in minivect for numpy is probably the optimization of broadcasting loops which seem to be pretty inefficient in numpy [0].
Concerning the rest I'm not sure how much of a bottleneck general strided operations really are in common numpy using code.
I guess a similar discussion about adding an expression compiler to numpy has already happened when numexpr was released? If yes what was the outcome of that?
I don't recall a discussion when numexpr was done as this is before I
read
this list. numexpr do optimization that can't be done by NumPy: fusing element-wise operation in one call. So I don't see how it could be done
to
reuse it in NumPy.
You call your optimization trivial, but I don't. In the git log of
NumPy,
the first commit is in 2001. It is the first time someone do this in 12 years! Also, this give 1.5-8x speed up (from memory from your PR description). This is not negligible. But how much time did you spend on them? Also, some of them are processor dependent, how many people in
this
list already have done this? I suppose not many.
Yes, your optimization don't cover all cases that minivect do. I see 2
level
of optimization. 1) The inner loop/contiguous cases, 2) the strided, broadcasted level. We don't need all optimization being done for them
to be
useful. Any of them are useful.
So what I think is that we could reuse/share that work. NumPy have c
code
generator. They could call minivect code generator for some of them when compiling NumPy. This will make optimization done to those code
generator
reused by more people. For example, when new processor are launched, we
will
need only 1 place to change for many projects. Or for example, it the
call
to MKL vector library is done there, more people will benefit from it.
Right
now, only numexpr do it.
About the level 2 optimization (strides, broadcast), I never read NumPy
code
that deal with that. Do someone that know it have an idea if it would be possible to reuse minivect for this?
Would someone be able to guide some of the numpy C experts into a room to do some thinking / writing on this at the scipy conference?
I completely agree that these kind of optimizations and code sharing seem likely to be very important for the future.
I'm not at the conference, but if there's anything I can do to help, please someone let me know.
Concerning the future development of numpy, I'd also suggest that we look at libdynd https://github.com/ContinuumIO/libdynd. It looks to me like it is reaching a level of maturity where it is worth trying to plan out a long term path to merger.
Chuck
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Wed, Jun 19, 2013 at 7:48 AM, Charles R Harris <charlesr.harris@gmail.com
wrote:
On Wed, Jun 19, 2013 at 5:45 AM, Matthew Brett matthew.brett@gmail.comwrote:
Hi,
On Wed, Jun 19, 2013 at 1:43 AM, Frédéric Bastien nouiz@nouiz.org wrote:
Hi,
On Mon, Jun 17, 2013 at 5:03 PM, Julian Taylor jtaylor.debian@googlemail.com wrote:
On 17.06.2013 17:11, Frédéric Bastien wrote:
Hi,
I saw that recently Julian Taylor is doing many low level
optimization
like using SSE instruction. I think it is great.
Last year, Mark Florisson released the minivect[1] project that he worked on during is master thesis. minivect is a compiler for element-wise expression that do some of the same low level
optimization
that Julian is doing in NumPy right now.
Mark did minivect in a way that allow it to be reused by other
project.
It is used now by Cython and Numba I think. I had plan to reuse it in Theano, but I didn't got the time to integrate it up to now.
What about reusing it in NumPy? I think that some of Julian
optimization
aren't in minivect (I didn't check to confirm). But from I heard, minivect don't implement reduction and there is a pull request to optimize this in NumPy.
Hi, what I vectorized is just the really easy cases of unit stride continuous operations, so the min/max reductions which is now in numpy is in essence pretty trivial. minivect goes much further in optimizing general strided access and broadcasting via loop optimizations (it seems to have a lot of overlap with the graphite loop optimizer available in GCC [0]) so my code is probably not of very much use to minivect.
The most interesting part in minivect for numpy is probably the optimization of broadcasting loops which seem to be pretty inefficient in numpy [0].
Concerning the rest I'm not sure how much of a bottleneck general strided operations really are in common numpy using code.
I guess a similar discussion about adding an expression compiler to numpy has already happened when numexpr was released? If yes what was the outcome of that?
I don't recall a discussion when numexpr was done as this is before I
read
this list. numexpr do optimization that can't be done by NumPy: fusing element-wise operation in one call. So I don't see how it could be done
to
reuse it in NumPy.
You call your optimization trivial, but I don't. In the git log of
NumPy,
the first commit is in 2001. It is the first time someone do this in 12 years! Also, this give 1.5-8x speed up (from memory from your PR description). This is not negligible. But how much time did you spend on them? Also, some of them are processor dependent, how many people in
this
list already have done this? I suppose not many.
Yes, your optimization don't cover all cases that minivect do. I see 2
level
of optimization. 1) The inner loop/contiguous cases, 2) the strided, broadcasted level. We don't need all optimization being done for them
to be
useful. Any of them are useful.
So what I think is that we could reuse/share that work. NumPy have c
code
generator. They could call minivect code generator for some of them when compiling NumPy. This will make optimization done to those code
generator
reused by more people. For example, when new processor are launched, we
will
need only 1 place to change for many projects. Or for example, it the
call
to MKL vector library is done there, more people will benefit from it.
Right
now, only numexpr do it.
About the level 2 optimization (strides, broadcast), I never read NumPy
code
that deal with that. Do someone that know it have an idea if it would be possible to reuse minivect for this?
Would someone be able to guide some of the numpy C experts into a room to do some thinking / writing on this at the scipy conference?
I completely agree that these kind of optimizations and code sharing seem likely to be very important for the future.
I'm not at the conference, but if there's anything I can do to help, please someone let me know.
Concerning the future development of numpy, I'd also suggest that we look at libdynd https://github.com/ContinuumIO/libdynd. It looks to me like it is reaching a level of maturity where it is worth trying to plan out a long term path to merger.
I'm in Austin for SciPy, and will giving a talk on the dynd library on Thursday, please drop by if you can make it, I'm very interested in cross-pollination of ideas between numpy, libdynd, blaze, and other array programming projects. The Python exposure of dynd as it is now can transform data to/from numpy via views very easily, where the data is compatible, and I expect libdynd and numpy to live alongside each other for quite some time. One possible way things could work is to think of libdynd as a more rapidly changing "playground" for functionality that would be nice to have in numpy, without the guarantees of C-level ABI or API backwards compatibility that numpy has, at least before libdynd 1.0.
Cheers, Mark
Chuck
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hi,
I wasn't able to attend this year Scipy Conference. My tutorial proposal was rejected and other deadline intefered with this conference date.
Will the presentation be recorded? If not, can you make the slide available?
What is your opinion on this question:
- Should other lib like NumPy/Theano/Cython/Numba base their elemwise implemention (or part of it) on dynd or minivect? I know cython and Numba do it, but it was before dynd and I don't know where dynd fit in the big picture. Do dynd reuse minivect itself?
thanks
Frédéric
On Mon, Jun 24, 2013 at 11:46 AM, Mark Wiebe mwwiebe@gmail.com wrote:
On Wed, Jun 19, 2013 at 7:48 AM, Charles R Harris < charlesr.harris@gmail.com> wrote:
On Wed, Jun 19, 2013 at 5:45 AM, Matthew Brett matthew.brett@gmail.comwrote:
Hi,
On Wed, Jun 19, 2013 at 1:43 AM, Frédéric Bastien nouiz@nouiz.org wrote:
Hi,
On Mon, Jun 17, 2013 at 5:03 PM, Julian Taylor jtaylor.debian@googlemail.com wrote:
On 17.06.2013 17:11, Frédéric Bastien wrote:
Hi,
I saw that recently Julian Taylor is doing many low level
optimization
like using SSE instruction. I think it is great.
Last year, Mark Florisson released the minivect[1] project that he worked on during is master thesis. minivect is a compiler for element-wise expression that do some of the same low level
optimization
that Julian is doing in NumPy right now.
Mark did minivect in a way that allow it to be reused by other
project.
It is used now by Cython and Numba I think. I had plan to reuse it
in
Theano, but I didn't got the time to integrate it up to now.
What about reusing it in NumPy? I think that some of Julian
optimization
aren't in minivect (I didn't check to confirm). But from I heard, minivect don't implement reduction and there is a pull request to optimize this in NumPy.
Hi, what I vectorized is just the really easy cases of unit stride continuous operations, so the min/max reductions which is now in numpy is in essence pretty trivial. minivect goes much further in optimizing general strided access and broadcasting via loop optimizations (it seems to have a lot of overlap with the graphite loop optimizer available in GCC [0]) so my code is probably not of very much use to minivect.
The most interesting part in minivect for numpy is probably the optimization of broadcasting loops which seem to be pretty inefficient in numpy [0].
Concerning the rest I'm not sure how much of a bottleneck general strided operations really are in common numpy using code.
I guess a similar discussion about adding an expression compiler to numpy has already happened when numexpr was released? If yes what was the outcome of that?
I don't recall a discussion when numexpr was done as this is before I
read
this list. numexpr do optimization that can't be done by NumPy: fusing element-wise operation in one call. So I don't see how it could be
done to
reuse it in NumPy.
You call your optimization trivial, but I don't. In the git log of
NumPy,
the first commit is in 2001. It is the first time someone do this in 12 years! Also, this give 1.5-8x speed up (from memory from your PR description). This is not negligible. But how much time did you spend
on
them? Also, some of them are processor dependent, how many people in
this
list already have done this? I suppose not many.
Yes, your optimization don't cover all cases that minivect do. I see 2
level
of optimization. 1) The inner loop/contiguous cases, 2) the strided, broadcasted level. We don't need all optimization being done for them
to be
useful. Any of them are useful.
So what I think is that we could reuse/share that work. NumPy have c
code
generator. They could call minivect code generator for some of them
when
compiling NumPy. This will make optimization done to those code
generator
reused by more people. For example, when new processor are launched,
we will
need only 1 place to change for many projects. Or for example, it the
call
to MKL vector library is done there, more people will benefit from it.
Right
now, only numexpr do it.
About the level 2 optimization (strides, broadcast), I never read
NumPy code
that deal with that. Do someone that know it have an idea if it would
be
possible to reuse minivect for this?
Would someone be able to guide some of the numpy C experts into a room to do some thinking / writing on this at the scipy conference?
I completely agree that these kind of optimizations and code sharing seem likely to be very important for the future.
I'm not at the conference, but if there's anything I can do to help, please someone let me know.
Concerning the future development of numpy, I'd also suggest that we look at libdynd https://github.com/ContinuumIO/libdynd. It looks to me like it is reaching a level of maturity where it is worth trying to plan out a long term path to merger.
I'm in Austin for SciPy, and will giving a talk on the dynd library on Thursday, please drop by if you can make it, I'm very interested in cross-pollination of ideas between numpy, libdynd, blaze, and other array programming projects. The Python exposure of dynd as it is now can transform data to/from numpy via views very easily, where the data is compatible, and I expect libdynd and numpy to live alongside each other for quite some time. One possible way things could work is to think of libdynd as a more rapidly changing "playground" for functionality that would be nice to have in numpy, without the guarantees of C-level ABI or API backwards compatibility that numpy has, at least before libdynd 1.0.
Cheers, Mark
Chuck
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On 06/25/2013 04:21 PM, Frédéric Bastien wrote:
Hi,
I wasn't able to attend this year Scipy Conference. My tutorial proposal was rejected and other deadline intefered with this conference date.
Will the presentation be recorded? If not, can you make the slide available?
What is your opinion on this question:
- Should other lib like NumPy/Theano/Cython/Numba base their elemwise
implemention (or part of it) on dynd or minivect? I know cython and Numba do it, but it was before dynd and I don't know where dynd fit in the big picture. Do dynd reuse minivect itself?
Actually, I think the Cython branch with minivect support was in the end not merged, due to lack of interest/manpower to maintain support for vectorization in the long term (so it was better to not add the feature than have a badly supported feature).
My understanding is that Numba is based on minivect and not on dynd, so it's more of a competitor.
Perhaps Mark Florisson will be able to comment.
Dag Sverre
On 26 June 2013 09:05, Dag Sverre Seljebotn d.s.seljebotn@astro.uio.no wrote:
On 06/25/2013 04:21 PM, Frédéric Bastien wrote:
Hi,
I wasn't able to attend this year Scipy Conference. My tutorial proposal was rejected and other deadline intefered with this conference date.
Will the presentation be recorded? If not, can you make the slide available?
What is your opinion on this question:
- Should other lib like NumPy/Theano/Cython/Numba base their elemwise
implemention (or part of it) on dynd or minivect? I know cython and Numba do it, but it was before dynd and I don't know where dynd fit in the big picture. Do dynd reuse minivect itself?
Actually, I think the Cython branch with minivect support was in the end not merged, due to lack of interest/manpower to maintain support for vectorization in the long term (so it was better to not add the feature than have a badly supported feature).
My understanding is that Numba is based on minivect and not on dynd, so it's more of a competitor.
Perhaps Mark Florisson will be able to comment.
Dag Sverre
Hey Dag,
Indeed, numba uses it for its array expression support, but it will likely remove the minivect dependency and generate a simple loop nest for now. I'm working on pykit now (https://github.com/ContinuumIO/pykit) which similarly to minivect defines its own intermediate representation, with array expressions in the form of map/reduce/scan/etc functions. The project has a broader scope than minivect, to be used by projects like numba, what but a "minivect baked in".
As such, minivect isn't really maintained any longer, and I wouldn't recommend anyone using the code at this point (just maybe some of the ideas :)).
On Wed, Jun 26, 2013 at 7:30 AM, mark florisson markflorisson88@gmail.comwrote:
On 26 June 2013 09:05, Dag Sverre Seljebotn d.s.seljebotn@astro.uio.no wrote:
On 06/25/2013 04:21 PM, Frédéric Bastien wrote:
Hi,
I wasn't able to attend this year Scipy Conference. My tutorial proposal was rejected and other deadline intefered with this conference date.
Will the presentation be recorded? If not, can you make the slide available?
What is your opinion on this question:
- Should other lib like NumPy/Theano/Cython/Numba base their elemwise
implemention (or part of it) on dynd or minivect? I know cython and Numba do it, but it was before dynd and I don't know where dynd fit in the big picture. Do dynd reuse minivect itself?
Actually, I think the Cython branch with minivect support was in the end
not
merged, due to lack of interest/manpower to maintain support for vectorization in the long term (so it was better to not add the feature
than
have a badly supported feature).
My understanding is that Numba is based on minivect and not on dynd, so
it's
more of a competitor.
Perhaps Mark Florisson will be able to comment.
Dag Sverre
Hey Dag,
Indeed, numba uses it for its array expression support, but it will likely remove the minivect dependency and generate a simple loop nest for now. I'm working on pykit now (https://github.com/ContinuumIO/pykit) which similarly to minivect defines its own intermediate representation, with array expressions in the form of map/reduce/scan/etc functions. The project has a broader scope than minivect, to be used by projects like numba, what but a "minivect baked in".
As such, minivect isn't really maintained any longer, and I wouldn't recommend anyone using the code at this point (just maybe some of the ideas :)).
Hi,
thanks for the information. I checked the repo rapidly and didn't found information on how to use it the way I expect to use it. I would like to be able to take a small Theano graph (like just elemwise operation) and make a graph in it to have it generate the c code. Do you have some tests/tests/doc that demonstrate something in that direction?
Ideally I would like to be able to implement something like this simple example:
(x ** 2).sum(1) or (x ** 2).sum()
Is pykit or Numba IR ready for that?
thanks
Frédéric
On 27 June 2013 01:48, Frédéric Bastien nouiz@nouiz.org wrote:
On Wed, Jun 26, 2013 at 7:30 AM, mark florisson markflorisson88@gmail.com wrote:
On 26 June 2013 09:05, Dag Sverre Seljebotn d.s.seljebotn@astro.uio.no wrote:
On 06/25/2013 04:21 PM, Frédéric Bastien wrote:
Hi,
I wasn't able to attend this year Scipy Conference. My tutorial proposal was rejected and other deadline intefered with this conference date.
Will the presentation be recorded? If not, can you make the slide available?
What is your opinion on this question:
- Should other lib like NumPy/Theano/Cython/Numba base their elemwise
implemention (or part of it) on dynd or minivect? I know cython and Numba do it, but it was before dynd and I don't know where dynd fit in the big picture. Do dynd reuse minivect itself?
Actually, I think the Cython branch with minivect support was in the end not merged, due to lack of interest/manpower to maintain support for vectorization in the long term (so it was better to not add the feature than have a badly supported feature).
My understanding is that Numba is based on minivect and not on dynd, so it's more of a competitor.
Perhaps Mark Florisson will be able to comment.
Dag Sverre
Hey Dag,
Indeed, numba uses it for its array expression support, but it will likely remove the minivect dependency and generate a simple loop nest for now. I'm working on pykit now (https://github.com/ContinuumIO/pykit) which similarly to minivect defines its own intermediate representation, with array expressions in the form of map/reduce/scan/etc functions. The project has a broader scope than minivect, to be used by projects like numba, what but a "minivect baked in".
As such, minivect isn't really maintained any longer, and I wouldn't recommend anyone using the code at this point (just maybe some of the ideas :)).
Hi,
thanks for the information. I checked the repo rapidly and didn't found information on how to use it the way I expect to use it. I would like to be able to take a small Theano graph (like just elemwise operation) and make a graph in it to have it generate the c code. Do you have some tests/tests/doc that demonstrate something in that direction?
Ideally I would like to be able to implement something like this simple example:
(x ** 2).sum(1) or (x ** 2).sum()
Is pykit or Numba IR ready for that?
thanks
Frédéric
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Hey Fred,
It's in no way ready for public use, it doesn't really actually do anything yet :) Numba doesn't really optimize reductions yet, so I don't think it addresses any of your needs - but the input would be a Python function (compiled from generated source code, or an AST).
I don't know how much further pykit would go beyond simple fusion and perhaps tiling - I imagine it will defer to libraries like dynd to perform actual work. This is offtopic for numpy itself, but it may be useful to Theano in the future, I'll be sure to keep you in the loop and bounce ideas of for feedback and collaboration.
Cheers,
Mark
On 19 June 2013 01:43, Frédéric Bastien nouiz@nouiz.org wrote:
Hi,
On Mon, Jun 17, 2013 at 5:03 PM, Julian Taylor jtaylor.debian@googlemail.com wrote:
On 17.06.2013 17:11, Frédéric Bastien wrote:
Hi,
I saw that recently Julian Taylor is doing many low level optimization like using SSE instruction. I think it is great.
Last year, Mark Florisson released the minivect[1] project that he worked on during is master thesis. minivect is a compiler for element-wise expression that do some of the same low level optimization that Julian is doing in NumPy right now.
Mark did minivect in a way that allow it to be reused by other project. It is used now by Cython and Numba I think. I had plan to reuse it in Theano, but I didn't got the time to integrate it up to now.
What about reusing it in NumPy? I think that some of Julian optimization aren't in minivect (I didn't check to confirm). But from I heard, minivect don't implement reduction and there is a pull request to optimize this in NumPy.
Hi, what I vectorized is just the really easy cases of unit stride continuous operations, so the min/max reductions which is now in numpy is in essence pretty trivial. minivect goes much further in optimizing general strided access and broadcasting via loop optimizations (it seems to have a lot of overlap with the graphite loop optimizer available in GCC [0]) so my code is probably not of very much use to minivect.
The most interesting part in minivect for numpy is probably the optimization of broadcasting loops which seem to be pretty inefficient in numpy [0].
Concerning the rest I'm not sure how much of a bottleneck general strided operations really are in common numpy using code.
I guess a similar discussion about adding an expression compiler to numpy has already happened when numexpr was released? If yes what was the outcome of that?
I don't recall a discussion when numexpr was done as this is before I read this list. numexpr do optimization that can't be done by NumPy: fusing element-wise operation in one call. So I don't see how it could be done to reuse it in NumPy.
You call your optimization trivial, but I don't. In the git log of NumPy, the first commit is in 2001. It is the first time someone do this in 12 years! Also, this give 1.5-8x speed up (from memory from your PR description). This is not negligible. But how much time did you spend on them? Also, some of them are processor dependent, how many people in this list already have done this? I suppose not many.
Yes, your optimization don't cover all cases that minivect do. I see 2 level of optimization. 1) The inner loop/contiguous cases, 2) the strided, broadcasted level. We don't need all optimization being done for them to be useful. Any of them are useful.
So what I think is that we could reuse/share that work. NumPy have c code generator. They could call minivect code generator for some of them when compiling NumPy. This will make optimization done to those code generator reused by more people. For example, when new processor are launched, we will need only 1 place to change for many projects. Or for example, it the call to MKL vector library is done there, more people will benefit from it. Right now, only numexpr do it.
About the level 2 optimization (strides, broadcast), I never read NumPy code that deal with that. Do someone that know it have an idea if it would be possible to reuse minivect for this?
I wouldn't attempt to, it's not really maintained any longer, though pykit will likely address what minivect did in the future (more in following email). Many of the optimizations minivect will really only shine in a runtime context where it can perform fusion, and where you can hoist out repeated computation from inner loops. I like the code reuse, especially between dynd/blaze/theano.
Frédéric
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion