From jzwinck at gmail.com Wed Oct 1 10:08:54 2014 From: jzwinck at gmail.com (John Zwinck) Date: Wed, 1 Oct 2014 22:08:54 +0800 Subject: [Numpy-discussion] Proposal: add ndarray.keys() to return dtype.names In-Reply-To: References: Message-ID: On 1 Oct 2014 04:30, "Stephan Hoyer" wrote: > > On Tue, Sep 30, 2014 at 1:22 PM, Eelco Hoogendoorn < hoogendoorn.eelco at gmail.com> wrote: >> >> On more careful reading of your words, I think we agree; indeed, if keys() is present is should return an iterable; but I don't think it should be present for non-structured arrays. > > Indeed, I think we do agree. The attribute can simply be missing (e.g., accessing it raises AttributeError) for non-structured arrays. I'm generally fine with this, though I would like to know if there is precedent for methods being present on structured arrays only. Even if there is no precedent I am still OK with the idea, I just think we should understand if how novel this will be. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Wed Oct 1 10:13:17 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Wed, 1 Oct 2014 16:13:17 +0200 Subject: [Numpy-discussion] Proposal: add ndarray.keys() to return dtype.names In-Reply-To: References: Message-ID: Well, the method will have to be present on all ndarrays, since structured arrays do not have a different type from regular arrays, only a different dtype. Thus the attribute has to be present regardless, but some Exception will have to be raised depending on the dtype, to make it quack like the kind of duck it is, so to speak. Indeed it seems like an atypical design pattern; but I don't see a problem with it. On Wed, Oct 1, 2014 at 4:08 PM, John Zwinck wrote: > On 1 Oct 2014 04:30, "Stephan Hoyer" wrote: > > > > On Tue, Sep 30, 2014 at 1:22 PM, Eelco Hoogendoorn < > hoogendoorn.eelco at gmail.com> wrote: > >> > >> On more careful reading of your words, I think we agree; indeed, if > keys() is present is should return an iterable; but I don't think it should > be present for non-structured arrays. > > > > Indeed, I think we do agree. The attribute can simply be missing (e.g., > accessing it raises AttributeError) for non-structured arrays. > > I'm generally fine with this, though I would like to know if there is > precedent for methods being present on structured arrays only. Even if > there is no precedent I am still OK with the idea, I just think we should > understand if how novel this will be. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Wed Oct 1 10:41:25 2014 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 1 Oct 2014 10:41:25 -0400 Subject: [Numpy-discussion] Proposal: add ndarray.keys() to return dtype.names In-Reply-To: References: Message-ID: Actually, if I remember correctly, special methods show up in the ndarray object when the dtype is datetime64, right? On Wed, Oct 1, 2014 at 10:13 AM, Eelco Hoogendoorn < hoogendoorn.eelco at gmail.com> wrote: > Well, the method will have to be present on all ndarrays, since structured > arrays do not have a different type from regular arrays, only a different > dtype. Thus the attribute has to be present regardless, but some Exception > will have to be raised depending on the dtype, to make it quack like the > kind of duck it is, so to speak. Indeed it seems like an atypical design > pattern; but I don't see a problem with it. > > On Wed, Oct 1, 2014 at 4:08 PM, John Zwinck wrote: > >> On 1 Oct 2014 04:30, "Stephan Hoyer" wrote: >> > >> > On Tue, Sep 30, 2014 at 1:22 PM, Eelco Hoogendoorn < >> hoogendoorn.eelco at gmail.com> wrote: >> >> >> >> On more careful reading of your words, I think we agree; indeed, if >> keys() is present is should return an iterable; but I don't think it should >> be present for non-structured arrays. >> > >> > Indeed, I think we do agree. The attribute can simply be missing (e.g., >> accessing it raises AttributeError) for non-structured arrays. >> >> I'm generally fine with this, though I would like to know if there is >> precedent for methods being present on structured arrays only. Even if >> there is no precedent I am still OK with the idea, I just think we should >> understand if how novel this will be. >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Wed Oct 1 10:47:30 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Wed, 1 Oct 2014 16:47:30 +0200 Subject: [Numpy-discussion] Proposal: add ndarray.keys() to return dtype.names In-Reply-To: References: Message-ID: Ah yes; you can use.. from types import MethodType ...to dynamically add methods to specific instances of a type. This may be cleaner or more pythonic than performing a check within the method, I dunno. On Wed, Oct 1, 2014 at 4:41 PM, Benjamin Root wrote: > Actually, if I remember correctly, special methods show up in the ndarray > object when the dtype is datetime64, right? > > On Wed, Oct 1, 2014 at 10:13 AM, Eelco Hoogendoorn < > hoogendoorn.eelco at gmail.com> wrote: > >> Well, the method will have to be present on all ndarrays, since >> structured arrays do not have a different type from regular arrays, only a >> different dtype. Thus the attribute has to be present regardless, but some >> Exception will have to be raised depending on the dtype, to make it quack >> like the kind of duck it is, so to speak. Indeed it seems like an atypical >> design pattern; but I don't see a problem with it. >> >> On Wed, Oct 1, 2014 at 4:08 PM, John Zwinck wrote: >> >>> On 1 Oct 2014 04:30, "Stephan Hoyer" wrote: >>> > >>> > On Tue, Sep 30, 2014 at 1:22 PM, Eelco Hoogendoorn < >>> hoogendoorn.eelco at gmail.com> wrote: >>> >> >>> >> On more careful reading of your words, I think we agree; indeed, if >>> keys() is present is should return an iterable; but I don't think it should >>> be present for non-structured arrays. >>> > >>> > Indeed, I think we do agree. The attribute can simply be missing >>> (e.g., accessing it raises AttributeError) for non-structured arrays. >>> >>> I'm generally fine with this, though I would like to know if there is >>> precedent for methods being present on structured arrays only. Even if >>> there is no precedent I am still OK with the idea, I just think we should >>> understand if how novel this will be. >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Oct 1 20:12:56 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 1 Oct 2014 18:12:56 -0600 Subject: [Numpy-discussion] Interpolation using `np.interp()` with periodic x-coordinates In-Reply-To: References: Message-ID: On Wed, Sep 24, 2014 at 3:57 PM, Saullo Castro wrote: > From the closed pull request PR #5109: > > https://github.com/numpy/numpy/pull/5109 > > it came out that the a good implementation would be adding a parameter > `period`. I would like to know about the community's interest for this > implementation. > > The modification are shown here: > > https://github.com/saullocastro/numpy/compare/interp_with_period?expand=1 > > Please, let me know about your feedback. > I don't have any problem with allowing periodic interpolation. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebix at sebix.at Thu Oct 2 06:27:26 2014 From: sebix at sebix.at (Sebastian Wagner) Date: Thu, 02 Oct 2014 12:27:26 +0200 Subject: [Numpy-discussion] Proposal: add ndarray.keys() to return dtype.names In-Reply-To: References: Message-ID: <0294ae9ec5a919857265bed55ebffda3@fizeau.net> So, for non-structured arrays, the consens is an Exception. The question is, which one. AttributeError would be fully backwards compatible. Existing code checks for the method and if it exists, the object has fields. ValueError would make more sense, as the value - the array - is in wrong format/structure/type. regards, On 2014-10-01 16:47, Eelco Hoogendoorn wrote: > Ah yes; you can use.. > > from types import MethodType > > ...to dynamically add methods to specific instances of a type. This > may be cleaner or more pythonic than performing a check within the > method, I dunno. > > On Wed, Oct 1, 2014 at 4:41 PM, Benjamin Root wrote: > >> Actually, if I remember correctly, special methods show up in the >> ndarray object when the dtype is datetime64, right? >> >> On Wed, Oct 1, 2014 at 10:13 AM, Eelco Hoogendoorn >> wrote: >> >> Well, the method will have to be present on all ndarrays, since >> structured arrays do not have a different type from regular arrays, >> only a different dtype. Thus the attribute has to be present >> regardless, but some Exception will have to be raised depending on >> the dtype, to make it quack like the kind of duck it is, so to >> speak. Indeed it seems like an atypical design pattern; but I don't >> see a problem with it. >> >> On Wed, Oct 1, 2014 at 4:08 PM, John Zwinck >> wrote: >> >> On 1 Oct 2014 04:30, "Stephan Hoyer" wrote: >>> >>> On Tue, Sep 30, 2014 at 1:22 PM, Eelco Hoogendoorn >> wrote: >>>> >>>> On more careful reading of your words, I think we agree; indeed, >> if keys() is present is should return an iterable; but I don't think >> it should be present for non-structured arrays. >>> >>> Indeed, I think we do agree. The attribute can simply be missing >> (e.g., accessing it raises AttributeError) for non-structured >> arrays. >> >> I'm generally fine with this, though I would like to know if there >> is precedent for methods being present on structured arrays only. >> Even if there is no precedent I am still OK with the idea, I just >> think we should understand if how novel this will be. >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion [1] >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion [1] > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion [1] > > > > Links: > ------ > [1] http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From jzwinck at gmail.com Thu Oct 2 07:09:47 2014 From: jzwinck at gmail.com (John Zwinck) Date: Thu, 2 Oct 2014 19:09:47 +0800 Subject: [Numpy-discussion] Proposal: add ndarray.keys() to return dtype.names In-Reply-To: <0294ae9ec5a919857265bed55ebffda3@fizeau.net> References: <0294ae9ec5a919857265bed55ebffda3@fizeau.net> Message-ID: On Thu, Oct 2, 2014 at 6:27 PM, Sebastian Wagner wrote: > So, for non-structured arrays, the consens is an Exception. The question > is, which one. > AttributeError would be fully backwards compatible. Existing code checks > for the method and if it exists, the object has fields. > ValueError would make more sense, as the value - the array - is in wrong > format/structure/type. If a non-structured array has its keys() raise AttributeError, I think that hasattr(arr, "keys") should return False, which implies that getattr(arr, "keys") should throw AttributeError. This would require that ndarray.__getattribute__ raise AttributeError, meaning "the attribute isn't here," not merely "the attribute doesn't have a value now." Otherwise people may rightly complain when they interrogate an array to see if it has keys(), find out that it does, but then get an error upon calling it which could have been known ahead of time. Now, to actually implement it this way would seem to require setting the "tp_getattro" function pointer (https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_getattro). Currently PyArray_Type has this field as null. If I understand everything correctly, adding a non-null function pointer here would mean some small runtime overhead to resolve every attribute on ndarray. I could easily be missing some detail which would allow us to do what I described above without a performance hit, but someone better versed would need to explain how/why. If I'm right about all that, and if the consensus is that keys() should raise an exception when dtype.names is None, then perhaps raising ValueError is the only viable option. I'd appreciate opinions from those experienced in the details of the C API. From bburan at alum.mit.edu Thu Oct 2 11:42:46 2014 From: bburan at alum.mit.edu (Brad Buran) Date: Thu, 2 Oct 2014 08:42:46 -0700 Subject: [Numpy-discussion] skip samples in random number generator Message-ID: Given the following: from numpy import random rs = random.RandomState(seed=1) # skip the first X billion samples x = rs.uniform(0, 10) How do I accomplish "skip the first X billion samples" (e.g. 7.2 billion)? I see that there's a numpy.random.RandomState.set_state which accepts (among other parameters) a value called "pos". This sounds promising, but the other parameters I'm not sure how to compute (e.g. the 1D array of 624 unsigned integers, etc.). I need to be able to skip ahead in the sequence to reproduce some signals that were generated for experiments. I could certainly consume and discard the first X billion samples; however, that seems to be computationally inefficient. Brad From robert.kern at gmail.com Thu Oct 2 11:52:11 2014 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 2 Oct 2014 16:52:11 +0100 Subject: [Numpy-discussion] skip samples in random number generator In-Reply-To: References: Message-ID: On Thu, Oct 2, 2014 at 4:42 PM, Brad Buran wrote: > Given the following: > > from numpy import random > rs = random.RandomState(seed=1) > # skip the first X billion samples > x = rs.uniform(0, 10) > > How do I accomplish "skip the first X billion samples" (e.g. 7.2 > billion)? I see that there's a numpy.random.RandomState.set_state > which accepts (among other parameters) a value called "pos". This > sounds promising, but the other parameters I'm not sure how to compute > (e.g. the 1D array of 624 unsigned integers, etc.). I need to be able > to skip ahead in the sequence to reproduce some signals that were > generated for experiments. I could certainly consume and discard the > first X billion samples; however, that seems to be computationally > inefficient. Unfortunately, it requires some significant number-theoretical precomputation for any given N number of steps that you want to skip. http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/JUMP/index.html It's also unreliable for your purpose. You don't know how many random integers were actually pulled from the core PRNG if you ever used any of the nonuniform distributions (they usually will consume a variable number of uniform pseudorandom numbers to give you a single nonuniform number). Instead, you should just pickle the RandomState object just before you start using it for anything that you want to reproduce. The unpickled RandomState will reproduce the same numbers the first one did. -- Robert Kern From njs at pobox.com Thu Oct 2 12:28:47 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 2 Oct 2014 17:28:47 +0100 Subject: [Numpy-discussion] skip samples in random number generator In-Reply-To: References: Message-ID: On 2 Oct 2014 16:52, "Robert Kern" wrote: > > On Thu, Oct 2, 2014 at 4:42 PM, Brad Buran wrote: > > Given the following: > > > > from numpy import random > > rs = random.RandomState(seed=1) > > # skip the first X billion samples > > x = rs.uniform(0, 10) > > > > How do I accomplish "skip the first X billion samples" (e.g. 7.2 > > billion)? I see that there's a numpy.random.RandomState.set_state > > which accepts (among other parameters) a value called "pos". This > > sounds promising, but the other parameters I'm not sure how to compute > > (e.g. the 1D array of 624 unsigned integers, etc.). I need to be able > > to skip ahead in the sequence to reproduce some signals that were > > generated for experiments. I could certainly consume and discard the > > first X billion samples; however, that seems to be computationally > > inefficient. > > Unfortunately, it requires some significant number-theoretical > precomputation for any given N number of steps that you want to skip. > > http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/JUMP/index.html If someone really wanted this functionality then I suppose it would be possible to precompute the special jump coefficients for lengths 2, 4, 8, 16, 32, ..., and then perform arbitrary jumps using a sequence of smaller jumps. (The coefficient table could be shipped with the source code.) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From saullogiovani at gmail.com Thu Oct 2 13:07:41 2014 From: saullogiovani at gmail.com (Saullo Castro) Date: Thu, 2 Oct 2014 19:07:41 +0200 Subject: [Numpy-discussion] Interpolation using `np.interp()` with periodic x-coordinates (Saullo Castro) Message-ID: Jaime has helped me and PR#5117 is getting ready to be merged ... 2014-10-02 19:00 GMT+02:00 : > Send NumPy-Discussion mailing list submissions to > numpy-discussion at scipy.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.scipy.org/mailman/listinfo/numpy-discussion > or, via email, send a message with subject or body 'help' to > numpy-discussion-request at scipy.org > > You can reach the person managing the list at > numpy-discussion-owner at scipy.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of NumPy-Discussion digest..." > > > Today's Topics: > > 1. Re: Interpolation using `np.interp()` with periodic > x-coordinates (Charles R Harris) > 2. Re: Proposal: add ndarray.keys() to return dtype.names > (Sebastian Wagner) > 3. Re: Proposal: add ndarray.keys() to return dtype.names > (John Zwinck) > 4. skip samples in random number generator (Brad Buran) > 5. Re: skip samples in random number generator (Robert Kern) > 6. Re: skip samples in random number generator (Nathaniel Smith) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Wed, 1 Oct 2014 18:12:56 -0600 > From: Charles R Harris > Subject: Re: [Numpy-discussion] Interpolation using `np.interp()` with > periodic x-coordinates > To: Discussion of Numerical Python > Message-ID: > o-DgkxWQfvWOB6A at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > On Wed, Sep 24, 2014 at 3:57 PM, Saullo Castro > wrote: > > > From the closed pull request PR #5109: > > > > https://github.com/numpy/numpy/pull/5109 > > > > it came out that the a good implementation would be adding a parameter > > `period`. I would like to know about the community's interest for this > > implementation. > > > > The modification are shown here: > > > > > https://github.com/saullocastro/numpy/compare/interp_with_period?expand=1 > > > > Please, let me know about your feedback. > > > > I don't have any problem with allowing periodic interpolation. > > Chuck > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mail.scipy.org/pipermail/numpy-discussion/attachments/20141001/519ac77f/attachment-0001.html > > ------------------------------ > > Message: 2 > Date: Thu, 02 Oct 2014 12:27:26 +0200 > From: Sebastian Wagner > Subject: Re: [Numpy-discussion] Proposal: add ndarray.keys() to return > dtype.names > To: Discussion of Numerical Python > Message-ID: <0294ae9ec5a919857265bed55ebffda3 at fizeau.net> > Content-Type: text/plain; charset=US-ASCII; format=flowed > > So, for non-structured arrays, the consens is an Exception. The question > is, which one. > AttributeError would be fully backwards compatible. Existing code checks > for the method and if it exists, the object has fields. > ValueError would make more sense, as the value - the array - is in wrong > format/structure/type. > > regards, > > On 2014-10-01 16:47, Eelco Hoogendoorn wrote: > > Ah yes; you can use.. > > > > from types import MethodType > > > > ...to dynamically add methods to specific instances of a type. This > > may be cleaner or more pythonic than performing a check within the > > method, I dunno. > > > > On Wed, Oct 1, 2014 at 4:41 PM, Benjamin Root wrote: > > > >> Actually, if I remember correctly, special methods show up in the > >> ndarray object when the dtype is datetime64, right? > >> > >> On Wed, Oct 1, 2014 at 10:13 AM, Eelco Hoogendoorn > >> wrote: > >> > >> Well, the method will have to be present on all ndarrays, since > >> structured arrays do not have a different type from regular arrays, > >> only a different dtype. Thus the attribute has to be present > >> regardless, but some Exception will have to be raised depending on > >> the dtype, to make it quack like the kind of duck it is, so to > >> speak. Indeed it seems like an atypical design pattern; but I don't > >> see a problem with it. > >> > >> On Wed, Oct 1, 2014 at 4:08 PM, John Zwinck > >> wrote: > >> > >> On 1 Oct 2014 04:30, "Stephan Hoyer" wrote: > >>> > >>> On Tue, Sep 30, 2014 at 1:22 PM, Eelco Hoogendoorn > >> wrote: > >>>> > >>>> On more careful reading of your words, I think we agree; indeed, > >> if keys() is present is should return an iterable; but I don't think > >> it should be present for non-structured arrays. > >>> > >>> Indeed, I think we do agree. The attribute can simply be missing > >> (e.g., accessing it raises AttributeError) for non-structured > >> arrays. > >> > >> I'm generally fine with this, though I would like to know if there > >> is precedent for methods being present on structured arrays only. > >> Even if there is no precedent I am still OK with the idea, I just > >> think we should understand if how novel this will be. > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion [1] > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion [1] > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion [1] > > > > > > > > Links: > > ------ > > [1] http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > ------------------------------ > > Message: 3 > Date: Thu, 2 Oct 2014 19:09:47 +0800 > From: John Zwinck > Subject: Re: [Numpy-discussion] Proposal: add ndarray.keys() to return > dtype.names > To: Discussion of Numerical Python > Message-ID: > Sog at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Thu, Oct 2, 2014 at 6:27 PM, Sebastian Wagner wrote: > > So, for non-structured arrays, the consens is an Exception. The question > > is, which one. > > AttributeError would be fully backwards compatible. Existing code checks > > for the method and if it exists, the object has fields. > > ValueError would make more sense, as the value - the array - is in wrong > > format/structure/type. > > If a non-structured array has its keys() raise AttributeError, I think > that hasattr(arr, "keys") should return False, which implies that > getattr(arr, "keys") should throw AttributeError. This would require > that ndarray.__getattribute__ raise AttributeError, meaning "the > attribute isn't here," not merely "the attribute doesn't have a value > now." Otherwise people may rightly complain when they interrogate an > array to see if it has keys(), find out that it does, but then get an > error upon calling it which could have been known ahead of time. > > Now, to actually implement it this way would seem to require setting > the "tp_getattro" function pointer > (https://docs.python.org/3/c-api/typeobj.html#c.PyTypeObject.tp_getattro). > Currently PyArray_Type has this field as null. If I understand > everything correctly, adding a non-null function pointer here would > mean some small runtime overhead to resolve every attribute on > ndarray. I could easily be missing some detail which would allow us > to do what I described above without a performance hit, but someone > better versed would need to explain how/why. > > If I'm right about all that, and if the consensus is that keys() > should raise an exception when dtype.names is None, then perhaps > raising ValueError is the only viable option. > > I'd appreciate opinions from those experienced in the details of the C API. > > > ------------------------------ > > Message: 4 > Date: Thu, 2 Oct 2014 08:42:46 -0700 > From: Brad Buran > Subject: [Numpy-discussion] skip samples in random number generator > To: Discussion of Numerical Python > Message-ID: > < > CAHb_y2LaLg3A4Jch9ErKMcDMHMzhKcVS0W20CeNKxBxBHL1o8A at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > Given the following: > > from numpy import random > rs = random.RandomState(seed=1) > # skip the first X billion samples > x = rs.uniform(0, 10) > > How do I accomplish "skip the first X billion samples" (e.g. 7.2 > billion)? I see that there's a numpy.random.RandomState.set_state > which accepts (among other parameters) a value called "pos". This > sounds promising, but the other parameters I'm not sure how to compute > (e.g. the 1D array of 624 unsigned integers, etc.). I need to be able > to skip ahead in the sequence to reproduce some signals that were > generated for experiments. I could certainly consume and discard the > first X billion samples; however, that seems to be computationally > inefficient. > > Brad > > > ------------------------------ > > Message: 5 > Date: Thu, 2 Oct 2014 16:52:11 +0100 > From: Robert Kern > Subject: Re: [Numpy-discussion] skip samples in random number > generator > To: Discussion of Numerical Python > Message-ID: > < > CAF6FJivcx3q+TWYoLXbdzBCgc2uk8isV83ZU-+28LDyndh5SSg at mail.gmail.com> > Content-Type: text/plain; charset=UTF-8 > > On Thu, Oct 2, 2014 at 4:42 PM, Brad Buran wrote: > > Given the following: > > > > from numpy import random > > rs = random.RandomState(seed=1) > > # skip the first X billion samples > > x = rs.uniform(0, 10) > > > > How do I accomplish "skip the first X billion samples" (e.g. 7.2 > > billion)? I see that there's a numpy.random.RandomState.set_state > > which accepts (among other parameters) a value called "pos". This > > sounds promising, but the other parameters I'm not sure how to compute > > (e.g. the 1D array of 624 unsigned integers, etc.). I need to be able > > to skip ahead in the sequence to reproduce some signals that were > > generated for experiments. I could certainly consume and discard the > > first X billion samples; however, that seems to be computationally > > inefficient. > > Unfortunately, it requires some significant number-theoretical > precomputation for any given N number of steps that you want to skip. > > http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/JUMP/index.html > > It's also unreliable for your purpose. You don't know how many random > integers were actually pulled from the core PRNG if you ever used any > of the nonuniform distributions (they usually will consume a variable > number of uniform pseudorandom numbers to give you a single nonuniform > number). > > Instead, you should just pickle the RandomState object just before you > start using it for anything that you want to reproduce. The unpickled > RandomState will reproduce the same numbers the first one did. > > -- > Robert Kern > > > ------------------------------ > > Message: 6 > Date: Thu, 2 Oct 2014 17:28:47 +0100 > From: Nathaniel Smith > Subject: Re: [Numpy-discussion] skip samples in random number > generator > To: Discussion of Numerical Python > Message-ID: > < > CAPJVwBmZBQU-nYO1SE6SCvN+6eOTW9mrzruwDKkUF188bGfT_A at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > On 2 Oct 2014 16:52, "Robert Kern" wrote: > > > > On Thu, Oct 2, 2014 at 4:42 PM, Brad Buran wrote: > > > Given the following: > > > > > > from numpy import random > > > rs = random.RandomState(seed=1) > > > # skip the first X billion samples > > > x = rs.uniform(0, 10) > > > > > > How do I accomplish "skip the first X billion samples" (e.g. 7.2 > > > billion)? I see that there's a numpy.random.RandomState.set_state > > > which accepts (among other parameters) a value called "pos". This > > > sounds promising, but the other parameters I'm not sure how to compute > > > (e.g. the 1D array of 624 unsigned integers, etc.). I need to be able > > > to skip ahead in the sequence to reproduce some signals that were > > > generated for experiments. I could certainly consume and discard the > > > first X billion samples; however, that seems to be computationally > > > inefficient. > > > > Unfortunately, it requires some significant number-theoretical > > precomputation for any given N number of steps that you want to skip. > > > > http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/JUMP/index.html > > If someone really wanted this functionality then I suppose it would be > possible to precompute the special jump coefficients for lengths 2, 4, 8, > 16, 32, ..., and then perform arbitrary jumps using a sequence of smaller > jumps. (The coefficient table could be shipped with the source code.) > > -n > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: > http://mail.scipy.org/pipermail/numpy-discussion/attachments/20141002/d58de1c1/attachment-0001.html > > ------------------------------ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > End of NumPy-Discussion Digest, Vol 97, Issue 3 > *********************************************** > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu Oct 2 13:14:47 2014 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 2 Oct 2014 18:14:47 +0100 Subject: [Numpy-discussion] skip samples in random number generator In-Reply-To: References: Message-ID: On Thu, Oct 2, 2014 at 5:28 PM, Nathaniel Smith wrote: > On 2 Oct 2014 16:52, "Robert Kern" wrote: >> >> On Thu, Oct 2, 2014 at 4:42 PM, Brad Buran wrote: >> > Given the following: >> > >> > from numpy import random >> > rs = random.RandomState(seed=1) >> > # skip the first X billion samples >> > x = rs.uniform(0, 10) >> > >> > How do I accomplish "skip the first X billion samples" (e.g. 7.2 >> > billion)? I see that there's a numpy.random.RandomState.set_state >> > which accepts (among other parameters) a value called "pos". This >> > sounds promising, but the other parameters I'm not sure how to compute >> > (e.g. the 1D array of 624 unsigned integers, etc.). I need to be able >> > to skip ahead in the sequence to reproduce some signals that were >> > generated for experiments. I could certainly consume and discard the >> > first X billion samples; however, that seems to be computationally >> > inefficient. >> >> Unfortunately, it requires some significant number-theoretical >> precomputation for any given N number of steps that you want to skip. >> >> http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/JUMP/index.html > > If someone really wanted this functionality then I suppose it would be > possible to precompute the special jump coefficients for lengths 2, 4, 8, > 16, 32, ..., and then perform arbitrary jumps using a sequence of smaller > jumps. (The coefficient table could be shipped with the source code.) No one needs small jumps of arbitrary size. The real use case for jumping is to make N parallel streams that won't overlap. You pick a number, let's call it `jump_steps`, much larger than any single run of your system could possibly consume (i.e. the number of core PRNG variates pulled is << `jump_steps`). Then you can initializing N parallel streams by initializing RandomState once with a seed, storing that RandomState, then jumping ahead by `jump_steps`, storing *that* RandomState, by `2*jump_steps`, etc. to get N RandomState streams that will not overlap. Give those to your separate processes and let them run. So the alternative may actually be to just generate and distribute *one* set of these jump coefficients for a really big jump size but still leaves you enough space for a really large number of streams (fortunately, 2**19937-1 is a really big number). -- Robert Kern From sturla.molden at gmail.com Thu Oct 2 16:52:59 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 2 Oct 2014 20:52:59 +0000 (UTC) Subject: [Numpy-discussion] skip samples in random number generator References: Message-ID: <756050196433974588.278974sturla.molden-gmail.com@news.gmane.org> Robert Kern wrote: > No one needs small jumps of arbitrary size. The real use case for > jumping is to make N parallel streams that won't overlap. You pick a > number, let's call it `jump_steps`, much larger than any single run of > your system could possibly consume (i.e. the number of core PRNG > variates pulled is << `jump_steps`). Then you can initializing N > parallel streams by initializing RandomState once with a seed, storing > that RandomState, then jumping ahead by `jump_steps`, storing *that* > RandomState, by `2*jump_steps`, etc. to get N RandomState streams that > will not overlap. Give those to your separate processes and let them > run. > > So the alternative may actually be to just generate and distribute > *one* set of these jump coefficients for a really big jump size but > still leaves you enough space for a really large number of streams > (fortunately, 2**19937-1 is a really big number). DCMT might be preferred in this case. It works the same, except you have N "random state" streams with characteristic polynomials that are distinct and relatively prime to each other. Thus each of the N processes will get an independent stream of random numbers, without any chance of overlap. http://www.math.sci.hiroshima-u.ac.jp/?m-mat/MT/DC/dc.html Jump-ahead is easier to accomplish with MRG-32k3a than MT19937. Another generator with an efficient jump-ahead is XORWOW. Sturla From robert.kern at gmail.com Thu Oct 2 17:24:10 2014 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 2 Oct 2014 22:24:10 +0100 Subject: [Numpy-discussion] skip samples in random number generator In-Reply-To: <756050196433974588.278974sturla.molden-gmail.com@news.gmane.org> References: <756050196433974588.278974sturla.molden-gmail.com@news.gmane.org> Message-ID: On Thu, Oct 2, 2014 at 9:52 PM, Sturla Molden wrote: > Robert Kern wrote: > >> No one needs small jumps of arbitrary size. The real use case for >> jumping is to make N parallel streams that won't overlap. You pick a >> number, let's call it `jump_steps`, much larger than any single run of >> your system could possibly consume (i.e. the number of core PRNG >> variates pulled is << `jump_steps`). Then you can initializing N >> parallel streams by initializing RandomState once with a seed, storing >> that RandomState, then jumping ahead by `jump_steps`, storing *that* >> RandomState, by `2*jump_steps`, etc. to get N RandomState streams that >> will not overlap. Give those to your separate processes and let them >> run. >> >> So the alternative may actually be to just generate and distribute >> *one* set of these jump coefficients for a really big jump size but >> still leaves you enough space for a really large number of streams >> (fortunately, 2**19937-1 is a really big number). > > DCMT might be preferred in this case. It works the same, except you have N > "random state" streams with characteristic polynomials that are distinct > and relatively prime to each other. Thus each of the N processes will get > an independent stream of random numbers, without any chance of overlap. > > http://www.math.sci.hiroshima-u.ac.jp/?m-mat/MT/DC/dc.html Yes, but that would require rewriting much of numpy.random to allow replacing the core generator. This would work out-of-box because it's just manipulating the state of the current core generator. -- Robert Kern From bburan at alum.mit.edu Thu Oct 2 18:13:58 2014 From: bburan at alum.mit.edu (Brad Buran) Date: Thu, 2 Oct 2014 15:13:58 -0700 Subject: [Numpy-discussion] skip samples in random number generator In-Reply-To: References: <756050196433974588.278974sturla.molden-gmail.com@news.gmane.org> Message-ID: Thanks for the great input. The idea of implementing jump-ahead in numpy.random would be a very nice feature, but I don't currently have the time to work on implementing such a feature. For now, it seems the simplest approach is to cache the RandomState and reuse that later. Brad On Thu, Oct 2, 2014 at 2:24 PM, Robert Kern wrote: > On Thu, Oct 2, 2014 at 9:52 PM, Sturla Molden wrote: >> Robert Kern wrote: >> >>> No one needs small jumps of arbitrary size. The real use case for >>> jumping is to make N parallel streams that won't overlap. You pick a >>> number, let's call it `jump_steps`, much larger than any single run of >>> your system could possibly consume (i.e. the number of core PRNG >>> variates pulled is << `jump_steps`). Then you can initializing N >>> parallel streams by initializing RandomState once with a seed, storing >>> that RandomState, then jumping ahead by `jump_steps`, storing *that* >>> RandomState, by `2*jump_steps`, etc. to get N RandomState streams that >>> will not overlap. Give those to your separate processes and let them >>> run. >>> >>> So the alternative may actually be to just generate and distribute >>> *one* set of these jump coefficients for a really big jump size but >>> still leaves you enough space for a really large number of streams >>> (fortunately, 2**19937-1 is a really big number). >> >> DCMT might be preferred in this case. It works the same, except you have N >> "random state" streams with characteristic polynomials that are distinct >> and relatively prime to each other. Thus each of the N processes will get >> an independent stream of random numbers, without any chance of overlap. >> >> http://www.math.sci.hiroshima-u.ac.jp/?m-mat/MT/DC/dc.html > > Yes, but that would require rewriting much of numpy.random to allow > replacing the core generator. This would work out-of-box because it's > just manipulating the state of the current core generator. > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sturla.molden at gmail.com Thu Oct 2 18:56:33 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 2 Oct 2014 22:56:33 +0000 (UTC) Subject: [Numpy-discussion] skip samples in random number generator References: <756050196433974588.278974sturla.molden-gmail.com@news.gmane.org> Message-ID: <1016090134433983128.659345sturla.molden-gmail.com@news.gmane.org> Robert Kern wrote: > Yes, but that would require rewriting much of numpy.random to allow > replacing the core generator. This would work out-of-box because it's > just manipulating the state of the current core generator. Yes, then we just need to sacrifice a year's worth of CPU time, and a PR will be ready by next fall ;-) Sturla From tjhnson at gmail.com Thu Oct 2 19:02:32 2014 From: tjhnson at gmail.com (T J) Date: Thu, 2 Oct 2014 18:02:32 -0500 Subject: [Numpy-discussion] 0/0 == 0? Message-ID: Hi, I'm using NumPy 1.8.2: In [1]: np.array(0) / np.array(0) Out[1]: 0 In [2]: np.array(0) / np.array(0.0) Out[2]: nan In [3]: np.array(0.0) / np.array(0) Out[3]: nan In [4]: np.array(0.0) / np.array(0.0) Out[4]: nan In [5]: 0/0 --------------------------------------------------------------------------- ZeroDivisionError Traceback (most recent call last) in () ----> 1 0/0 ZeroDivisionError: integer division or modulo by zero Out[1] seems odd. I get the right value in 1.8.1. Was this fixed for 1.9.0? -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjhnson at gmail.com Thu Oct 2 19:08:56 2014 From: tjhnson at gmail.com (T J) Date: Thu, 2 Oct 2014 18:08:56 -0500 Subject: [Numpy-discussion] Round away from zero (towards +/- infinity) In-Reply-To: References: Message-ID: Any bites on this? On Wed, Sep 24, 2014 at 12:23 PM, T J wrote: > Is there a ufunc for rounding away from zero? Or do I need to do > > x2 = sign(x) * ceil(abs(x)) > > whenever I want to round away from zero? Maybe the following is better? > > x_ceil = ceil(x) > x_floor = floor(x) > x2 = where(x >= 0, x_ceil, x_floor) > > Python's round function goes away from zero, so I am looking for the NumPy > equivalent (and using vectorize() seems undesirable). In this sense, it > seems that having a ufunc for this type of rounding could be helpful. > > Aside: Is there interest in a more general around() that allows users to > specify alternative tie-breaking rules, with the default staying 'round > half to nearest even'? [1] > > Also, what is the difference between NumPy's fix() and trunc() functions? > It seems like they achieve the same goal. trunc() was added in 1.3.0. So is > fix() just legacy? > > --- > [1] > http://stackoverflow.com/questions/16000574/tie-breaking-of-round-with-numpy > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Oct 2 19:29:42 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 2 Oct 2014 17:29:42 -0600 Subject: [Numpy-discussion] 0/0 == 0? In-Reply-To: References: Message-ID: On Thu, Oct 2, 2014 at 5:02 PM, T J wrote: > Hi, I'm using NumPy 1.8.2: > > In [1]: np.array(0) / np.array(0) > Out[1]: 0 > > In [2]: np.array(0) / np.array(0.0) > Out[2]: nan > > In [3]: np.array(0.0) / np.array(0) > Out[3]: nan > > In [4]: np.array(0.0) / np.array(0.0) > Out[4]: nan > > > In [5]: 0/0 > --------------------------------------------------------------------------- > ZeroDivisionError Traceback (most recent call last) > in () > ----> 1 0/0 > > ZeroDivisionError: integer division or modulo by zero > > > > Out[1] seems odd. I get the right value in 1.8.1. Was this fixed for > 1.9.0? > > In master on Fedora I get In [1]: np.array(0) / np.array(0) /home/charris/.local/bin/ipython:1: RuntimeWarning: divide by zero encountered in divide #!/usr/bin/python Out[1]: 0 Note that the warning is only given once, it's a warnings module thing. In [2]: np.array(0) / np.array(0) Out[2]: 0 And it might even be system dependent. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Thu Oct 2 19:48:47 2014 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Thu, 2 Oct 2014 16:48:47 -0700 Subject: [Numpy-discussion] 0/0 == 0? In-Reply-To: References: Message-ID: On Thu, Oct 2, 2014 at 4:29 PM, Charles R Harris wrote: > > > On Thu, Oct 2, 2014 at 5:02 PM, T J wrote: > >> Hi, I'm using NumPy 1.8.2: >> >> In [1]: np.array(0) / np.array(0) >> Out[1]: 0 >> >> In [2]: np.array(0) / np.array(0.0) >> Out[2]: nan >> >> In [3]: np.array(0.0) / np.array(0) >> Out[3]: nan >> >> In [4]: np.array(0.0) / np.array(0.0) >> Out[4]: nan >> >> >> In [5]: 0/0 >> >> --------------------------------------------------------------------------- >> ZeroDivisionError Traceback (most recent call >> last) >> in () >> ----> 1 0/0 >> >> ZeroDivisionError: integer division or modulo by zero >> >> >> >> Out[1] seems odd. I get the right value in 1.8.1. Was this fixed for >> 1.9.0? >> >> > In master on Fedora I get > > In [1]: np.array(0) / np.array(0) > /home/charris/.local/bin/ipython:1: RuntimeWarning: divide by zero > encountered in divide > #!/usr/bin/python > Out[1]: 0 > Exact same thing on Windows with master: -------------- next part -------------- An HTML attachment was scrubbed... URL: From jareynolds at ea.com Thu Oct 2 20:18:53 2014 From: jareynolds at ea.com (Reynolds, Jay) Date: Fri, 3 Oct 2014 00:18:53 +0000 Subject: [Numpy-discussion] Problem building 64 bit numpy using MKL on Windows Message-ID: Hi, I've built numpy 64 bit using vc11, the Intel Fortran compiler and the MKL 'mkl_rt' library. *why? (see end of message for the reason, if interested) Any advice or assistance would be greatly appreciated. If I can offer additional information, I will happily do so. The build appears to go just fine (no errors noted), and numpy loads into python just fine as well. (I note a warning: ### Warning: python_xerbla.c is disabled ### -- however, it doesn't appear to be problematic?) I have also confirmed that numpy sees the mkl blas and lapack libs. >>> numpy.__config__.show() lapack_opt_info: libraries = ['mkl_lapack', 'mkl_rt'] library_dirs = ['C:\\Program Files (x86)\\Intel\\Composer XE 2015\\mkl\\lib\\intel64'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['C:\\Program Files (x86)\\Intel\\Composer XE 2015\\mkl\\include'] blas_opt_info: libraries = ['mkl_rt'] library_dirs = ['C:\\Program Files (x86)\\Intel\\Composer XE 2015\\mkl\\lib\\intel64'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['C:\\Program Files (x86)\\Intel\\Composer XE 2015\\mkl\\include'] openblas_lapack_info: NOT AVAILABLE lapack_mkl_info: libraries = ['mkl_lapack', 'mkl_rt'] library_dirs = ['C:\\Program Files (x86)\\Intel\\Composer XE 2015\\mkl\\lib\\intel64'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['C:\\Program Files (x86)\\Intel\\Composer XE 2015\\mkl\\include'] blas_mkl_info: libraries = ['mkl_rt'] library_dirs = ['C:\\Program Files (x86)\\Intel\\Composer XE 2015\\mkl\\lib\\intel64'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['C:\\Program Files (x86)\\Intel\\Composer XE 2015\\mkl\\include'] mkl_info: libraries = ['mkl_rt'] library_dirs = ['C:\\Program Files (x86)\\Intel\\Composer XE 2015\\mkl\\lib\\intel64'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['C:\\Program Files (x86)\\Intel\\Composer XE 2015\\mkl\\include'] Everything *looks* to be in order upon casual inspection (*I think*, please correct me if I'm wrong!) However, there is no performance boost when running a few different tests in numpy (singular value decomposition, for example), and only a single thread appears to be in play. Running numpy.test('full') reveals 21 errors. For instance, LINK : fatal error LNK1104: cannot open file 'ifconsol.lib' And, the other being a recurring error with f2py, ERROR: test_size.TestSizeSumExample.test_transpose ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Program Files\Side Effects Software\Houdini 13.0.509\python27\lib\site-packages\nose\case.py", line 371, in setUp try_run(self.inst, ('setup', 'setUp')) File "C:\Program Files\Side Effects Software\Houdini 13.0.509\python27\lib\site-packages\nose\util.py", line 478, in try_run return func() File "C:\Program Files\Side Effects Software\Houdini 13.0.509\python27\lib\site-packages\numpy\f2py\tests\util.py", line 353, in setUp module_name=self.module_name) File "C:\Program Files\Side Effects Software\Houdini 13.0.509\python27\lib\site-packages\numpy\f2py\tests\util.py", line 80, in wrapper raise ret RuntimeError: Running f2py failed: ['-m', '_test_ext_module_5403', 'c:\\users\\jareyn~1\\appdata\\local\\temp\\tmpvykewl \\foo.f90'] Reading .f2py_f2cmap ... Mapping "real(kind=rk)" to "double" Succesfully applied user defined changes from .f2py_f2cmap Everything that requires configuration appears to be in agreement with this Intel Application Note, minus use of the Intel C++ compiler: https://software.intel.com/en-us/articles/numpyscipy-with-intel-mkl I have also referenced the Windows build docs on scipy.org: http://www.scipy.org/scipylib/building/windows.html#building-scipy Some info about my configuration: site.cfg: include_dirs = C:\Program Files (x86)\Intel\Composer XE 2015\mkl\include library_dirs = C:\Program Files (x86)\Intel\Composer XE 2015\mkl\lib\intel64 mkl_libs = mkl_rt PATH = (paths separated by line for easy reading) C:\Program Files\Side Effects Software\Houdini 13.0.509\python27; C:\Program Files\Side Effects Software\Houdini 13.0.509\python27\Scripts; C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin\x86_amd64; C:\Program Files (x86)\Intel\Composer XE 2015\bin\intel64 LD_LIBRARY_PATH = C:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\bin\x86_amd64; C:\Program Files (x86)\Intel\Composer XE 2015\bin\intel64; C:\Program Files (x86)\Intel\Composer XE 2015\mkl\lib\intel64; C:\Program Files (x86)\Intel\Composer XE 2015\compiler\lib\intel64 Thank you in advance for your time, -Jay ===== *why am I doing this? The reason I'm doing this is because I need numpy with MKL to run with the version of python that comes packaged with Houdini (Python 2.7.5 (default, Oct 24 2013, 17:49:49) [MSC v.1700 64 bit (AMD64)] on win32). So, downloading a prebuilt 64 bit numpy isn't an option due to the unavailability of a compatible compiler version. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Thu Oct 2 21:06:07 2014 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 2 Oct 2014 21:06:07 -0400 Subject: [Numpy-discussion] 0/0 == 0? In-Reply-To: References: Message-ID: Out[1] has an integer divided by an integer, and you can't represent nan as an integer. Perhaps something weird was happening with type promotion between versions? Ben Root On Oct 2, 2014 7:02 PM, "T J" wrote: > Hi, I'm using NumPy 1.8.2: > > In [1]: np.array(0) / np.array(0) > Out[1]: 0 > > In [2]: np.array(0) / np.array(0.0) > Out[2]: nan > > In [3]: np.array(0.0) / np.array(0) > Out[3]: nan > > In [4]: np.array(0.0) / np.array(0.0) > Out[4]: nan > > > In [5]: 0/0 > --------------------------------------------------------------------------- > ZeroDivisionError Traceback (most recent call last) > in () > ----> 1 0/0 > > ZeroDivisionError: integer division or modulo by zero > > > > Out[1] seems odd. I get the right value in 1.8.1. Was this fixed for > 1.9.0? > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Oct 2 22:20:41 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 2 Oct 2014 20:20:41 -0600 Subject: [Numpy-discussion] 0/0 == 0? In-Reply-To: References: Message-ID: On Thu, Oct 2, 2014 at 7:06 PM, Benjamin Root wrote: > Out[1] has an integer divided by an integer, and you can't represent nan > as an integer. Perhaps something weird was happening with type promotion > between versions? > Also note that in python3 the '/' operator does float rather than integer division. >>> np.array(0) / np.array(0) __main__:1: RuntimeWarning: invalid value encountered in true_divide nan Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jzwinck at gmail.com Thu Oct 2 23:00:15 2014 From: jzwinck at gmail.com (John Zwinck) Date: Fri, 3 Oct 2014 11:00:15 +0800 Subject: [Numpy-discussion] Round away from zero (towards +/- infinity) In-Reply-To: References: Message-ID: On 3 Oct 2014 07:09, "T J" wrote: > > Any bites on this? > > On Wed, Sep 24, 2014 at 12:23 PM, T J wrote: >> Python's round function goes away from zero, so I am looking for the NumPy equivalent (and using vectorize() seems undesirable). In this sense, it seems that having a ufunc for this type of rounding could be helpful. >> >> Aside: Is there interest in a more general around() that allows users to specify alternative tie-breaking rules, with the default staying 'round half to nearest even'? [1] >> --- >> [1] http://stackoverflow.com/questions/16000574/tie-breaking-of-round-with-numpy I like the solution given in that Stack Overflow post, namely using ctypes to call fesetround(). Does that work for you? -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Oct 2 23:29:14 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 3 Oct 2014 04:29:14 +0100 Subject: [Numpy-discussion] 0/0 == 0? In-Reply-To: References: Message-ID: On Fri, Oct 3, 2014 at 3:20 AM, Charles R Harris wrote: > > On Thu, Oct 2, 2014 at 7:06 PM, Benjamin Root wrote: >> >> Out[1] has an integer divided by an integer, and you can't represent nan >> as an integer. Perhaps something weird was happening with type promotion >> between versions? > > > Also note that in python3 the '/' operator does float rather than integer > division. > >>>> np.array(0) / np.array(0) > __main__:1: RuntimeWarning: invalid value encountered in true_divide > nan Floor division still acts the same though: >>> np.array(0) // np.array(0) __main__:1: RuntimeWarning: divide by zero encountered in floor_divide 0 The seterr warning system makes a lot of sense for IEEE754 floats, which are specifically designed so that 0/0 has a unique well-defined answer. For ints though this seems really broken to me. 0 / 0 = 0 is just the wrong answer. It would be nice if we had something reasonable to return, but we don't, and I'd rather raise an error than return the wrong answer. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From charlesr.harris at gmail.com Fri Oct 3 00:12:22 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 2 Oct 2014 22:12:22 -0600 Subject: [Numpy-discussion] 0/0 == 0? In-Reply-To: References: Message-ID: On Thu, Oct 2, 2014 at 9:29 PM, Nathaniel Smith wrote: > On Fri, Oct 3, 2014 at 3:20 AM, Charles R Harris > wrote: > > > > On Thu, Oct 2, 2014 at 7:06 PM, Benjamin Root wrote: > >> > >> Out[1] has an integer divided by an integer, and you can't represent nan > >> as an integer. Perhaps something weird was happening with type promotion > >> between versions? > > > > > > Also note that in python3 the '/' operator does float rather than integer > > division. > > > >>>> np.array(0) / np.array(0) > > __main__:1: RuntimeWarning: invalid value encountered in true_divide > > nan > > Floor division still acts the same though: > > >>> np.array(0) // np.array(0) > __main__:1: RuntimeWarning: divide by zero encountered in floor_divide > 0 > > The seterr warning system makes a lot of sense for IEEE754 floats, > which are specifically designed so that 0/0 has a unique well-defined > answer. For ints though this seems really broken to me. 0 / 0 = 0 is > just the wrong answer. It would be nice if we had something reasonable > to return, but we don't, and I'd rather raise an error than return the > wrong answer. > That's an option, although arguable for arrays of numbers. However, the fact that we don't know *which* numbers caused the problem strengthens the argument for an error. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Oct 3 00:13:50 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 2 Oct 2014 22:13:50 -0600 Subject: [Numpy-discussion] 0/0 == 0? In-Reply-To: References: Message-ID: On Thu, Oct 2, 2014 at 10:12 PM, Charles R Harris wrote: > > > On Thu, Oct 2, 2014 at 9:29 PM, Nathaniel Smith wrote: > >> On Fri, Oct 3, 2014 at 3:20 AM, Charles R Harris >> wrote: >> > >> > On Thu, Oct 2, 2014 at 7:06 PM, Benjamin Root wrote: >> >> >> >> Out[1] has an integer divided by an integer, and you can't represent >> nan >> >> as an integer. Perhaps something weird was happening with type >> promotion >> >> between versions? >> > >> > >> > Also note that in python3 the '/' operator does float rather than >> integer >> > division. >> > >> >>>> np.array(0) / np.array(0) >> > __main__:1: RuntimeWarning: invalid value encountered in true_divide >> > nan >> >> Floor division still acts the same though: >> >> >>> np.array(0) // np.array(0) >> __main__:1: RuntimeWarning: divide by zero encountered in floor_divide >> 0 >> >> The seterr warning system makes a lot of sense for IEEE754 floats, >> which are specifically designed so that 0/0 has a unique well-defined >> answer. For ints though this seems really broken to me. 0 / 0 = 0 is >> just the wrong answer. It would be nice if we had something reasonable >> to return, but we don't, and I'd rather raise an error than return the >> wrong answer. >> > > That's an option, although arguable for arrays of numbers. However, the > fact that we don't know *which* numbers caused the problem strengthens the > argument for an error. > > Plus the g*dawful warning default to only warn once. That has always bothered me, it just seems useless. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Oct 3 02:50:46 2014 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 3 Oct 2014 07:50:46 +0100 Subject: [Numpy-discussion] skip samples in random number generator In-Reply-To: <1016090134433983128.659345sturla.molden-gmail.com@news.gmane.org> References: <756050196433974588.278974sturla.molden-gmail.com@news.gmane.org> <1016090134433983128.659345sturla.molden-gmail.com@news.gmane.org> Message-ID: On Thu, Oct 2, 2014 at 11:56 PM, Sturla Molden wrote: > Robert Kern wrote: > >> Yes, but that would require rewriting much of numpy.random to allow >> replacing the core generator. This would work out-of-box because it's >> just manipulating the state of the current core generator. > > Yes, then we just need to sacrifice a year's worth of CPU time, and a PR > will be ready by next fall ;-) Eh? It runs practically instantaneously. Here is the coefficient list for a jump step of 1<<100. -- Robert Kern -------------- next part -------------- 00101101000011101001010011111111011001000001110111001101010011010011100000110001011000101110100111000101010010000111001011001110011100111111001110101001011101001010100000011110111100100100001000100011101010100000110101110101101100010110111101100000010011001000011000001011001100100101110110100001000001111100101100000001001001101111111100111111111000100111110010100111101110001110000110111111011010010111101101011001101110001001110010100011000101111100110111111001001001101011101111001100110001010111000110111100111111111111000000100101111100101001100011110110100001110100110101110101101010010100101110101011001110000001000111011010001001101000001001011010000011101101100001000001001000010100010000001101010010000111001000100111011010000111000011000100111111010011010111101111010110100111010110100111101001101011001110100111011100000010111001100010111101010001001111010010001100101110111111011011110111000110010001000110111001011101000100001001001100101110011001000100000011001011100100011000000001100000011001001111000011101011010010000110101011010001110011101001001010111001111001010111111100010110101111011000010111100100110000000100001110000100000111010110110000101010010110110010111111101001110011110100000100000001010101000100100001000000101111101111001000011111011011001110011110111100001010000100010101000011000111010101110110100110100011101110010110101101010101100010111011010101100100010000110011110111000100011111000001110011010111110011001011010000100110000101011000101001101111001100011110101100101101000011111000011010000000101000000001010001001100101011111110110000001001000111111111010010001010010110110001001001011100011101100001100010110001011001000011111011000101100101100000100010000110000011001001011111101010010000011111111101111111011111010000001001110001100100010001111011010110101011101000010000110001000110100010011100111011001101010100010011100000101001101111010111001001110010001010010100101111000001011010100110001110001101111010010011110111010100011010100101001000110000000001110101001010111010010111101010100110001001100110100000111000000001111010010101001101110000010101100011011111111001101101110001000100111100001110000011000101100111000101110001101101001000101100001110111011011010110011110010010101010001011000001101100110111000001100010011010100101010100001001101101011110100111110100100100001000001110010000101011110110011101011011111101011101000011001010011100101010110111111100001001110010010101111101000101111111111110101110011010101010100110010000111001101111001111011100111101100110100100111000000101011000101111110100111010100000110010000110010010101011011111011011111100110010010101000101000101011001101111100001111101000001110010111101111101101100111001010101110001101010111000101000101011011000011011101100001111010000100100001100110010000100001010010111010101000001010010101011010010110000010100110000000101111101111001011001100001100000100110011010101100010011110100110001010101010010110111100101011110011100100011000100001110001010111011010110101011100010110001011010110110001010001010111010011111010101111110111011001101101111100101100011001110110001100100000110000000111111101111011000100110000011000110110010100011111000100100100111110010111110111001101101000011100000000010100100101000100101001100110011000100111001110110010111000001010110100101011011100001001100111011100010011110010111111101111111100001001111110111110011101001100111110110101000000111001111011000111001011101010110010010100101101001001101100010101000010111100100111101010001010011100110001111110000000100001110011010011001100111011111110101110111001010000110100011100110100000011011010111111011101010111000111101101100110000001001010011011110000101110011001100111101000010101101100100111111011010110011111100011001100101101000000100001111100010010000000010011001001100011011010010000001110100001001101110100001011011001111110110010001101010101001100100001001110011101010100001001100100101110100100001100110111000101000101111100100101000111101001100001001111100011101010001001001101101000101000011100101011111010110011011100110101010011111101011010111100111101100001111100101011110011111100011101100111011111011011101001001101011010011111111010110110000011111111100110010000010000010100011000001110111000101010000110110100000001110100001101100110110001110111110001011011011000111011101101000101110101100111001010000101110000001001010010000101101101111110110101001111010110011100101010100100010101111101100010101011001000010111000111000111010010111111100101011010100111001011010011110001110000110111001110011111000000010111010100000100100111111111110100101011010001110000000111001011111101010110000011100010010000111010111111100111001110111100010101101100110001110110001010000000010100101011010011010010100011111010111000001000100011001110101000101000001110101001111000010011101011101100110110000001101101000111001001010001010101100110101010010110010010011100111011101110011000011101011100001110010010110011000111001001110100001100111100101011010001101000101101101111011000001000010100101000011010000101111110110010111011010000111010010001110010001110110001100010111100011001000011110000101001011000101100110111101001001110111010000001000101110001101111101011001001001101100011101100111000100010011110001000011110000000000101010000010001101100111010100000111011011011001100011111111111001001110100010010110011101010101000100111100111101000100010010000100100100100001000111100010111100000010000011010010000101010011011011101011101111101000001011110011010011000100011000101100010010111011010001100000100001101110011110010100010111010010010010000111110110010101011010010000000010010111000000000111010010000000101010101000010100111101010101100110000100101110011111011101111001000000000011101001010001001011001101101100001000111110011000011011010100101000000111111011010100010001110101100100100111100111111000110000110011000001010100100000101111110000000011111111101111011010010000111000000000011100111000110010111001000110011000111110111001101110101001100000011101111010001011100101110010100101000100011001010100111100001011010101001111100000000100111011010111000111011100000100010110101011100111100001010111100111111010101110011011101110101111000100010100110001100001001010000110111001010111000111000101110111110101110101011110101000001111110011010101000101100110001000100010111000001001001001111011101100010011110001010001101011101001000000010110010111010001000011101011100001011000011100111100001101111010110110001100110010110001010010011101011111001111011100100011100010001101011110111111010010111011001011111000100010010000100000000110010010111010010000000000010100011000001101101001100110111010110010110110100001110101111111001100001101001110110011010010000101010111100000000100001010111000101111001111010101100111011010110010001100100101011011111111000010011010011001110100110001100010100110000100101100110100111000011111111111100111010001100100000111011000011010100011100101110010111110101010101110100011110100101110101000101100010101101111011000000011001001011100000001100010010000000100000110010110010011110110111101111110111011000010101011101100110101111000101011100010101000010100001000001001110010010101101101101111110101001000111011000100110001100101011101111011011110001001000111001010001101111011111001101011111000010001010000101000000100011011101000110100011111101110011110001001000001000110011111110100000111010001101111110000011100000011001110111110101001110110100010111110110000000101111101110101100000111100101001011111010001111110011010100111001010010111001100000101000101111001100100110101101100001000100101000010110111100110110010000001110011000011010011010000111110011011100010001111110000010100111110000111111100101001011001001011101010101000100100101101111001011101100010010010000111011110011011001100011000111110111011111111101110100111110101110100100001101000000100111001100101001111011011111101001001100010101010110110110000011101010001011111101001101110010110110110111110010110110000000000010001111110010010011110100111000110010011100010000001110001101111011100100101001100111000110101111010001101110101100101101011111111110111110101110001100101010000000111101010000101111001000010011101110011001010111001011010101110101111110110001111010110100011110110000111011110000101100101110111000000010101111101000101111000001111001110110100111010111111011010101111101010101011101110011111010011100011010101110001100000011101110110000001001101001000110010100110001110000101110110100010110011000111110110000011101001001010110111110001100101011110011010010000111010101110000101010000011011100001110001000101101001101100100011100111001011110101101110110100000111110110100001111011001110001001101010100100000110111100001101001101100000011001110000100000000000100110000100100010000100000100101100000111001101100011101101111110101100001111000011100010010100111001111111011101001011000000001100110000010100110100100010001001011101000110110101110011101001001101111100010111011110101110011001100010101110001100000010001110110100100011010001101110011110010010101001000000000111010001010001001001101010000110000100111010101100101001101110101011101110010100000011000101010000110100101110111000011100111011011001110100100000110001001000011110010100100101101110000011100101011110111101110110001100101111101011101101100100000011001000100101011000011111001100101010000101110100011100101001100011110011011010110000000110011011011110101011000011000000111111100111001011010001110101000010001110011100111111100011010000000010101010011101111000111110011111010100010101011001101010011011011101010100001000000100111100111010111100100010001101000100001011101010100101010010110101101010001110100101001000001101111011001011010001111110110010100110001100010010000000000111011001011000111011111011001111010101000010100110100111111000000000000010000111010010100100001100101110010100001111100001011011100010110101101010000000011000000011101011011110110111010011000000101000001100101100001001101011110111001010100000010111101000101110101011111000011001001100011010100010111101011011010110100100010010110010111101001101010000011001110000001101111110010000111101101100101101110011000011010010010111001010111100111101011110000111000100111111100111111101111100110010010111110000000110001111000110101111000000110111010010000110000110111011100011110111111100000000110110000100010101011011001010010111110011000010000110011000111110010010110001000101001110001100011101001001111011101011010010010001101111101110110001101000001101111011011001010010011001011001000100110100100010101110110011110100100101101000101010100011000111011100011111001110010100001001001111001001010000011011001100000101100000011111101001011011011101010110101100101111101001001110100100000110101000101001110000110000101110100101011010111011110000111001110010110000001110101011111001101001001111000110010000001000101000111110100000101100101111101101010100110110100000111001011000001101110101100100100001111001100101010011100010111001111101111000111000001010011100111100000101010111101010101010110010111111011001000100010010000000111100101100101111000101100101101001001010110100101011111111101111010111001101101010111110010100010101101011100100011111110000011101001101011101111110110010000010101111101101011100111100000100101110000001001001100001111001010010100101011101010110101000000111000100100001101100110100001001001111011111010101001011001111100010110100010111001010010011000001111100110111001111110100011100001110100000011011001101001111000011111001001001101010001011100110111011001011100001111010111010101111101000101101010010100010100001001110000111110100011001001101110001001110111000001100110100100010011011101000100111101110101101100011101011010100000000100011011101110010010110010101000101000000101110001110010111010000101111011100001101000100100010110101011110110101000111010011101100101010101111000101000000100010111111010011001010100001101010110010100101000101011000010101100001011100010010000111111110110010001110011110011100110110101011111011001010001111111111000100001111101001001110000000011011111110000111111011100110100010001001000100000111001100110101010011101001101101100010100100011110111110111110101010111101111011110101000110111110010100011100011011101011010110010111100011001100100010111010111000001110011110000001101001001100001010010001001001010011000011010100100000000000010111101011011001000000110100010110010100100110110100111010110010110000111101011001100100001100111001001110110001011101010000101001011000010101010001010111110111110110010111010100001010111000100100011010010110001001101100000100111111000100100011001011101111001111000011101100101100000001111110011011110000000010000100100100111001101110010110111111000100110100001101100011111101011001000011110000101101010011000000011011100010000001001011001101001111001110100001010000100000110001010100110101010011010011010001000101110110011010001100111011001011111100110100110111101011001110000111111001110000101100011000101011110010110010111010010111100100110010010000000001011011100101000011100100101110011011000110101000000011001111010001101100011000100001000010010000001001011010101011100011001101111000011101011000010011001100011100001001011010010011101101100101010010010111111001001110110111110011101111110000001101001100110000011111010001111111000000000100101011101111000100111000100011101110111011001011100000001011000110011011110110110101100100010001001100011110010001000110011011010111110111111111100001111101011000001111011000001101000100100011101001110111110011000011101110010101010110001011000001010010010100111100011011001110110111100011101010001101100001101011100000111001111011010110010110110101000001101111011101111000110100111111101111111110110001001100110011100110111100001111011001110101001110111111101001110000001111100000111100000101111110101010011001010100111101101101101100010000000110001011011001110001110100101000000100010010011110110101011011100101010001111100100011010100011100001010111010110011011000001001101111011001010011011110000011010101000111001010010000100100111101100010010100101011100100110111101000101101111110011101010101010010101000100010110001000111000110111101110110010111100010111111010110001000110100010100001101100100001110011011111010110111110000111100111100100001100101110000010111000111001001111010100011011000000011111110010110001010010100000011110000000011001001010010100111100010101001010011100000001111010111001111000001101111100100011101011111010101000101000011110000001101100100111000001111101101000111011010111110100110011001000010110111111001000110001011010000010001010101001010001111101110000000111000111000111010100000101110011011101000111011011100011100101001011010111000110010100110001010001110101010000100001010001000011111000011110000100111100010001100001110110110101101010111101011110010010010010110011110001011111000111111101110111110100110110100011011010111101100111101010111110011110110111110000110110111011000101011011100011000100000010100111110011100111111100111101010001010001010101000110001100101010100110100110001100110100110101101011001110000011110101101001101111110011010111011010001000110110100100001100001111111001111001100010001010000000010100101110011110011111001100010010111011111011100101111010111000100101100011110111010011100110100110010111011000100010010111110010110111110101000111010100100100000000110100100100111000000111001100111001110111011111000010111000010011100110010110011001000011010101010100110101010111010001001000010010000110100111000000100111000101000111110000001100011010010110011001000000011010001111010111010000010101111111100110110011111101011101010001000000010010001000111011000000110001100111110111100100010001111111111111100110011010110000100101001110010000110011011110111111100100001010000101111010101101001010000001011101101000010101001111100000010101011110000101101100110110000010010110011110000100001110111000001011101001010000000100110111011101011000110011001110111101000100100010011010010011100011000111000010001111100100101011010101100001011000110011001000001001110110000111111101111111110010001001111111010101001011100101010000011011110001011110111001001101000110001001111100001100010111000010110110111010010010100011101000110110011101010111001010111110110100101010000100111101011111101001111110110110000000101011110111100101111010111110101100111110101010011101100101000101001110000100000011110011100000110111010101001111011001010010100111001010000001100101000010111111110110100100011010011000111000110101111001100111111110100001010010101101100110000111100110011111110010010100111000111010111100110110000111001010101100010111110101100111010011011010011110011010000011111011001100101111110101110010111000100010101110111001111001011000011100001111011010010011011001111111011101000110100010010111000000001111101011111101001101011010010110111001111111001110101110110011100100110000110100111111110001101011001101010001001101100001000000101000001010011111110010011100101111001011100100000100011001110110010110010011100111011011000010010100000101101000101011111100011001011011100100000001001111101110100110110100100101100011111111001100001011110100001011100010110001001010001101011100011110001111111010100111001010100011011100000000000001101101111011001010100010010110100100011110111010000101110001100100000011011010100000100101000000111101100111011110110011111010010111001101010101001100001011111000011101011001011010101010110111110010011110001000011010101100000101001111111110110001010110101100111100100110110101111010110010111111111110110011011101000001011111001011010011001100011011110101111101111010011111001010111011011001011001100100101100100001100010100010111001001111111100010000110000101100001100011110110000001100110001001111011111110101111100111010010001111011001100000011010100010010111011010100010111111011011110100010101110011100001000001100101100111011111011011110010011101101010001011010001001000110110010110110100011100000000010001010111000110110101011011001001001110111011000001010100011010001100101000110101010000100101111101110000001000011110010000100001010001111000010110100111011000111101011011100011011000011100011111101011011111100101101001100101000110011101101111111011101111011100101000000010001110111001010010110001100010100001010101000001000010000010101100011110110110000111111110001101110001000111101111110110101000010111010100111110101101100011011000011000111110111100100111010111101010111101010111011110001100111010110010001011011000100111010101010011101111110000000010111010101001010011010001010011000010110110111010010000010111101100101011000101111101011111100001100111010101001101101000010010000010011111000000001101011000001011111001001010111100010111111100111101011011100010111101010011110100100001100101001101111011000101100100111011101010000110111001110111001110111101000000010101101001110101100001010011001101010101010111010010000011011000111101101011111101011101011100100101101010100010000000111110010001101001111001110010100010011100110100110011110011000100111000111000011001111101110010110110011110011010000010010101011101100001111100110010110010111100000010011101110111010110011001000001011010000101001011000111110110111100001111111101110111101001010010110010100000110111001010100011010110100011110111000001001001101000011010101001000010101001101100000010000100110001101101010111011010110110001111010111011101111110000100001110111000001110111001000101000100001001010010100001001110101111000101000001111101111011000111010111100010000111111111000011000110100101110001001010010101000111101001000110100100001001101110001000000001001101010011011100000010101000000100110010101001110101001111110011101010110101010101111111010100001001010010011010000111000010111001110010010100100100101100111001011001111110101111111011101110110000110011000101100011001001101011010010001001001001001010010101101001111000110011010110111101100010010010111001101001100010011010101000101001000100001010001100001000001111001000100010110000101011001110111110001110100111100001011101111110110111111101001100101011111001001001100100111100100001011010001110001100001010 From hoogendoorn.eelco at gmail.com Fri Oct 3 02:51:50 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Fri, 3 Oct 2014 08:51:50 +0200 Subject: [Numpy-discussion] 0/0 == 0? In-Reply-To: References: Message-ID: slightly OT; but fwiw, its all ill-thought out nonsense from the start anyway. ALL numbers satisfy the predicate 0*x=0. what the IEEE calls 'not a number' would be more accurately called 'not a specific number', or 'a number'. whats a logical negation among computer scientists? On Fri, Oct 3, 2014 at 6:13 AM, Charles R Harris wrote: > > > On Thu, Oct 2, 2014 at 10:12 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Thu, Oct 2, 2014 at 9:29 PM, Nathaniel Smith wrote: >> >>> On Fri, Oct 3, 2014 at 3:20 AM, Charles R Harris >>> wrote: >>> > >>> > On Thu, Oct 2, 2014 at 7:06 PM, Benjamin Root wrote: >>> >> >>> >> Out[1] has an integer divided by an integer, and you can't represent >>> nan >>> >> as an integer. Perhaps something weird was happening with type >>> promotion >>> >> between versions? >>> > >>> > >>> > Also note that in python3 the '/' operator does float rather than >>> integer >>> > division. >>> > >>> >>>> np.array(0) / np.array(0) >>> > __main__:1: RuntimeWarning: invalid value encountered in true_divide >>> > nan >>> >>> Floor division still acts the same though: >>> >>> >>> np.array(0) // np.array(0) >>> __main__:1: RuntimeWarning: divide by zero encountered in floor_divide >>> 0 >>> >>> The seterr warning system makes a lot of sense for IEEE754 floats, >>> which are specifically designed so that 0/0 has a unique well-defined >>> answer. For ints though this seems really broken to me. 0 / 0 = 0 is >>> just the wrong answer. It would be nice if we had something reasonable >>> to return, but we don't, and I'd rather raise an error than return the >>> wrong answer. >>> >> >> That's an option, although arguable for arrays of numbers. However, the >> fact that we don't know *which* numbers caused the problem strengthens the >> argument for an error. >> >> > Plus the g*dawful warning default to only warn once. That has always > bothered me, it just seems useless. > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Oct 3 03:12:20 2014 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 3 Oct 2014 08:12:20 +0100 Subject: [Numpy-discussion] 0/0 == 0? In-Reply-To: References: Message-ID: On Fri, Oct 3, 2014 at 4:29 AM, Nathaniel Smith wrote: > On Fri, Oct 3, 2014 at 3:20 AM, Charles R Harris > wrote: >> >> On Thu, Oct 2, 2014 at 7:06 PM, Benjamin Root wrote: >>> >>> Out[1] has an integer divided by an integer, and you can't represent nan >>> as an integer. Perhaps something weird was happening with type promotion >>> between versions? >> >> >> Also note that in python3 the '/' operator does float rather than integer >> division. >> >>>>> np.array(0) / np.array(0) >> __main__:1: RuntimeWarning: invalid value encountered in true_divide >> nan > > Floor division still acts the same though: > >>>> np.array(0) // np.array(0) > __main__:1: RuntimeWarning: divide by zero encountered in floor_divide > 0 > > The seterr warning system makes a lot of sense for IEEE754 floats, > which are specifically designed so that 0/0 has a unique well-defined > answer. For ints though this seems really broken to me. 0 / 0 = 0 is > just the wrong answer. It would be nice if we had something reasonable > to return, but we don't, and I'd rather raise an error than return the > wrong answer. Well, actually, that's the really nice thing about seterr for ints! CPUs have hardware floating point exception flags to work with. We had to build one for ints. If you want an error, you can get an error. *I* don't want an error, and I don't have to have one! -- Robert Kern From shoyer at gmail.com Fri Oct 3 03:33:50 2014 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 3 Oct 2014 03:33:50 -0400 Subject: [Numpy-discussion] 0/0 == 0? In-Reply-To: References: Message-ID: On Thu, Oct 2, 2014 at 11:29 PM, Nathaniel Smith wrote: > The seterr warning system makes a lot of sense for IEEE754 floats, > which are specifically designed so that 0/0 has a unique well-defined > answer. For ints though this seems really broken to me. 0 / 0 = 0 is > just the wrong answer. It would be nice if we had something reasonable > to return, but we don't, and I'd rather raise an error than return the > wrong answer. +1 An even worse offender (in my opinion) is in-place addition of an integer array with np.nan: >>> x = np.zeros(1, dtype=int) >>> x += np.nan RuntimeWarning: invalid value encountered in add >>> x array([-9223372036854775808]) -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Oct 3 10:30:12 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 3 Oct 2014 08:30:12 -0600 Subject: [Numpy-discussion] 0/0 == 0? In-Reply-To: References: Message-ID: On Fri, Oct 3, 2014 at 1:33 AM, Stephan Hoyer wrote: > On Thu, Oct 2, 2014 at 11:29 PM, Nathaniel Smith wrote: > >> The seterr warning system makes a lot of sense for IEEE754 floats, >> which are specifically designed so that 0/0 has a unique well-defined >> answer. For ints though this seems really broken to me. 0 / 0 = 0 is >> just the wrong answer. It would be nice if we had something reasonable >> to return, but we don't, and I'd rather raise an error than return the >> wrong answer. > > > +1 > > An even worse offender (in my opinion) is in-place addition of an integer > array with np.nan: > > >>> x = np.zeros(1, dtype=int) > >>> x += np.nan > RuntimeWarning: invalid value encountered in add > >>> x > array([-9223372036854775808]) > This will be an error in 1.10 Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From tjhnson at gmail.com Fri Oct 3 13:28:37 2014 From: tjhnson at gmail.com (T J) Date: Fri, 3 Oct 2014 12:28:37 -0500 Subject: [Numpy-discussion] Round away from zero (towards +/- infinity) In-Reply-To: References: Message-ID: It does, but it is not portable. That's why I was hoping NumPy might think about supporting more rounding algorithms. On Thu, Oct 2, 2014 at 10:00 PM, John Zwinck wrote: > On 3 Oct 2014 07:09, "T J" wrote: > > > > Any bites on this? > > > > On Wed, Sep 24, 2014 at 12:23 PM, T J wrote: > >> Python's round function goes away from zero, so I am looking for the > NumPy equivalent (and using vectorize() seems undesirable). In this sense, > it seems that having a ufunc for this type of rounding could be helpful. > >> > >> Aside: Is there interest in a more general around() that allows users > to specify alternative tie-breaking rules, with the default staying 'round > half to nearest even'? [1] > >> --- > >> [1] > http://stackoverflow.com/questions/16000574/tie-breaking-of-round-with-numpy > > I like the solution given in that Stack Overflow post, namely using ctypes > to call fesetround(). Does that work for you? > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pearu.peterson at gmail.com Fri Oct 3 15:38:03 2014 From: pearu.peterson at gmail.com (Pearu Peterson) Date: Fri, 3 Oct 2014 22:38:03 +0300 Subject: [Numpy-discussion] f2py and debug mode In-Reply-To: <542AACDE.7010408@strains.fr> References: <542AACDE.7010408@strains.fr> Message-ID: Hi, When you run f2py without -c option, the wrapper source files are generated without compiling them. With these source files and fortranobject.c, you can build the extension module with your specific compiler options using the compiler framework of your choice. I am not familiar with Visual Studio specifics to suggest a more detailed solution but after generating wrapper source files there is no f2py specific way to build the extension module, in fact, `f2py -c` relies on distutils compilation/linkage methods. HTH, Pearu On Tue, Sep 30, 2014 at 4:15 PM, Bayard wrote: > Hello to all. > I'm aiming to wrap a Fortran program into Python. I started to work with > f2py, and am trying to setup a debug mode where I could reach > breakpoints in Fortran module launched by Python. I've been looking in > the existing post, but not seeing things like that. > > I'm used to work with visual studio 2012 and Intel Fortran compiler, I > have tried to get that point doing : > 1) Run f2py -m to get *.c wrapper > 2) Embed it in a C Project in Visual Studio, containing also with > fortranobject.c and fortranobject.h, > 3) Create a solution which also contains my fortran files compiled as a lib > 4) Generate in debug mode a "dll" with extension pyd (to get to that > point name of the "main" function in Fortran by "_main"). > > I compiled without any error, and reach break point in C Wrapper, but > not in Fortran, and the fortran code seems not to be executed (whereas > it is when compiling with f2py -c). Trying to understand f2py code, I > noticed that f2py is not only writing c-wrapper, but compiling it in a > specific way. Is there a way to get a debug mode in Visual Studio with > f2py (some members of the team are used to it) ? Any alternative tool we > should use for debugging ? > > Thanks for answering > Ferdinand > > > > > --- > Ce courrier ?lectronique ne contient aucun virus ou logiciel malveillant > parce que la protection avast! Antivirus est active. > http://www.avast.com > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From g.brandl at gmx.net Fri Oct 3 16:47:58 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Fri, 03 Oct 2014 22:47:58 +0200 Subject: [Numpy-discussion] 0/0 == 0? In-Reply-To: References: Message-ID: On 10/03/2014 06:13 AM, Charles R Harris wrote: > Plus the g*dawful warning default to only warn once. That has always bothered > me, it just seems useless. If you use a custom warning, there's no reason why you couldn't set a filter that shows it every time by default. Georg From njs at pobox.com Fri Oct 3 17:00:05 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 3 Oct 2014 22:00:05 +0100 Subject: [Numpy-discussion] 0/0 == 0? In-Reply-To: References: Message-ID: On Fri, Oct 3, 2014 at 5:13 AM, Charles R Harris wrote: > > On Thu, Oct 2, 2014 at 10:12 PM, Charles R Harris > wrote: > Plus the g*dawful warning default to only warn once. That has always > bothered me, it just seems useless. I believe the idea is to only warn once per source location, i.e. if you write something like: z = np.zeros(()) for foo in range(1000): z / z z / z then you're supposed to get two warnings, for the two lines that contain divide-by-zero. And then there is an unfortunate interaction where all lines entered interactively are considered to be "the same line" for these purposes. I don't know why this happens. It's either a bug in ipython or in warnings. The weird thing is that ipython does give each line its own unique name (you can see 'filenames' like "" in error tracebacks), but somehow the warning machinery doesn't pick up on this. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From charlesr.harris at gmail.com Fri Oct 3 17:24:45 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 3 Oct 2014 15:24:45 -0600 Subject: [Numpy-discussion] 0/0 == 0? In-Reply-To: References: Message-ID: On Fri, Oct 3, 2014 at 3:00 PM, Nathaniel Smith wrote: > On Fri, Oct 3, 2014 at 5:13 AM, Charles R Harris > wrote: > > > > On Thu, Oct 2, 2014 at 10:12 PM, Charles R Harris > > wrote: > > Plus the g*dawful warning default to only warn once. That has always > > bothered me, it just seems useless. > > I believe the idea is to only warn once per source location, i.e. if > you write something like: > > z = np.zeros(()) > for foo in range(1000): > z / z > z / z > > then you're supposed to get two warnings, for the two lines that > contain divide-by-zero. > What I want is that the following script would warn three times import numpy as np z = np.zeros(1, dtype=np.int) def f(x): return x/x f(z) f(z) f(z) But it only warns once. That is not helpful when f gets called with an erroneous argument from different places. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Oct 3 17:40:30 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 3 Oct 2014 22:40:30 +0100 Subject: [Numpy-discussion] 0/0 == 0? In-Reply-To: References: Message-ID: On Fri, Oct 3, 2014 at 10:00 PM, Nathaniel Smith wrote: > On Fri, Oct 3, 2014 at 5:13 AM, Charles R Harris > wrote: >> >> On Thu, Oct 2, 2014 at 10:12 PM, Charles R Harris >> wrote: >> Plus the g*dawful warning default to only warn once. That has always >> bothered me, it just seems useless. > > I believe the idea is to only warn once per source location, i.e. if > you write something like: > > z = np.zeros(()) > for foo in range(1000): > z / z > z / z > > then you're supposed to get two warnings, for the two lines that > contain divide-by-zero. > > And then there is an unfortunate interaction where all lines entered > interactively are considered to be "the same line" for these purposes. > > I don't know why this happens. It's either a bug in ipython or in > warnings. The weird thing is that ipython does give each line its own > unique name (you can see 'filenames' like > "" in error tracebacks), but somehow the > warning machinery doesn't pick up on this. Here's the solution: https://github.com/ipython/ipython/issues/6611 -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From charlesr.harris at gmail.com Fri Oct 3 18:55:12 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 3 Oct 2014 16:55:12 -0600 Subject: [Numpy-discussion] 0/0 == 0? In-Reply-To: References: Message-ID: On Fri, Oct 3, 2014 at 3:40 PM, Nathaniel Smith wrote: > On Fri, Oct 3, 2014 at 10:00 PM, Nathaniel Smith wrote: > > On Fri, Oct 3, 2014 at 5:13 AM, Charles R Harris > > wrote: > >> > >> On Thu, Oct 2, 2014 at 10:12 PM, Charles R Harris > >> wrote: > >> Plus the g*dawful warning default to only warn once. That has always > >> bothered me, it just seems useless. > > > > I believe the idea is to only warn once per source location, i.e. if > > you write something like: > > > > z = np.zeros(()) > > for foo in range(1000): > > z / z > > z / z > > > > then you're supposed to get two warnings, for the two lines that > > contain divide-by-zero. > > > > And then there is an unfortunate interaction where all lines entered > > interactively are considered to be "the same line" for these purposes. > > > > I don't know why this happens. It's either a bug in ipython or in > > warnings. The weird thing is that ipython does give each line its own > > unique name (you can see 'filenames' like > > "" in error tracebacks), but somehow the > > warning machinery doesn't pick up on this. > > Here's the solution: > https://github.com/ipython/ipython/issues/6611 > > IPython isn't involved, it's a script. The problem is that the error occurs in the function, and when/if the same error occurs in the function, regardless of where the function is called from, it won't be reported. It wouldn't be so bad if the error was traced to the first rather than last entry on the call stack. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Oct 3 19:21:00 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 4 Oct 2014 00:21:00 +0100 Subject: [Numpy-discussion] 0/0 == 0? In-Reply-To: References: Message-ID: On Fri, Oct 3, 2014 at 8:12 AM, Robert Kern wrote: > On Fri, Oct 3, 2014 at 4:29 AM, Nathaniel Smith wrote: >> On Fri, Oct 3, 2014 at 3:20 AM, Charles R Harris >> wrote: >>> >>> On Thu, Oct 2, 2014 at 7:06 PM, Benjamin Root wrote: >>>> >>>> Out[1] has an integer divided by an integer, and you can't represent nan >>>> as an integer. Perhaps something weird was happening with type promotion >>>> between versions? >>> >>> >>> Also note that in python3 the '/' operator does float rather than integer >>> division. >>> >>>>>> np.array(0) / np.array(0) >>> __main__:1: RuntimeWarning: invalid value encountered in true_divide >>> nan >> >> Floor division still acts the same though: >> >>>>> np.array(0) // np.array(0) >> __main__:1: RuntimeWarning: divide by zero encountered in floor_divide >> 0 >> >> The seterr warning system makes a lot of sense for IEEE754 floats, >> which are specifically designed so that 0/0 has a unique well-defined >> answer. For ints though this seems really broken to me. 0 / 0 = 0 is >> just the wrong answer. It would be nice if we had something reasonable >> to return, but we don't, and I'd rather raise an error than return the >> wrong answer. > > Well, actually, that's the really nice thing about seterr for ints! > CPUs have hardware floating point exception flags to work with. We had > to build one for ints. If you want an error, you can get an error. *I* > don't want an error, and I don't have to have one! Sure, that's fine for integer computations corner cases that have well-defined outputs, like wraparound. But it doesn't make sense for divide-by-zero. The key thing about the IEEE754 exception design is that it gives you the option of either raising an error immediately or else letting it propagate through the computation as a nan until you reach an appropriate place to handle it. With ints we don't have nan, so we don't have the second option. Our options are either raise an error immediately, or else return some nonsense value that will just cause you to get some meaningless result, with no way to detect or recover from this situation. (Why don't we define 0 / 0 == -72? It would make just as much sense.) The second option is terrible enough that I kinda don't believe you when you say you want it. Maybe I'm missing something but... Even more egregiously, numpy currently treats the integer divide-by-zero case identically with the floating-point one -- so if you want 0 / 0 to be an error (as you have to if you care about getting correct results), then you have to make 0.0 / 0.0 an error as well. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From robert.kern at gmail.com Fri Oct 3 19:40:59 2014 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 4 Oct 2014 00:40:59 +0100 Subject: [Numpy-discussion] 0/0 == 0? In-Reply-To: References: Message-ID: On Sat, Oct 4, 2014 at 12:21 AM, Nathaniel Smith wrote: > On Fri, Oct 3, 2014 at 8:12 AM, Robert Kern wrote: >> On Fri, Oct 3, 2014 at 4:29 AM, Nathaniel Smith wrote: >>> On Fri, Oct 3, 2014 at 3:20 AM, Charles R Harris >>> wrote: >>>> >>>> On Thu, Oct 2, 2014 at 7:06 PM, Benjamin Root wrote: >>>>> >>>>> Out[1] has an integer divided by an integer, and you can't represent nan >>>>> as an integer. Perhaps something weird was happening with type promotion >>>>> between versions? >>>> >>>> >>>> Also note that in python3 the '/' operator does float rather than integer >>>> division. >>>> >>>>>>> np.array(0) / np.array(0) >>>> __main__:1: RuntimeWarning: invalid value encountered in true_divide >>>> nan >>> >>> Floor division still acts the same though: >>> >>>>>> np.array(0) // np.array(0) >>> __main__:1: RuntimeWarning: divide by zero encountered in floor_divide >>> 0 >>> >>> The seterr warning system makes a lot of sense for IEEE754 floats, >>> which are specifically designed so that 0/0 has a unique well-defined >>> answer. For ints though this seems really broken to me. 0 / 0 = 0 is >>> just the wrong answer. It would be nice if we had something reasonable >>> to return, but we don't, and I'd rather raise an error than return the >>> wrong answer. >> >> Well, actually, that's the really nice thing about seterr for ints! >> CPUs have hardware floating point exception flags to work with. We had >> to build one for ints. If you want an error, you can get an error. *I* >> don't want an error, and I don't have to have one! > > Sure, that's fine for integer computations corner cases that have > well-defined outputs, like wraparound. But it doesn't make sense for > divide-by-zero. > > The key thing about the IEEE754 exception design is that it gives you > the option of either raising an error immediately or else letting it > propagate through the computation as a nan until you reach an > appropriate place to handle it. > > With ints we don't have nan, so we don't have the second option. Our > options are either raise an error immediately, or else return some > nonsense value that will just cause you to get some meaningless > result, with no way to detect or recover from this situation. (Why > don't we define 0 / 0 == -72? It would make just as much sense.) > > The second option is terrible enough that I kinda don't believe you > when you say you want it. Maybe I'm missing something but... I fix the values after-the-fact because one *can* detect and recover from this situation with just a smidgen of forethought. mask = (denominator == 0) x = numerator // denominator # We don't care about the masked cases. Fill them with a value that # will be harmless/ignored downstream. Here, it's 0. It might be something # else in other contexts. x[mask] = 0 > Even more egregiously, numpy currently treats the integer > divide-by-zero case identically with the floating-point one -- so if > you want 0 / 0 to be an error (as you have to if you care about > getting correct results), then you have to make 0.0 / 0.0 an error as > well. If you would like to introduce a separate `integer_divide` setting for errstate() and make it raise by default, I'd be marginally okay with that. In the above pattern, I'd be wrapping it with an errstate() context manager anyways to silence the warning, so silencing the default exception would be just as easy. However, nothing else in errstate() raises by default, so this would be the odd special case. -- Robert Kern From njs at pobox.com Fri Oct 3 21:17:33 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 4 Oct 2014 02:17:33 +0100 Subject: [Numpy-discussion] 0/0 == 0? In-Reply-To: References: Message-ID: On Sat, Oct 4, 2014 at 12:40 AM, Robert Kern wrote: > On Sat, Oct 4, 2014 at 12:21 AM, Nathaniel Smith wrote: >> On Fri, Oct 3, 2014 at 8:12 AM, Robert Kern wrote: >>> On Fri, Oct 3, 2014 at 4:29 AM, Nathaniel Smith wrote: >>>> On Fri, Oct 3, 2014 at 3:20 AM, Charles R Harris >>>> wrote: >>>>> >>>>> On Thu, Oct 2, 2014 at 7:06 PM, Benjamin Root wrote: >>>>>> >>>>>> Out[1] has an integer divided by an integer, and you can't represent nan >>>>>> as an integer. Perhaps something weird was happening with type promotion >>>>>> between versions? >>>>> >>>>> >>>>> Also note that in python3 the '/' operator does float rather than integer >>>>> division. >>>>> >>>>>>>> np.array(0) / np.array(0) >>>>> __main__:1: RuntimeWarning: invalid value encountered in true_divide >>>>> nan >>>> >>>> Floor division still acts the same though: >>>> >>>>>>> np.array(0) // np.array(0) >>>> __main__:1: RuntimeWarning: divide by zero encountered in floor_divide >>>> 0 >>>> >>>> The seterr warning system makes a lot of sense for IEEE754 floats, >>>> which are specifically designed so that 0/0 has a unique well-defined >>>> answer. For ints though this seems really broken to me. 0 / 0 = 0 is >>>> just the wrong answer. It would be nice if we had something reasonable >>>> to return, but we don't, and I'd rather raise an error than return the >>>> wrong answer. >>> >>> Well, actually, that's the really nice thing about seterr for ints! >>> CPUs have hardware floating point exception flags to work with. We had >>> to build one for ints. If you want an error, you can get an error. *I* >>> don't want an error, and I don't have to have one! >> >> Sure, that's fine for integer computations corner cases that have >> well-defined outputs, like wraparound. But it doesn't make sense for >> divide-by-zero. >> >> The key thing about the IEEE754 exception design is that it gives you >> the option of either raising an error immediately or else letting it >> propagate through the computation as a nan until you reach an >> appropriate place to handle it. >> >> With ints we don't have nan, so we don't have the second option. Our >> options are either raise an error immediately, or else return some >> nonsense value that will just cause you to get some meaningless >> result, with no way to detect or recover from this situation. (Why >> don't we define 0 / 0 == -72? It would make just as much sense.) >> >> The second option is terrible enough that I kinda don't believe you >> when you say you want it. Maybe I'm missing something but... > > I fix the values after-the-fact because one *can* detect and recover > from this situation with just a smidgen of forethought. > > > > mask = (denominator == 0) > x = numerator // denominator > # We don't care about the masked cases. Fill them with a value that > # will be harmless/ignored downstream. Here, it's 0. It might be something > # else in other contexts. > x[mask] = 0 > > I don't find this argument very convincing, except as an argument for having a deprecation period. In the unusual case where this is what you want, it's trivial and more explicit to write it directly -- bringing errstate into it is just rube goldbergian. E.g.: mask = (denominator == 0) x = np.floor_divide(numerator, denominator, where=~mask) x[mask] = 0 >> Even more egregiously, numpy currently treats the integer >> divide-by-zero case identically with the floating-point one -- so if >> you want 0 / 0 to be an error (as you have to if you care about >> getting correct results), then you have to make 0.0 / 0.0 an error as >> well. > > If you would like to introduce a separate `integer_divide` setting for > errstate() and make it raise by default, I'd be marginally okay with > that. In the above pattern, I'd be wrapping it with an errstate() > context manager anyways to silence the warning, so silencing the > default exception would be just as easy. However, nothing else in > errstate() raises by default, so this would be the odd special case. In a perfect world (which may or not match the actual world) I actually would prefer integer wraparound to be treated as its own category (instead of being lumped with float overflow-to-inf), and to raise by default (with the option to enable it explicitly). Unexpected inf's give you correct results in some cases and obviously-broken results in others; unexpected wraparound tends to produce silent bugs in basically all integer using code [1]; lumping them together is pretty suboptimal. But that's a whole 'nother discussion... -n [1] http://googleresearch.blogspot.co.uk/2006/06/extra-extra-read-all-about-it-nearly.html -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From g.brandl at gmx.net Sat Oct 4 03:50:37 2014 From: g.brandl at gmx.net (Georg Brandl) Date: Sat, 04 Oct 2014 09:50:37 +0200 Subject: [Numpy-discussion] 0/0 == 0? In-Reply-To: References: Message-ID: On 10/03/2014 11:24 PM, Charles R Harris wrote: > What I want is that the following script would warn three times > > import numpy as np > > z = np.zeros(1, dtype=np.int ) > > def f(x): > return x/x > > f(z) > f(z) > f(z) > > But it only warns once. That is not helpful when f gets called with an erroneous > argument from different places. Add import warnings warnings.simplefilter('always', RuntimeWarning) to get that effect. And as said before, if Numpy uses its own warning class it can put this filter in place automatically on import. You might also argue on python-dev to make RuntimeWarning default to "always" behavior in CPython, since dubious runtime features should always warn. Georg From robert.kern at gmail.com Sat Oct 4 04:41:43 2014 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 4 Oct 2014 09:41:43 +0100 Subject: [Numpy-discussion] 0/0 == 0? In-Reply-To: References: Message-ID: On Sat, Oct 4, 2014 at 2:17 AM, Nathaniel Smith wrote: > On Sat, Oct 4, 2014 at 12:40 AM, Robert Kern wrote: >> On Sat, Oct 4, 2014 at 12:21 AM, Nathaniel Smith wrote: >>> On Fri, Oct 3, 2014 at 8:12 AM, Robert Kern wrote: >>>> On Fri, Oct 3, 2014 at 4:29 AM, Nathaniel Smith wrote: >>>>> On Fri, Oct 3, 2014 at 3:20 AM, Charles R Harris >>>>> wrote: >>>>>> >>>>>> On Thu, Oct 2, 2014 at 7:06 PM, Benjamin Root wrote: >>>>>>> >>>>>>> Out[1] has an integer divided by an integer, and you can't represent nan >>>>>>> as an integer. Perhaps something weird was happening with type promotion >>>>>>> between versions? >>>>>> >>>>>> >>>>>> Also note that in python3 the '/' operator does float rather than integer >>>>>> division. >>>>>> >>>>>>>>> np.array(0) / np.array(0) >>>>>> __main__:1: RuntimeWarning: invalid value encountered in true_divide >>>>>> nan >>>>> >>>>> Floor division still acts the same though: >>>>> >>>>>>>> np.array(0) // np.array(0) >>>>> __main__:1: RuntimeWarning: divide by zero encountered in floor_divide >>>>> 0 >>>>> >>>>> The seterr warning system makes a lot of sense for IEEE754 floats, >>>>> which are specifically designed so that 0/0 has a unique well-defined >>>>> answer. For ints though this seems really broken to me. 0 / 0 = 0 is >>>>> just the wrong answer. It would be nice if we had something reasonable >>>>> to return, but we don't, and I'd rather raise an error than return the >>>>> wrong answer. >>>> >>>> Well, actually, that's the really nice thing about seterr for ints! >>>> CPUs have hardware floating point exception flags to work with. We had >>>> to build one for ints. If you want an error, you can get an error. *I* >>>> don't want an error, and I don't have to have one! >>> >>> Sure, that's fine for integer computations corner cases that have >>> well-defined outputs, like wraparound. But it doesn't make sense for >>> divide-by-zero. >>> >>> The key thing about the IEEE754 exception design is that it gives you >>> the option of either raising an error immediately or else letting it >>> propagate through the computation as a nan until you reach an >>> appropriate place to handle it. >>> >>> With ints we don't have nan, so we don't have the second option. Our >>> options are either raise an error immediately, or else return some >>> nonsense value that will just cause you to get some meaningless >>> result, with no way to detect or recover from this situation. (Why >>> don't we define 0 / 0 == -72? It would make just as much sense.) >>> >>> The second option is terrible enough that I kinda don't believe you >>> when you say you want it. Maybe I'm missing something but... >> >> I fix the values after-the-fact because one *can* detect and recover >> from this situation with just a smidgen of forethought. >> >> >> >> mask = (denominator == 0) >> x = numerator // denominator >> # We don't care about the masked cases. Fill them with a value that >> # will be harmless/ignored downstream. Here, it's 0. It might be something >> # else in other contexts. >> x[mask] = 0 >> >> > > I don't find this argument very convincing, except as an argument for > having a deprecation period. In the unusual case where this is what > you want, it's trivial and more explicit to write it directly -- > bringing errstate into it is just rube goldbergian. E.g.: > > mask = (denominator == 0) > x = np.floor_divide(numerator, denominator, where=~mask) > x[mask] = 0 In any case, controlling the errstate() is important because that operation is often buried inside a function written by someone who made assumptions about the inputs. Even if I had remembered that we had added the `where=` keyword recently, I would still run into places where I needed to silence the error. This is an old, long-established pattern, not a rube goldbergian contraption. It's how all of numpy.ma works for domained operations like division. >>> Even more egregiously, numpy currently treats the integer >>> divide-by-zero case identically with the floating-point one -- so if >>> you want 0 / 0 to be an error (as you have to if you care about >>> getting correct results), then you have to make 0.0 / 0.0 an error as >>> well. >> >> If you would like to introduce a separate `integer_divide` setting for >> errstate() and make it raise by default, I'd be marginally okay with >> that. In the above pattern, I'd be wrapping it with an errstate() >> context manager anyways to silence the warning, so silencing the >> default exception would be just as easy. However, nothing else in >> errstate() raises by default, so this would be the odd special case. > > In a perfect world (which may or not match the actual world) I > actually would prefer integer wraparound to be treated as its own > category (instead of being lumped with float overflow-to-inf), and to > raise by default (with the option to enable it explicitly). Unexpected > inf's give you correct results in some cases and obviously-broken > results in others; unexpected wraparound tends to produce silent bugs > in basically all integer using code [1]; lumping them together is > pretty suboptimal. But that's a whole 'nother discussion... Well, it's up to you to make a concrete proposal. You have the opportunity to make the world you want. -- Robert Kern From arokem at gmail.com Sat Oct 4 14:37:05 2014 From: arokem at gmail.com (Ariel Rokem) Date: Sat, 4 Oct 2014 11:37:05 -0700 Subject: [Numpy-discussion] Changed behavior of np.gradient Message-ID: Hi everyone, >>> import numpy as np >>> np.__version__ '1.9.0' >>> np.gradient(np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float)) [array([[ 2., 2., -1.], [ 2., 2., -1.]]), array([[-0.5, 2.5, 5.5], [ 1. , 1. , 1. ]])] On the other hand: >>> import numpy as np >>> np.__version__ '1.8.2' >>> np.gradient(np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float)) [array([[ 2., 2., -1.], [ 2., 2., -1.]]), array([[ 1. , 2.5, 4. ], [ 1. , 1. , 1. ]])] For what it's worth, the 1.8 version of this function seems to be in agreement with the Matlab equivalent function ('gradient'): >> gradient([[1, 2, 6]; [3, 4, 5]]) ans = 1.0000 2.5000 4.0000 1.0000 1.0000 1.0000 This seems like a regression to me, but maybe it's an improvement? Cheers, -------------- next part -------------- An HTML attachment was scrubbed... URL: From derek at astro.physik.uni-goettingen.de Sat Oct 4 15:29:46 2014 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Sat, 4 Oct 2014 21:29:46 +0200 Subject: [Numpy-discussion] Changed behavior of np.gradient In-Reply-To: References: Message-ID: <19A7BCC1-8608-47A8-9FB1-0FFFF567D573@astro.physik.uni-goettingen.de> On 4 Oct 2014, at 08:37 pm, Ariel Rokem wrote: > >>> import numpy as np > >>> np.__version__ > '1.9.0' > >>> np.gradient(np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float)) > [array([[ 2., 2., -1.], > [ 2., 2., -1.]]), array([[-0.5, 2.5, 5.5], > [ 1. , 1. , 1. ]])] > > On the other hand: > >>> import numpy as np > >>> np.__version__ > '1.8.2' > >>> np.gradient(np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float)) > [array([[ 2., 2., -1.], > [ 2., 2., -1.]]), array([[ 1. , 2.5, 4. ], > [ 1. , 1. , 1. ]])] > > For what it's worth, the 1.8 version of this function seems to be in agreement with the Matlab equivalent function ('gradient'): > >> gradient([[1, 2, 6]; [3, 4, 5]]) > ans = > 1.0000 2.5000 4.0000 > 1.0000 1.0000 1.0000 > > This seems like a regression to me, but maybe it's an improvement? > Technically yes, the function has been changed to use 2nd-order differences where possible, as is described in the docstring. Someone missed to update the example though, which still quotes the 1.8 results. And if the loss of Matlab-compliance is seen as a disadvantage, maybe there is a case for re-enabling the old behaviour via keyword argument? Cheers, Derek From arokem at gmail.com Sat Oct 4 15:53:26 2014 From: arokem at gmail.com (Ariel Rokem) Date: Sat, 4 Oct 2014 12:53:26 -0700 Subject: [Numpy-discussion] Changed behavior of np.gradient In-Reply-To: <19A7BCC1-8608-47A8-9FB1-0FFFF567D573@astro.physik.uni-goettingen.de> References: <19A7BCC1-8608-47A8-9FB1-0FFFF567D573@astro.physik.uni-goettingen.de> Message-ID: On Sat, Oct 4, 2014 at 12:29 PM, Derek Homeier < derek at astro.physik.uni-goettingen.de> wrote: > On 4 Oct 2014, at 08:37 pm, Ariel Rokem wrote: > > > >>> import numpy as np > > >>> np.__version__ > > '1.9.0' > > >>> np.gradient(np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float)) > > [array([[ 2., 2., -1.], > > [ 2., 2., -1.]]), array([[-0.5, 2.5, 5.5], > > [ 1. , 1. , 1. ]])] > > > > On the other hand: > > >>> import numpy as np > > >>> np.__version__ > > '1.8.2' > > >>> np.gradient(np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float)) > > [array([[ 2., 2., -1.], > > [ 2., 2., -1.]]), array([[ 1. , 2.5, 4. ], > > [ 1. , 1. , 1. ]])] > > > > For what it's worth, the 1.8 version of this function seems to be in > agreement with the Matlab equivalent function ('gradient'): > > >> gradient([[1, 2, 6]; [3, 4, 5]]) > > ans = > > 1.0000 2.5000 4.0000 > > 1.0000 1.0000 1.0000 > > > > This seems like a regression to me, but maybe it's an improvement? > > > Technically yes, the function has been changed to use 2nd-order > differences where possible, > as is described in the docstring. Someone missed to update the example > though, which still > quotes the 1.8 results. > And if the loss of Matlab-compliance is seen as a disadvantage, maybe > there is a case for > re-enabling the old behaviour via keyword argument? > > Thanks for clarifying - I see that now in the docstring as well. It went from: "The gradient is computed using central differences in the interior and first differences at the boundaries." to "The gradient is computed using second order accurate central differences in the interior and second order accurate one-sides (forward or backwards) differences at the boundaries.". I think that the docstring in 1.9 is fine (has the 1.9 result). The docs online (for all of numpy) are still on version 1.8, though. I think that enabling the old behavior might be useful, if only so that I can write code that behaves consistently across these two versions of numpy. For now, I might just copy over the 1.8 code into my project. Cheers, Ariel > Cheers, > Derek > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From derek at astro.physik.uni-goettingen.de Sat Oct 4 16:14:18 2014 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Sat, 4 Oct 2014 22:14:18 +0200 Subject: [Numpy-discussion] Changed behavior of np.gradient In-Reply-To: References: <19A7BCC1-8608-47A8-9FB1-0FFFF567D573@astro.physik.uni-goettingen.de> Message-ID: <6711F56D-CFBE-4C55-AC55-F92E7EC6CB12@astro.physik.uni-goettingen.de> Hi Ariel, > I think that the docstring in 1.9 is fine (has the 1.9 result). The docs online (for all of numpy) are still on version 1.8, though. > > I think that enabling the old behavior might be useful, if only so that I can write code that behaves consistently across these two versions of numpy. For now, I might just copy over the 1.8 code into my project. > Hmm, I got this with 1.9.0: Examples -------- >>> x = np.array([1, 2, 4, 7, 11, 16], dtype=np.float) >>> np.gradient(x) array([ 1. , 1.5, 2.5, 3.5, 4.5, 5. ]) >>> np.gradient(x, 2) array([ 0.5 , 0.75, 1.25, 1.75, 2.25, 2.5 ]) >>> np.gradient(np.array([[1, 2, 6], [3, 4, 5]], dtype=np.float)) [array([[ 2., 2., -1.], [ 2., 2., -1.]]), array([[ 1. , 2.5, 4. ], [ 1. , 1. , 1. ]])] In [5]: x =np.array([1, 2, 4, 7, 11, 16], dtype=np.float) In [6]: print(np.gradient(x)) [ 0.5 1.5 2.5 3.5 4.5 5.5] In [7]: print(np.gradient(x, 2)) [ 0.25 0.75 1.25 1.75 2.25 2.75] ? I think there is a point for supporting the old behaviour besides backwards-compatibility or any sort of Matlab-compliance as I?d probably like to be able to restrict a function to linear/1st order differences in cases where I know the input to be not well-behaved. +1 for an order=2 or maxorder=2 flag Cheers, Derek From stefan at sun.ac.za Sat Oct 4 17:16:56 2014 From: stefan at sun.ac.za (=?UTF-8?Q?St=C3=A9fan_van_der_Walt?=) Date: Sat, 4 Oct 2014 23:16:56 +0200 Subject: [Numpy-discussion] Changed behavior of np.gradient In-Reply-To: <6711F56D-CFBE-4C55-AC55-F92E7EC6CB12@astro.physik.uni-goettingen.de> References: <19A7BCC1-8608-47A8-9FB1-0FFFF567D573@astro.physik.uni-goettingen.de> <6711F56D-CFBE-4C55-AC55-F92E7EC6CB12@astro.physik.uni-goettingen.de> Message-ID: On Oct 4, 2014 10:14 PM, "Derek Homeier" < derek at astro.physik.uni-goettingen.de> wrote: > > +1 for an order=2 or maxorder=2 flag If you parameterize that flag, users will want to change its value (above two). Perhaps rather use a boolean flag such as "second_order" or "high_order", unless it seems feasible to include additional orders in the future. St?fan -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Oct 4 19:52:50 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 4 Oct 2014 17:52:50 -0600 Subject: [Numpy-discussion] Round away from zero (towards +/- infinity) In-Reply-To: References: Message-ID: On Fri, Oct 3, 2014 at 11:28 AM, T J wrote: > It does, but it is not portable. That's why I was hoping NumPy might think > about supporting more rounding algorithms. > > On Thu, Oct 2, 2014 at 10:00 PM, John Zwinck wrote: > >> On 3 Oct 2014 07:09, "T J" wrote: >> > >> > Any bites on this? >> > >> > On Wed, Sep 24, 2014 at 12:23 PM, T J wrote: >> >> Python's round function goes away from zero, so I am looking for the >> NumPy equivalent (and using vectorize() seems undesirable). In this sense, >> it seems that having a ufunc for this type of rounding could be helpful. >> >> >> >> Aside: Is there interest in a more general around() that allows users >> to specify alternative tie-breaking rules, with the default staying 'round >> half to nearest even'? [1] >> >> --- >> >> [1] >> http://stackoverflow.com/questions/16000574/tie-breaking-of-round-with-numpy >> >> I like the solution given in that Stack Overflow post, namely using >> ctypes to call fesetround(). Does that work for you? >> >> >> In [4]: def roundout(x): ...: return trunc(x + copysign(.5, x)) ...: Will do what you want, if not quite as nicely as a ufunc. Won't work as is for complex, but that could be handled with a view. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ferdinand.bayard at strains.fr Mon Oct 6 11:03:48 2014 From: ferdinand.bayard at strains.fr (ferdinand.bayard) Date: Mon, 6 Oct 2014 08:03:48 -0700 (MST) Subject: [Numpy-discussion] f2py and debug mode In-Reply-To: References: <542AACDE.7010408@strains.fr> Message-ID: <1412607828476-38792.post@n7.nabble.com> Hello ! And thx for answering ! Meanwhile, I founded a solution for my problem : I hacked some lines setting the compiler options in the f2py Python files, which enables the generation of pdb files for debugging. In msvc9compiler.py, line 432 self.ldflags_shared = ['/DLL', '/nologo', '/INCREMENTAL:no', *'/DEBUG', '/pdb:link.pdb'*] In IntelCompiler, line 155 def get_flags(self): opt = ['/nologo', '/MD', '/nbs', '/Qlowercase', '/us',*'/debug'*,'/Qmkl:parallel','/free'] I got another problem, I could not succeed in linking with external routines. By example, for : call system('....') error LNK2019: unresolved external symbol _system. (I get the same result with other libraries, even if I set the flags in setup.py : ext1 = Extension(name = moduleName, sources = ..., libraries = ..., library_dirs = ..., f2py_options = f2py_options) Thx for your help. Pearu Peterson-2 wrote > Hi, > > When you run f2py without -c option, the wrapper source files are > generated > without compiling them. > With these source files and fortranobject.c, you can build the extension > module with your specific compiler options using the compiler framework of > your choice. > I am not familiar with Visual Studio specifics to suggest a more detailed > solution but after generating wrapper source files there is no f2py > specific way to build the extension module, in fact, `f2py -c` relies on > distutils compilation/linkage methods. > > HTH, > Pearu > > On Tue, Sep 30, 2014 at 4:15 PM, Bayard < > ferdinand.bayard@ > > wrote: > >> Hello to all. >> I'm aiming to wrap a Fortran program into Python. I started to work with >> f2py, and am trying to setup a debug mode where I could reach >> breakpoints in Fortran module launched by Python. I've been looking in >> the existing post, but not seeing things like that. >> >> I'm used to work with visual studio 2012 and Intel Fortran compiler, I >> have tried to get that point doing : >> 1) Run f2py -m to get *.c wrapper >> 2) Embed it in a C Project in Visual Studio, containing also with >> fortranobject.c and fortranobject.h, >> 3) Create a solution which also contains my fortran files compiled as a >> lib >> 4) Generate in debug mode a "dll" with extension pyd (to get to that >> point name of the "main" function in Fortran by "_main"). >> >> I compiled without any error, and reach break point in C Wrapper, but >> not in Fortran, and the fortran code seems not to be executed (whereas >> it is when compiling with f2py -c). Trying to understand f2py code, I >> noticed that f2py is not only writing c-wrapper, but compiling it in a >> specific way. Is there a way to get a debug mode in Visual Studio with >> f2py (some members of the team are used to it) ? Any alternative tool we >> should use for debugging ? >> >> Thanks for answering >> Ferdinand >> >> >> >> >> --- >> Ce courrier ?lectronique ne contient aucun virus ou logiciel malveillant >> parce que la protection avast! Antivirus est active. >> http://www.avast.com >> >> _______________________________________________ >> NumPy-Discussion mailing list >> > NumPy-Discussion@ >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@ > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- View this message in context: http://numpy-discussion.10968.n7.nabble.com/f2py-and-debug-mode-tp38736p38792.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From andrew.collette at gmail.com Mon Oct 6 12:54:24 2014 From: andrew.collette at gmail.com (Andrew Collette) Date: Mon, 6 Oct 2014 10:54:24 -0600 Subject: [Numpy-discussion] Copyright status of NumPy binaries on Windows/OS X Message-ID: Hi all, I am working with the HDF Group on a new open-source viewer program for HDF5 files, powered by NumPy, h5py, and wxPython. On Windows, since people don't typically have Python installed, we are looking to distribute the application using PyInstaller, which embeds dependencies like NumPy. Likewise for OS X (using Py2App). We would like to make sure we don't accidentally include non-open-source components... I recall there was some discussion here about using the Intel math libraries for binary releases on various platforms. Do the releases on SourceForge or PyPI use any proprietary code? We'd like to avoid building NumPy ourselves if we can avoid it. Apologies if this is explained somewhere, but I couldn't find it. Thanks! Andrew Collette From jeffreback at gmail.com Tue Oct 7 08:30:52 2014 From: jeffreback at gmail.com (Jeff Reback) Date: Tue, 7 Oct 2014 08:30:52 -0400 Subject: [Numpy-discussion] ANN: Pandas 0.15.0 Release Candiate 1 Message-ID: Hi, I'm pleased to announce the availability of the first release candidate of Pandas 0.15.0. Please try this RC and report any issues here: Pandas Issues We will be releasing officially in 1-2 weeks or so. This is a major release from 0.14.1 and includes a number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. Highlights include: - Drop support for numpy < 1.7.0 - The Categorical type was integrated as a first-class pandas type - New scalar type Timedelta, and a new index type TimedeltaIndex - New DataFrame default display for df.info() to include memory usage - New datetimelike properties accessor .dt for Series - Split indexing documentation into Indexing and Selecting Data and MultiIndex / Advanced Indexing - Split out string methods documentation into Working with Text Data - read_csv will now by default ignore blank lines when parsing - API change in using Indexes in set operations - Internal refactoring of the Index class to no longer sub-class ndarray - dropping support for PyTables less than version 3.0.0, and numexpr less than version 2.1 Here are the full whatsnew and documentation links: v0.15.0 Whatsnew v0.15.0 Documentation Page Source tarballs, and windows builds are available here: Pandas v0.15.0rc1 Release A big thank you to everyone who contributed to this release! Jeff -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Tue Oct 7 17:13:42 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 07 Oct 2014 23:13:42 +0200 Subject: [Numpy-discussion] Copyright status of NumPy binaries on Windows/OS X In-Reply-To: References: Message-ID: <54345786.1090506@googlemail.com> On 06.10.2014 18:54, Andrew Collette wrote: > Hi all, > > I am working with the HDF Group on a new open-source viewer program > for HDF5 files, powered by NumPy, h5py, and wxPython. On Windows, > since people don't typically have Python installed, we are looking to > distribute the application using PyInstaller, which embeds > dependencies like NumPy. Likewise for OS X (using Py2App). > > We would like to make sure we don't accidentally include > non-open-source components... I recall there was some discussion here > about using the Intel math libraries for binary releases on various > platforms. Do the releases on SourceForge or PyPI use any proprietary > code? We'd like to avoid building NumPy ourselves if we can avoid it. > > Apologies if this is explained somewhere, but I couldn't find it. > > Thanks! > Andrew Collette Hi, the numpy win32 binaries on sourceforge do not contain any proprietary code. They are build with mingw 3.4.5 and are using a f2c'd version of netlib blas and lapack which so far I know is public domain. I think the macos wheels on pypi are built using ATLAS but they do also contain libquadmath which is LGPL licensed. Its probably pulled in by fortran (could maybe be removed by a rebuild as neither blas nor numpy use it) There are also unofficial win64 binaries floating around, I don't know what they are using, but its possible they contain MKL, you need to check with who is building these (Christoph Gohlke I think). Cheers, Julian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From travis at continuum.io Tue Oct 7 19:32:44 2014 From: travis at continuum.io (Travis Oliphant) Date: Tue, 7 Oct 2014 18:32:44 -0500 Subject: [Numpy-discussion] Copyright status of NumPy binaries on Windows/OS X In-Reply-To: References: Message-ID: Hey Andrew, You can use any of the binaries from Anaconda and redistribute them as long as you "cite" Anaconda --- i.e. tell your users that they are using Anaconda-derived binaries. The Anaconda binaries link against ATLAS. The binaries are all at http://repo.continuum.io/pkgs/ In case you weren't aware: Another way you can build and distribute an "application" is to build a 'conda' meta-package which lists all the dependencies. If you add to this meta-package 1) an icon and 2) an entry-point, then your application will automatically show up in the "Anaconda Launcher" (see this blog-post: http://www.continuum.io/blog/new-launcher ) and anyone with the Anaconda Launcher app can install/update your package by clicking on the icon next to it. Users can also install your package with conda install or using the conda-gui. Best, -Travis On Mon, Oct 6, 2014 at 11:54 AM, Andrew Collette wrote: > Hi all, > > I am working with the HDF Group on a new open-source viewer program > for HDF5 files, powered by NumPy, h5py, and wxPython. On Windows, > since people don't typically have Python installed, we are looking to > distribute the application using PyInstaller, which embeds > dependencies like NumPy. Likewise for OS X (using Py2App). > > We would like to make sure we don't accidentally include > non-open-source components... I recall there was some discussion here > about using the Intel math libraries for binary releases on various > platforms. Do the releases on SourceForge or PyPI use any proprietary > code? We'd like to avoid building NumPy ourselves if we can avoid it. > > Apologies if this is explained somewhere, but I couldn't find it. > > Thanks! > Andrew Collette > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Travis Oliphant CEO Continuum Analytics, Inc. http://www.continuum.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmkleffner at gmail.com Wed Oct 8 14:12:32 2014 From: cmkleffner at gmail.com (Carl Kleffner) Date: Wed, 8 Oct 2014 20:12:32 +0200 Subject: [Numpy-discussion] Copyright status of NumPy binaries on Windows/OS X In-Reply-To: References: Message-ID: Hi Travis, the Anaconda binaries (free packages as well as the non-free addons) link against Intel MKL - not against ATLAS. Are this binaries really free redistributable as stated? The lack of numpy/scipy 64bit windows binaries with opensource blas/lapack with was one of the main reasons to start with the development of a dedicated mingw-w64 based compiler toolchain to support OpenBLAS / ATLAS based binaries on windows. Cheers, carlkl 2014-10-08 1:32 GMT+02:00 Travis Oliphant : > Hey Andrew, > > You can use any of the binaries from Anaconda and redistribute them as > long as you "cite" Anaconda --- i.e. tell your users that they are using > Anaconda-derived binaries. The Anaconda binaries link against ATLAS. > > The binaries are all at http://repo.continuum.io/pkgs/ > > In case you weren't aware: > > Another way you can build and distribute an "application" is to build a > 'conda' meta-package which lists all the dependencies. If you add to this > meta-package 1) an icon and 2) an entry-point, then your application will > automatically show up in the "Anaconda Launcher" (see this blog-post: > http://www.continuum.io/blog/new-launcher ) and anyone with the Anaconda > Launcher app can install/update your package by clicking on the icon next > to it. > > Users can also install your package with conda install or using the > conda-gui. > > Best, > > -Travis > > > On Mon, Oct 6, 2014 at 11:54 AM, Andrew Collette < > andrew.collette at gmail.com> wrote: > >> Hi all, >> >> I am working with the HDF Group on a new open-source viewer program >> for HDF5 files, powered by NumPy, h5py, and wxPython. On Windows, >> since people don't typically have Python installed, we are looking to >> distribute the application using PyInstaller, which embeds >> dependencies like NumPy. Likewise for OS X (using Py2App). >> >> We would like to make sure we don't accidentally include >> non-open-source components... I recall there was some discussion here >> about using the Intel math libraries for binary releases on various >> platforms. Do the releases on SourceForge or PyPI use any proprietary >> code? We'd like to avoid building NumPy ourselves if we can avoid it. >> >> Apologies if this is explained somewhere, but I couldn't find it. >> >> Thanks! >> Andrew Collette >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > > -- > > Travis Oliphant > CEO > Continuum Analytics, Inc. > http://www.continuum.io > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Wed Oct 8 15:59:24 2014 From: travis at continuum.io (Travis Oliphant) Date: Wed, 8 Oct 2014 14:59:24 -0500 Subject: [Numpy-discussion] Copyright status of NumPy binaries on Windows/OS X In-Reply-To: References: Message-ID: Only on Windows does free Anaconda link against the MKL. But, you are correct, that the MKL-linked binaries can only be re-distributed if the person or entity doing the re-distribution has a valid MKL license from Intel. Microsoft has actually released their Visual Studio 2008 compiler stack so that OpenBLAS and ATLAS could be compiled on Windows for these platforms as well. I would be very interested to see conda packages for these libraries which should be pretty straightforward to build. -Travis On Wed, Oct 8, 2014 at 1:12 PM, Carl Kleffner wrote: > Hi Travis, > > the Anaconda binaries (free packages as well as the non-free addons) link > against Intel MKL - not against ATLAS. Are this binaries really free > redistributable as stated? > > The lack of numpy/scipy 64bit windows binaries with opensource blas/lapack > with was one of the main reasons to start with the development of a > dedicated mingw-w64 based compiler toolchain to support OpenBLAS / ATLAS > based binaries on windows. > > Cheers, > > carlkl > > > > 2014-10-08 1:32 GMT+02:00 Travis Oliphant : > >> Hey Andrew, >> >> You can use any of the binaries from Anaconda and redistribute them as >> long as you "cite" Anaconda --- i.e. tell your users that they are using >> Anaconda-derived binaries. The Anaconda binaries link against ATLAS. >> >> The binaries are all at http://repo.continuum.io/pkgs/ >> >> In case you weren't aware: >> >> Another way you can build and distribute an "application" is to build a >> 'conda' meta-package which lists all the dependencies. If you add to this >> meta-package 1) an icon and 2) an entry-point, then your application will >> automatically show up in the "Anaconda Launcher" (see this blog-post: >> http://www.continuum.io/blog/new-launcher ) and anyone with the Anaconda >> Launcher app can install/update your package by clicking on the icon next >> to it. >> >> Users can also install your package with conda install or using the >> conda-gui. >> >> Best, >> >> -Travis >> >> >> On Mon, Oct 6, 2014 at 11:54 AM, Andrew Collette < >> andrew.collette at gmail.com> wrote: >> >>> Hi all, >>> >>> I am working with the HDF Group on a new open-source viewer program >>> for HDF5 files, powered by NumPy, h5py, and wxPython. On Windows, >>> since people don't typically have Python installed, we are looking to >>> distribute the application using PyInstaller, which embeds >>> dependencies like NumPy. Likewise for OS X (using Py2App). >>> >>> We would like to make sure we don't accidentally include >>> non-open-source components... I recall there was some discussion here >>> about using the Intel math libraries for binary releases on various >>> platforms. Do the releases on SourceForge or PyPI use any proprietary >>> code? We'd like to avoid building NumPy ourselves if we can avoid it. >>> >>> Apologies if this is explained somewhere, but I couldn't find it. >>> >>> Thanks! >>> Andrew Collette >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> >> >> -- >> >> Travis Oliphant >> CEO >> Continuum Analytics, Inc. >> http://www.continuum.io >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Travis Oliphant CEO Continuum Analytics, Inc. http://www.continuum.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Wed Oct 8 17:46:58 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Wed, 8 Oct 2014 21:46:58 +0000 (UTC) Subject: [Numpy-discussion] Copyright status of NumPy binaries on Windows/OS X References: Message-ID: <2050029887434497244.115215sturla.molden-gmail.com@news.gmane.org> Travis Oliphant wrote: > Microsoft has actually released their Visual Studio 2008 compiler stack so > that OpenBLAS and ATLAS could be compiled on Windows for these platforms as > well. I would be very interested to see conda packages for these > libraries which should be pretty straightforward to build. OpenBLAS does not compile with Microsoft compilers because of AT&T assembly syntax. You need to use a GNU compiler and you also need to have a GNU environment. OpenBLAS is easy to build on Windows with MinGW (with gfortran) and MSYS. Carl's toolchain ensures that the binaries are compatible with the Python binaries from Python.org. Sturla From travis at continuum.io Wed Oct 8 21:24:34 2014 From: travis at continuum.io (Travis Oliphant) Date: Wed, 8 Oct 2014 20:24:34 -0500 Subject: [Numpy-discussion] Copyright status of NumPy binaries on Windows/OS X In-Reply-To: <2050029887434497244.115215sturla.molden-gmail.com@news.gmane.org> References: <2050029887434497244.115215sturla.molden-gmail.com@news.gmane.org> Message-ID: Ah, yes, I hadn't realized that OpenBLAS could not be compiled with Visual Studio. Thanks for that explanation. Also, I had heard that 32bit mingw on Windows could still produce 64-bit binaries. It looks like there are OpenBLAS binaries available for Windows 32 and Windows 64 (two flavors). It should be straightforward to take those binaries and make conda (or wheel) packages out of them. A good mingw64 stack for Windows would be great and benefits many communities. On Wed, Oct 8, 2014 at 4:46 PM, Sturla Molden wrote: > Travis Oliphant wrote: > > > Microsoft has actually released their Visual Studio 2008 compiler stack > so > > that OpenBLAS and ATLAS could be compiled on Windows for these platforms > as > > well. I would be very interested to see conda packages for these > > libraries which should be pretty straightforward to build. > > OpenBLAS does not compile with Microsoft compilers because of AT&T assembly > syntax. You need to use a GNU compiler and you also need to have a GNU > environment. OpenBLAS is easy to build on Windows with MinGW (with > gfortran) and MSYS. Carl's toolchain ensures that the binaries are > compatible with the Python binaries from Python.org. > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Travis Oliphant CEO Continuum Analytics, Inc. http://www.continuum.io -------------- next part -------------- An HTML attachment was scrubbed... URL: From suchithjn22 at gmail.com Thu Oct 9 01:42:31 2014 From: suchithjn22 at gmail.com (suchith) Date: Thu, 09 Oct 2014 11:12:31 +0530 Subject: [Numpy-discussion] Extracting individual columns in Numpy Message-ID: <54362047.10102@gmail.com> How to extract individual columns from a numpy array? For example, consider this script import numpy as np a = np.array([[1,2,3],[4,5,6],[7,8,9]]) a[0][:] a[:][0] Now both a[:][0] and a[0][:] are outputting the same result, i.e np.array([1,2,3]). If I want to extract the array [[1],[4],[7]] then what should I do? Is it possible to add this feature? If so, which file(s) should I edit? I know C,C++ and Python programming and I am new to open-source software development. Please help me. I have attached the screenshot with this mail. Thanks Suchith.J.N -------------- next part -------------- A non-text attachment was scrubbed... Name: numpy-problem.png Type: image/png Size: 100618 bytes Desc: not available URL: From njs at pobox.com Thu Oct 9 01:52:45 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 9 Oct 2014 06:52:45 +0100 Subject: [Numpy-discussion] Extracting individual columns in Numpy In-Reply-To: <54362047.10102@gmail.com> References: <54362047.10102@gmail.com> Message-ID: On Thu, Oct 9, 2014 at 6:42 AM, suchith wrote: > How to extract individual columns from a numpy array? > For example, consider this script > > import numpy as np > a = np.array([[1,2,3],[4,5,6],[7,8,9]]) > a[0][:] > a[:][0] > > Now both a[:][0] and a[0][:] are outputting the same result, i.e > np.array([1,2,3]). If I want to extract the array [[1],[4],[7]] then what > should I do? You want a[:, 0]. I'd recommend never writing expressions like a[0], where you give just 1 index into a 2d array -- numpy interprets such a thing as equivalent to a[0, :], so you should just write a[0, :] in the first place, it'll be more explicit and less confusing. (This also explains the problem you're having: a[:] is the same as a[:, :], i.e., it just returns all of 'a'. So a[:][0] is the same as a[0]. Similarly, a[0][:] returns all of a[0].) (The one time you might want to write something that looks like a[foo], with no commas inside the [], is where 'foo' is a 2d boolean mask.) -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From andyfaff at gmail.com Thu Oct 9 01:55:38 2014 From: andyfaff at gmail.com (Andrew Nelson) Date: Thu, 9 Oct 2014 16:55:38 +1100 Subject: [Numpy-discussion] Extracting individual columns in Numpy Message-ID: >import numpy as np a = np.array([[1,2,3],[4,5,6],[7,8,9]]) a[0][:] a[:][0] Now both a[:][0] and a[0][:] are outputting the same result, i.e np.array([1,2,3]). If I want to extract the array [[1],[4],[7]] then what should I do? Is it possible to add this feature? The feature is already there: a[:, 0] -- _____________________________________ Dr. Andrew Nelson _____________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From suchithjn22 at gmail.com Thu Oct 9 01:59:59 2014 From: suchithjn22 at gmail.com (suchith) Date: Thu, 09 Oct 2014 11:29:59 +0530 Subject: [Numpy-discussion] Extracting individual columns in Numpy In-Reply-To: <54362047.10102@gmail.com> References: <54362047.10102@gmail.com> Message-ID: <5436245F.60809@gmail.com> Ok...I got it. Sorry for stupid question. On Thursday 09 October 2014 11:12 AM, suchith wrote: > How to extract individual columns from a numpy array? > For example, consider this script > > import numpy as np > a = np.array([[1,2,3],[4,5,6],[7,8,9]]) > a[0][:] > a[:][0] > > Now both a[:][0] and a[0][:] are outputting the same result, i.e > np.array([1,2,3]). If I want to extract the array [[1],[4],[7]] then > what should I do? Is it possible to add this feature? If so, which > file(s) should I edit? > > I know C,C++ and Python programming and I am new to open-source > software development. Please help me. > > I have attached the screenshot with this mail. > > Thanks > Suchith.J.N > From sturla.molden at gmail.com Thu Oct 9 04:46:31 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 9 Oct 2014 08:46:31 +0000 (UTC) Subject: [Numpy-discussion] Copyright status of NumPy binaries on Windows/OS X References: <2050029887434497244.115215sturla.molden-gmail.com@news.gmane.org> Message-ID: <1377850661434536865.878951sturla.molden-gmail.com@news.gmane.org> Travis Oliphant wrote: > A good mingw64 stack for Windows would be great and benefits many > communities. Carl Kleffner has made 32- and 64-bit mingw stacks compatible with Python. E.g. the stack alignment in the 32-bit version is different from the vanilla mingw distribution. It also, for the first time, allow us to build SciPy with gfortran instead of g77 on Windows, which means we don't have to limit the Fortran code in SciPy to Fortran 77 legacy code. Sturla From sturla.molden at gmail.com Thu Oct 9 04:50:36 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 9 Oct 2014 08:50:36 +0000 (UTC) Subject: [Numpy-discussion] Copyright status of NumPy binaries on Windows/OS X References: <2050029887434497244.115215sturla.molden-gmail.com@news.gmane.org> Message-ID: <1913118610434537300.134393sturla.molden-gmail.com@news.gmane.org> Travis Oliphant wrote: > A good mingw64 stack for Windows would be great and benefits many > communities. BTW: Carl Kleffners mingw toolchains are here: Documentation: https://github.com/numpy/numpy/wiki/Mingw-static-toolchain Downloads: https://bitbucket.org/carlkl/mingw-w64-for-python/downloads From fiolj at yahoo.com Thu Oct 9 08:50:33 2014 From: fiolj at yahoo.com (Juan) Date: Thu, 09 Oct 2014 09:50:33 -0300 Subject: [Numpy-discussion] Extracting individual columns in Numpy (suchith) In-Reply-To: References: Message-ID: <54368499.2000603@yahoo.com> > Date: Thu, 09 Oct 2014 11:12:31 +0530 > From: suchith > Subject: [Numpy-discussion] Extracting individual columns in Numpy > To: numpy-discussion at scipy.org > Message-ID: <54362047.10102 at gmail.com> > Content-Type: text/plain; charset="utf-8" > > How to extract individual columns from a numpy array? > For example, consider this script > > import numpy as np > a = np.array([[1,2,3],[4,5,6],[7,8,9]]) > a[0][:] > a[:][0] > > Now both a[:][0] and a[0][:] are outputting the same result, i.e > np.array([1,2,3]). If I want to extract the array [[1],[4],[7]] then > what should I do? Is it possible to add this feature? If so, which > file(s) should I edit? > > I know C,C++ and Python programming and I am new to open-source software > development. Please help me. > Try: In [2]: import numpy as np In [3]: a = np.array([[1,2,3],[4,5,6],[7,8,9]]) In [4]: a[0][:] Out[4]: array([1, 2, 3]) In [5]: a[:][0] Out[5]: array([1, 2, 3]) In [6]: a[0,:] Out[6]: array([1, 2, 3]) In [7]: a[:,0] Out[7]: array([1, 4, 7]) Syntax is different for numpy arrays. Regards, Juan From Per.Brodtkorb at ffi.no Fri Oct 10 05:34:17 2014 From: Per.Brodtkorb at ffi.no (Per.Brodtkorb at ffi.no) Date: Fri, 10 Oct 2014 09:34:17 +0000 Subject: [Numpy-discussion] Any interest in a generalized piecewise function? Message-ID: <8114F0AADAECD745AF1FC2047A5DC7ED1E54E27E@HBU-POST2.ffi.no> I have worked on a generalized piecewise function (genpiecewise) that are simpler and more general than the current numpy.piecewise implementation. The new generalized piecewise function allows functions of the type f(x0, x1,.. , xn) i.e. to have arbitrary number of input arguments that are evaluated conditionally. The generalized piecewise function passes all the tests for numpy.piecewise function except the undocumented features of numpy.piecewise which allows condlist to be a single bool list/array or a single int array. A new keyword "fillvalue" opens up the possibility to specify "out of bounds values" to other values than 0 eg. Np.nan. Examples: Example 1) >>> x = numpy.linspace(-2,2,5) >>> numpy.piecewise(x, x<1, [1]) # = numpy.where(x<0, 1, 0) array([ 1., 1., 1., 0., 0.]) # can be written as >>> genpiecewise([x<1],[1]) array([1, 1, 1, 0, 0]) # or >>> genpiecewise([x<1],[1], x) array([ 1., 1., 1., 0., 0.]) # or >>> genpiecewise([x<1],[1.0]) array([ 1., 1., 1., 0., 0.]) Example 2) >>> numpy.piecewise(x, [x < 0, x >= 0], [lambda x: -x, lambda x: x]) array([ 2., 1., 0., 1., 2.]) # can be written as >>> genpiecewise([x < 0, x >= 0], [lambda x: -x, lambda x: x], (x,)) array([ 2., 1., 0., 1., 2.]) # or genpiecewise([x < 0,], [lambda x: -x, lambda x: x], x) array([ 2., 1., 0., 1., 2.]) Example 3) # New functionality >>> X,Y = numpy.meshgrid(x,x) >>> genpiecewise([X*Y<0,], [lambda x,y: -x*y, lambda x,y: x*y], xi=(X,Y)) array([[ 4., 2., -0., 2., 4.], [ 2., 1., -0., 1., 2.], [-0., -0., 0., 0., 0.], [ 2., 1., 0., 1., 2.], [ 4., 2., 0., 2., 4.]]) >>> genpiecewise([X*Y<-0.5, X*Y>0.5], [lambda x,y: -x*y, lambda x,y: x*y], xi=(X,Y), fillvalue=numpy.nan) array([[ 4., 2., nan, 2., 4.], [ 2., 1., nan, 1., 2.], [ nan, nan, nan, nan, nan], [ 2., 1., nan, 1., 2.], [ 4., 2., nan, 2., 4.]]) >>> genpiecewise([X*Y<-0.5, X*Y>0.5], [lambda x,y: -x*y, lambda x,y: x*y, numpy.nan], (X,Y)) array([[ 4., 2., nan, 2., 4.], [ 2., 1., nan, 1., 2.], [ nan, nan, nan, nan, nan], [ 2., 1., nan, 1., 2.], [ 4., 2., nan, 2., 4.]]) My question is: are there any interest in the community for such a function? Could some or all of this functionality replace the current numpy.piecewise? (This function could replace the function _lazywhere (in scipy.stats._distn_infrastructure) which is heavily used in scipy.stats) Per A. Brodtkorb The code looks like this: def valarray(shape, value=np.nan, typecode=None): """Return an array of all value. """ out = ones(shape, dtype=bool) * value if typecode is not None: out = out.astype(typecode) if not isinstance(out, np.ndarray): out = asarray(out) return out def genpiecewise(condlist, funclist, xi=None, fillvalue=0, args=(), **kw): """ Evaluate a piecewise-defined function. Given a set of conditions and corresponding functions, evaluate each function on the input data wherever its condition is true. Parameters ---------- condlist : list of bool arrays Each boolean array corresponds to a function in `funclist`. Wherever `condlist[i]` is True, `funclist[i](x0,x1,...,xn)` is used as the output value. Each boolean array in `condlist` selects a piece of `xi`, and should therefore be of the same shape as `xi`. The length of `condlist` must correspond to that of `funclist`. If one extra function is given, i.e. if ``len(funclist) - len(condlist) == 1``, then that extra function is the default value, used wherever all conditions are false. funclist : list of callables, f(*(xi + args), **kw), or scalars Each function is evaluated over `x` wherever its corresponding condition is True. It should take an array as input and give an array or a scalar value as output. If, instead of a callable, a scalar is provided then a constant function (``lambda x: scalar``) is assumed. xi : tuple input arguments to the functions in funclist, i.e., (x0, x1,...., xn) fillvalue : scalar fillvalue for out of range values. Default 0. args : tuple, optional Any further arguments given here passed to the functions upon execution, i.e., if called ``piecewise(..., ..., args=(1, 'a'))``, then each function is called as ``f(x0, x1,..., xn, 1, 'a')``. kw : dict, optional Keyword arguments used in calling `piecewise` are passed to the functions upon execution, i.e., if called ``piecewise(..., ..., lambda=1)``, then each function is called as ``f(x0, x1,..., xn, lambda=1)``. Returns ------- out : ndarray The output is the same shape and type as x and is found by calling the functions in `funclist` on the appropriate portions of `x`, as defined by the boolean arrays in `condlist`. Portions not covered by any condition have undefined values. See Also -------- choose, select, where Notes ----- This is similar to choose or select, except that functions are evaluated on elements of `xi` that satisfy the corresponding condition from `condlist`. The result is:: |-- |funclist[0](x0[condlist[0]],x1[condlist[0]],...,xn[condlist[0]]) out = |funclist[1](x0[condlist[1]],x1[condlist[1]],...,xn[condlist[1]]) |... |funclist[n2](x0[condlist[n2]],x1[condlist[n2]],...,xn[condlist[n2]]) |-- Examples -------- Define the sigma function, which is -1 for ``x < 0`` and +1 for ``x >= 0``. >>> x = np.linspace(-2.5, 2.5, 6) >>> genpiecewise([x < 0, x >= 0], [-1, 1]) array([-1., -1., -1., 1., 1., 1.]) Define the absolute value, which is ``-x`` for ``x <0`` and ``x`` for ``x >= 0``. >>> genpiecewise([x < 0, x >= 0], [lambda x: -x, lambda x: x], (x,)) array([ 2.5, 1.5, 0.5, 0.5, 1.5, 2.5]) """ def otherwise_condition(condlist): return ~np.logical_or.reduce(condlist, axis=0) def check_shapes(condlist, funclist): nc, nf = len(condlist), len(funclist) if nc not in [nf-1, nf]: raise ValueError("function list and condition list" + " must be the same length") check_shapes(condlist, funclist) if xi is not None and not isinstance(xi, tuple): xi = (xi,) condlist = np.broadcast_arrays(*condlist) if len(condlist) == len(funclist)-1: condlist.append(otherwise_condition(condlist)) arrays = dtype = None if xi is not None: arrays = np.broadcast_arrays(*xi) dtype = np.result_type(*arrays) else: # funclist is a list of scalars only dtype = np.result_type(*funclist) out = valarray(condlist[0].shape, fillvalue, dtype) for cond, func in zip(condlist, funclist): if isinstance(func, collections.Callable): temp = tuple(np.extract(cond, arr) for arr in arrays) + args np.place(out, cond, func(*temp, **kw)) else: # func is a scalar value np.place(out, cond, func) return out -------------- next part -------------- An HTML attachment was scrubbed... URL: From lahiruts at gmail.com Fri Oct 10 12:51:19 2014 From: lahiruts at gmail.com (Lahiru Samarakoon) Date: Sat, 11 Oct 2014 00:51:19 +0800 Subject: [Numpy-discussion] Instaling numpy without root access Message-ID: Dear all, I am trying to install numpy without root access. So I am building from the source. I have installed atlas which also has lapack with it. I changed the site.cfg file as given below [DEFAULT] library_dirs = /home/svu/a0095654/ATLAS/build/lib include_dirs = /home/svu/a0095654/ATLAS/build/include However, I am getting a segmentation fault when importing numpy. Please advise. I also put the build log file at the end of the email if necessary. Thank you, Best Regards, Lahiru Log starts below. * python2.7 setup.py build --fcompiler=gnu95* Running from numpy source directory. /home/svu/a0095654/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'test_suite' warnings.warn(msg) non-existing path in 'numpy/f2py': 'docs' non-existing path in 'numpy/f2py': 'f2py.1' F2PY Version 2 blas_opt_info: blas_mkl_info: libraries mkl,vml,guide not found in ['/home/svu/a0095654/ATLAS/build/lib'] NOT AVAILABLE openblas_info: libraries openblas not found in ['/home/svu/a0095654/ATLAS/build/lib'] NOT AVAILABLE atlas_blas_threads_info: Setting PTATLAS=ATLAS Setting PTATLAS=ATLAS customize Gnu95FCompiler Found executable /usr/bin/gfortran customize Gnu95FCompiler customize Gnu95FCompiler using config compiling '_configtest.c': /* This file is generated from numpy/distutils/system_info.py */ void ATL_buildinfo(void); int main(void) { ATL_buildinfo(); return 0; } C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC compile options: '-c' gcc: _configtest.c gcc -pthread _configtest.o -L/home/svu/a0095654/ATLAS/build/install/lib -lptf77blas -lptcblas -latlas -o _configtest _configtest.o: In function `main': /hpctmp/a0095654/Software/numpy-1.9.0/_configtest.c:5: undefined reference to `ATL_buildinfo' collect2: ld returned 1 exit status _configtest.o: In function `main': /hpctmp/a0095654/Software/numpy-1.9.0/_configtest.c:5: undefined reference to `ATL_buildinfo' collect2: ld returned 1 exit status failure. removing: _configtest.c _configtest.o Status: 255 Output: compiling '_configtest.c': /* This file is generated from numpy/distutils/system_info.py */ void ATL_buildinfo(void); int main(void) { ATL_buildinfo(); return 0; } C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC compile options: '-c' gcc: _configtest.c gcc -pthread _configtest.o -L/home/svu/a0095654/ATLAS/build/install/lib -lptf77blas -lptcblas -latlas -o _configtest _configtest.o: In function `main': /hpctmp/a0095654/Software/numpy-1.9.0/_configtest.c:5: undefined reference to `ATL_buildinfo' collect2: ld returned 1 exit status Setting PTATLAS=ATLAS FOUND: libraries = ['ptf77blas', 'ptcblas', 'atlas'] library_dirs = ['/home/svu/a0095654/ATLAS/build/install/lib'] language = c define_macros = [('ATLAS_INFO', '"\\"None\\""')] FOUND: libraries = ['ptf77blas', 'ptcblas', 'atlas'] library_dirs = ['/home/svu/a0095654/ATLAS/build/install/lib'] language = c define_macros = [('ATLAS_INFO', '"\\"None\\""')] non-existing path in 'numpy/lib': 'benchmarks' lapack_opt_info: openblas_lapack_info: libraries openblas not found in ['/home/svu/a0095654/ATLAS/build/lib'] NOT AVAILABLE lapack_mkl_info: mkl_info: libraries mkl,vml,guide not found in ['/home/svu/a0095654/ATLAS/build/lib'] NOT AVAILABLE NOT AVAILABLE atlas_threads_info: Setting PTATLAS=ATLAS libraries lapack_atlas not found in /home/svu/a0095654/ATLAS/build/install/lib numpy.distutils.system_info.atlas_threads_info Setting PTATLAS=ATLAS /hpctmp/a0095654/Software/numpy-1.9.0/numpy/distutils/system_info.py:1095: UserWarning: ********************************************************************* Lapack library (from ATLAS) is probably incomplete: size of /home/svu/a0095654/ATLAS/build/install/lib/liblapack.so is 5.0673828125k (expected >4000k) Follow the instructions in the KNOWN PROBLEMS section of the file numpy/INSTALL.txt. ********************************************************************* warnings.warn(message) Setting PTATLAS=ATLAS FOUND: libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas'] library_dirs = ['/home/svu/a0095654/ATLAS/build/install/lib'] language = c define_macros = [('ATLAS_INFO', '"\\"None\\""')] FOUND: libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas'] library_dirs = ['/home/svu/a0095654/ATLAS/build/install/lib'] language = c define_macros = [('ATLAS_INFO', '"\\"None\\""')] /home/svu/a0095654/lib/python2.7/distutils/dist.py:267: UserWarning: Unknown distribution option: 'define_macros' warnings.warn(msg) running build running config_cc unifing config_cc, config, build_clib, build_ext, build commands --compiler options running config_fc unifing config_fc, config, build_clib, build_ext, build commands --fcompiler options running build_src build_src building py_modules sources building library "npymath" sources customize Gnu95FCompiler customize Gnu95FCompiler using config C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/home/svu/a0095654/include/python2.7 -c' gcc: _configtest.c gcc -pthread _configtest.o -o _configtest success! removing: _configtest.c _configtest.o _configtest C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/home/svu/a0095654/include/python2.7 -c' gcc: _configtest.c _configtest.c:1: warning: conflicting types for built-in function ?exp? gcc -pthread _configtest.o -o _configtest _configtest.o: In function `main': /hpctmp/a0095654/Software/numpy-1.9.0/_configtest.c:6: undefined reference to `exp' collect2: ld returned 1 exit status _configtest.o: In function `main': /hpctmp/a0095654/Software/numpy-1.9.0/_configtest.c:6: undefined reference to `exp' collect2: ld returned 1 exit status failure. removing: _configtest.c _configtest.o C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/home/svu/a0095654/include/python2.7 -c' gcc: _configtest.c _configtest.c:1: warning: conflicting types for built-in function ?exp? gcc -pthread _configtest.o -lm -o _configtest success! removing: _configtest.c _configtest.o _configtest building library "npysort" sources adding 'build/src.linux-x86_64-2.7/numpy/core/src/private' to include_dirs. None - nothing done with h_files = ['build/src.linux-x86_64-2.7/numpy/core/src/private/npy_partition.h', 'build/src.linux-x86_64-2.7/numpy/core/src/private/npy_binsearch.h'] building extension "numpy.core._dummy" sources adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h' to sources. adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h' to sources. executing numpy/core/code_generators/generate_numpy_api.py adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__multiarray_api.h' to sources. numpy.core - nothing done with h_files = ['build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h', 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h', 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__multiarray_api.h'] building extension "numpy.core.multiarray" sources adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h' to sources. adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h' to sources. executing numpy/core/code_generators/generate_numpy_api.py adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__multiarray_api.h' to sources. numpy.core - nothing done with h_files = ['build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h', 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h', 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__multiarray_api.h'] building extension "numpy.core.umath" sources adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h' to sources. adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h' to sources. executing numpy/core/code_generators/generate_ufunc_api.py adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__ufunc_api.h' to sources. adding 'build/src.linux-x86_64-2.7/numpy/core/src/umath' to include_dirs. numpy.core - nothing done with h_files = ['build/src.linux-x86_64-2.7/numpy/core/src/umath/funcs.inc', 'build/src.linux-x86_64-2.7/numpy/core/src/umath/simd.inc', 'build/src.linux-x86_64-2.7/numpy/core/src/umath/loops.h', 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h', 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h', 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__ufunc_api.h'] building extension "numpy.core.scalarmath" sources adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h' to sources. adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h' to sources. executing numpy/core/code_generators/generate_numpy_api.py adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__multiarray_api.h' to sources. executing numpy/core/code_generators/generate_ufunc_api.py adding 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__ufunc_api.h' to sources. adding 'build/src.linux-x86_64-2.7/numpy/core/src/private' to include_dirs. numpy.core - nothing done with h_files = ['build/src.linux-x86_64-2.7/numpy/core/src/private/scalarmathmodule.h', 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/config.h', 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/_numpyconfig.h', 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__multiarray_api.h', 'build/src.linux-x86_64-2.7/numpy/core/include/numpy/__ufunc_api.h'] building extension "numpy.core._dotblas" sources adding 'numpy/core/blasdot/_dotblas.c' to sources. building extension "numpy.core.umath_tests" sources building extension "numpy.core.test_rational" sources building extension "numpy.core.struct_ufunc_test" sources building extension "numpy.core.multiarray_tests" sources building extension "numpy.core.operand_flag_tests" sources building extension "numpy.lib._compiled_base" sources building extension "numpy.fft.fftpack_lite" sources building extension "numpy.linalg.lapack_lite" sources adding 'numpy/linalg/lapack_litemodule.c' to sources. adding 'numpy/linalg/lapack_lite/python_xerbla.c' to sources. building extension "numpy.linalg._umath_linalg" sources adding 'numpy/linalg/umath_linalg.c.src' to sources. adding 'numpy/linalg/lapack_lite/python_xerbla.c' to sources. building extension "numpy.random.mtrand" sources C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC compile options: '-Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/home/svu/a0095654/include/python2.7 -c' gcc: _configtest.c gcc -pthread _configtest.o -o _configtest _configtest failure. removing: _configtest.c _configtest.o _configtest building data_files sources build_src: building npy-pkg config files running build_py copying numpy/version.py -> build/lib.linux-x86_64-2.7/numpy copying build/src.linux-x86_64-2.7/numpy/__config__.py -> build/lib.linux-x86_64-2.7/numpy copying build/src.linux-x86_64-2.7/numpy/distutils/__config__.py -> build/lib.linux-x86_64-2.7/numpy/distutils running build_clib customize UnixCCompiler customize UnixCCompiler using build_clib building 'npymath' library compiling C sources C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC creating build/temp.linux-x86_64-2.7 creating build/temp.linux-x86_64-2.7/build creating build/temp.linux-x86_64-2.7/build/src.linux-x86_64-2.7 creating build/temp.linux-x86_64-2.7/build/src.linux-x86_64-2.7/numpy creating build/temp.linux-x86_64-2.7/build/src.linux-x86_64-2.7/numpy/core creating build/temp.linux-x86_64-2.7/build/src.linux-x86_64-2.7/numpy/core/src creating build/temp.linux-x86_64-2.7/build/src.linux-x86_64-2.7/numpy/core/src/npymath creating build/temp.linux-x86_64-2.7/numpy creating build/temp.linux-x86_64-2.7/numpy/core creating build/temp.linux-x86_64-2.7/numpy/core/src creating build/temp.linux-x86_64-2.7/numpy/core/src/npymath compile options: '-Inumpy/core/include -Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/home/svu/a0095654/include/python2.7 -Ibuild/src.linux-x86_64-2.7/numpy/core/src/private -Ibuild/src.linux-x86_64-2.7/numpy/core/src/private -Ibuild/src.linux-x86_64-2.7/numpy/core/src/private -Ibuild/src.linux-x86_64-2.7/numpy/core/src/private -c' gcc: numpy/core/src/npymath/halffloat.c gcc: build/src.linux-x86_64-2.7/numpy/core/src/npymath/ieee754.c gcc: build/src.linux-x86_64-2.7/numpy/core/src/npymath/npy_math.c gcc: build/src.linux-x86_64-2.7/numpy/core/src/npymath/npy_math_complex.c ar: adding 4 object files to build/temp.linux-x86_64-2.7/libnpymath.a building 'npysort' library compiling C sources C compiler: gcc -pthread -fno-strict-aliasing -g -O2 -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC creating build/temp.linux-x86_64-2.7/build/src.linux-x86_64-2.7/numpy/core/src/npysort compile options: '-Ibuild/src.linux-x86_64-2.7/numpy/core/src/private -Inumpy/core/include -Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/home/svu/a0095654/include/python2.7 -Ibuild/src.linux-x86_64-2.7/numpy/core/src/private -Ibuild/src.linux-x86_64-2.7/numpy/core/src/private -Ibuild/src.linux-x86_64-2.7/numpy/core/src/private -Ibuild/src.linux-x86_64-2.7/numpy/core/src/private -c' gcc: build/src.linux-x86_64-2.7/numpy/core/src/npysort/selection.c gcc: build/src.linux-x86_64-2.7/numpy/core/src/npysort/quicksort.c gcc: build/src.linux-x86_64-2.7/numpy/core/src/npysort/mergesort.c gcc: build/src.linux-x86_64-2.7/numpy/core/src/npysort/binsearch.c gcc: build/src.linux-x86_64-2.7/numpy/core/src/npysort/heapsort.c ar: adding 5 object files to build/temp.linux-x86_64-2.7/libnpysort.a running build_ext customize UnixCCompiler customize UnixCCompiler using build_ext running build_scripts adding 'build/scripts.linux-x86_64-2.7/f2py2.7' to scripts -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Fri Oct 10 12:59:12 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 10 Oct 2014 18:59:12 +0200 Subject: [Numpy-discussion] Instaling numpy without root access In-Reply-To: References: Message-ID: <54381060.1040109@googlemail.com> On 10.10.2014 18:51, Lahiru Samarakoon wrote: > Dear all, > > I am trying to install numpy without root access. So I am building from > the source. I have installed atlas which also has lapack with it. I > changed the site.cfg file as given below > > [DEFAULT] > library_dirs = /home/svu/a0095654/ATLAS/build/lib > include_dirs = /home/svu/a0095654/ATLAS/build/include > > > However, I am getting a segmentation fault when importing numpy. > > Please advise. I also put the build log file at the end of the email if > necessary. Which platform are you working on? Which compiler version? We just solved a segfault on import on red hat 5 gcc 4.1.2. Very likely caused by a compiler bug. See https://github.com/numpy/numpy/issues/5163 The build log is complaining about your atlas being to small, possibly the installation is broken? From lahiruts at gmail.com Fri Oct 10 13:26:55 2014 From: lahiruts at gmail.com (Lahiru Samarakoon) Date: Sat, 11 Oct 2014 01:26:55 +0800 Subject: [Numpy-discussion] Instaling numpy without root access In-Reply-To: <54381060.1040109@googlemail.com> References: <54381060.1040109@googlemail.com> Message-ID: Red Hat Enterprise Linux release 5.8 gcc (GCC) 4.1.2 I am also trying to install numpy 1.9. On Sat, Oct 11, 2014 at 12:59 AM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > On 10.10.2014 18:51, Lahiru Samarakoon wrote: > > Dear all, > > > > I am trying to install numpy without root access. So I am building from > > the source. I have installed atlas which also has lapack with it. I > > changed the site.cfg file as given below > > > > [DEFAULT] > > library_dirs = /home/svu/a0095654/ATLAS/build/lib > > include_dirs = /home/svu/a0095654/ATLAS/build/include > > > > > > However, I am getting a segmentation fault when importing numpy. > > > > Please advise. I also put the build log file at the end of the email if > > necessary. > > > Which platform are you working on? Which compiler version? > We just solved a segfault on import on red hat 5 gcc 4.1.2. Very likely > caused by a compiler bug. See https://github.com/numpy/numpy/issues/5163 > > The build log is complaining about your atlas being to small, possibly > the installation is broken? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Fri Oct 10 13:30:13 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 10 Oct 2014 19:30:13 +0200 Subject: [Numpy-discussion] Instaling numpy without root access In-Reply-To: References: <54381060.1040109@googlemail.com> Message-ID: <543817A5.4010900@googlemail.com> On 10.10.2014 19:26, Lahiru Samarakoon wrote: > Red Hat Enterprise Linux release 5.8 > gcc (GCC) 4.1.2 > > I am also trying to install numpy 1.9. that is the broken platform, please try the master branch or the maintenance/1.9.x branch, those should work now. Are there volunteers to report this to redhat? > > On Sat, Oct 11, 2014 at 12:59 AM, Julian Taylor > > > wrote: > > On 10.10.2014 18:51, Lahiru Samarakoon wrote: > > Dear all, > > > > I am trying to install numpy without root access. So I am building from > > the source. I have installed atlas which also has lapack with it. I > > changed the site.cfg file as given below > > > > [DEFAULT] > > library_dirs = /home/svu/a0095654/ATLAS/build/lib > > include_dirs = /home/svu/a0095654/ATLAS/build/include > > > > > > However, I am getting a segmentation fault when importing numpy. > > > > Please advise. I also put the build log file at the end of the email if > > necessary. > > > Which platform are you working on? Which compiler version? > We just solved a segfault on import on red hat 5 gcc 4.1.2. Very likely > caused by a compiler bug. See https://github.com/numpy/numpy/issues/5163 > > The build log is complaining about your atlas being to small, possibly > the installation is broken? > > From jtaylor.debian at googlemail.com Fri Oct 10 14:11:22 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 10 Oct 2014 20:11:22 +0200 Subject: [Numpy-discussion] parallel compilation with numpy.distutils in numpy 1.10 Message-ID: <5438214A.1000909@googlemail.com> hi, To speed up compilation of extensions I have made a PR to compile extension files in parallel: https://github.com/numpy/numpy/pull/5161 It adds the --jobs/-j flags to the build command of setup.py which defines the number of parallel compile processes. E.g. python setup.py build --jobs 4 install --prefix /tmp/local Additionally it adds the environment variable NPY_NUM_BUILD_JOBS which is used if no commandline is set. This helps e.g. with pip installations, travis builds (which give you 1.5 cpus) or to put in your .bashrc. The parallelization is only with the files of an extension so it is not super efficient but an uncached numpy build goes down from 1m40s to 1m00s with 3 cores on my machine which is quite decent. Building scipy from scratch decreased from 10minutes to 6m30s on my machine. Unfortunately projects using cython will not profit as cython tends to build an extension from one single file. (You may want to look into gccs internal parallelization for that, -flto=jobs)- Does some see issues with the interface I have currently set? Please speak up soon. There is still one problem in regards to parallelizing fortran 90. The ccompiler.py contains following comment: # build any sources in same order as they were originally specified # especially important for fortran .f90 files using modules This indicates the f90 builds cannot be trivially parallelized. I do not know much fortran, can someone explain to me when ordering of single file compiles is an issue in f90? Cheers, Julian Taylor From ben.root at ou.edu Fri Oct 10 14:23:48 2014 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 10 Oct 2014 14:23:48 -0400 Subject: [Numpy-discussion] use ufunc for arbitrary positional arguments? Message-ID: I have a need to "and" together an arbitrary number of boolean arrays. np.logical_and() expects only two positional arguments. There has got to be some sort of easy way to just and these together using the ufunc mechanism, right? Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Fri Oct 10 14:27:52 2014 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 10 Oct 2014 11:27:52 -0700 Subject: [Numpy-discussion] use ufunc for arbitrary positional arguments? In-Reply-To: References: Message-ID: On Fri, Oct 10, 2014 at 11:23 AM, Benjamin Root wrote: > I have a need to "and" together an arbitrary number of boolean arrays. > np.logical_and() expects only two positional arguments. There has got to be > some sort of easy way to just and these together using the ufunc mechanism, > right? > Do you really need a ufunc? The obvious way to do this (at least to me) would be use reduce (if you're especially concerned about memory) or just np.all. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Fri Oct 10 14:30:48 2014 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 10 Oct 2014 14:30:48 -0400 Subject: [Numpy-discussion] use ufunc for arbitrary positional arguments? In-Reply-To: References: Message-ID: Oy! I got to be having a brain fart today. np.all on the list of boolean arrays applied on the first(?) axis is much clearer than any ufunc or reduce call. And to answer the next question... use np.any for logical_or()... Thanks! Ben Root On Fri, Oct 10, 2014 at 2:27 PM, Stephan Hoyer wrote: > On Fri, Oct 10, 2014 at 11:23 AM, Benjamin Root wrote: > >> I have a need to "and" together an arbitrary number of boolean arrays. >> np.logical_and() expects only two positional arguments. There has got to be >> some sort of easy way to just and these together using the ufunc mechanism, >> right? >> > > Do you really need a ufunc? The obvious way to do this (at least to me) > would be use reduce (if you're especially concerned about memory) or just > np.all. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Fri Oct 10 15:09:47 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Fri, 10 Oct 2014 19:09:47 +0000 (UTC) Subject: [Numpy-discussion] parallel compilation with numpy.distutils in numpy 1.10 References: <5438214A.1000909@googlemail.com> Message-ID: <1427334356434660430.019528sturla.molden-gmail.com@news.gmane.org> Julian Taylor wrote: > There is still one problem in regards to parallelizing fortran 90. The > ccompiler.py contains following comment: > # build any sources in same order as they were originally specified > # especially important for fortran .f90 files using modules > > This indicates the f90 builds cannot be trivially parallelized. I do not > know much fortran, can someone explain to me when ordering of single > file compiles is an issue in f90? Sure :) When a Fortran module is compiled, the compiler emits an object file (.o) and a module file (.mod). The module file plays the role of a header file in C. So when another Fortran file imports the module with a use statement, the compiler looks for the module file. Because the .mod file is generated by the compiler, unlike the .h file in C, the ordering of compilation is more critical in Fortran 90 than in C. If B.f90 has a "use A" statement, then A.f90 must be compiled before B.f90. CMake has an intelligent system for working out the correct order of compilation of Fortran 90 files. Sturla From sturla.molden at gmail.com Fri Oct 10 15:24:12 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Fri, 10 Oct 2014 19:24:12 +0000 (UTC) Subject: [Numpy-discussion] parallel compilation with numpy.distutils in numpy 1.10 References: <5438214A.1000909@googlemail.com> <1427334356434660430.019528sturla.molden-gmail.com@news.gmane.org> Message-ID: <1808308021434661589.788892sturla.molden-gmail.com@news.gmane.org> Sturla Molden wrote: > When a Fortran module is compiled, the compiler emits an object file (.o) > and a module file (.mod). The module file plays the role of a header file > in C. So when another Fortran file imports the module with a use statement, > the compiler looks for the module file. Because the .mod file is generated > by the compiler, unlike the .h file in C, the ordering of compilation is > more critical in Fortran 90 than in C. If B.f90 has a "use A" statement, > then A.f90 must be compiled before B.f90. CMake has an intelligent system > for working out the correct order of compilation of Fortran 90 files. So the Fortran 90 files creates a directed asyclic graph. To compute in parallel one might use a set of coroutines, one for each f90 file, and then yield from the downstream files in the graph. But on the other hand, it might not be worth the effort. ;-) Sturla From sturla.molden at gmail.com Fri Oct 10 15:29:12 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Fri, 10 Oct 2014 19:29:12 +0000 (UTC) Subject: [Numpy-discussion] parallel compilation with numpy.distutils in numpy 1.10 References: <5438214A.1000909@googlemail.com> <1427334356434660430.019528sturla.molden-gmail.com@news.gmane.org> <1808308021434661589.788892sturla.molden-gmail.com@news.gmane.org> Message-ID: <1308348177434662110.607333sturla.molden-gmail.com@news.gmane.org> Sturla Molden wrote: > So the Fortran 90 files creates a directed asyclic graph. To compute in > parallel Eh, *compile* in parallel. > Sturla From sebastian at sipsolutions.net Fri Oct 10 15:36:50 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 10 Oct 2014 21:36:50 +0200 Subject: [Numpy-discussion] use ufunc for arbitrary positional arguments? In-Reply-To: References: Message-ID: <1412969810.27542.1.camel@sebastian-t440> On Fr, 2014-10-10 at 14:30 -0400, Benjamin Root wrote: > Oy! I got to be having a brain fart today. np.all on the list of > boolean arrays applied on the first(?) axis is much clearer than any > ufunc or reduce call. And to answer the next question... use np.any > for logical_or()... > Of course np.all is pretty much identical to np.logical_and.reduce(), and that is defined for all ufuncs. Of course your list of arrays will be converted to one large array first, so the python reduce may actually be faster in many cases. - Sebastian > > Thanks! > > Ben Root > > > On Fri, Oct 10, 2014 at 2:27 PM, Stephan Hoyer > wrote: > On Fri, Oct 10, 2014 at 11:23 AM, Benjamin Root > wrote: > I have a need to "and" together an arbitrary number of > boolean arrays. np.logical_and() expects only two > positional arguments. There has got to be some sort of > easy way to just and these together using the ufunc > mechanism, right? > > > > Do you really need a ufunc? The obvious way to do this (at > least to me) would be use reduce (if you're especially > concerned about memory) or just np.all. > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From jtaylor.debian at googlemail.com Fri Oct 10 15:55:29 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 10 Oct 2014 21:55:29 +0200 Subject: [Numpy-discussion] parallel compilation with numpy.distutils in numpy 1.10 In-Reply-To: <1427334356434660430.019528sturla.molden-gmail.com@news.gmane.org> References: <5438214A.1000909@googlemail.com> <1427334356434660430.019528sturla.molden-gmail.com@news.gmane.org> Message-ID: <543839B1.4070702@googlemail.com> On 10.10.2014 21:09, Sturla Molden wrote: > Julian Taylor wrote: > >> There is still one problem in regards to parallelizing fortran 90. The >> ccompiler.py contains following comment: >> # build any sources in same order as they were originally specified >> # especially important for fortran .f90 files using modules >> >> This indicates the f90 builds cannot be trivially parallelized. I do not >> know much fortran, can someone explain to me when ordering of single >> file compiles is an issue in f90? > > > Sure :) > > When a Fortran module is compiled, the compiler emits an object file (.o) > and a module file (.mod). The module file plays the role of a header file > in C. So when another Fortran file imports the module with a use statement, > the compiler looks for the module file. Because the .mod file is generated > by the compiler, unlike the .h file in C, the ordering of compilation is > more critical in Fortran 90 than in C. If B.f90 has a "use A" statement, > then A.f90 must be compiled before B.f90. CMake has an intelligent system > for working out the correct order of compilation of Fortran 90 files. > > thanks for the explanation. Modules are only available with f90 right? f77 files do not have these generated interdependencies? being able to handle f77 would already be quite good, as it should at least cover current scipy. One can look at a smarter scheme for f90 later. From alan.isaac at gmail.com Fri Oct 10 16:24:14 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Fri, 10 Oct 2014 16:24:14 -0400 Subject: [Numpy-discussion] fsin on Intel Message-ID: <5438406E.9070904@gmail.com> This is not NumPy specific but may still interest list members: http://randomascii.wordpress.com/2014/10/09/intel-underestimates-error-bounds-by-1-3-quintillion/ Alan Isaac From jtaylor.debian at googlemail.com Fri Oct 10 16:31:04 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 10 Oct 2014 22:31:04 +0200 Subject: [Numpy-discussion] Any interest in a generalized piecewise function? In-Reply-To: <8114F0AADAECD745AF1FC2047A5DC7ED1E54E27E@HBU-POST2.ffi.no> References: <8114F0AADAECD745AF1FC2047A5DC7ED1E54E27E@HBU-POST2.ffi.no> Message-ID: <54384208.1050005@googlemail.com> On 10.10.2014 11:34, Per.Brodtkorb at ffi.no wrote: > I have worked on a generalized piecewise function (genpiecewise) that > are simpler and more general than the current numpy.piecewise > implementation. The new generalized piecewise function allows functions > of the type f(x0, x1,.. , xn) i.e. to have arbitrary number of input > arguments that are evaluated conditionally. > > The generalized piecewise function passes all the tests for > numpy.piecewise function except the undocumented features of > numpy.piecewise which allows condlist to be a single bool list/array or > a single int array. > Hi, One would think you could already pass two arguments to a function by using a 2d array but I couldn't get that to work with some short testing. So this looks like a useful improvement over the current piecewise to me. Do you want open a pull request on github to discuss the details? It would be good if it can replace the current piecewise as having two functions which do very similar things is not so nice. Cheers, Julian From matthew.brett at gmail.com Fri Oct 10 17:07:02 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 10 Oct 2014 17:07:02 -0400 Subject: [Numpy-discussion] Copyright status of NumPy binaries on Windows/OS X In-Reply-To: <54345786.1090506@googlemail.com> References: <54345786.1090506@googlemail.com> Message-ID: Hi, On Tue, Oct 7, 2014 at 5:13 PM, Julian Taylor wrote: > On 06.10.2014 18:54, Andrew Collette wrote: >> Hi all, >> >> I am working with the HDF Group on a new open-source viewer program >> for HDF5 files, powered by NumPy, h5py, and wxPython. On Windows, >> since people don't typically have Python installed, we are looking to >> distribute the application using PyInstaller, which embeds >> dependencies like NumPy. Likewise for OS X (using Py2App). >> >> We would like to make sure we don't accidentally include >> non-open-source components... I recall there was some discussion here >> about using the Intel math libraries for binary releases on various >> platforms. Do the releases on SourceForge or PyPI use any proprietary >> code? We'd like to avoid building NumPy ourselves if we can avoid it. >> >> Apologies if this is explained somewhere, but I couldn't find it. >> >> Thanks! >> Andrew Collette > > > Hi, > the numpy win32 binaries on sourceforge do not contain any proprietary > code. They are build with mingw 3.4.5 and are using a f2c'd version of > netlib blas and lapack which so far I know is public domain. > I think the macos wheels on pypi are built using ATLAS but they do also > contain libquadmath which is LGPL licensed. Its probably pulled in by > fortran (could maybe be removed by a rebuild as neither blas nor numpy > use it) Yes, the OSX builds use gcc 4.8.2 and ATLAS [1], and bundle these libraries: tar tvf numpy-1.9.0-cp27-none-macosx_10_6_intel.macosx_10_9_intel.macosx_10_9_x86_64.whl | grep .dylib numpy/.dylibs/libatlas.dylib numpy/.dylibs/libcblas.dylib numpy/.dylibs/libf77blas.dylib numpy/.dylibs/libgcc_s.1.dylib numpy/.dylibs/libgfortran.3.dylib numpy/.dylibs/liblapack.dylib numpy/.dylibs/libptcblas.dylib numpy/.dylibs/libptf77blas.dylib numpy/.dylibs/libquadmath.0.dylib The libquadmath library is indeed LGPL. libgcc_s.1 and libgfortran These libraries are covered by GPLv3 with the "runtime library exception [2, 3] for code built with gcc and linked against the relevant runtime libraries. Here's the relevant text from the exception [2] """ You have permission to propagate a work of Target Code formed by combining the Runtime Library with Independent Modules, even if such propagation would otherwise violate the terms of GPLv3, provided that all Target Code was generated by Eligible Compilation Processes. You may then convey such a combination under terms of your choice, consistent with the licensing of the Independent Modules. """ My understanding of this license is that we are free to distribute our own linked code under any "terms of our choice" - in our case the BSD license. However, we are also distrubuting the libraries libgcc, libgomp, libgfortran, libquadmath, albeit buried in a hidden directory within the wheel. >From the FAQ entry "I use a proprietary compiler toolchain without any parts of GCC to compile my program, and link it with libstdc++." [3] I infer that we need to provide a link to the source code for these standalone runtime libraries. I hope a link on the pypi page and a README in the relevant directory will be enough but we should probably get some advice. Otherwise I believe we avoid that requirement by doing static linking. My reading of that FAQ entry is that anyone can link against the included gcc runtime libraries under the same runtime library exception, and so will not need to apply any GPL terms to their linked code. Julian - it would be good to remove libquadmath with a rebuild - any pointers on how to do this? Cheers, Matthew [1] https://github.com/MacPython/numpy-atlas-binaries/travis_install.sh [2[ http://www.gnu.org/licenses/gcc-exception-3.1.html [3] http://www.gnu.org/licenses/gcc-exception-faq.html From jtaylor.debian at googlemail.com Fri Oct 10 17:28:32 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 10 Oct 2014 23:28:32 +0200 Subject: [Numpy-discussion] Copyright status of NumPy binaries on Windows/OS X In-Reply-To: References: <54345786.1090506@googlemail.com> Message-ID: <54384F80.5050205@googlemail.com> On 10.10.2014 23:07, Matthew Brett wrote: > Hi, > > On Tue, Oct 7, 2014 at 5:13 PM, Julian Taylor > wrote: >> On 06.10.2014 18:54, Andrew Collette wrote: >>> Hi all, >>> >>> I am working with the HDF Group on a new open-source viewer program >>> for HDF5 files, powered by NumPy, h5py, and wxPython. On Windows, >>> since people don't typically have Python installed, we are looking to >>> distribute the application using PyInstaller, which embeds >>> dependencies like NumPy. Likewise for OS X (using Py2App). >>> >>> We would like to make sure we don't accidentally include >>> non-open-source components... I recall there was some discussion here >>> about using the Intel math libraries for binary releases on various >>> platforms. Do the releases on SourceForge or PyPI use any proprietary >>> code? We'd like to avoid building NumPy ourselves if we can avoid it. >>> >>> Apologies if this is explained somewhere, but I couldn't find it. >>> >>> Thanks! >>> Andrew Collette >> >> >> Hi, >> the numpy win32 binaries on sourceforge do not contain any proprietary >> code. They are build with mingw 3.4.5 and are using a f2c'd version of >> netlib blas and lapack which so far I know is public domain. >> I think the macos wheels on pypi are built using ATLAS but they do also >> contain libquadmath which is LGPL licensed. Its probably pulled in by >> fortran (could maybe be removed by a rebuild as neither blas nor numpy >> use it) > > > Julian - it would be good to remove libquadmath with a rebuild - any > pointers on how to do this? > You'd probably have to rebuild gfortran with quadmath disabled. It is a configuration flag of the gcc build. Then rebuild the binaries with that toolchain. But I'm not convinced it needs to be removed. Is LGPL really a problem? As long as you don't static link it (and not also ship the objects) it should be fine also for commercial programs. Note that there is no -static-quadmath flag, most likely because it would allow accidental license violation. From cmkleffner at gmail.com Fri Oct 10 17:42:21 2014 From: cmkleffner at gmail.com (Carl Kleffner) Date: Fri, 10 Oct 2014 23:42:21 +0200 Subject: [Numpy-discussion] Copyright status of NumPy binaries on Windows/OS X In-Reply-To: <54384F80.5050205@googlemail.com> References: <54345786.1090506@googlemail.com> <54384F80.5050205@googlemail.com> Message-ID: this applies to the mingw-w64 builds as well see also: https://gcc.gnu.org/ml/fortran/2014-10/msg00038.html From: "Joseph S. Myers" To: FX Cc: GCC Patches , fortran List Date: Mon, 6 Oct 2014 20:38:14 Subject: Re: [patch] Add -static-libquadmath option Since -static-libquadmath introduces LGPL requirements on redistributing the resulting binaries (that you provide source or relinkable object files to allow relinking with modified versions of libquadmath) that don't otherwise generally apply simply through using GCC to build a program even if you link in GCC's other libraries statically, it would seem a good idea for the documentation of this option to make that explicit. Joseph S. Myers ____ -carlkl 2014-10-10 23:28 GMT+02:00 Julian Taylor : > On 10.10.2014 23:07, Matthew Brett wrote: > > Hi, > > > > On Tue, Oct 7, 2014 at 5:13 PM, Julian Taylor > > wrote: > >> On 06.10.2014 18:54, Andrew Collette wrote: > >>> Hi all, > >>> > >>> I am working with the HDF Group on a new open-source viewer program > >>> for HDF5 files, powered by NumPy, h5py, and wxPython. On Windows, > >>> since people don't typically have Python installed, we are looking to > >>> distribute the application using PyInstaller, which embeds > >>> dependencies like NumPy. Likewise for OS X (using Py2App). > >>> > >>> We would like to make sure we don't accidentally include > >>> non-open-source components... I recall there was some discussion here > >>> about using the Intel math libraries for binary releases on various > >>> platforms. Do the releases on SourceForge or PyPI use any proprietary > >>> code? We'd like to avoid building NumPy ourselves if we can avoid it. > >>> > >>> Apologies if this is explained somewhere, but I couldn't find it. > >>> > >>> Thanks! > >>> Andrew Collette > >> > >> > >> Hi, > >> the numpy win32 binaries on sourceforge do not contain any proprietary > >> code. They are build with mingw 3.4.5 and are using a f2c'd version of > >> netlib blas and lapack which so far I know is public domain. > >> I think the macos wheels on pypi are built using ATLAS but they do also > >> contain libquadmath which is LGPL licensed. Its probably pulled in by > >> fortran (could maybe be removed by a rebuild as neither blas nor numpy > >> use it) > > > > > > Julian - it would be good to remove libquadmath with a rebuild - any > > pointers on how to do this? > > > > You'd probably have to rebuild gfortran with quadmath disabled. It is a > configuration flag of the gcc build. Then rebuild the binaries with that > toolchain. > > But I'm not convinced it needs to be removed. Is LGPL really a problem? > As long as you don't static link it (and not also ship the objects) it > should be fine also for commercial programs. > Note that there is no -static-quadmath flag, most likely because it > would allow accidental license violation. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Fri Oct 10 18:23:43 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Fri, 10 Oct 2014 22:23:43 +0000 (UTC) Subject: [Numpy-discussion] parallel compilation with numpy.distutils in numpy 1.10 References: <5438214A.1000909@googlemail.com> <1427334356434660430.019528sturla.molden-gmail.com@news.gmane.org> <543839B1.4070702@googlemail.com> Message-ID: <1073517200434672407.412067sturla.molden-gmail.com@news.gmane.org> Julian Taylor wrote: > thanks for the explanation. > Modules are only available with f90 right? f77 files do not have these > generated interdependencies? > being able to handle f77 would already be quite good, as it should at > least cover current scipy. > One can look at a smarter scheme for f90 later. Yes, modules are only for f90 and later. f77 do not have modules. Sturla From lahiruts at gmail.com Fri Oct 10 21:43:48 2014 From: lahiruts at gmail.com (Lahiru Samarakoon) Date: Sat, 11 Oct 2014 09:43:48 +0800 Subject: [Numpy-discussion] Instaling numpy without root access In-Reply-To: <543817A5.4010900@googlemail.com> References: <54381060.1040109@googlemail.com> <543817A5.4010900@googlemail.com> Message-ID: I switched to numpy-1.8.2. . Now getting following error. I am using LAPACK that comes with atlast installation. Can this be a problem? Traceback (most recent call last): File "", line 1, in File "/home/svu/a0095654/.local/lib/python2.7/site-packages/numpy/__init__.py", line 170, in from . import add_newdocs File "/home/svu/a0095654/.local/lib/python2.7/site-packages/numpy/add_newdocs.py", line 13, in from numpy.lib import add_newdoc File "/home/svu/a0095654/.local/lib/python2.7/site-packages/numpy/lib/__init__.py", line 18, in from .polynomial import * File "/home/svu/a0095654/.local/lib/python2.7/site-packages/numpy/lib/polynomial.py", line 19, in from numpy.linalg import eigvals, lstsq, inv File "/home/svu/a0095654/.local/lib/python2.7/site-packages/numpy/linalg/__init__.py", line 51, in from .linalg import * File "/home/svu/a0095654/.local/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 29, in from numpy.linalg import lapack_lite, _umath_linalg ImportError: /home/svu/a0095654/.local/lib/python2.7/site-packages/numpy/linalg/lapack_lite.so: undefined symbol: zgesdd_ On Sat, Oct 11, 2014 at 1:30 AM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > On 10.10.2014 19:26, Lahiru Samarakoon wrote: > > Red Hat Enterprise Linux release 5.8 > > gcc (GCC) 4.1.2 > > > > I am also trying to install numpy 1.9. > > that is the broken platform, please try the master branch or the > maintenance/1.9.x branch, those should work now. > > Are there volunteers to report this to redhat? > > > > > On Sat, Oct 11, 2014 at 12:59 AM, Julian Taylor > > > > > wrote: > > > > On 10.10.2014 18:51, Lahiru Samarakoon wrote: > > > Dear all, > > > > > > I am trying to install numpy without root access. So I am building > from > > > the source. I have installed atlas which also has lapack with > it. I > > > changed the site.cfg file as given below > > > > > > [DEFAULT] > > > library_dirs = /home/svu/a0095654/ATLAS/build/lib > > > include_dirs = /home/svu/a0095654/ATLAS/build/include > > > > > > > > > However, I am getting a segmentation fault when importing numpy. > > > > > > Please advise. I also put the build log file at the end of the > email if > > > necessary. > > > > > > Which platform are you working on? Which compiler version? > > We just solved a segfault on import on red hat 5 gcc 4.1.2. Very > likely > > caused by a compiler bug. See > https://github.com/numpy/numpy/issues/5163 > > > > The build log is complaining about your atlas being to small, > possibly > > the installation is broken? > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Sat Oct 11 18:51:56 2014 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sat, 11 Oct 2014 18:51:56 -0400 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle Message-ID: I created an issue on github for an enhancement to numpy.random.shuffle: https://github.com/numpy/numpy/issues/5173 I'd like to get some feedback on the idea. Currently, `shuffle` shuffles the first dimension of an array in-place. For example, shuffling a 2D array shuffles the rows: In [227]: a Out[227]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11]]) In [228]: np.random.shuffle(a) In [229]: a Out[229]: array([[ 0, 1, 2], [ 9, 10, 11], [ 3, 4, 5], [ 6, 7, 8]]) To add an axis keyword, we could (in effect) apply `shuffle` to `a.swapaxes(axis, 0)`. For a 2-D array, `axis=1` would shuffles the columns: In [232]: a = np.arange(15).reshape(3,5) In [233]: a Out[233]: array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]]) In [234]: axis = 1 In [235]: np.random.shuffle(a.swapaxes(axis, 0)) In [236]: a Out[236]: array([[ 3, 2, 4, 0, 1], [ 8, 7, 9, 5, 6], [13, 12, 14, 10, 11]]) So that's the first part--adding an `axis` keyword. The other part of the enhancement request is to add a shuffle behavior that shuffles the 1-d slices *independently*. That is, for a 2-d array, shuffling with `axis=0` would apply a different shuffle to each column. In the github issue, I defined a function called `disarrange` that implements this behavior: In [240]: a Out[240]: array([[ 0, 1, 2], [ 3, 4, 5], [ 6, 7, 8], [ 9, 10, 11], [12, 13, 14]]) In [241]: disarrange(a, axis=0) In [242]: a Out[242]: array([[ 6, 1, 2], [ 3, 13, 14], [ 9, 10, 5], [12, 7, 8], [ 0, 4, 11]]) Note that each column has been shuffled independently. This behavior is analogous to how `sort` handles the `axis` keyword. `sort` sorts the 1-d slices along the given axis independently. In the github issue, I suggested the following signature for `shuffle` (but I'm not too fond of the name `independent`): def shuffle(a, independent=False, axis=0) If `independent` is False, the current behavior of `shuffle` is used. If `independent` is True, each 1-d slice is shuffled independently (in the same way that `sort` sorts each 1-d slice). Like most functions that take an `axis` argument, `axis=None` means to shuffle the flattened array. With `independent=True`, it would act like `np.random.shuffle(a.flat)`, e.g. In [247]: a Out[247]: array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14]]) In [248]: np.random.shuffle(a.flat) In [249]: a Out[249]: array([[ 0, 14, 9, 1, 13], [ 2, 8, 5, 3, 4], [ 6, 10, 7, 12, 11]]) A small wart in this API is the meaning of shuffle(a, independent=False, axis=None) It could be argued that the correct behavior is to leave the array unchanged. (The current behavior can be interpreted as shuffling a 1-d sequence of monolithic blobs; the axis argument specifies which axis of the array corresponds to the sequence index. Then `axis=None` means the argument is a single monolithic blob, so there is nothing to shuffle.) Or an error could be raised. What do you think? Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From jzwinck at gmail.com Sat Oct 11 21:31:41 2014 From: jzwinck at gmail.com (John Zwinck) Date: Sun, 12 Oct 2014 09:31:41 +0800 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: References: Message-ID: On Sun, Oct 12, 2014 at 6:51 AM, Warren Weckesser wrote: > I created an issue on github for an enhancement > to numpy.random.shuffle: > https://github.com/numpy/numpy/issues/5173 I like this idea. I was a bit surprised there wasn't something like this already. > A small wart in this API is the meaning of > > shuffle(a, independent=False, axis=None) > > It could be argued that the correct behavior is to leave the > array unchanged. (The current behavior can be interpreted as > shuffling a 1-d sequence of monolithic blobs; the axis argument > specifies which axis of the array corresponds to the > sequence index. Then `axis=None` means the argument is > a single monolithic blob, so there is nothing to shuffle.) > Or an error could be raised. Let's think about it from the other direction: if a user wants to shuffle all the elements as if it were 1-d, as you point out they could do this: shuffle(a, axis=None, independent=True) But that's a lot of typing. Maybe we should just let this do the same thing: shuffle(a, axis=None) That seems to be in keeping with the other APIs taking axis as you mentioned. To me, "independent" has no relevance when the array is 1-d, it can simply be ignored. John Zwinck From hoogendoorn.eelco at gmail.com Sun Oct 12 03:51:51 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Sun, 12 Oct 2014 09:51:51 +0200 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: References: Message-ID: Thanks Warren, I think these are sensible additions. I would argue to treat the None-False condition as an error. Indeed I agree one might argue the correcr behavior is to 'shuffle' the singleton block of data, which does nothing; but its more likely to come up as an unintended error than as a natural outcome of parametrized behavior. On Sun, Oct 12, 2014 at 3:31 AM, John Zwinck wrote: > On Sun, Oct 12, 2014 at 6:51 AM, Warren Weckesser > wrote: > > I created an issue on github for an enhancement > > to numpy.random.shuffle: > > https://github.com/numpy/numpy/issues/5173 > > I like this idea. I was a bit surprised there wasn't something like > this already. > > > A small wart in this API is the meaning of > > > > shuffle(a, independent=False, axis=None) > > > > It could be argued that the correct behavior is to leave the > > array unchanged. (The current behavior can be interpreted as > > shuffling a 1-d sequence of monolithic blobs; the axis argument > > specifies which axis of the array corresponds to the > > sequence index. Then `axis=None` means the argument is > > a single monolithic blob, so there is nothing to shuffle.) > > Or an error could be raised. > > Let's think about it from the other direction: if a user wants to > shuffle all the elements as if it were 1-d, as you point out they > could do this: > > shuffle(a, axis=None, independent=True) > > But that's a lot of typing. Maybe we should just let this do the same > thing: > > shuffle(a, axis=None) > > That seems to be in keeping with the other APIs taking axis as you > mentioned. To me, "independent" has no relevance when the array is > 1-d, it can simply be ignored. > > John Zwinck > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jzwinck at gmail.com Sun Oct 12 04:43:00 2014 From: jzwinck at gmail.com (John Zwinck) Date: Sun, 12 Oct 2014 16:43:00 +0800 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: References: Message-ID: On Sun, Oct 12, 2014 at 3:51 PM, Eelco Hoogendoorn wrote: > I would argue to treat the None-False condition as an error. Indeed I agree > one might argue the correcr behavior is to 'shuffle' the singleton block of > data, which does nothing; but its more likely to come up as an unintended > error than as a natural outcome of parametrized behavior. I'm interested to know why you think axis=None should raise an error if independent=False when independent=False is the default. What I mean is, if someone uses this function and wants axis=None (which seems not totally unusual), why force them to always type in the boilerplate independent=True to make it work? John Zwinck From stefan at sun.ac.za Sun Oct 12 04:56:43 2014 From: stefan at sun.ac.za (Stefan van der Walt) Date: Sun, 12 Oct 2014 10:56:43 +0200 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: References: Message-ID: <87mw91k6no.fsf@sun.ac.za> Hi Warren On 2014-10-12 00:51:56, Warren Weckesser wrote: > A small wart in this API is the meaning of > > shuffle(a, independent=False, axis=None) > > It could be argued that the correct behavior is to leave the > array unchanged. I like the suggested changes. Since "independent" loses its meaning when axis is None, I would expect this to have the same effect as `shuffle(a, independent=True, axis=None)`. I think a shuffle function that doesn't shuffle will confuse a lot of people! St?fan From hoogendoorn.eelco at gmail.com Sun Oct 12 06:00:29 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Sun, 12 Oct 2014 12:00:29 +0200 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: <87mw91k6no.fsf@sun.ac.za> References: <87mw91k6no.fsf@sun.ac.za> Message-ID: yeah, a shuffle function that does not shuffle indeed seems like a major source of bugs to me. Indeed one could argue that setting axis=None should suffice to give a clear enough declaration of intent; though I wouldn't mind typing the extra bit to ensure consistent semantics. On Sun, Oct 12, 2014 at 10:56 AM, Stefan van der Walt wrote: > Hi Warren > > On 2014-10-12 00:51:56, Warren Weckesser > wrote: > > A small wart in this API is the meaning of > > > > shuffle(a, independent=False, axis=None) > > > > It could be argued that the correct behavior is to leave the > > array unchanged. > > I like the suggested changes. Since "independent" loses its meaning > when axis is None, I would expect this to have the same effect as > `shuffle(a, independent=True, axis=None)`. I think a shuffle function > that doesn't shuffle will confuse a lot of people! > > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sun Oct 12 07:57:06 2014 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 12 Oct 2014 12:57:06 +0100 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: References: Message-ID: On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser wrote: > A small wart in this API is the meaning of > > shuffle(a, independent=False, axis=None) > > It could be argued that the correct behavior is to leave the > array unchanged. (The current behavior can be interpreted as > shuffling a 1-d sequence of monolithic blobs; the axis argument > specifies which axis of the array corresponds to the > sequence index. Then `axis=None` means the argument is > a single monolithic blob, so there is nothing to shuffle.) > Or an error could be raised. > > What do you think? It seems to me a perfectly good reason to have two methods instead of one. I can't imagine when I wouldn't be using a literal True or False for this, so it really should be two different methods. That said, I would just make the axis=None behavior the same for both methods. axis=None does *not* mean "treat this like a single monolithic blob" in any of the axis=-having methods; it means "flatten the array and do the operation on the single flattened axis". I think the latter behavior is a reasonable interpretation of axis=None for both methods. -- Robert Kern From warren.weckesser at gmail.com Sun Oct 12 10:54:03 2014 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sun, 12 Oct 2014 10:54:03 -0400 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: References: Message-ID: On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern wrote: > On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser > wrote: > > > A small wart in this API is the meaning of > > > > shuffle(a, independent=False, axis=None) > > > > It could be argued that the correct behavior is to leave the > > array unchanged. (The current behavior can be interpreted as > > shuffling a 1-d sequence of monolithic blobs; the axis argument > > specifies which axis of the array corresponds to the > > sequence index. Then `axis=None` means the argument is > > a single monolithic blob, so there is nothing to shuffle.) > > Or an error could be raised. > > > > What do you think? > > It seems to me a perfectly good reason to have two methods instead of > one. I can't imagine when I wouldn't be using a literal True or False > for this, so it really should be two different methods. > > I agree, and my first inclination was to propose a different method (and I had the bikeshedding conversation with myself about the name: "disarrange", "scramble", "disorder", "randomize", "ashuffle", some other variation of the word "shuffle", ...), but I figured the first thing folks would say is "Why not just add options to shuffle?" So, choose your battles and all that. What do other folks think of making a separate method? > That said, I would just make the axis=None behavior the same for both > methods. axis=None does *not* mean "treat this like a single > monolithic blob" in any of the axis=-having methods; it means "flatten > the array and do the operation on the single flattened axis". I think > the latter behavior is a reasonable interpretation of axis=None for > both methods. > Sounds good to me. Warren > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun Oct 12 11:20:14 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 12 Oct 2014 11:20:14 -0400 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: References: Message-ID: On Sun, Oct 12, 2014 at 10:54 AM, Warren Weckesser wrote: > > > On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern wrote: >> >> On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser >> wrote: >> >> > A small wart in this API is the meaning of >> > >> > shuffle(a, independent=False, axis=None) >> > >> > It could be argued that the correct behavior is to leave the >> > array unchanged. (The current behavior can be interpreted as >> > shuffling a 1-d sequence of monolithic blobs; the axis argument >> > specifies which axis of the array corresponds to the >> > sequence index. Then `axis=None` means the argument is >> > a single monolithic blob, so there is nothing to shuffle.) >> > Or an error could be raised. >> > >> > What do you think? >> >> It seems to me a perfectly good reason to have two methods instead of >> one. I can't imagine when I wouldn't be using a literal True or False >> for this, so it really should be two different methods. >> > > > I agree, and my first inclination was to propose a different method (and I > had the bikeshedding conversation with myself about the name: "disarrange", > "scramble", "disorder", "randomize", "ashuffle", some other variation of the > word "shuffle", ...), but I figured the first thing folks would say is "Why > not just add options to shuffle?" So, choose your battles and all that. > > What do other folks think of making a separate method? I'm not a fan of many similar functions. What's the difference between permute, shuffle and scramble? And how do I find or remember which is which? > > >> >> That said, I would just make the axis=None behavior the same for both >> methods. axis=None does *not* mean "treat this like a single >> monolithic blob" in any of the axis=-having methods; it means "flatten >> the array and do the operation on the single flattened axis". I think >> the latter behavior is a reasonable interpretation of axis=None for >> both methods. > > > > Sounds good to me. +1 (since all the arguments have been already given Josef - Why does sort treat columns independently instead of sorting rows? - because there is lexsort - Oh, lexsort, I haven thought about it in 5 years. It's not even next to sort in the pop up code completion > > Warren > > >> >> >> -- >> Robert Kern >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From warren.weckesser at gmail.com Sun Oct 12 11:33:20 2014 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sun, 12 Oct 2014 11:33:20 -0400 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: References: Message-ID: On Sun, Oct 12, 2014 at 11:20 AM, wrote: > On Sun, Oct 12, 2014 at 10:54 AM, Warren Weckesser > wrote: > > > > > > On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern > wrote: > >> > >> On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser > >> wrote: > >> > >> > A small wart in this API is the meaning of > >> > > >> > shuffle(a, independent=False, axis=None) > >> > > >> > It could be argued that the correct behavior is to leave the > >> > array unchanged. (The current behavior can be interpreted as > >> > shuffling a 1-d sequence of monolithic blobs; the axis argument > >> > specifies which axis of the array corresponds to the > >> > sequence index. Then `axis=None` means the argument is > >> > a single monolithic blob, so there is nothing to shuffle.) > >> > Or an error could be raised. > >> > > >> > What do you think? > >> > >> It seems to me a perfectly good reason to have two methods instead of > >> one. I can't imagine when I wouldn't be using a literal True or False > >> for this, so it really should be two different methods. > >> > > > > > > I agree, and my first inclination was to propose a different method (and > I > > had the bikeshedding conversation with myself about the name: > "disarrange", > > "scramble", "disorder", "randomize", "ashuffle", some other variation of > the > > word "shuffle", ...), but I figured the first thing folks would say is > "Why > > not just add options to shuffle?" So, choose your battles and all that. > > > > What do other folks think of making a separate method? > > I'm not a fan of many similar functions. > > What's the difference between permute, shuffle and scramble? > The difference between `shuffle` and the new method being proposed is explained in the first email in this thread. `np.random.permutation` with an array argument returns a shuffled copy of the array; it does not modify its argument. (It should also get an `axis` argument when `shuffle` gets an `axis` argument.) And how do I find or remember which is which? > You could start with `doc(np.random)` (or `np.random?` in ipython). Warren > > > > > > > >> > >> That said, I would just make the axis=None behavior the same for both > >> methods. axis=None does *not* mean "treat this like a single > >> monolithic blob" in any of the axis=-having methods; it means "flatten > >> the array and do the operation on the single flattened axis". I think > >> the latter behavior is a reasonable interpretation of axis=None for > >> both methods. > > > > > > > > Sounds good to me. > > +1 (since all the arguments have been already given > > > Josef > - Why does sort treat columns independently instead of sorting rows? > - because there is lexsort > - Oh, lexsort, I haven thought about it in 5 years. It's not even next > to sort in the pop up code completion > > > > > > Warren > > > > > >> > >> > >> -- > >> Robert Kern > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun Oct 12 11:59:44 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 12 Oct 2014 11:59:44 -0400 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: References: Message-ID: On Sun, Oct 12, 2014 at 11:33 AM, Warren Weckesser wrote: > > > On Sun, Oct 12, 2014 at 11:20 AM, wrote: >> >> On Sun, Oct 12, 2014 at 10:54 AM, Warren Weckesser >> wrote: >> > >> > >> > On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern >> > wrote: >> >> >> >> On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser >> >> wrote: >> >> >> >> > A small wart in this API is the meaning of >> >> > >> >> > shuffle(a, independent=False, axis=None) >> >> > >> >> > It could be argued that the correct behavior is to leave the >> >> > array unchanged. (The current behavior can be interpreted as >> >> > shuffling a 1-d sequence of monolithic blobs; the axis argument >> >> > specifies which axis of the array corresponds to the >> >> > sequence index. Then `axis=None` means the argument is >> >> > a single monolithic blob, so there is nothing to shuffle.) >> >> > Or an error could be raised. >> >> > >> >> > What do you think? >> >> >> >> It seems to me a perfectly good reason to have two methods instead of >> >> one. I can't imagine when I wouldn't be using a literal True or False >> >> for this, so it really should be two different methods. >> >> >> > >> > >> > I agree, and my first inclination was to propose a different method (and >> > I >> > had the bikeshedding conversation with myself about the name: >> > "disarrange", >> > "scramble", "disorder", "randomize", "ashuffle", some other variation of >> > the >> > word "shuffle", ...), but I figured the first thing folks would say is >> > "Why >> > not just add options to shuffle?" So, choose your battles and all that. >> > >> > What do other folks think of making a separate method? >> >> I'm not a fan of many similar functions. >> >> What's the difference between permute, shuffle and scramble? > > > > The difference between `shuffle` and the new method being proposed is > explained in the first email in this thread. > `np.random.permutation` with an array argument returns a shuffled copy of > the array; it does not modify its argument. (It should also get an `axis` > argument when `shuffle` gets an `axis` argument.) > > >> And how do I find or remember which is which? > > > > You could start with `doc(np.random)` (or `np.random?` in ipython). If you have to check the docstring each time, then there is something wrong. In my opinion all docstrings should be read only once. It's like a Windows program where the GUI menus are not **self-explanatory**. What did Save-As do ? Josef > > Warren > > >> >> >> >> > >> > >> >> >> >> That said, I would just make the axis=None behavior the same for both >> >> methods. axis=None does *not* mean "treat this like a single >> >> monolithic blob" in any of the axis=-having methods; it means "flatten >> >> the array and do the operation on the single flattened axis". I think >> >> the latter behavior is a reasonable interpretation of axis=None for >> >> both methods. >> > >> > >> > >> > Sounds good to me. >> >> +1 (since all the arguments have been already given >> >> >> Josef >> - Why does sort treat columns independently instead of sorting rows? >> - because there is lexsort >> - Oh, lexsort, I haven thought about it in 5 years. It's not even next >> to sort in the pop up code completion >> >> >> > >> > Warren >> > >> > >> >> >> >> >> >> -- >> >> Robert Kern >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From warren.weckesser at gmail.com Sun Oct 12 12:14:15 2014 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sun, 12 Oct 2014 12:14:15 -0400 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: References: Message-ID: On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser < warren.weckesser at gmail.com> wrote: > I created an issue on github for an enhancement > to numpy.random.shuffle: > https://github.com/numpy/numpy/issues/5173 > I'd like to get some feedback on the idea. > > Currently, `shuffle` shuffles the first dimension of an array > in-place. For example, shuffling a 2D array shuffles the rows: > > In [227]: a > Out[227]: > array([[ 0, 1, 2], > [ 3, 4, 5], > [ 6, 7, 8], > [ 9, 10, 11]]) > > In [228]: np.random.shuffle(a) > > In [229]: a > Out[229]: > array([[ 0, 1, 2], > [ 9, 10, 11], > [ 3, 4, 5], > [ 6, 7, 8]]) > > > To add an axis keyword, we could (in effect) apply `shuffle` to > `a.swapaxes(axis, 0)`. For a 2-D array, `axis=1` would shuffles > the columns: > > In [232]: a = np.arange(15).reshape(3,5) > > In [233]: a > Out[233]: > array([[ 0, 1, 2, 3, 4], > [ 5, 6, 7, 8, 9], > [10, 11, 12, 13, 14]]) > > In [234]: axis = 1 > > In [235]: np.random.shuffle(a.swapaxes(axis, 0)) > > In [236]: a > Out[236]: > array([[ 3, 2, 4, 0, 1], > [ 8, 7, 9, 5, 6], > [13, 12, 14, 10, 11]]) > > So that's the first part--adding an `axis` keyword. > > The other part of the enhancement request is to add a shuffle > behavior that shuffles the 1-d slices *independently*. That is, > for a 2-d array, shuffling with `axis=0` would apply a different > shuffle to each column. In the github issue, I defined a > function called `disarrange` that implements this behavior: > > In [240]: a > Out[240]: > array([[ 0, 1, 2], > [ 3, 4, 5], > [ 6, 7, 8], > [ 9, 10, 11], > [12, 13, 14]]) > > In [241]: disarrange(a, axis=0) > > In [242]: a > Out[242]: > array([[ 6, 1, 2], > [ 3, 13, 14], > [ 9, 10, 5], > [12, 7, 8], > [ 0, 4, 11]]) > > Note that each column has been shuffled independently. > > This behavior is analogous to how `sort` handles the `axis` > keyword. `sort` sorts the 1-d slices along the given axis > independently. > > In the github issue, I suggested the following signature > for `shuffle` (but I'm not too fond of the name `independent`): > > def shuffle(a, independent=False, axis=0) > > If `independent` is False, the current behavior of `shuffle` > is used. If `independent` is True, each 1-d slice is shuffled > independently (in the same way that `sort` sorts each 1-d > slice). > > Like most functions that take an `axis` argument, `axis=None` > means to shuffle the flattened array. With `independent=True`, > it would act like `np.random.shuffle(a.flat)`, e.g. > > In [247]: a > Out[247]: > array([[ 0, 1, 2, 3, 4], > [ 5, 6, 7, 8, 9], > [10, 11, 12, 13, 14]]) > > In [248]: np.random.shuffle(a.flat) > > In [249]: a > Out[249]: > array([[ 0, 14, 9, 1, 13], > [ 2, 8, 5, 3, 4], > [ 6, 10, 7, 12, 11]]) > > > A small wart in this API is the meaning of > > shuffle(a, independent=False, axis=None) > > It could be argued that the correct behavior is to leave the > array unchanged. (The current behavior can be interpreted as > shuffling a 1-d sequence of monolithic blobs; the axis argument > specifies which axis of the array corresponds to the > sequence index. Then `axis=None` means the argument is > a single monolithic blob, so there is nothing to shuffle.) > Or an error could be raised. > > What do you think? > > Warren > > It is clear from the comments so far that, when `axis` is None, the result should be a shuffle of all the elements in the array, for both methods of shuffling (whether implemented as a new method or with a boolean argument to `shuffle`). Forget I ever suggested doing nothing or raising an error. :) Josef's comment reminded me that `numpy.random.permutation` returns a shuffled copy of the array (when its argument is an array). This function should also get an `axis` argument. `permutation` shuffles the same way `shuffle` does--it simply makes a copy and then calls `shuffle` on the copy. If a new method is added for the new shuffling style, then it would be consistent to also add a new method that uses the new shuffling style and returns a copy of the shuffled array. Then we would then have four methods: In-place Copy Current shuffle style shuffle permutation New shuffle style (name TBD) (name TBD) (All of them will have an `axis` argument.) I suspect this will make some folks prefer the approach of adding a boolean argument to `shuffle` and `permutation`. Warren -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebix at sebix.at Sun Oct 12 12:14:33 2014 From: sebix at sebix.at (Sebastian) Date: Sun, 12 Oct 2014 18:14:33 +0200 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: References: Message-ID: <543AA8E9.9020001@sebix.at> On 2014-10-12 16:54, Warren Weckesser wrote: > > > On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern > wrote: > > On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser > > > wrote: > > > A small wart in this API is the meaning of > > > > shuffle(a, independent=False, axis=None) > > > > It could be argued that the correct behavior is to leave the > > array unchanged. (The current behavior can be interpreted as > > shuffling a 1-d sequence of monolithic blobs; the axis argument > > specifies which axis of the array corresponds to the > > sequence index. Then `axis=None` means the argument is > > a single monolithic blob, so there is nothing to shuffle.) > > Or an error could be raised. > > > > What do you think? > > It seems to me a perfectly good reason to have two methods instead of > one. I can't imagine when I wouldn't be using a literal True or False > for this, so it really should be two different methods. > > > > I agree, and my first inclination was to propose a different method > (and I had the bikeshedding conversation with myself about the name: > "disarrange", "scramble", "disorder", "randomize", "ashuffle", some > other variation of the word "shuffle", ...), but I figured the first > thing folks would say is "Why not just add options to shuffle?" So, > choose your battles and all that. > > What do other folks think of making a separate method I'm not a fan of more methods with similar functionality in Numpy. It's already hard to overlook the existing functions and all their possible applications and variants. The axis=None proposal for shuffling all items is very intuitive. I think we don't want to take the path of matlab: a huge amount of powerful functions, but few people know of their powerful possibilities. regards, Sebastian From josef.pktd at gmail.com Sun Oct 12 12:29:13 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 12 Oct 2014 12:29:13 -0400 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: References: Message-ID: On Sun, Oct 12, 2014 at 12:14 PM, Warren Weckesser wrote: > > > On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser > wrote: >> >> I created an issue on github for an enhancement >> to numpy.random.shuffle: >> https://github.com/numpy/numpy/issues/5173 >> I'd like to get some feedback on the idea. >> >> Currently, `shuffle` shuffles the first dimension of an array >> in-place. For example, shuffling a 2D array shuffles the rows: >> >> In [227]: a >> Out[227]: >> array([[ 0, 1, 2], >> [ 3, 4, 5], >> [ 6, 7, 8], >> [ 9, 10, 11]]) >> >> In [228]: np.random.shuffle(a) >> >> In [229]: a >> Out[229]: >> array([[ 0, 1, 2], >> [ 9, 10, 11], >> [ 3, 4, 5], >> [ 6, 7, 8]]) >> >> >> To add an axis keyword, we could (in effect) apply `shuffle` to >> `a.swapaxes(axis, 0)`. For a 2-D array, `axis=1` would shuffles >> the columns: >> >> In [232]: a = np.arange(15).reshape(3,5) >> >> In [233]: a >> Out[233]: >> array([[ 0, 1, 2, 3, 4], >> [ 5, 6, 7, 8, 9], >> [10, 11, 12, 13, 14]]) >> >> In [234]: axis = 1 >> >> In [235]: np.random.shuffle(a.swapaxes(axis, 0)) >> >> In [236]: a >> Out[236]: >> array([[ 3, 2, 4, 0, 1], >> [ 8, 7, 9, 5, 6], >> [13, 12, 14, 10, 11]]) >> >> So that's the first part--adding an `axis` keyword. >> >> The other part of the enhancement request is to add a shuffle >> behavior that shuffles the 1-d slices *independently*. That is, >> for a 2-d array, shuffling with `axis=0` would apply a different >> shuffle to each column. In the github issue, I defined a >> function called `disarrange` that implements this behavior: >> >> In [240]: a >> Out[240]: >> array([[ 0, 1, 2], >> [ 3, 4, 5], >> [ 6, 7, 8], >> [ 9, 10, 11], >> [12, 13, 14]]) >> >> In [241]: disarrange(a, axis=0) >> >> In [242]: a >> Out[242]: >> array([[ 6, 1, 2], >> [ 3, 13, 14], >> [ 9, 10, 5], >> [12, 7, 8], >> [ 0, 4, 11]]) >> >> Note that each column has been shuffled independently. >> >> This behavior is analogous to how `sort` handles the `axis` >> keyword. `sort` sorts the 1-d slices along the given axis >> independently. >> >> In the github issue, I suggested the following signature >> for `shuffle` (but I'm not too fond of the name `independent`): >> >> def shuffle(a, independent=False, axis=0) >> >> If `independent` is False, the current behavior of `shuffle` >> is used. If `independent` is True, each 1-d slice is shuffled >> independently (in the same way that `sort` sorts each 1-d >> slice). >> >> Like most functions that take an `axis` argument, `axis=None` >> means to shuffle the flattened array. With `independent=True`, >> it would act like `np.random.shuffle(a.flat)`, e.g. >> >> In [247]: a >> Out[247]: >> array([[ 0, 1, 2, 3, 4], >> [ 5, 6, 7, 8, 9], >> [10, 11, 12, 13, 14]]) >> >> In [248]: np.random.shuffle(a.flat) >> >> In [249]: a >> Out[249]: >> array([[ 0, 14, 9, 1, 13], >> [ 2, 8, 5, 3, 4], >> [ 6, 10, 7, 12, 11]]) >> >> >> A small wart in this API is the meaning of >> >> shuffle(a, independent=False, axis=None) >> >> It could be argued that the correct behavior is to leave the >> array unchanged. (The current behavior can be interpreted as >> shuffling a 1-d sequence of monolithic blobs; the axis argument >> specifies which axis of the array corresponds to the >> sequence index. Then `axis=None` means the argument is >> a single monolithic blob, so there is nothing to shuffle.) >> Or an error could be raised. >> >> What do you think? >> >> Warren >> > > > > It is clear from the comments so far that, when `axis` is None, the result > should be a shuffle of all the elements in the array, for both methods of > shuffling (whether implemented as a new method or with a boolean argument to > `shuffle`). Forget I ever suggested doing nothing or raising an error. :) > > Josef's comment reminded me that `numpy.random.permutation` which kind of proofs my point I sometimes have problems finding `shuffle` because I want a function that does permutation. Josef returns a > shuffled copy of the array (when its argument is an array). This function > should also get an `axis` argument. `permutation` shuffles the same way > `shuffle` does--it simply makes a copy and then calls `shuffle` on the copy. > If a new method is added for the new shuffling style, then it would be > consistent to also add a new method that uses the new shuffling style and > returns a copy of the shuffled array. Then we would then have four > methods: > > In-place Copy > Current shuffle style shuffle permutation > New shuffle style (name TBD) (name TBD) > > (All of them will have an `axis` argument.) > > I suspect this will make some folks prefer the approach of adding a boolean > argument to `shuffle` and `permutation`. > > Warren > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From warren.weckesser at gmail.com Sun Oct 12 12:29:27 2014 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Sun, 12 Oct 2014 12:29:27 -0400 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: References: Message-ID: On Sun, Oct 12, 2014 at 12:14 PM, Warren Weckesser < warren.weckesser at gmail.com> wrote: > > > On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser < > warren.weckesser at gmail.com> wrote: > >> I created an issue on github for an enhancement >> to numpy.random.shuffle: >> https://github.com/numpy/numpy/issues/5173 >> I'd like to get some feedback on the idea. >> >> Currently, `shuffle` shuffles the first dimension of an array >> in-place. For example, shuffling a 2D array shuffles the rows: >> >> In [227]: a >> Out[227]: >> array([[ 0, 1, 2], >> [ 3, 4, 5], >> [ 6, 7, 8], >> [ 9, 10, 11]]) >> >> In [228]: np.random.shuffle(a) >> >> In [229]: a >> Out[229]: >> array([[ 0, 1, 2], >> [ 9, 10, 11], >> [ 3, 4, 5], >> [ 6, 7, 8]]) >> >> >> To add an axis keyword, we could (in effect) apply `shuffle` to >> `a.swapaxes(axis, 0)`. For a 2-D array, `axis=1` would shuffles >> the columns: >> >> In [232]: a = np.arange(15).reshape(3,5) >> >> In [233]: a >> Out[233]: >> array([[ 0, 1, 2, 3, 4], >> [ 5, 6, 7, 8, 9], >> [10, 11, 12, 13, 14]]) >> >> In [234]: axis = 1 >> >> In [235]: np.random.shuffle(a.swapaxes(axis, 0)) >> >> In [236]: a >> Out[236]: >> array([[ 3, 2, 4, 0, 1], >> [ 8, 7, 9, 5, 6], >> [13, 12, 14, 10, 11]]) >> >> So that's the first part--adding an `axis` keyword. >> >> The other part of the enhancement request is to add a shuffle >> behavior that shuffles the 1-d slices *independently*. That is, >> for a 2-d array, shuffling with `axis=0` would apply a different >> shuffle to each column. In the github issue, I defined a >> function called `disarrange` that implements this behavior: >> >> In [240]: a >> Out[240]: >> array([[ 0, 1, 2], >> [ 3, 4, 5], >> [ 6, 7, 8], >> [ 9, 10, 11], >> [12, 13, 14]]) >> >> In [241]: disarrange(a, axis=0) >> >> In [242]: a >> Out[242]: >> array([[ 6, 1, 2], >> [ 3, 13, 14], >> [ 9, 10, 5], >> [12, 7, 8], >> [ 0, 4, 11]]) >> >> Note that each column has been shuffled independently. >> >> This behavior is analogous to how `sort` handles the `axis` >> keyword. `sort` sorts the 1-d slices along the given axis >> independently. >> >> In the github issue, I suggested the following signature >> for `shuffle` (but I'm not too fond of the name `independent`): >> >> def shuffle(a, independent=False, axis=0) >> >> If `independent` is False, the current behavior of `shuffle` >> is used. If `independent` is True, each 1-d slice is shuffled >> independently (in the same way that `sort` sorts each 1-d >> slice). >> >> Like most functions that take an `axis` argument, `axis=None` >> means to shuffle the flattened array. With `independent=True`, >> it would act like `np.random.shuffle(a.flat)`, e.g. >> >> In [247]: a >> Out[247]: >> array([[ 0, 1, 2, 3, 4], >> [ 5, 6, 7, 8, 9], >> [10, 11, 12, 13, 14]]) >> >> In [248]: np.random.shuffle(a.flat) >> >> In [249]: a >> Out[249]: >> array([[ 0, 14, 9, 1, 13], >> [ 2, 8, 5, 3, 4], >> [ 6, 10, 7, 12, 11]]) >> >> >> A small wart in this API is the meaning of >> >> shuffle(a, independent=False, axis=None) >> >> It could be argued that the correct behavior is to leave the >> array unchanged. (The current behavior can be interpreted as >> shuffling a 1-d sequence of monolithic blobs; the axis argument >> specifies which axis of the array corresponds to the >> sequence index. Then `axis=None` means the argument is >> a single monolithic blob, so there is nothing to shuffle.) >> Or an error could be raised. >> >> What do you think? >> >> Warren >> >> > > > It is clear from the comments so far that, when `axis` is None, the result > should be a shuffle of all the elements in the array, for both methods of > shuffling (whether implemented as a new method or with a boolean argument > to `shuffle`). Forget I ever suggested doing nothing or raising an error. > :) > > Josef's comment reminded me that `numpy.random.permutation` returns a > shuffled copy of the array (when its argument is an array). This function > should also get an `axis` argument. `permutation` shuffles the same way > `shuffle` does--it simply makes a copy and then calls `shuffle` on the > copy. If a new method is added for the new shuffling style, then it would > be consistent to also add a new method that uses the new shuffling style > and returns a copy of the shuffled array. Then we would then have four > methods: > > In-place Copy > Current shuffle style shuffle permutation > New shuffle style (name TBD) (name TBD) > > (All of them will have an `axis` argument.) > > That table makes me think that, *if* we go with new methods, the names should be `shuffleXXX` and `permutationXXX`, where `XXX` is a common suffix that is to be determined. That will ensure that the names appear together in alphabetical lists, and should show up together as options in tab-completion or code-completion. Warren > I suspect this will make some folks prefer the approach of adding a > boolean argument to `shuffle` and `permutation`. > > Warren > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From valentin at haenel.co Sun Oct 12 12:40:33 2014 From: valentin at haenel.co (Valentin Haenel) Date: Sun, 12 Oct 2014 18:40:33 +0200 Subject: [Numpy-discussion] [ANN] bcolz 0.7.2 Message-ID: <20141012164033.GC10668@kudu.in-berlin.de> ====================== Announcing bcolz 0.7.2 ====================== What's new ========== This is a maintenance release that fixes various bits and pieces. Importantly, compatibility with Numpy 1.9 and Cython 0.21 has been fixed and the test suit no longer segfaults on 32 bit UNIX. Feature-wise a new ``carray.view()`` method has been introduced which allows carrays to share the same raw data. ``bcolz`` is a renaming of the ``carray`` project. The new goals for the project are to create simple, yet flexible compressed containers, that can live either on-disk or in-memory, and with some high-performance iterators (like `iter()`, `where()`) for querying them. Together, bcolz and the Blosc compressor, are finally fulfilling the promise of accelerating memory I/O, at least for some real scenarios: http://nbviewer.ipython.org/github/Blosc/movielens-bench/blob/master/querying-ep14.ipynb#Plots For more detailed info, see the release notes in: https://github.com/Blosc/bcolz/wiki/Release-Notes What it is ========== bcolz provides columnar and compressed data containers. Column storage allows for efficiently querying tables with a large number of columns. It also allows for cheap addition and removal of column. In addition, bcolz objects are compressed by default for reducing memory/disk I/O needs. The compression process is carried out internally by Blosc, a high-performance compressor that is optimized for binary data. bcolz can use numexpr internally so as to accelerate many vector and query operations (although it can use pure NumPy for doing so too). numexpr optimizes the memory usage and use several cores for doing the computations, so it is blazing fast. Moreover, the carray/ctable containers can be disk-based, and it is possible to use them for seamlessly performing out-of-memory computations. bcolz has minimal dependencies (NumPy), comes with an exhaustive test suite and fully supports both 32-bit and 64-bit platforms. Also, it is typically tested on both UNIX and Windows operating systems. Installing ========== bcolz is in the PyPI repository, so installing it is easy:: $ pip install -U bcolz Resources ========= Visit the main bcolz site repository at: http://github.com/Blosc/bcolz Manual: http://bcolz.blosc.org Home of Blosc compressor: http://blosc.org User's mail list: bcolz at googlegroups.com http://groups.google.com/group/bcolz License is the new BSD: https://github.com/Blosc/bcolz/blob/master/LICENSES/BCOLZ.txt ---- **Enjoy data!** From mads.ipsen at gmail.com Sun Oct 12 13:19:13 2014 From: mads.ipsen at gmail.com (Mads Ipsen) Date: Sun, 12 Oct 2014 19:19:13 +0200 Subject: [Numpy-discussion] Detect if array has been transposed Message-ID: <543AB811.6060104@gmail.com> Hi, In part of my C++ code, I often do void foo(PyObject * matrix) { do stuff } where matrix is a numpy mxn matrix created on the Python side, where foo() eg. is invoked as a = numpy.array([[1,2],[3,5]]) foo(a) However, if you call transpose() on a, some care should be taken, since numpy's internal matrix data first gets transposed on demand. In that case I must do a = numpy.array([[1,2],[3,5]]) a.transpose() foo(a.copy()) to make sure the correct data of the array gets transferred to the C++ side. Is there any way for me to detect (on the Python side) that transpose() has been invoked on the matrix, and thereby only do the copy operation when it really is needed? For example if a_has_transposed_data: foo(a.copy()) else: foo(a) Best regards, Mads -- +---------------------------------------------------------+ | Mads Ipsen | +----------------------+----------------------------------+ | G?seb?ksvej 7, 4. tv | phone: +45-29716388 | | DK-2500 Valby | email: mads.ipsen at gmail.com | | Denmark | map : www.tinyurl.com/ns52fpa | +----------------------+----------------------------------+ From jaime.frio at gmail.com Sun Oct 12 13:56:13 2014 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Sun, 12 Oct 2014 10:56:13 -0700 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: References: Message-ID: On Sun, Oct 12, 2014 at 9:29 AM, Warren Weckesser < warren.weckesser at gmail.com> wrote: > > > On Sun, Oct 12, 2014 at 12:14 PM, Warren Weckesser < > warren.weckesser at gmail.com> wrote: > >> >> >> On Sat, Oct 11, 2014 at 6:51 PM, Warren Weckesser < >> warren.weckesser at gmail.com> wrote: >> >>> I created an issue on github for an enhancement >>> to numpy.random.shuffle: >>> https://github.com/numpy/numpy/issues/5173 >>> I'd like to get some feedback on the idea. >>> >>> Currently, `shuffle` shuffles the first dimension of an array >>> in-place. For example, shuffling a 2D array shuffles the rows: >>> >>> In [227]: a >>> Out[227]: >>> array([[ 0, 1, 2], >>> [ 3, 4, 5], >>> [ 6, 7, 8], >>> [ 9, 10, 11]]) >>> >>> In [228]: np.random.shuffle(a) >>> >>> In [229]: a >>> Out[229]: >>> array([[ 0, 1, 2], >>> [ 9, 10, 11], >>> [ 3, 4, 5], >>> [ 6, 7, 8]]) >>> >>> >>> To add an axis keyword, we could (in effect) apply `shuffle` to >>> `a.swapaxes(axis, 0)`. For a 2-D array, `axis=1` would shuffles >>> the columns: >>> >>> In [232]: a = np.arange(15).reshape(3,5) >>> >>> In [233]: a >>> Out[233]: >>> array([[ 0, 1, 2, 3, 4], >>> [ 5, 6, 7, 8, 9], >>> [10, 11, 12, 13, 14]]) >>> >>> In [234]: axis = 1 >>> >>> In [235]: np.random.shuffle(a.swapaxes(axis, 0)) >>> >>> In [236]: a >>> Out[236]: >>> array([[ 3, 2, 4, 0, 1], >>> [ 8, 7, 9, 5, 6], >>> [13, 12, 14, 10, 11]]) >>> >>> So that's the first part--adding an `axis` keyword. >>> >>> The other part of the enhancement request is to add a shuffle >>> behavior that shuffles the 1-d slices *independently*. That is, >>> for a 2-d array, shuffling with `axis=0` would apply a different >>> shuffle to each column. In the github issue, I defined a >>> function called `disarrange` that implements this behavior: >>> >>> In [240]: a >>> Out[240]: >>> array([[ 0, 1, 2], >>> [ 3, 4, 5], >>> [ 6, 7, 8], >>> [ 9, 10, 11], >>> [12, 13, 14]]) >>> >>> In [241]: disarrange(a, axis=0) >>> >>> In [242]: a >>> Out[242]: >>> array([[ 6, 1, 2], >>> [ 3, 13, 14], >>> [ 9, 10, 5], >>> [12, 7, 8], >>> [ 0, 4, 11]]) >>> >>> Note that each column has been shuffled independently. >>> >>> This behavior is analogous to how `sort` handles the `axis` >>> keyword. `sort` sorts the 1-d slices along the given axis >>> independently. >>> >>> In the github issue, I suggested the following signature >>> for `shuffle` (but I'm not too fond of the name `independent`): >>> >>> def shuffle(a, independent=False, axis=0) >>> >>> If `independent` is False, the current behavior of `shuffle` >>> is used. If `independent` is True, each 1-d slice is shuffled >>> independently (in the same way that `sort` sorts each 1-d >>> slice). >>> >>> Like most functions that take an `axis` argument, `axis=None` >>> means to shuffle the flattened array. With `independent=True`, >>> it would act like `np.random.shuffle(a.flat)`, e.g. >>> >>> In [247]: a >>> Out[247]: >>> array([[ 0, 1, 2, 3, 4], >>> [ 5, 6, 7, 8, 9], >>> [10, 11, 12, 13, 14]]) >>> >>> In [248]: np.random.shuffle(a.flat) >>> >>> In [249]: a >>> Out[249]: >>> array([[ 0, 14, 9, 1, 13], >>> [ 2, 8, 5, 3, 4], >>> [ 6, 10, 7, 12, 11]]) >>> >>> >>> A small wart in this API is the meaning of >>> >>> shuffle(a, independent=False, axis=None) >>> >>> It could be argued that the correct behavior is to leave the >>> array unchanged. (The current behavior can be interpreted as >>> shuffling a 1-d sequence of monolithic blobs; the axis argument >>> specifies which axis of the array corresponds to the >>> sequence index. Then `axis=None` means the argument is >>> a single monolithic blob, so there is nothing to shuffle.) >>> Or an error could be raised. >>> >>> What do you think? >>> >>> Warren >>> >>> >> >> >> It is clear from the comments so far that, when `axis` is None, the >> result should be a shuffle of all the elements in the array, for both >> methods of shuffling (whether implemented as a new method or with a boolean >> argument to `shuffle`). Forget I ever suggested doing nothing or raising >> an error. :) >> >> Josef's comment reminded me that `numpy.random.permutation` returns a >> shuffled copy of the array (when its argument is an array). This function >> should also get an `axis` argument. `permutation` shuffles the same way >> `shuffle` does--it simply makes a copy and then calls `shuffle` on the >> copy. If a new method is added for the new shuffling style, then it would >> be consistent to also add a new method that uses the new shuffling style >> and returns a copy of the shuffled array. Then we would then have four >> methods: >> >> In-place Copy >> Current shuffle style shuffle permutation >> New shuffle style (name TBD) (name TBD) >> >> (All of them will have an `axis` argument.) >> >> > > That table makes me think that, *if* we go with new methods, the names > should be `shuffleXXX` and `permutationXXX`, where `XXX` is a common suffix > that is to be determined. That will ensure that the names appear together > in alphabetical lists, and should show up together as options in > tab-completion or code-completion. > Just to add some noise to a productive conversation: if you add a 'copy' flag to shuffle, then all the functionality is in one place, and 'permutation' can either be deprecated, or trivially implemented in terms of the new 'shuffle'. Jaime > > > Warren > > >> I suspect this will make some folks prefer the approach of adding a >> boolean argument to `shuffle` and `permutation`. >> >> Warren >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sun Oct 12 14:29:24 2014 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 12 Oct 2014 21:29:24 +0300 Subject: [Numpy-discussion] Detect if array has been transposed In-Reply-To: <543AB811.6060104@gmail.com> References: <543AB811.6060104@gmail.com> Message-ID: 12.10.2014, 20:19, Mads Ipsen kirjoitti: > Is there any way for me to detect (on the Python side) that transpose() > has been invoked on the matrix, and thereby only do the copy operation > when it really is needed? The correct way to do this is to, either: In your C code check PyArray_IS_C_CONTIGUOUS(obj) and raise an error if it is not. In addition, on the Python side, check for `a.flags.c_contiguous` and make a copy if it is not. OR In your C code, get an handle to the array using PyArray_FromANY (or PyArray_FromOTF) with NPY_ARRAY_C_CONTIGUOUS requirement set so that it makes a copy when necessary. From efiring at hawaii.edu Sun Oct 12 15:16:45 2014 From: efiring at hawaii.edu (Eric Firing) Date: Sun, 12 Oct 2014 09:16:45 -1000 Subject: [Numpy-discussion] Detect if array has been transposed In-Reply-To: References: <543AB811.6060104@gmail.com> Message-ID: <543AD39D.6070207@hawaii.edu> On 2014/10/12, 8:29 AM, Pauli Virtanen wrote: > 12.10.2014, 20:19, Mads Ipsen kirjoitti: >> Is there any way for me to detect (on the Python side) that transpose() >> has been invoked on the matrix, and thereby only do the copy operation >> when it really is needed? > > The correct way to do this is to, either: > > In your C code check PyArray_IS_C_CONTIGUOUS(obj) and raise an error if > it is not. In addition, on the Python side, check for > `a.flags.c_contiguous` and make a copy if it is not. > > OR > > In your C code, get an handle to the array using PyArray_FromANY (or > PyArray_FromOTF) with NPY_ARRAY_C_CONTIGUOUS requirement set so that it > makes a copy when necessary. or let numpy handle it on the python side: foo(numpy.ascontiguousarray(a)) > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From shoyer at gmail.com Sun Oct 12 15:49:51 2014 From: shoyer at gmail.com (Stephan Hoyer) Date: Sun, 12 Oct 2014 12:49:51 -0700 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: References: Message-ID: On Sun, Oct 12, 2014 at 10:56 AM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > > Just to add some noise to a productive conversation: if you add a 'copy' > flag to shuffle, then all the functionality is in one place, and > 'permutation' can either be deprecated, or trivially implemented in terms > of the new 'shuffle'. > +1 Unfortunately, shuffle has the better name, but permutation has the better default behavior. (also, I think "inplace" might be a less ambiguous name for the argument than "copy") -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sun Oct 12 19:07:40 2014 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 13 Oct 2014 02:07:40 +0300 Subject: [Numpy-discussion] Detect if array has been transposed In-Reply-To: <543AD39D.6070207@hawaii.edu> References: <543AB811.6060104@gmail.com> <543AD39D.6070207@hawaii.edu> Message-ID: 12.10.2014, 22:16, Eric Firing kirjoitti: > On 2014/10/12, 8:29 AM, Pauli Virtanen wrote: >> 12.10.2014, 20:19, Mads Ipsen kirjoitti: >>> Is there any way for me to detect (on the Python side) that transpose() >>> has been invoked on the matrix, and thereby only do the copy operation >>> when it really is needed? >> >> The correct way to do this is to, either: >> >> In your C code check PyArray_IS_C_CONTIGUOUS(obj) and raise an error if >> it is not. In addition, on the Python side, check for >> `a.flags.c_contiguous` and make a copy if it is not. >> >> OR >> >> In your C code, get an handle to the array using PyArray_FromANY (or >> PyArray_FromOTF) with NPY_ARRAY_C_CONTIGUOUS requirement set so that it >> makes a copy when necessary. > > or let numpy handle it on the python side: > > foo(numpy.ascontiguousarray(a)) Yes, but the C code really should check that the input array is C-contiguous, if it only works for C-contiguous inputs. -- Pauli Virtanen From njs at pobox.com Sun Oct 12 19:18:37 2014 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 13 Oct 2014 00:18:37 +0100 Subject: [Numpy-discussion] Detect if array has been transposed In-Reply-To: References: <543AB811.6060104@gmail.com> <543AD39D.6070207@hawaii.edu> Message-ID: On Mon, Oct 13, 2014 at 12:07 AM, Pauli Virtanen wrote: > 12.10.2014, 22:16, Eric Firing kirjoitti: >> On 2014/10/12, 8:29 AM, Pauli Virtanen wrote: >>> 12.10.2014, 20:19, Mads Ipsen kirjoitti: >>>> Is there any way for me to detect (on the Python side) that transpose() >>>> has been invoked on the matrix, and thereby only do the copy operation >>>> when it really is needed? >>> >>> The correct way to do this is to, either: >>> >>> In your C code check PyArray_IS_C_CONTIGUOUS(obj) and raise an error if >>> it is not. In addition, on the Python side, check for >>> `a.flags.c_contiguous` and make a copy if it is not. >>> >>> OR >>> >>> In your C code, get an handle to the array using PyArray_FromANY (or >>> PyArray_FromOTF) with NPY_ARRAY_C_CONTIGUOUS requirement set so that it >>> makes a copy when necessary. >> >> or let numpy handle it on the python side: >> >> foo(numpy.ascontiguousarray(a)) > > Yes, but the C code really should check that the input array is > C-contiguous, if it only works for C-contiguous inputs. I.e. your original instructions were correct, but instead of checking a.flags.c_contiguous by hand etc. the OP should just call ascontiguousarray which takes care of that part. -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From lahiruts at gmail.com Sun Oct 12 20:52:06 2014 From: lahiruts at gmail.com (Lahiru Samarakoon) Date: Mon, 13 Oct 2014 08:52:06 +0800 Subject: [Numpy-discussion] Instaling numpy without root access In-Reply-To: References: <54381060.1040109@googlemail.com> <543817A5.4010900@googlemail.com> Message-ID: Guys, any advice is highly appreciated. I am a little new to building in Linux. Thanks, Lahiru On Sat, Oct 11, 2014 at 9:43 AM, Lahiru Samarakoon wrote: > I switched to numpy-1.8.2. . Now getting following error. I am using > LAPACK that comes with atlast installation. Can this be a problem? > > Traceback (most recent call last): > File "", line 1, in > File > "/home/svu/a0095654/.local/lib/python2.7/site-packages/numpy/__init__.py", > line 170, in > from . import add_newdocs > File > "/home/svu/a0095654/.local/lib/python2.7/site-packages/numpy/add_newdocs.py", > line 13, in > from numpy.lib import add_newdoc > File > "/home/svu/a0095654/.local/lib/python2.7/site-packages/numpy/lib/__init__.py", > line 18, in > from .polynomial import * > File > "/home/svu/a0095654/.local/lib/python2.7/site-packages/numpy/lib/polynomial.py", > line 19, in > from numpy.linalg import eigvals, lstsq, inv > File > "/home/svu/a0095654/.local/lib/python2.7/site-packages/numpy/linalg/__init__.py", > line 51, in > from .linalg import * > File > "/home/svu/a0095654/.local/lib/python2.7/site-packages/numpy/linalg/linalg.py", > line 29, in > from numpy.linalg import lapack_lite, _umath_linalg > ImportError: > /home/svu/a0095654/.local/lib/python2.7/site-packages/numpy/linalg/lapack_lite.so: > undefined symbol: zgesdd_ > > On Sat, Oct 11, 2014 at 1:30 AM, Julian Taylor < > jtaylor.debian at googlemail.com> wrote: > >> On 10.10.2014 19:26, Lahiru Samarakoon wrote: >> > Red Hat Enterprise Linux release 5.8 >> > gcc (GCC) 4.1.2 >> > >> > I am also trying to install numpy 1.9. >> >> that is the broken platform, please try the master branch or the >> maintenance/1.9.x branch, those should work now. >> >> Are there volunteers to report this to redhat? >> >> > >> > On Sat, Oct 11, 2014 at 12:59 AM, Julian Taylor >> > > >> > wrote: >> > >> > On 10.10.2014 18:51, Lahiru Samarakoon wrote: >> > > Dear all, >> > > >> > > I am trying to install numpy without root access. So I am >> building from >> > > the source. I have installed atlas which also has lapack with >> it. I >> > > changed the site.cfg file as given below >> > > >> > > [DEFAULT] >> > > library_dirs = /home/svu/a0095654/ATLAS/build/lib >> > > include_dirs = /home/svu/a0095654/ATLAS/build/include >> > > >> > > >> > > However, I am getting a segmentation fault when importing numpy. >> > > >> > > Please advise. I also put the build log file at the end of the >> email if >> > > necessary. >> > >> > >> > Which platform are you working on? Which compiler version? >> > We just solved a segfault on import on red hat 5 gcc 4.1.2. Very >> likely >> > caused by a compiler bug. See >> https://github.com/numpy/numpy/issues/5163 >> > >> > The build log is complaining about your atlas being to small, >> possibly >> > the installation is broken? >> > >> > >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Oct 12 21:13:56 2014 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 13 Oct 2014 02:13:56 +0100 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: <543AA8E9.9020001@sebix.at> References: <543AA8E9.9020001@sebix.at> Message-ID: On Sun, Oct 12, 2014 at 5:14 PM, Sebastian wrote: > > On 2014-10-12 16:54, Warren Weckesser wrote: >> >> >> On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern > > wrote: >> >> On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser >> > >> wrote: >> >> > A small wart in this API is the meaning of >> > >> > shuffle(a, independent=False, axis=None) >> > >> > It could be argued that the correct behavior is to leave the >> > array unchanged. (The current behavior can be interpreted as >> > shuffling a 1-d sequence of monolithic blobs; the axis argument >> > specifies which axis of the array corresponds to the >> > sequence index. Then `axis=None` means the argument is >> > a single monolithic blob, so there is nothing to shuffle.) >> > Or an error could be raised. >> > >> > What do you think? >> >> It seems to me a perfectly good reason to have two methods instead of >> one. I can't imagine when I wouldn't be using a literal True or False >> for this, so it really should be two different methods. >> >> >> >> I agree, and my first inclination was to propose a different method >> (and I had the bikeshedding conversation with myself about the name: >> "disarrange", "scramble", "disorder", "randomize", "ashuffle", some >> other variation of the word "shuffle", ...), but I figured the first >> thing folks would say is "Why not just add options to shuffle?" So, >> choose your battles and all that. >> >> What do other folks think of making a separate method > I'm not a fan of more methods with similar functionality in Numpy. It's > already hard to overlook the existing functions and all their possible > applications and variants. The axis=None proposal for shuffling all > items is very intuitive. > > I think we don't want to take the path of matlab: a huge amount of > powerful functions, but few people know of their powerful possibilities. I totally agree with this principle, but I think this is an exception to the rule, b/c unfortunately in this case the function that we *do* have is weird and inconsistent with how most other functions in numpy work. It doesn't vectorize! Cf. 'sort' or how a 'shuffle' gufunc (k,)->(k,) would work. Also, it's easy to implement the current 'shuffle' in terms of any 1d shuffle function, with no explicit loops, Warren's disarrange requires an explicit loop. So, we really implemented the wrong one, oops. What this means going forward, though, is that our only options are either to implement both behaviours with two functions, or else to give up on have the more natural behaviour altogether. I think the former is the lesser of two evils. Regarding names: shuffle/permutation is a terrible naming convention IMHO and shouldn't be propagated further. We already have a good naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. reversed, etc. So, how about: scramble + scrambled shuffle individual entries within each row/column/..., as in Warren's suggestion. shuffle + shuffled to do what shuffle, permutation do now (mnemonic: these break a 2d array into a bunch of 1d "cards", and then shuffle those cards). permuted remains indefinitely, with the docstring: "Deprecated alias for 'shuffled'." -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From Per.Brodtkorb at ffi.no Mon Oct 13 03:56:43 2014 From: Per.Brodtkorb at ffi.no (Per.Brodtkorb at ffi.no) Date: Mon, 13 Oct 2014 07:56:43 +0000 Subject: [Numpy-discussion] Any interest in a generalized piecewise function? In-Reply-To: <54384208.1050005@googlemail.com> References: <8114F0AADAECD745AF1FC2047A5DC7ED1E54E27E@HBU-POST2.ffi.no> <54384208.1050005@googlemail.com> Message-ID: <8114F0AADAECD745AF1FC2047A5DC7ED1E54F4BB@HBU-POST2.ffi.no> Ok, I will open a pull request. But before I do so, I would like to know what kind of pull request to make. Ideally I think the call signature for piecewise should be like this: def piecewise(condlist, funclist, xi=None, fillvalue=numpy.nan, args=(), **kw): or this: def piecewise(condlist, funclist, xi=None, args=(), **kw): The reason why I think so is that if funclist is a list of scalars then xi is not needed as input and logically should be placed as an optional third argument to the function and not as the first as numpy.piecewise currently does. Any of those two call signatures will break with the current one in numpy.piecewise. So is this new call signature desirable enough that we want to break backwards compatibility? Or should I just keep the current callsignature: def piecewise(xi, condlist, funclist, *args, **kw): in the pull request? Per A. -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Julian Taylor Sent: 10. oktober 2014 22:31 To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Any interest in a generalized piecewise function? On 10.10.2014 11:34, Per.Brodtkorb at ffi.no wrote: > I have worked on a generalized piecewise function (genpiecewise) that > are simpler and more general than the current numpy.piecewise > implementation. The new generalized piecewise function allows > functions of the type f(x0, x1,.. , xn) i.e. to have arbitrary number > of input arguments that are evaluated conditionally. > > The generalized piecewise function passes all the tests for > numpy.piecewise function except the undocumented features of > numpy.piecewise which allows condlist to be a single bool list/array > or a single int array. > Hi, One would think you could already pass two arguments to a function by using a 2d array but I couldn't get that to work with some short testing. So this looks like a useful improvement over the current piecewise to me. Do you want open a pull request on github to discuss the details? It would be good if it can replace the current piecewise as having two functions which do very similar things is not so nice. Cheers, Julian _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From daniele at grinta.net Mon Oct 13 07:35:18 2014 From: daniele at grinta.net (Daniele Nicolodi) Date: Mon, 13 Oct 2014 13:35:18 +0200 Subject: [Numpy-discussion] Best way to expose std::vector to be used with numpy Message-ID: <543BB8F6.1070103@grinta.net> Hello, I have a C++ application that collects float, int or complex data in a possibly quite large std::vector. The application has some SWIG generated python wrappers that expose this vector to python. However, the standard way in which SWIG exposes the data is to create a touple and pass this to python, where it is very often converted to a numpy array for processing. Of course this is not efficient. I would like therefore to generate a smarter python interface. In python 3 I would without doubt go for implementing the buffer protocol, which enables seamless interfacing with numpy. However, I need to support also python 2 and there the buffer protocol is not as nice. What is the best thing to do to expose the data buffer to python 2, without introducing a dependency on numpy in the application? Bonus points if the proposed solution comes with a pointer on how to implement in in SWIG (which I don't know much and feel a bit too magic for my taste). Thank you! Cheers, Daniele From mads.ipsen at gmail.com Mon Oct 13 08:57:28 2014 From: mads.ipsen at gmail.com (Mads Ipsen) Date: Mon, 13 Oct 2014 14:57:28 +0200 Subject: [Numpy-discussion] Detect if array has been transposed In-Reply-To: References: <543AB811.6060104@gmail.com> <543AD39D.6070207@hawaii.edu> Message-ID: <543BCC38.6070605@gmail.com> On 13/10/14 01:18, Nathaniel Smith wrote: > On Mon, Oct 13, 2014 at 12:07 AM, Pauli Virtanen wrote: >> 12.10.2014, 22:16, Eric Firing kirjoitti: >>> On 2014/10/12, 8:29 AM, Pauli Virtanen wrote: >>>> 12.10.2014, 20:19, Mads Ipsen kirjoitti: >>>>> Is there any way for me to detect (on the Python side) that transpose() >>>>> has been invoked on the matrix, and thereby only do the copy operation >>>>> when it really is needed? >>>> >>>> The correct way to do this is to, either: >>>> >>>> In your C code check PyArray_IS_C_CONTIGUOUS(obj) and raise an error if >>>> it is not. In addition, on the Python side, check for >>>> `a.flags.c_contiguous` and make a copy if it is not. >>>> >>>> OR >>>> >>>> In your C code, get an handle to the array using PyArray_FromANY (or >>>> PyArray_FromOTF) with NPY_ARRAY_C_CONTIGUOUS requirement set so that it >>>> makes a copy when necessary. >>> >>> or let numpy handle it on the python side: >>> >>> foo(numpy.ascontiguousarray(a)) >> >> Yes, but the C code really should check that the input array is >> C-contiguous, if it only works for C-contiguous inputs. > > I.e. your original instructions were correct, but instead of checking > a.flags.c_contiguous by hand etc. the OP should just call > ascontiguousarray which takes care of that part. > Hi, To everybody that answered - your help is (as always) much appreciated. Best regards, Mads -- +---------------------------------------------------------+ | Mads Ipsen | +----------------------+----------------------------------+ | G?seb?ksvej 7, 4. tv | phone: +45-29716388 | | DK-2500 Valby | email: mads.ipsen at gmail.com | | Denmark | map : www.tinyurl.com/ns52fpa | +----------------------+----------------------------------+ From sebastian at sipsolutions.net Mon Oct 13 14:54:28 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 13 Oct 2014 20:54:28 +0200 Subject: [Numpy-discussion] Best way to expose std::vector to be used with numpy In-Reply-To: <543BB8F6.1070103@grinta.net> References: <543BB8F6.1070103@grinta.net> Message-ID: <1413226468.7048.2.camel@sebastian-t440> On Mo, 2014-10-13 at 13:35 +0200, Daniele Nicolodi wrote: > Hello, > > I have a C++ application that collects float, int or complex data in a > possibly quite large std::vector. The application has some SWIG > generated python wrappers that expose this vector to python. However, > the standard way in which SWIG exposes the data is to create a touple > and pass this to python, where it is very often converted to a numpy > array for processing. Of course this is not efficient. > > I would like therefore to generate a smarter python interface. In python > 3 I would without doubt go for implementing the buffer protocol, which > enables seamless interfacing with numpy. However, I need to support also > python 2 and there the buffer protocol is not as nice. Isn't the new buffer protocol in python 2.6 or 2.7 already? There is at least a memoryview object in python 2, which maybe could be used to the same effect? - Sebastian > > What is the best thing to do to expose the data buffer to python 2, > without introducing a dependency on numpy in the application? > > Bonus points if the proposed solution comes with a pointer on how to > implement in in SWIG (which I don't know much and feel a bit too magic > for my taste). > > Thank you! > > Cheers, > Daniele > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From jtaylor.debian at googlemail.com Mon Oct 13 16:10:50 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Mon, 13 Oct 2014 22:10:50 +0200 Subject: [Numpy-discussion] numpy 1.9.1 rc1 probably next weekend Message-ID: <543C31CA.2080008@googlemail.com> hi, we have collected a couple bugfixes in the 1.9 branch that should get released soon. The most important bugs fixes will be: - fix matplotlibs build (#5067) - avoid triggering a compiler bug on redhat 5, gcc 4.1.2 (#5163) - fix compatibility of npy files between py2 and py3 on win32/64 (#5170) There is still a bug in nanmedian if you have infinities in the array (#5138) which I hope will be fixed soon. After that I'd like to cut a 1.9.1rc1 with final release the week after. If there is something missing in the 1.9.x branch please report or fix it soon. Cheers, Julian From rays at blue-cove.com Mon Oct 13 18:08:01 2014 From: rays at blue-cove.com (RayS) Date: Mon, 13 Oct 2014 15:08:01 -0700 Subject: [Numpy-discussion] non-linear rebin function recipe? Message-ID: <201410132208.s9DM85S7002598@blue-cove.com> Most of my work has used the Fourrier based method for "linear" rebin of evenly sampled time data of length m (say 1500) to a new number of samples n (say 2048); the delta time change per sample is a constant over the array. I'd like to test the effect a non-constant delta t, ie, stretching some regions of data while leaving others static, and transitioning in a smooth way. - Ray Schumacher From jtaylor.debian at googlemail.com Mon Oct 13 18:32:49 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 14 Oct 2014 00:32:49 +0200 Subject: [Numpy-discussion] Any interest in a generalized piecewise function? In-Reply-To: <8114F0AADAECD745AF1FC2047A5DC7ED1E54F4BB@HBU-POST2.ffi.no> References: <8114F0AADAECD745AF1FC2047A5DC7ED1E54E27E@HBU-POST2.ffi.no> <54384208.1050005@googlemail.com> <8114F0AADAECD745AF1FC2047A5DC7ED1E54F4BB@HBU-POST2.ffi.no> Message-ID: <543C5311.2030304@googlemail.com> it probably makes a bit more sense to have the input an optional argument but I don't think its worth it to add a new function for more or less cosmetic reasons. You can still support scalars by ignoring the first argument. Fill value should be fill_value and the default 0 as thats what the current piecewise function returns. On 13.10.2014 09:56, Per.Brodtkorb at ffi.no wrote: > Ok, I will open a pull request. But before I do so, I would like to know what kind of pull request to make. > Ideally I think the call signature for piecewise should be like this: > > def piecewise(condlist, funclist, xi=None, fillvalue=numpy.nan, args=(), **kw): > > or this: > > def piecewise(condlist, funclist, xi=None, args=(), **kw): > > The reason why I think so is that if funclist is a list of scalars then xi is not needed as input and logically should be placed as an optional third argument to the function and not as the first as numpy.piecewise currently does. > > Any of those two call signatures will break with the current one in numpy.piecewise. > So is this new call signature desirable enough that we want to break backwards compatibility? > > Or should I just keep the current callsignature: > > def piecewise(xi, condlist, funclist, *args, **kw): > > in the pull request? > > Per A. > > -----Original Message----- > From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Julian Taylor > Sent: 10. oktober 2014 22:31 > To: Discussion of Numerical Python > Subject: Re: [Numpy-discussion] Any interest in a generalized piecewise function? > > On 10.10.2014 11:34, Per.Brodtkorb at ffi.no wrote: >> I have worked on a generalized piecewise function (genpiecewise) that >> are simpler and more general than the current numpy.piecewise >> implementation. The new generalized piecewise function allows >> functions of the type f(x0, x1,.. , xn) i.e. to have arbitrary number >> of input arguments that are evaluated conditionally. >> >> The generalized piecewise function passes all the tests for >> numpy.piecewise function except the undocumented features of >> numpy.piecewise which allows condlist to be a single bool list/array >> or a single int array. >> > > Hi, > One would think you could already pass two arguments to a function by using a 2d array but I couldn't get that to work with some short testing. > So this looks like a useful improvement over the current piecewise to me. > > Do you want open a pull request on github to discuss the details? > > It would be good if it can replace the current piecewise as having two functions which do very similar things is not so nice. > > Cheers, > Julian > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Mon Oct 13 22:39:50 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 13 Oct 2014 20:39:50 -0600 Subject: [Numpy-discussion] Best way to expose std::vector to be used with numpy In-Reply-To: <1413226468.7048.2.camel@sebastian-t440> References: <543BB8F6.1070103@grinta.net> <1413226468.7048.2.camel@sebastian-t440> Message-ID: On Mon, Oct 13, 2014 at 12:54 PM, Sebastian Berg wrote: > On Mo, 2014-10-13 at 13:35 +0200, Daniele Nicolodi wrote: > > Hello, > > > > I have a C++ application that collects float, int or complex data in a > > possibly quite large std::vector. The application has some SWIG > > generated python wrappers that expose this vector to python. However, > > the standard way in which SWIG exposes the data is to create a touple > > and pass this to python, where it is very often converted to a numpy > > array for processing. Of course this is not efficient. > > > > I would like therefore to generate a smarter python interface. In python > > 3 I would without doubt go for implementing the buffer protocol, which > > enables seamless interfacing with numpy. However, I need to support also > > python 2 and there the buffer protocol is not as nice. > > Isn't the new buffer protocol in python 2.6 or 2.7 already? There is at > least a memoryview object in python 2, which maybe could be used to the > same effect? > No memoryview in python2.6, but the older buffer protocol it there. Is Boost Python an option? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From insertinterestingnamehere at gmail.com Tue Oct 14 02:19:44 2014 From: insertinterestingnamehere at gmail.com (Ian Henriksen) Date: Tue, 14 Oct 2014 00:19:44 -0600 Subject: [Numpy-discussion] Best way to expose std::vector to be used with numpy In-Reply-To: References: <543BB8F6.1070103@grinta.net> <1413226468.7048.2.camel@sebastian-t440> Message-ID: On Mon, Oct 13, 2014 at 8:39 PM, Charles R Harris wrote: > > > On Mon, Oct 13, 2014 at 12:54 PM, Sebastian Berg < > sebastian at sipsolutions.net> wrote: > >> On Mo, 2014-10-13 at 13:35 +0200, Daniele Nicolodi wrote: >> > Hello, >> > >> > I have a C++ application that collects float, int or complex data in a >> > possibly quite large std::vector. The application has some SWIG >> > generated python wrappers that expose this vector to python. However, >> > the standard way in which SWIG exposes the data is to create a touple >> > and pass this to python, where it is very often converted to a numpy >> > array for processing. Of course this is not efficient. >> > >> > I would like therefore to generate a smarter python interface. In python >> > 3 I would without doubt go for implementing the buffer protocol, which >> > enables seamless interfacing with numpy. However, I need to support also >> > python 2 and there the buffer protocol is not as nice. >> >> Isn't the new buffer protocol in python 2.6 or 2.7 already? There is at >> least a memoryview object in python 2, which maybe could be used to the >> same effect? >> > > No memoryview in python2.6, but the older buffer protocol it there. Is > Boost Python an option? > > > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > Here's an idea on how to avoid the buffer protocol entirely. It won't be perfectly optimal, but it is a simple solution that may be good enough. It could be that the shape and type inference and the Python loops that are run when converting the large tuple to a NumPy array are the main sticking points. In that case, the primary bottleneck could be eliminated by looping and copying at the C level. You could use Cython and just copy the needed information manually into the array. Here's a rough outline of what the Cython file might look like: # distutils: language = c++ # distutils: libraries = (whatever shared library you need) # distutils: library_dirs = (folders where the libraries are) # distutils: include_dirs = (wherever the headers are) # cython: boundscheck = False # cython: wraparound = False import numpy as np from libcpp.vector cimport vector # Here do the call to your C++ library # This can be done by declaring the # functions or whatever else you need in # a cdef extern from "header file" block # and then calling the functions to get # the vector you need. cdef int[::1] arr_from_vector(vector[int] v): cdef: int i int[::1] a = np.empty(v.size(), int) for i in xrange(v.size()): a[i] = v[i] return np.asarray(a) def get_data(): # Run your C++ computation here. return arr_from_vector(v) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at seefeld.name Tue Oct 14 03:02:14 2014 From: stefan at seefeld.name (Stefan Seefeld) Date: Tue, 14 Oct 2014 03:02:14 -0400 Subject: [Numpy-discussion] Best way to expose std::vector to be used with numpy In-Reply-To: References: <543BB8F6.1070103@grinta.net> <1413226468.7048.2.camel@sebastian-t440> Message-ID: On Oct 14, 2014 4:40 AM, "Charles R Harris" wrote: > > > > On Mon, Oct 13, 2014 at 12:54 PM, Sebastian Berg < sebastian at sipsolutions.net> wrote: >> >> On Mo, 2014-10-13 at 13:35 +0200, Daniele Nicolodi wrote: >> > Hello, >> > >> > I have a C++ application that collects float, int or complex data in a >> > possibly quite large std::vector. The application has some SWIG >> > generated python wrappers that expose this vector to python. However, >> > the standard way in which SWIG exposes the data is to create a touple >> > and pass this to python, where it is very often converted to a numpy >> > array for processing. Of course this is not efficient. >> > >> > I would like therefore to generate a smarter python interface. In python >> > 3 I would without doubt go for implementing the buffer protocol, which >> > enables seamless interfacing with numpy. However, I need to support also >> > python 2 and there the buffer protocol is not as nice. >> >> Isn't the new buffer protocol in python 2.6 or 2.7 already? There is at >> least a memoryview object in python 2, which maybe could be used to the >> same effect? > > > No memoryview in python2.6, but the older buffer protocol it there. Is Boost Python an option? > Boost.Numpy may be a useful tool to use here. Stefan > > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniele at grinta.net Tue Oct 14 06:19:03 2014 From: daniele at grinta.net (Daniele Nicolodi) Date: Tue, 14 Oct 2014 12:19:03 +0200 Subject: [Numpy-discussion] Best way to expose std::vector to be used with numpy In-Reply-To: References: <543BB8F6.1070103@grinta.net> <1413226468.7048.2.camel@sebastian-t440> Message-ID: <543CF897.4040209@grinta.net> On 14/10/14 04:39, Charles R Harris wrote: > On Mon, Oct 13, 2014 at 12:54 PM, Sebastian Berg > > wrote: > > On Mo, 2014-10-13 at 13:35 +0200, Daniele Nicolodi wrote: > > Hello, > > > > I have a C++ application that collects float, int or complex data in a > > possibly quite large std::vector. The application has some SWIG > > generated python wrappers that expose this vector to python. However, > > the standard way in which SWIG exposes the data is to create a touple > > and pass this to python, where it is very often converted to a numpy > > array for processing. Of course this is not efficient. > > > > I would like therefore to generate a smarter python interface. In python > > 3 I would without doubt go for implementing the buffer protocol, which > > enables seamless interfacing with numpy. However, I need to support also > > python 2 and there the buffer protocol is not as nice. > > Isn't the new buffer protocol in python 2.6 or 2.7 already? There is at > least a memoryview object in python 2, which maybe could be used to the > same effect? > > No memoryview in python2.6, but the older buffer protocol it there. Is > Boost Python an option? The old buffer protocol is an option, but it is much less nice than the new one, as it requires to use numpy.frombuffer() with an exlicit dtype instead of the siumpler numpy.asarray() sufficient in python 3. Boost Python may be an option as the codebase already depends on Boost, but probably not yet on Boost Python. Can you point me to the relevant documentation, and maybe to an example? One of the problems I have is that the current wrapping is done auto-magically with SWIG and I would like to deviate the less possible from that patter. Thank you! Cheers, Daniele From fadzilmnor84 at gmail.com Tue Oct 14 06:50:34 2014 From: fadzilmnor84 at gmail.com (Fadzil Mnor) Date: Tue, 14 Oct 2014 11:50:34 +0100 Subject: [Numpy-discussion] numpy.mean slicing in a netCDF file Message-ID: Hi all, I wrote a script and plot monthly mean zonal wind (from a netcdf file names uwnd.mon.mean.nc) and I'm not sure if I'm doing it correctly. What I have: ******************************************************************************** *#this part calculates mean values for january only, from 1980-2010; thus the index looks like this 396:757:12* def ujan(): f = nc.Dataset('~/data/ncep/uwnd.mon.mean.nc') u10_1 = f.variables['uwnd'] u10_2 = np.mean(u10_1[396:757:12,:,38,39:43],axis=0) return u10_2 uJan = ujan()* #calling function* *#this part is only to define lon, lat and level * q = nc.Dataset('~/data/ncep/uwnd.mon.mean.nc') lon=q.variables['lon'] lat=q.variables['lat'] lev=q.variables['level'] *#for some reason I need to define this unless it gave error "length of x must be number of column in z"* lon=lon[39:43] *#begin plotting* clevs=np.arange(-10.,10.,0.5) fig = plt.figure(figsize=(11, 8)) fig.clf() ax = fig.add_subplot(111) ax.axis([97.5, 105., 1000., 10.]) ax.tick_params(direction='out', which='both') ax.set_xlabel('Lon (degrees)') ax.set_ylabel('Pressure (mb)') ax.set_xticks(np.arange(97.5, 105., .5)) ax.set_yticks([1000, 700, 500, 300, 100, 10]) cs=ax.contourf(lon, lev, uJan, clevs, extend='both',cmap='seismic') plt.title('Zonal winds average (Jan, 1981-2010)') cax = fig.add_axes([0.99, 0.1, 0.03, 0.8]) aa=fig.colorbar(cs,cax=cax,orientation='vertical') aa.set_label('m/s') plt.savefig('~/uwind-crossection-test.png', bbox_inches='tight') ******************************************************************************* the result is attached. I have no idea how to confirm the result (at least until this email is written) , but I believe the lower altitude values should be mostly negative, because over this region, the zonal wind are usually easterly (thus,negative values), but I got positive values. Put the information above aside, *I just want to know if my slicing in the ujan() function is correct*. If it is, then, there must be nothing wrong(except my above mentioned assumption). The data file dimension is: *[time,level,latitude,longitude]* This part: *u10_2 = np.mean(u10_1[396:757:12,:,38,39:43],axis=0)* The line above will calculate the mean of zonal wind (uwnd) in a range of time index 396 to 757 for each year (january only), for all vertical level, at latitude index 38 (5 N) and in between longitude index 39 to 43 (97.5E-105E). I assume it will calculate a 30-year average of zonal wind for january only. Is this correct? Thank you. Fadzil -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: uwind-crossection-test.png Type: image/png Size: 42464 bytes Desc: not available URL: From njs at pobox.com Tue Oct 14 07:39:22 2014 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 14 Oct 2014 12:39:22 +0100 Subject: [Numpy-discussion] Best way to expose std::vector to be used with numpy In-Reply-To: <543CF897.4040209@grinta.net> References: <543BB8F6.1070103@grinta.net> <1413226468.7048.2.camel@sebastian-t440> <543CF897.4040209@grinta.net> Message-ID: If the goal is to have something that works kind of like the new buffer protocol but with a wider variety of python versions, then you might find the old array interface useful: http://docs.scipy.org/doc/numpy/reference/arrays.interface.html I always get confused by the history here but I believe that that's the numpy-only interface that later got cleaned up and generalized to become the new buffer interface. Numpy itself still supports it. -n On 14 Oct 2014 11:19, "Daniele Nicolodi" wrote: > On 14/10/14 04:39, Charles R Harris wrote: > > On Mon, Oct 13, 2014 at 12:54 PM, Sebastian Berg > > > wrote: > > > > On Mo, 2014-10-13 at 13:35 +0200, Daniele Nicolodi wrote: > > > Hello, > > > > > > I have a C++ application that collects float, int or complex data > in a > > > possibly quite large std::vector. The application has some SWIG > > > generated python wrappers that expose this vector to python. > However, > > > the standard way in which SWIG exposes the data is to create a > touple > > > and pass this to python, where it is very often converted to a > numpy > > > array for processing. Of course this is not efficient. > > > > > > I would like therefore to generate a smarter python interface. In > python > > > 3 I would without doubt go for implementing the buffer protocol, > which > > > enables seamless interfacing with numpy. However, I need to > support also > > > python 2 and there the buffer protocol is not as nice. > > > > Isn't the new buffer protocol in python 2.6 or 2.7 already? There is > at > > least a memoryview object in python 2, which maybe could be used to > the > > same effect? > > > > No memoryview in python2.6, but the older buffer protocol it there. Is > > Boost Python an option? > > The old buffer protocol is an option, but it is much less nice than the > new one, as it requires to use numpy.frombuffer() with an exlicit dtype > instead of the siumpler numpy.asarray() sufficient in python 3. > > Boost Python may be an option as the codebase already depends on Boost, > but probably not yet on Boost Python. Can you point me to the relevant > documentation, and maybe to an example? One of the problems I have is > that the current wrapping is done auto-magically with SWIG and I would > like to deviate the less possible from that patter. > > Thank you! > > Cheers, > Daniele > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniele at grinta.net Tue Oct 14 08:11:25 2014 From: daniele at grinta.net (Daniele Nicolodi) Date: Tue, 14 Oct 2014 14:11:25 +0200 Subject: [Numpy-discussion] Best way to expose std::vector to be used with numpy In-Reply-To: References: <543BB8F6.1070103@grinta.net> <1413226468.7048.2.camel@sebastian-t440> <543CF897.4040209@grinta.net> Message-ID: <543D12ED.20904@grinta.net> On 14/10/14 13:39, Nathaniel Smith wrote: > If the goal is to have something that works kind of like the new buffer > protocol but with a wider variety of python versions, then you might > find the old array interface useful: > http://docs.scipy.org/doc/numpy/reference/arrays.interface.html > > I always get confused by the history here but I believe that that's the > numpy-only interface that later got cleaned up and generalized to become > the new buffer interface. Numpy itself still supports it. Hello Nathaniel, thanks for the pointer, that's what I need. However, reading the documentation you linked, it looks like the new buffer protocol is also available in Python 2.6, and I don't have the need to support older versions, and this is confirmed by the python documentation: https://docs.python.org/2.7/c-api/buffer.html So the issue is just on how to tell SWIG to implement this interface in the wrappers it generates. Does anyone have a pointer to some relevant documentation or examples? Thanks! Cheers, Daniele From jzwinck at gmail.com Tue Oct 14 08:25:06 2014 From: jzwinck at gmail.com (John Zwinck) Date: Tue, 14 Oct 2014 20:25:06 +0800 Subject: [Numpy-discussion] Best way to expose std::vector to be used with numpy In-Reply-To: <543CF897.4040209@grinta.net> References: <543BB8F6.1070103@grinta.net> <1413226468.7048.2.camel@sebastian-t440> <543CF897.4040209@grinta.net> Message-ID: On Tue, Oct 14, 2014 at 6:19 PM, Daniele Nicolodi wrote: >> On Mo, 2014-10-13 at 13:35 +0200, Daniele Nicolodi wrote: >> > I have a C++ application that collects float, int or complex data in a >> > possibly quite large std::vector. The application has some SWIG >> > generated python wrappers that expose this vector to python. However, >> > the standard way in which SWIG exposes the data is to create a touple >> > and pass this to python, where it is very often converted to a numpy >> > array for processing. Of course this is not efficient. > > Boost Python may be an option as the codebase already depends on Boost, > but probably not yet on Boost Python. Can you point me to the relevant > documentation, and maybe to an example? One of the problems I have is > that the current wrapping is done auto-magically with SWIG and I would > like to deviate the less possible from that patter. Some time ago I needed to do something similar. I fused the NumPy C API and Boost.Python with a small bit of code which I then open-sourced as part of a slightly larger library. The most relevant part for you is here: https://github.com/jzwinck/pccl/blob/master/NumPyArray.hpp In particular, it offers this function: // use already-allocated storage for the array // (it will not be initialized, and its format must match the given dtype) boost::python::object makeNumPyArrayWithData( boost::python::list const& dtype, unsigned count, void* data); What is dtype? For that you can use another small widget in my library: https://github.com/jzwinck/pccl/blob/master/NumPyDataType.hpp So the outline for your use case: you have a C array or C++ vector of contiguous plain old data. You create a NumPyDataType(), call append() on it with your data type, then pass all of the above to makeNumPyArrayWithData(). The result is a boost::python::object which you can return from C++ to Python with zero copies made of your data. Even if you don't decide to use Boost.Python, maybe you will find the implementation instructive. The functions I described take only a few lines. John Zwinck From daniele at grinta.net Tue Oct 14 10:17:49 2014 From: daniele at grinta.net (Daniele Nicolodi) Date: Tue, 14 Oct 2014 16:17:49 +0200 Subject: [Numpy-discussion] Best way to expose std::vector to be used with numpy In-Reply-To: <543D12ED.20904@grinta.net> References: <543BB8F6.1070103@grinta.net> <1413226468.7048.2.camel@sebastian-t440> <543CF897.4040209@grinta.net> <543D12ED.20904@grinta.net> Message-ID: <543D308D.8030601@grinta.net> On 14/10/14 14:11, Daniele Nicolodi wrote: > On 14/10/14 13:39, Nathaniel Smith wrote: >> If the goal is to have something that works kind of like the new buffer >> protocol but with a wider variety of python versions, then you might >> find the old array interface useful: >> http://docs.scipy.org/doc/numpy/reference/arrays.interface.html >> >> I always get confused by the history here but I believe that that's the >> numpy-only interface that later got cleaned up and generalized to become >> the new buffer interface. Numpy itself still supports it. > > Hello Nathaniel, > > thanks for the pointer, that's what I need. > > However, reading the documentation you linked, it looks like the new > buffer protocol is also available in Python 2.6, and I don't have the > need to support older versions, and this is confirmed by the python > documentation: > > https://docs.python.org/2.7/c-api/buffer.html I found one more problem: one of the data types I would like to expose is a complex float array equivalent to numpy.complex64 dtype. However, there is not such type format string defined in the python struct module, Experimenting with numpy it seems like it extends the struct format string definition with a new modifier 'Z' for complex: >>> a = np.ones(10, dtype=np.complex64) >>> m = memoryview(a) >>> m.format 'Zf' How "standard" is that? Should I do the same in my extension? Thank you. Cheers, Daniele From mail at tsmithe.net Tue Oct 14 11:59:45 2014 From: mail at tsmithe.net (Toby St Clere Smithe) Date: Tue, 14 Oct 2014 17:59:45 +0200 Subject: [Numpy-discussion] Best way to expose std::vector to be used with numpy References: <543BB8F6.1070103@grinta.net> <1413226468.7048.2.camel@sebastian-t440> <543CF897.4040209@grinta.net> Message-ID: <87iojmfxqm.fsf@tsmithe.net> John Zwinck writes: > Some time ago I needed to do something similar. I fused the NumPy C > API and Boost.Python with a small bit of code which I then > open-sourced as part of a slightly larger library. The most relevant > part for you is here: > https://github.com/jzwinck/pccl/blob/master/NumPyArray.hpp There is also a 'Boost.NumPy', which is quite nice, though perhaps a bit heavy-duty: https://github.com/ndarray/Boost.NumPy/ Cheers, Toby From charlesr.harris at gmail.com Tue Oct 14 12:46:52 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 14 Oct 2014 10:46:52 -0600 Subject: [Numpy-discussion] Changed behavior of np.gradient In-Reply-To: References: <19A7BCC1-8608-47A8-9FB1-0FFFF567D573@astro.physik.uni-goettingen.de> <6711F56D-CFBE-4C55-AC55-F92E7EC6CB12@astro.physik.uni-goettingen.de> Message-ID: On Sat, Oct 4, 2014 at 3:16 PM, St?fan van der Walt wrote: > On Oct 4, 2014 10:14 PM, "Derek Homeier" < > derek at astro.physik.uni-goettingen.de> wrote: > > > > +1 for an order=2 or maxorder=2 flag > > If you parameterize that flag, users will want to change its value (above > two). Perhaps rather use a boolean flag such as "second_order" or > "high_order", unless it seems feasible to include additional orders in the > future. > How about 'matlab={True, False}'. There is an open issue for this and it would be good to decide before 1.9.1 comes out. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Oct 14 12:57:01 2014 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 14 Oct 2014 17:57:01 +0100 Subject: [Numpy-discussion] Changed behavior of np.gradient In-Reply-To: References: <19A7BCC1-8608-47A8-9FB1-0FFFF567D573@astro.physik.uni-goettingen.de> <6711F56D-CFBE-4C55-AC55-F92E7EC6CB12@astro.physik.uni-goettingen.de> Message-ID: On 4 Oct 2014 22:17, "St?fan van der Walt" wrote: > > On Oct 4, 2014 10:14 PM, "Derek Homeier" < derek at astro.physik.uni-goettingen.de> wrote: > > > > +1 for an order=2 or maxorder=2 flag > > If you parameterize that flag, users will want to change its value (above two). Perhaps rather use a boolean flag such as "second_order" or "high_order", unless it seems feasible to include additional orders in the future. Predicting the future is hard :-). And in particular high_order= would create all kinds of confusion if in the future we added 3rd order approximations but high_order=True continued to mean 2nd order because of compatibility. I like maxorder (or max_order would be more pep8ish I guess) because it leaves our options open. (Similar to how it's often better to have a kwarg that can take two possible string values than to have a boolean kwarg. It makes current code more explicit and makes future enhancements easier.) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Oct 14 13:28:49 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 14 Oct 2014 11:28:49 -0600 Subject: [Numpy-discussion] Changed behavior of np.gradient In-Reply-To: References: <19A7BCC1-8608-47A8-9FB1-0FFFF567D573@astro.physik.uni-goettingen.de> <6711F56D-CFBE-4C55-AC55-F92E7EC6CB12@astro.physik.uni-goettingen.de> Message-ID: On Tue, Oct 14, 2014 at 10:57 AM, Nathaniel Smith wrote: > On 4 Oct 2014 22:17, "St?fan van der Walt" wrote: > > > > On Oct 4, 2014 10:14 PM, "Derek Homeier" < > derek at astro.physik.uni-goettingen.de> wrote: > > > > > > +1 for an order=2 or maxorder=2 flag > > > > If you parameterize that flag, users will want to change its value > (above two). Perhaps rather use a boolean flag such as "second_order" or > "high_order", unless it seems feasible to include additional orders in the > future. > > Predicting the future is hard :-). And in particular high_order= would > create all kinds of confusion if in the future we added 3rd order > approximations but high_order=True continued to mean 2nd order because of > compatibility. I like maxorder (or max_order would be more pep8ish I guess) > because it leaves our options open. (Similar to how it's often better to > have a kwarg that can take two possible string values than to have a > boolean kwarg. It makes current code more explicit and makes future > enhancements easier.) > I think maxorder is a bit misleading. The both versions are second order in the interior while at the ends the old is first order and the new is second order. Maybe edge_order? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Oct 14 13:50:24 2014 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 14 Oct 2014 18:50:24 +0100 Subject: [Numpy-discussion] Changed behavior of np.gradient In-Reply-To: References: <19A7BCC1-8608-47A8-9FB1-0FFFF567D573@astro.physik.uni-goettingen.de> <6711F56D-CFBE-4C55-AC55-F92E7EC6CB12@astro.physik.uni-goettingen.de> Message-ID: On 14 Oct 2014 18:29, "Charles R Harris" wrote: > > > > On Tue, Oct 14, 2014 at 10:57 AM, Nathaniel Smith wrote: >> >> On 4 Oct 2014 22:17, "St?fan van der Walt" wrote: >> > >> > On Oct 4, 2014 10:14 PM, "Derek Homeier" < derek at astro.physik.uni-goettingen.de> wrote: >> > > >> > > +1 for an order=2 or maxorder=2 flag >> > >> > If you parameterize that flag, users will want to change its value (above two). Perhaps rather use a boolean flag such as "second_order" or "high_order", unless it seems feasible to include additional orders in the future. >> >> Predicting the future is hard :-). And in particular high_order= would create all kinds of confusion if in the future we added 3rd order approximations but high_order=True continued to mean 2nd order because of compatibility. I like maxorder (or max_order would be more pep8ish I guess) because it leaves our options open. (Similar to how it's often better to have a kwarg that can take two possible string values than to have a boolean kwarg. It makes current code more explicit and makes future enhancements easier.) > > > I think maxorder is a bit misleading. The both versions are second order in the interior while at the ends the old is first order and the new is second order. Maybe edge_order? Ah, that makes sense. edge_order makes more sense to me too then - and we can always add interior_order to complement it later, if appropriate. The other thing to decide on is the default. Is the 2nd order version generally preferred (modulo compatibility)? If so then it might make sense to keep it the default, given that there are already numpy's in the wild with that version, so we can't fully guarantee compatibility even if we wanted to. But what do others think? -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Tue Oct 14 14:09:47 2014 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 14 Oct 2014 11:09:47 -0700 Subject: [Numpy-discussion] numpy.mean slicing in a netCDF file In-Reply-To: References: Message-ID: Hi Fadzil, My strong recommendation is that you don't just use numpy/netCDF4 to process your data, but rather use one of a multitude of packages that have been developed specifically to facilitate working with labeled data from netCDF files: - Iris: http://scitools.org.uk/iris/ - CDAT: http://uvcdat.llnl.gov/ - xray (my project): http://xray.readthedocs.org I can't answer your specific question without taking a careful look at your data, but in very general terms, your code will have fewer bugs if you can use meaningful labels to refer to your data rather than numeric ranges like 396:757:12. Best, Stephan On Tue, Oct 14, 2014 at 3:50 AM, Fadzil Mnor wrote: > Hi all, > I wrote a script and plot monthly mean zonal wind (from a netcdf file > names uwnd.mon.mean.nc) and I'm not sure if I'm doing it correctly. What > I have: > > > ******************************************************************************** > *#this part calculates mean values for january only, from 1980-2010; thus > the index looks like this 396:757:12* > > def ujan(): > f = nc.Dataset('~/data/ncep/uwnd.mon.mean.nc') > u10_1 = f.variables['uwnd'] > u10_2 = np.mean(u10_1[396:757:12,:,38,39:43],axis=0) > return u10_2 > > uJan = ujan()* #calling function* > > *#this part is only to define lon, lat and level * > q = nc.Dataset('~/data/ncep/uwnd.mon.mean.nc') > lon=q.variables['lon'] > lat=q.variables['lat'] > lev=q.variables['level'] > > *#for some reason I need to define this unless it gave error "length of x > must be number of column in z"* > > lon=lon[39:43] > > *#begin plotting* > > clevs=np.arange(-10.,10.,0.5) > fig = plt.figure(figsize=(11, 8)) > fig.clf() > ax = fig.add_subplot(111) > ax.axis([97.5, 105., 1000., 10.]) > ax.tick_params(direction='out', which='both') > ax.set_xlabel('Lon (degrees)') > ax.set_ylabel('Pressure (mb)') > ax.set_xticks(np.arange(97.5, 105., .5)) > ax.set_yticks([1000, 700, 500, 300, 100, 10]) > cs=ax.contourf(lon, lev, uJan, clevs, extend='both',cmap='seismic') > plt.title('Zonal winds average (Jan, 1981-2010)') > cax = fig.add_axes([0.99, 0.1, 0.03, 0.8]) > aa=fig.colorbar(cs,cax=cax,orientation='vertical') > aa.set_label('m/s') > plt.savefig('~/uwind-crossection-test.png', bbox_inches='tight') > > ******************************************************************************* > > the result is attached. > I have no idea how to confirm the result (at least until this email is > written) , but I believe the lower altitude values should be mostly > negative, because over this region, the zonal wind are usually easterly > (thus,negative values), but I got positive values. > > Put the information above aside, *I just want to know if my slicing in > the ujan() function is correct*. If it is, then, there must be nothing > wrong(except my above mentioned assumption). > The data file dimension is: > *[time,level,latitude,longitude]* > > This part: > *u10_2 = np.mean(u10_1[396:757:12,:,38,39:43],axis=0)* > The line above will calculate the mean of zonal wind (uwnd) in a range of > time index 396 to 757 for each year (january only), for all vertical level, > at latitude index 38 (5 N) and in between longitude index 39 to 43 > (97.5E-105E). > I assume it will calculate a 30-year average of zonal wind for january > only. > Is this correct? > > Thank you. > > Fadzil > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fadzilmnor84 at gmail.com Tue Oct 14 14:30:14 2014 From: fadzilmnor84 at gmail.com (Fadzil Mnor) Date: Tue, 14 Oct 2014 19:30:14 +0100 Subject: [Numpy-discussion] numpy.mean slicing in a netCDF file In-Reply-To: References: Message-ID: Thank you Stephan, I've been trying to install IRIS on my laptop (OS X) for months. Errors everywhere. I'll look at that IRIS again, and other links. Cheers, Fadzil On Tue, Oct 14, 2014 at 7:09 PM, Stephan Hoyer wrote: > Hi Fadzil, > > My strong recommendation is that you don't just use numpy/netCDF4 to > process your data, but rather use one of a multitude of packages that have > been developed specifically to facilitate working with labeled data from > netCDF files: > - Iris: http://scitools.org.uk/iris/ > - CDAT: http://uvcdat.llnl.gov/ > - xray (my project): http://xray.readthedocs.org > > I can't answer your specific question without taking a careful look at > your data, but in very general terms, your code will have fewer bugs if you > can use meaningful labels to refer to your data rather than numeric ranges > like 396:757:12. > > Best, > Stephan > > > On Tue, Oct 14, 2014 at 3:50 AM, Fadzil Mnor > wrote: > >> Hi all, >> I wrote a script and plot monthly mean zonal wind (from a netcdf file >> names uwnd.mon.mean.nc) and I'm not sure if I'm doing it correctly. What >> I have: >> >> >> ******************************************************************************** >> *#this part calculates mean values for january only, from 1980-2010; thus >> the index looks like this 396:757:12* >> >> def ujan(): >> f = nc.Dataset('~/data/ncep/uwnd.mon.mean.nc') >> u10_1 = f.variables['uwnd'] >> u10_2 = np.mean(u10_1[396:757:12,:,38,39:43],axis=0) >> return u10_2 >> >> uJan = ujan()* #calling function* >> >> *#this part is only to define lon, lat and level * >> q = nc.Dataset('~/data/ncep/uwnd.mon.mean.nc') >> lon=q.variables['lon'] >> lat=q.variables['lat'] >> lev=q.variables['level'] >> >> *#for some reason I need to define this unless it gave error "length of x >> must be number of column in z"* >> >> lon=lon[39:43] >> >> *#begin plotting* >> >> clevs=np.arange(-10.,10.,0.5) >> fig = plt.figure(figsize=(11, 8)) >> fig.clf() >> ax = fig.add_subplot(111) >> ax.axis([97.5, 105., 1000., 10.]) >> ax.tick_params(direction='out', which='both') >> ax.set_xlabel('Lon (degrees)') >> ax.set_ylabel('Pressure (mb)') >> ax.set_xticks(np.arange(97.5, 105., .5)) >> ax.set_yticks([1000, 700, 500, 300, 100, 10]) >> cs=ax.contourf(lon, lev, uJan, clevs, extend='both',cmap='seismic') >> plt.title('Zonal winds average (Jan, 1981-2010)') >> cax = fig.add_axes([0.99, 0.1, 0.03, 0.8]) >> aa=fig.colorbar(cs,cax=cax,orientation='vertical') >> aa.set_label('m/s') >> plt.savefig('~/uwind-crossection-test.png', bbox_inches='tight') >> >> ******************************************************************************* >> >> the result is attached. >> I have no idea how to confirm the result (at least until this email is >> written) , but I believe the lower altitude values should be mostly >> negative, because over this region, the zonal wind are usually easterly >> (thus,negative values), but I got positive values. >> >> Put the information above aside, *I just want to know if my slicing in >> the ujan() function is correct*. If it is, then, there must be nothing >> wrong(except my above mentioned assumption). >> The data file dimension is: >> *[time,level,latitude,longitude]* >> >> This part: >> *u10_2 = np.mean(u10_1[396:757:12,:,38,39:43],axis=0)* >> The line above will calculate the mean of zonal wind (uwnd) in a range of >> time index 396 to 757 for each year (january only), for all vertical level, >> at latitude index 38 (5 N) and in between longitude index 39 to 43 >> (97.5E-105E). >> I assume it will calculate a 30-year average of zonal wind for january >> only. >> Is this correct? >> >> Thank you. >> >> Fadzil >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Tue Oct 14 15:36:48 2014 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 14 Oct 2014 15:36:48 -0400 Subject: [Numpy-discussion] array indexing question Message-ID: I'm using np.nonzero to construct the tuple: (array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2]), array([1, 3, 5, 7, 2, 3, 6, 7, 4, 5, 6, 7])) Now what I want is the 2-D index array: [1,3,5,7, 2,3,6,7, 4,5,6,7] Any ideas? -- -- Those who don't understand recursion are doomed to repeat it From njs at pobox.com Tue Oct 14 15:44:00 2014 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 14 Oct 2014 20:44:00 +0100 Subject: [Numpy-discussion] array indexing question In-Reply-To: References: Message-ID: For this to work at all you have to know a priori that there are the same number of non-zero entries in each row of your mask. Given that you know that, isn't it just a matter of calling reshape on the second array? On 14 Oct 2014 20:37, "Neal Becker" wrote: > I'm using np.nonzero to construct the tuple: > (array([0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 2]), array([1, 3, 5, 7, 2, 3, 6, > 7, 4, > 5, 6, 7])) > > Now what I want is the 2-D index array: > > [1,3,5,7, > 2,3,6,7, > 4,5,6,7] > > Any ideas? > > -- > -- Those who don't understand recursion are doomed to repeat it > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Oct 14 17:33:14 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 14 Oct 2014 15:33:14 -0600 Subject: [Numpy-discussion] Changed behavior of np.gradient In-Reply-To: References: <19A7BCC1-8608-47A8-9FB1-0FFFF567D573@astro.physik.uni-goettingen.de> <6711F56D-CFBE-4C55-AC55-F92E7EC6CB12@astro.physik.uni-goettingen.de> Message-ID: On Tue, Oct 14, 2014 at 11:50 AM, Nathaniel Smith wrote: > On 14 Oct 2014 18:29, "Charles R Harris" > wrote: > > > > > > > > On Tue, Oct 14, 2014 at 10:57 AM, Nathaniel Smith wrote: > >> > >> On 4 Oct 2014 22:17, "St?fan van der Walt" wrote: > >> > > >> > On Oct 4, 2014 10:14 PM, "Derek Homeier" < > derek at astro.physik.uni-goettingen.de> wrote: > >> > > > >> > > +1 for an order=2 or maxorder=2 flag > >> > > >> > If you parameterize that flag, users will want to change its value > (above two). Perhaps rather use a boolean flag such as "second_order" or > "high_order", unless it seems feasible to include additional orders in the > future. > >> > >> Predicting the future is hard :-). And in particular high_order= would > create all kinds of confusion if in the future we added 3rd order > approximations but high_order=True continued to mean 2nd order because of > compatibility. I like maxorder (or max_order would be more pep8ish I guess) > because it leaves our options open. (Similar to how it's often better to > have a kwarg that can take two possible string values than to have a > boolean kwarg. It makes current code more explicit and makes future > enhancements easier.) > > > > > > I think maxorder is a bit misleading. The both versions are second order > in the interior while at the ends the old is first order and the new is > second order. Maybe edge_order? > > Ah, that makes sense. edge_order makes more sense to me too then - and we > can always add interior_order to complement it later, if appropriate. > > The other thing to decide on is the default. Is the 2nd order version > generally preferred (modulo compatibility)? If so then it might make sense > to keep it the default, given that there are already numpy's in the wild > with that version, so we can't fully guarantee compatibility even if we > wanted to. But what do others think? > I'd be inclined to keep the older as the default and regard adding the keyword as a bugfix. I should have caught the incompatibility in review. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From nvitacolonna at gmail.com Wed Oct 15 10:21:54 2014 From: nvitacolonna at gmail.com (Nicola) Date: Wed, 15 Oct 2014 16:21:54 +0200 Subject: [Numpy-discussion] Endianness not detected in OS X Lion Message-ID: Hi, I am trying to install numpy 1.9.0 on OS X Lion 10.7.5 and I get the same error as reported in this old thread: http://thread.gmane.org/gmane.comp.python.numeric.general/47711/ that is, numpy/core/src/npymath/npy_math_private.h:78:3: error: conflicting types for 'ieee_double_shape_type' } ieee_double_shape_type; ^ numpy/core/src/npymath/npy_math_private.h:64:3: note: previous declaration of 'ieee_double_shape_type' was here } ieee_double_shape_type; ^ In file included from numpy/core/src/npymath/npy_math.c.src:56:0: numpy/core/src/npymath/npy_math_private.h:78:3: error: conflicting types for 'ieee_double_shape_type' } ieee_double_shape_type; ^ numpy/core/src/npymath/npy_math_private.h:64:3: note: previous declaration of 'ieee_double_shape_type' was here } ieee_double_shape_type; ^ no matter whether I use the system's clang or gcc 4.9 (installed with Homebrew). Is there a workaround for this? Nicola From jtaylor.debian at googlemail.com Wed Oct 15 13:18:46 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 15 Oct 2014 19:18:46 +0200 Subject: [Numpy-discussion] Endianness not detected in OS X Lion In-Reply-To: References: Message-ID: <543EAC76.7070104@googlemail.com> On 15.10.2014 16:21, Nicola wrote: > Hi, > I am trying to install numpy 1.9.0 on OS X Lion 10.7.5 and I get the > same error as reported in this old thread: > > http://thread.gmane.org/gmane.comp.python.numeric.general/47711/ > ... > ^ > > no matter whether I use the system's clang or gcc 4.9 (installed with > Homebrew). Is there a workaround for this? > hi, you probably have an importable endian.h in your path (e.g. /usr/include) normally thats not the case on mac The issue should be fixed in current git master, can you give it a try? might be worthwhile to backport the fix to 1.9.1 Cheers, Julian From nvitacolonna at gmail.com Wed Oct 15 14:36:17 2014 From: nvitacolonna at gmail.com (Nicola) Date: Wed, 15 Oct 2014 20:36:17 +0200 Subject: [Numpy-discussion] Endianness not detected in OS X Lion References: <543EAC76.7070104@googlemail.com> Message-ID: In article <543EAC76.7070104 at googlemail.com>, Julian Taylor wrote: > On 15.10.2014 16:21, Nicola wrote: > > Hi, > > I am trying to install numpy 1.9.0 on OS X Lion 10.7.5 and I get the > > same error as reported in this old thread: > > > > http://thread.gmane.org/gmane.comp.python.numeric.general/47711/ > > > ... > > ^ > > > > no matter whether I use the system's clang or gcc 4.9 (installed with > > Homebrew). Is there a workaround for this? > > > > hi, > you probably have an importable endian.h in your path (e.g. > /usr/include) That is the case. And endian.h does not seem to define __BYTE_ORDER, __LITTLE_ENDIAN and __BIG_ENDIAN, but BYTE_ORDER (__DARWIN_BYTE_ORDER), LITTLE_ENDIAN (__DARWIN_LITTLE_ENDIAN) and BIG_ENDIAN (__DARWIN_BIG_ENDIAN). I was able to get a successful build by applying this patch: diff --git a/numpy/core/include/numpy/npy_endian.h b/numpy/core/include/numpy/npy_endian.h index 3ba03d0..6684a89 100644 --- a/numpy/core/include/numpy/npy_endian.h +++ b/numpy/core/include/numpy/npy_endian.h @@ -10,9 +10,9 @@ /* Use endian.h if available */ #include - #define NPY_BYTE_ORDER __BYTE_ORDER - #define NPY_LITTLE_ENDIAN __LITTLE_ENDIAN - #define NPY_BIG_ENDIAN __BIG_ENDIAN + #define NPY_BYTE_ORDER __DARWIN_BYTE_ORDER + #define NPY_LITTLE_ENDIAN __DARWIN_LITTLE_ENDIAN + #define NPY_BIG_ENDIAN __DARWIN_BIG_ENDIAN #else /* Set endianness info using target CPU */ #include "npy_cpu.h" > The issue should be fixed in current git master, can you give it a try? > might be worthwhile to backport the fix to 1.9.1 I confirm that the current master builds just fine. Thanks! Nicola From chris.barker at noaa.gov Wed Oct 15 15:44:28 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 15 Oct 2014 12:44:28 -0700 Subject: [Numpy-discussion] numpy.mean slicing in a netCDF file In-Reply-To: References: Message-ID: On Tue, Oct 14, 2014 at 11:30 AM, Fadzil Mnor wrote: > I've been trying to install IRIS on my laptop (OS X) for months. Errors > everywhere. > I'll look at that IRIS again, and other links. > IRIS has been an install challeng,e but gotten better. And you ay even find a conda package for it if you use Anaconda -- I put one up during Scipy -- they may be a newer build, though. -Chris > Cheers, > > > Fadzil > > On Tue, Oct 14, 2014 at 7:09 PM, Stephan Hoyer wrote: > >> Hi Fadzil, >> >> My strong recommendation is that you don't just use numpy/netCDF4 to >> process your data, but rather use one of a multitude of packages that have >> been developed specifically to facilitate working with labeled data from >> netCDF files: >> - Iris: http://scitools.org.uk/iris/ >> - CDAT: http://uvcdat.llnl.gov/ >> - xray (my project): http://xray.readthedocs.org >> >> I can't answer your specific question without taking a careful look at >> your data, but in very general terms, your code will have fewer bugs if you >> can use meaningful labels to refer to your data rather than numeric ranges >> like 396:757:12. >> >> Best, >> Stephan >> >> >> On Tue, Oct 14, 2014 at 3:50 AM, Fadzil Mnor >> wrote: >> >>> Hi all, >>> I wrote a script and plot monthly mean zonal wind (from a netcdf file >>> names uwnd.mon.mean.nc) and I'm not sure if I'm doing it correctly. >>> What I have: >>> >>> >>> ******************************************************************************** >>> *#this part calculates mean values for january only, from 1980-2010; >>> thus the index looks like this 396:757:12* >>> >>> def ujan(): >>> f = nc.Dataset('~/data/ncep/uwnd.mon.mean.nc') >>> u10_1 = f.variables['uwnd'] >>> u10_2 = np.mean(u10_1[396:757:12,:,38,39:43],axis=0) >>> return u10_2 >>> >>> uJan = ujan()* #calling function* >>> >>> *#this part is only to define lon, lat and level * >>> q = nc.Dataset('~/data/ncep/uwnd.mon.mean.nc') >>> lon=q.variables['lon'] >>> lat=q.variables['lat'] >>> lev=q.variables['level'] >>> >>> *#for some reason I need to define this unless it gave error "length of >>> x must be number of column in z"* >>> >>> lon=lon[39:43] >>> >>> *#begin plotting* >>> >>> clevs=np.arange(-10.,10.,0.5) >>> fig = plt.figure(figsize=(11, 8)) >>> fig.clf() >>> ax = fig.add_subplot(111) >>> ax.axis([97.5, 105., 1000., 10.]) >>> ax.tick_params(direction='out', which='both') >>> ax.set_xlabel('Lon (degrees)') >>> ax.set_ylabel('Pressure (mb)') >>> ax.set_xticks(np.arange(97.5, 105., .5)) >>> ax.set_yticks([1000, 700, 500, 300, 100, 10]) >>> cs=ax.contourf(lon, lev, uJan, clevs, extend='both',cmap='seismic') >>> plt.title('Zonal winds average (Jan, 1981-2010)') >>> cax = fig.add_axes([0.99, 0.1, 0.03, 0.8]) >>> aa=fig.colorbar(cs,cax=cax,orientation='vertical') >>> aa.set_label('m/s') >>> plt.savefig('~/uwind-crossection-test.png', bbox_inches='tight') >>> >>> ******************************************************************************* >>> >>> the result is attached. >>> I have no idea how to confirm the result (at least until this email is >>> written) , but I believe the lower altitude values should be mostly >>> negative, because over this region, the zonal wind are usually easterly >>> (thus,negative values), but I got positive values. >>> >>> Put the information above aside, *I just want to know if my slicing in >>> the ujan() function is correct*. If it is, then, there must be nothing >>> wrong(except my above mentioned assumption). >>> The data file dimension is: >>> *[time,level,latitude,longitude]* >>> >>> This part: >>> *u10_2 = np.mean(u10_1[396:757:12,:,38,39:43],axis=0)* >>> The line above will calculate the mean of zonal wind (uwnd) in a range >>> of time index 396 to 757 for each year (january only), for all vertical >>> level, at latitude index 38 (5 N) and in between longitude index 39 to 43 >>> (97.5E-105E). >>> I assume it will calculate a 30-year average of zonal wind for january >>> only. >>> Is this correct? >>> >>> Thank you. >>> >>> Fadzil >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Oct 15 15:48:48 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 15 Oct 2014 12:48:48 -0700 Subject: [Numpy-discussion] Best way to expose std::vector to be used with numpy In-Reply-To: <87iojmfxqm.fsf@tsmithe.net> References: <543BB8F6.1070103@grinta.net> <1413226468.7048.2.camel@sebastian-t440> <543CF897.4040209@grinta.net> <87iojmfxqm.fsf@tsmithe.net> Message-ID: Sorry about SWIG -- maybe a chance to move on ;-) I'd go with Cython -- this is pretty straightforward, and it handles the buffer protocol for you under the hood. And with XDress, you can get numpy wrapped std::vector out of the box, I think: https://s3.amazonaws.com/xdress/index.html if you REALLY want to stick with SWIG, take a look a the SWIG numpy interface files -- they are designed for old-fashioned C, but you could probably adapt them. -Chris On Tue, Oct 14, 2014 at 8:59 AM, Toby St Clere Smithe wrote: > John Zwinck writes: > > Some time ago I needed to do something similar. I fused the NumPy C > > API and Boost.Python with a small bit of code which I then > > open-sourced as part of a slightly larger library. The most relevant > > part for you is here: > > https://github.com/jzwinck/pccl/blob/master/NumPyArray.hpp > > There is also a 'Boost.NumPy', which is quite nice, though perhaps a bit > heavy-duty: > > https://github.com/ndarray/Boost.NumPy/ > > Cheers, > > > Toby > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From sole at esrf.fr Wed Oct 15 16:09:45 2014 From: sole at esrf.fr (V. Armando Sole) Date: Wed, 15 Oct 2014 22:09:45 +0200 Subject: [Numpy-discussion] Best way to expose std::vector to be used with numpy In-Reply-To: References: <543BB8F6.1070103@grinta.net> <1413226468.7048.2.camel@sebastian-t440> <543CF897.4040209@grinta.net> <87iojmfxqm.fsf@tsmithe.net> Message-ID: <6c9b09436300a1a1dee9ac16d6a3a377@esrf.fr> On 15.10.2014 21:48, Chris Barker wrote: > Sorry about SWIG -- maybe a chance to move on ;-) > > I'd go with Cython -- this is pretty straightforward, and it handles > the buffer protocol for you under the hood. > +1 All the standard containers are automatically wrapped and C++ exceptions can be caught and translated to Python ones. Armando From ben.root at ou.edu Wed Oct 15 17:04:34 2014 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 15 Oct 2014 17:04:34 -0400 Subject: [Numpy-discussion] numpy.mean slicing in a netCDF file In-Reply-To: References: Message-ID: Stephen is being a bit modest by putting xray last in the list. I recommend it, and it is very painless to install. I could only get iris installed via a SciTools repo on binstar and even then, I had to tinker with a few things to get it working (and it was only the linux binaries, too). Ben Root On Wed, Oct 15, 2014 at 3:44 PM, Chris Barker wrote: > On Tue, Oct 14, 2014 at 11:30 AM, Fadzil Mnor > wrote: > >> I've been trying to install IRIS on my laptop (OS X) for months. Errors >> everywhere. >> I'll look at that IRIS again, and other links. >> > > IRIS has been an install challeng,e but gotten better. > > And you ay even find a conda package for it if you use Anaconda -- I put > one up during Scipy -- they may be a newer build, though. > > -Chris > > > > >> Cheers, >> >> >> Fadzil >> >> On Tue, Oct 14, 2014 at 7:09 PM, Stephan Hoyer wrote: >> >>> Hi Fadzil, >>> >>> My strong recommendation is that you don't just use numpy/netCDF4 to >>> process your data, but rather use one of a multitude of packages that have >>> been developed specifically to facilitate working with labeled data from >>> netCDF files: >>> - Iris: http://scitools.org.uk/iris/ >>> - CDAT: http://uvcdat.llnl.gov/ >>> - xray (my project): http://xray.readthedocs.org >>> >>> I can't answer your specific question without taking a careful look at >>> your data, but in very general terms, your code will have fewer bugs if you >>> can use meaningful labels to refer to your data rather than numeric ranges >>> like 396:757:12. >>> >>> Best, >>> Stephan >>> >>> >>> On Tue, Oct 14, 2014 at 3:50 AM, Fadzil Mnor >>> wrote: >>> >>>> Hi all, >>>> I wrote a script and plot monthly mean zonal wind (from a netcdf file >>>> names uwnd.mon.mean.nc) and I'm not sure if I'm doing it correctly. >>>> What I have: >>>> >>>> >>>> ******************************************************************************** >>>> *#this part calculates mean values for january only, from 1980-2010; >>>> thus the index looks like this 396:757:12* >>>> >>>> def ujan(): >>>> f = nc.Dataset('~/data/ncep/uwnd.mon.mean.nc') >>>> u10_1 = f.variables['uwnd'] >>>> u10_2 = np.mean(u10_1[396:757:12,:,38,39:43],axis=0) >>>> return u10_2 >>>> >>>> uJan = ujan()* #calling function* >>>> >>>> *#this part is only to define lon, lat and level * >>>> q = nc.Dataset('~/data/ncep/uwnd.mon.mean.nc') >>>> lon=q.variables['lon'] >>>> lat=q.variables['lat'] >>>> lev=q.variables['level'] >>>> >>>> *#for some reason I need to define this unless it gave error "length of >>>> x must be number of column in z"* >>>> >>>> lon=lon[39:43] >>>> >>>> *#begin plotting* >>>> >>>> clevs=np.arange(-10.,10.,0.5) >>>> fig = plt.figure(figsize=(11, 8)) >>>> fig.clf() >>>> ax = fig.add_subplot(111) >>>> ax.axis([97.5, 105., 1000., 10.]) >>>> ax.tick_params(direction='out', which='both') >>>> ax.set_xlabel('Lon (degrees)') >>>> ax.set_ylabel('Pressure (mb)') >>>> ax.set_xticks(np.arange(97.5, 105., .5)) >>>> ax.set_yticks([1000, 700, 500, 300, 100, 10]) >>>> cs=ax.contourf(lon, lev, uJan, clevs, extend='both',cmap='seismic') >>>> plt.title('Zonal winds average (Jan, 1981-2010)') >>>> cax = fig.add_axes([0.99, 0.1, 0.03, 0.8]) >>>> aa=fig.colorbar(cs,cax=cax,orientation='vertical') >>>> aa.set_label('m/s') >>>> plt.savefig('~/uwind-crossection-test.png', bbox_inches='tight') >>>> >>>> ******************************************************************************* >>>> >>>> the result is attached. >>>> I have no idea how to confirm the result (at least until this email is >>>> written) , but I believe the lower altitude values should be mostly >>>> negative, because over this region, the zonal wind are usually easterly >>>> (thus,negative values), but I got positive values. >>>> >>>> Put the information above aside, *I just want to know if my slicing in >>>> the ujan() function is correct*. If it is, then, there must be nothing >>>> wrong(except my above mentioned assumption). >>>> The data file dimension is: >>>> *[time,level,latitude,longitude]* >>>> >>>> This part: >>>> *u10_2 = np.mean(u10_1[396:757:12,:,38,39:43],axis=0)* >>>> The line above will calculate the mean of zonal wind (uwnd) in a range >>>> of time index 396 to 757 for each year (january only), for all vertical >>>> level, at latitude index 38 (5 N) and in between longitude index 39 to 43 >>>> (97.5E-105E). >>>> I assume it will calculate a 30-year average of zonal wind for january >>>> only. >>>> Is this correct? >>>> >>>> Thank you. >>>> >>>> Fadzil >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From fadzilmnor84 at gmail.com Wed Oct 15 17:20:58 2014 From: fadzilmnor84 at gmail.com (Fadzil Mnor) Date: Wed, 15 Oct 2014 22:20:58 +0100 Subject: [Numpy-discussion] numpy.mean slicing in a netCDF file In-Reply-To: References: Message-ID: Thanks for confirming that I'm not the only one having trouble with IRIS installation. Such a pain! Back to the first question, I figured that the NCEP Reanalysis data has the y axis "from 90N to 90S", means the indexing started from north (90), not south (-90), which means that my calculation was on 5 S instead of 5 N. Fadzil Postgraduate Student Room 1U09 - Dept of Meteorology University of Reading, Earley Gate Reading RG6 6BB, UK On Wed, Oct 15, 2014 at 10:04 PM, Benjamin Root wrote: > Stephen is being a bit modest by putting xray last in the list. I > recommend it, and it is very painless to install. I could only get iris > installed via a SciTools repo on binstar and even then, I had to tinker > with a few things to get it working (and it was only the linux binaries, > too). > > Ben Root > > On Wed, Oct 15, 2014 at 3:44 PM, Chris Barker > wrote: > >> On Tue, Oct 14, 2014 at 11:30 AM, Fadzil Mnor >> wrote: >> >>> I've been trying to install IRIS on my laptop (OS X) for months. Errors >>> everywhere. >>> I'll look at that IRIS again, and other links. >>> >> >> IRIS has been an install challeng,e but gotten better. >> >> And you ay even find a conda package for it if you use Anaconda -- I put >> one up during Scipy -- they may be a newer build, though. >> >> -Chris >> >> >> >> >>> Cheers, >>> >>> >>> Fadzil >>> >>> On Tue, Oct 14, 2014 at 7:09 PM, Stephan Hoyer wrote: >>> >>>> Hi Fadzil, >>>> >>>> My strong recommendation is that you don't just use numpy/netCDF4 to >>>> process your data, but rather use one of a multitude of packages that have >>>> been developed specifically to facilitate working with labeled data from >>>> netCDF files: >>>> - Iris: http://scitools.org.uk/iris/ >>>> - CDAT: http://uvcdat.llnl.gov/ >>>> - xray (my project): http://xray.readthedocs.org >>>> >>>> I can't answer your specific question without taking a careful look at >>>> your data, but in very general terms, your code will have fewer bugs if you >>>> can use meaningful labels to refer to your data rather than numeric ranges >>>> like 396:757:12. >>>> >>>> Best, >>>> Stephan >>>> >>>> >>>> On Tue, Oct 14, 2014 at 3:50 AM, Fadzil Mnor >>>> wrote: >>>> >>>>> Hi all, >>>>> I wrote a script and plot monthly mean zonal wind (from a netcdf file >>>>> names uwnd.mon.mean.nc) and I'm not sure if I'm doing it correctly. >>>>> What I have: >>>>> >>>>> >>>>> ******************************************************************************** >>>>> *#this part calculates mean values for january only, from 1980-2010; >>>>> thus the index looks like this 396:757:12* >>>>> >>>>> def ujan(): >>>>> f = nc.Dataset('~/data/ncep/uwnd.mon.mean.nc') >>>>> u10_1 = f.variables['uwnd'] >>>>> u10_2 = np.mean(u10_1[396:757:12,:,38,39:43],axis=0) >>>>> return u10_2 >>>>> >>>>> uJan = ujan()* #calling function* >>>>> >>>>> *#this part is only to define lon, lat and level * >>>>> q = nc.Dataset('~/data/ncep/uwnd.mon.mean.nc') >>>>> lon=q.variables['lon'] >>>>> lat=q.variables['lat'] >>>>> lev=q.variables['level'] >>>>> >>>>> *#for some reason I need to define this unless it gave error "length >>>>> of x must be number of column in z"* >>>>> >>>>> lon=lon[39:43] >>>>> >>>>> *#begin plotting* >>>>> >>>>> clevs=np.arange(-10.,10.,0.5) >>>>> fig = plt.figure(figsize=(11, 8)) >>>>> fig.clf() >>>>> ax = fig.add_subplot(111) >>>>> ax.axis([97.5, 105., 1000., 10.]) >>>>> ax.tick_params(direction='out', which='both') >>>>> ax.set_xlabel('Lon (degrees)') >>>>> ax.set_ylabel('Pressure (mb)') >>>>> ax.set_xticks(np.arange(97.5, 105., .5)) >>>>> ax.set_yticks([1000, 700, 500, 300, 100, 10]) >>>>> cs=ax.contourf(lon, lev, uJan, clevs, extend='both',cmap='seismic') >>>>> plt.title('Zonal winds average (Jan, 1981-2010)') >>>>> cax = fig.add_axes([0.99, 0.1, 0.03, 0.8]) >>>>> aa=fig.colorbar(cs,cax=cax,orientation='vertical') >>>>> aa.set_label('m/s') >>>>> plt.savefig('~/uwind-crossection-test.png', bbox_inches='tight') >>>>> >>>>> ******************************************************************************* >>>>> >>>>> the result is attached. >>>>> I have no idea how to confirm the result (at least until this email is >>>>> written) , but I believe the lower altitude values should be mostly >>>>> negative, because over this region, the zonal wind are usually easterly >>>>> (thus,negative values), but I got positive values. >>>>> >>>>> Put the information above aside, *I just want to know if my slicing >>>>> in the ujan() function is correct*. If it is, then, there must be >>>>> nothing wrong(except my above mentioned assumption). >>>>> The data file dimension is: >>>>> *[time,level,latitude,longitude]* >>>>> >>>>> This part: >>>>> *u10_2 = np.mean(u10_1[396:757:12,:,38,39:43],axis=0)* >>>>> The line above will calculate the mean of zonal wind (uwnd) in a range >>>>> of time index 396 to 757 for each year (january only), for all vertical >>>>> level, at latitude index 38 (5 N) and in between longitude index 39 to 43 >>>>> (97.5E-105E). >>>>> I assume it will calculate a 30-year average of zonal wind for january >>>>> only. >>>>> Is this correct? >>>>> >>>>> Thank you. >>>>> >>>>> Fadzil >>>>> >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>> >>>>> >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> >> -- >> >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R (206) 526-6959 voice >> 7600 Sand Point Way NE (206) 526-6329 fax >> Seattle, WA 98115 (206) 526-6317 main reception >> >> Chris.Barker at noaa.gov >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Thu Oct 16 11:39:12 2014 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Thu, 16 Oct 2014 11:39:12 -0400 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: References: <543AA8E9.9020001@sebix.at> Message-ID: On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith wrote: > On Sun, Oct 12, 2014 at 5:14 PM, Sebastian wrote: > > > > On 2014-10-12 16:54, Warren Weckesser wrote: > >> > >> > >> On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern >> > wrote: > >> > >> On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser > >> > > >> wrote: > >> > >> > A small wart in this API is the meaning of > >> > > >> > shuffle(a, independent=False, axis=None) > >> > > >> > It could be argued that the correct behavior is to leave the > >> > array unchanged. (The current behavior can be interpreted as > >> > shuffling a 1-d sequence of monolithic blobs; the axis argument > >> > specifies which axis of the array corresponds to the > >> > sequence index. Then `axis=None` means the argument is > >> > a single monolithic blob, so there is nothing to shuffle.) > >> > Or an error could be raised. > >> > > >> > What do you think? > >> > >> It seems to me a perfectly good reason to have two methods instead > of > >> one. I can't imagine when I wouldn't be using a literal True or > False > >> for this, so it really should be two different methods. > >> > >> > >> > >> I agree, and my first inclination was to propose a different method > >> (and I had the bikeshedding conversation with myself about the name: > >> "disarrange", "scramble", "disorder", "randomize", "ashuffle", some > >> other variation of the word "shuffle", ...), but I figured the first > >> thing folks would say is "Why not just add options to shuffle?" So, > >> choose your battles and all that. > >> > >> What do other folks think of making a separate method > > I'm not a fan of more methods with similar functionality in Numpy. It's > > already hard to overlook the existing functions and all their possible > > applications and variants. The axis=None proposal for shuffling all > > items is very intuitive. > > > > I think we don't want to take the path of matlab: a huge amount of > > powerful functions, but few people know of their powerful possibilities. > > I totally agree with this principle, but I think this is an exception > to the rule, b/c unfortunately in this case the function that we *do* > have is weird and inconsistent with how most other functions in numpy > work. It doesn't vectorize! Cf. 'sort' or how a 'shuffle' gufunc > (k,)->(k,) would work. Also, it's easy to implement the current > 'shuffle' in terms of any 1d shuffle function, with no explicit loops, > Warren's disarrange requires an explicit loop. So, we really > implemented the wrong one, oops. What this means going forward, > though, is that our only options are either to implement both > behaviours with two functions, or else to give up on have the more > natural behaviour altogether. I think the former is the lesser of two > evils. > > Regarding names: shuffle/permutation is a terrible naming convention > IMHO and shouldn't be propagated further. We already have a good > naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. > reversed, etc. > > So, how about: > > scramble + scrambled shuffle individual entries within each > row/column/..., as in Warren's suggestion. > > shuffle + shuffled to do what shuffle, permutation do now (mnemonic: > these break a 2d array into a bunch of 1d "cards", and then shuffle > those cards). > > permuted remains indefinitely, with the docstring: "Deprecated alias > for 'shuffled'." > > That sounds good to me. (I might go with 'randomize' instead of 'scramble', but that's a second-order decision for the API.) Warren -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Thu Oct 16 11:58:53 2014 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Thu, 16 Oct 2014 08:58:53 -0700 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: References: <543AA8E9.9020001@sebix.at> Message-ID: On Thu, Oct 16, 2014 at 8:39 AM, Warren Weckesser < warren.weckesser at gmail.com> wrote: > > > On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith wrote: > >> On Sun, Oct 12, 2014 at 5:14 PM, Sebastian wrote: >> > >> > On 2014-10-12 16:54, Warren Weckesser wrote: >> >> >> >> >> >> On Sun, Oct 12, 2014 at 7:57 AM, Robert Kern > >> > wrote: >> >> >> >> On Sat, Oct 11, 2014 at 11:51 PM, Warren Weckesser >> >> > >> >> wrote: >> >> >> >> > A small wart in this API is the meaning of >> >> > >> >> > shuffle(a, independent=False, axis=None) >> >> > >> >> > It could be argued that the correct behavior is to leave the >> >> > array unchanged. (The current behavior can be interpreted as >> >> > shuffling a 1-d sequence of monolithic blobs; the axis argument >> >> > specifies which axis of the array corresponds to the >> >> > sequence index. Then `axis=None` means the argument is >> >> > a single monolithic blob, so there is nothing to shuffle.) >> >> > Or an error could be raised. >> >> > >> >> > What do you think? >> >> >> >> It seems to me a perfectly good reason to have two methods instead >> of >> >> one. I can't imagine when I wouldn't be using a literal True or >> False >> >> for this, so it really should be two different methods. >> >> >> >> >> >> >> >> I agree, and my first inclination was to propose a different method >> >> (and I had the bikeshedding conversation with myself about the name: >> >> "disarrange", "scramble", "disorder", "randomize", "ashuffle", some >> >> other variation of the word "shuffle", ...), but I figured the first >> >> thing folks would say is "Why not just add options to shuffle?" So, >> >> choose your battles and all that. >> >> >> >> What do other folks think of making a separate method >> > I'm not a fan of more methods with similar functionality in Numpy. It's >> > already hard to overlook the existing functions and all their possible >> > applications and variants. The axis=None proposal for shuffling all >> > items is very intuitive. >> > >> > I think we don't want to take the path of matlab: a huge amount of >> > powerful functions, but few people know of their powerful possibilities. >> >> I totally agree with this principle, but I think this is an exception >> to the rule, b/c unfortunately in this case the function that we *do* >> have is weird and inconsistent with how most other functions in numpy >> work. It doesn't vectorize! Cf. 'sort' or how a 'shuffle' gufunc >> (k,)->(k,) would work. Also, it's easy to implement the current >> 'shuffle' in terms of any 1d shuffle function, with no explicit loops, >> Warren's disarrange requires an explicit loop. So, we really >> implemented the wrong one, oops. What this means going forward, >> though, is that our only options are either to implement both >> behaviours with two functions, or else to give up on have the more >> natural behaviour altogether. I think the former is the lesser of two >> evils. >> >> Regarding names: shuffle/permutation is a terrible naming convention >> IMHO and shouldn't be propagated further. We already have a good >> naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. >> reversed, etc. >> >> So, how about: >> >> scramble + scrambled shuffle individual entries within each >> row/column/..., as in Warren's suggestion. >> >> shuffle + shuffled to do what shuffle, permutation do now (mnemonic: >> these break a 2d array into a bunch of 1d "cards", and then shuffle >> those cards). >> >> permuted remains indefinitely, with the docstring: "Deprecated alias >> for 'shuffled'." >> >> > > That sounds good to me. (I might go with 'randomize' instead of > 'scramble', but that's a second-order decision for the API.) > So the only little detail left is someone actually rolling up his/her sleeves and creating a PR... ;-) The current shuffle and permutation are implemented here: https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/mtrand.pyx#L4551 It's in Cython, so it is a good candidate for anyone wanting to contribute to numpy, but wary of C code. Jaime > > > Warren > > > -n >> >> -- >> Nathaniel J. Smith >> Postdoctoral researcher - Informatics - University of Edinburgh >> http://vorpus.org >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Oct 16 12:40:27 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 16 Oct 2014 17:40:27 +0100 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: References: <543AA8E9.9020001@sebix.at> Message-ID: On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser wrote: > > On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith wrote: >> >> Regarding names: shuffle/permutation is a terrible naming convention >> IMHO and shouldn't be propagated further. We already have a good >> naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. >> reversed, etc. >> >> So, how about: >> >> scramble + scrambled shuffle individual entries within each >> row/column/..., as in Warren's suggestion. >> >> shuffle + shuffled to do what shuffle, permutation do now (mnemonic: >> these break a 2d array into a bunch of 1d "cards", and then shuffle >> those cards). >> >> permuted remains indefinitely, with the docstring: "Deprecated alias >> for 'shuffled'." > > That sounds good to me. (I might go with 'randomize' instead of 'scramble', > but that's a second-order decision for the API.) I hesitate to use names like "randomize" because they're less informative than they feel seem -- if asked what this operation does to an array, then it would be natural to say "it randomizes the array". But if told that the random module has a function called randomize, then that's not very informative -- everything in random randomizes something somehow. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From njs at pobox.com Thu Oct 16 13:22:26 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 16 Oct 2014 18:22:26 +0100 Subject: [Numpy-discussion] Changed behavior of np.gradient In-Reply-To: References: <19A7BCC1-8608-47A8-9FB1-0FFFF567D573@astro.physik.uni-goettingen.de> <6711F56D-CFBE-4C55-AC55-F92E7EC6CB12@astro.physik.uni-goettingen.de> Message-ID: On Tue, Oct 14, 2014 at 10:33 PM, Charles R Harris wrote: > > On Tue, Oct 14, 2014 at 11:50 AM, Nathaniel Smith wrote: >> >> On 14 Oct 2014 18:29, "Charles R Harris" >> wrote: >> > >> > >> > >> > On Tue, Oct 14, 2014 at 10:57 AM, Nathaniel Smith wrote: >> >> >> >> On 4 Oct 2014 22:17, "St?fan van der Walt" wrote: >> >> > >> >> > On Oct 4, 2014 10:14 PM, "Derek Homeier" >> >> > wrote: >> >> > > >> >> > > +1 for an order=2 or maxorder=2 flag >> >> > >> >> > If you parameterize that flag, users will want to change its value >> >> > (above two). Perhaps rather use a boolean flag such as "second_order" or >> >> > "high_order", unless it seems feasible to include additional orders in the >> >> > future. >> >> >> >> Predicting the future is hard :-). And in particular high_order= would >> >> create all kinds of confusion if in the future we added 3rd order >> >> approximations but high_order=True continued to mean 2nd order because of >> >> compatibility. I like maxorder (or max_order would be more pep8ish I guess) >> >> because it leaves our options open. (Similar to how it's often better to >> >> have a kwarg that can take two possible string values than to have a boolean >> >> kwarg. It makes current code more explicit and makes future enhancements >> >> easier.) >> > >> > >> > I think maxorder is a bit misleading. The both versions are second order >> > in the interior while at the ends the old is first order and the new is >> > second order. Maybe edge_order? >> >> Ah, that makes sense. edge_order makes more sense to me too then - and we >> can always add interior_order to complement it later, if appropriate. >> >> The other thing to decide on is the default. Is the 2nd order version >> generally preferred (modulo compatibility)? If so then it might make sense >> to keep it the default, given that there are already numpy's in the wild >> with that version, so we can't fully guarantee compatibility even if we >> wanted to. But what do others think? > > I'd be inclined to keep the older as the default and regard adding the > keyword as a bugfix. I should have caught the incompatibility in review. I don't have any code that uses gradient, so I don't have a great sense of the trade-offs here. - Usually if we have a change that produces increased accuracy, we make the increased accuracy the default. Otherwise no-one ever uses it, and everyone gets less accurate results than they would otherwise. (I don't have a great sense of how much this change affects accuracy though.) - If the change in output per se is a serious problem for people, then it's not one we can fix at this point -- 1.9.0 is out there and people are using it anyway, so those who have the problem already need to take some affirmative action to fix it. (Also, it's kinda weird to change a function's behaviour and add a new argument in a point release!) So I'd like to hear from people affected by this -- would you prefer to have the 2nd order boundary calculations by default, you just need some way to workaround the immediate problems in existing code? Or do you prefer the old default remain the default, with 2nd order boundary calculations something that must be requested by hand every time? -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From warren.weckesser at gmail.com Thu Oct 16 13:30:59 2014 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Thu, 16 Oct 2014 13:30:59 -0400 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: References: <543AA8E9.9020001@sebix.at> Message-ID: On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith wrote: > On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser > wrote: > > > > On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith wrote: > >> > >> Regarding names: shuffle/permutation is a terrible naming convention > >> IMHO and shouldn't be propagated further. We already have a good > >> naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. > >> reversed, etc. > >> > >> So, how about: > >> > >> scramble + scrambled shuffle individual entries within each > >> row/column/..., as in Warren's suggestion. > >> > >> shuffle + shuffled to do what shuffle, permutation do now (mnemonic: > >> these break a 2d array into a bunch of 1d "cards", and then shuffle > >> those cards). > >> > >> permuted remains indefinitely, with the docstring: "Deprecated alias > >> for 'shuffled'." > > > > That sounds good to me. (I might go with 'randomize' instead of > 'scramble', > > but that's a second-order decision for the API.) > > I hesitate to use names like "randomize" because they're less > informative than they feel seem -- if asked what this operation does > to an array, then it would be natural to say "it randomizes the > array". But if told that the random module has a function called > randomize, then that's not very informative -- everything in random > randomizes something somehow. > > I had some similar concerns (hence my original "disarrange"), but "randomize" seemed more likely to be found when searching or browsing the docs, and while it might be a bit too generic-sounding, it does feel like a natural verb for the process. On the other hand, "permute" and "permuted" are even more natural and unambiguous. Any objections to those? (The existing function is "permutation".) Whatever the names, the docstrings for the four functions should be cross-referenced in their "See Also" sections to help users find the appropriate function. By the way, "permutation" has a feature not yet mentioned here: if the argument is an integer 'n', it generates a permutation of arange(n). In this case, it acts like matlab's "randperm" function. Unless we replicate that in the new function, we shouldn't deprecate "permutation". Warren > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Oct 16 15:39:38 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 16 Oct 2014 20:39:38 +0100 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: References: <543AA8E9.9020001@sebix.at> Message-ID: On Thu, Oct 16, 2014 at 6:30 PM, Warren Weckesser wrote: > > > On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith wrote: >> >> On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser >> wrote: >> > >> > On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith wrote: >> >> >> >> Regarding names: shuffle/permutation is a terrible naming convention >> >> IMHO and shouldn't be propagated further. We already have a good >> >> naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. >> >> reversed, etc. >> >> >> >> So, how about: >> >> >> >> scramble + scrambled shuffle individual entries within each >> >> row/column/..., as in Warren's suggestion. >> >> >> >> shuffle + shuffled to do what shuffle, permutation do now (mnemonic: >> >> these break a 2d array into a bunch of 1d "cards", and then shuffle >> >> those cards). >> >> >> >> permuted remains indefinitely, with the docstring: "Deprecated alias >> >> for 'shuffled'." >> > >> > That sounds good to me. (I might go with 'randomize' instead of >> > 'scramble', >> > but that's a second-order decision for the API.) >> >> I hesitate to use names like "randomize" because they're less >> informative than they feel seem -- if asked what this operation does >> to an array, then it would be natural to say "it randomizes the >> array". But if told that the random module has a function called >> randomize, then that's not very informative -- everything in random >> randomizes something somehow. > > I had some similar concerns (hence my original "disarrange"), but > "randomize" seemed more likely to be found when searching or browsing the > docs, and while it might be a bit too generic-sounding, it does feel like a > natural verb for the process. On the other hand, "permute" and "permuted" > are even more natural and unambiguous. Any objections to those? (The > existing function is "permutation".) [...] > By the way, "permutation" has a feature not yet mentioned here: if the > argument is an integer 'n', it generates a permutation of arange(n). In > this case, it acts like matlab's "randperm" function. Unless we replicate > that in the new function, we shouldn't deprecate "permutation". I guess we could do something like: permutation(n): Return a random permutation on n items. Equivalent to permuted(arange(n)). Note: for backwards compatibility, a call like permutation(an_array) currently returns the same as shuffled(an_array). (This is *not* equivalent to permuted(an_array).) This functionality is deprecated. OTOH "np.random.permute" as a name does have a downside: someday we'll probably add a function called "np.permute" (for applying a given permutation in place -- the O(n) algorithm for this is useful and tricky), and having two functions with the same name and very different semantics would be pretty confusing. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From arokem at gmail.com Thu Oct 16 18:10:24 2014 From: arokem at gmail.com (Ariel Rokem) Date: Thu, 16 Oct 2014 15:10:24 -0700 Subject: [Numpy-discussion] Changed behavior of np.gradient In-Reply-To: References: <19A7BCC1-8608-47A8-9FB1-0FFFF567D573@astro.physik.uni-goettingen.de> <6711F56D-CFBE-4C55-AC55-F92E7EC6CB12@astro.physik.uni-goettingen.de> Message-ID: On Thu, Oct 16, 2014 at 10:22 AM, Nathaniel Smith wrote: > On Tue, Oct 14, 2014 at 10:33 PM, Charles R Harris > wrote: > > > > On Tue, Oct 14, 2014 at 11:50 AM, Nathaniel Smith wrote: > >> > >> On 14 Oct 2014 18:29, "Charles R Harris" > >> wrote: > >> > > >> > > >> > > >> > On Tue, Oct 14, 2014 at 10:57 AM, Nathaniel Smith > wrote: > >> >> > >> >> On 4 Oct 2014 22:17, "St?fan van der Walt" wrote: > >> >> > > >> >> > On Oct 4, 2014 10:14 PM, "Derek Homeier" > >> >> > wrote: > >> >> > > > >> >> > > +1 for an order=2 or maxorder=2 flag > >> >> > > >> >> > If you parameterize that flag, users will want to change its value > >> >> > (above two). Perhaps rather use a boolean flag such as > "second_order" or > >> >> > "high_order", unless it seems feasible to include additional > orders in the > >> >> > future. > >> >> > >> >> Predicting the future is hard :-). And in particular high_order= > would > >> >> create all kinds of confusion if in the future we added 3rd order > >> >> approximations but high_order=True continued to mean 2nd order > because of > >> >> compatibility. I like maxorder (or max_order would be more pep8ish I > guess) > >> >> because it leaves our options open. (Similar to how it's often > better to > >> >> have a kwarg that can take two possible string values than to have a > boolean > >> >> kwarg. It makes current code more explicit and makes future > enhancements > >> >> easier.) > >> > > >> > > >> > I think maxorder is a bit misleading. The both versions are second > order > >> > in the interior while at the ends the old is first order and the new > is > >> > second order. Maybe edge_order? > >> > >> Ah, that makes sense. edge_order makes more sense to me too then - and > we > >> can always add interior_order to complement it later, if appropriate. > >> > >> The other thing to decide on is the default. Is the 2nd order version > >> generally preferred (modulo compatibility)? If so then it might make > sense > >> to keep it the default, given that there are already numpy's in the wild > >> with that version, so we can't fully guarantee compatibility even if we > >> wanted to. But what do others think? > > > > I'd be inclined to keep the older as the default and regard adding the > > keyword as a bugfix. I should have caught the incompatibility in review. > > I don't have any code that uses gradient, so I don't have a great > sense of the trade-offs here. > > - Usually if we have a change that produces increased accuracy, we > make the increased accuracy the default. Otherwise no-one ever uses > it, and everyone gets less accurate results than they would otherwise. > (I don't have a great sense of how much this change affects accuracy > though.) > > - If the change in output per se is a serious problem for people, then > it's not one we can fix at this point -- 1.9.0 is out there and people > are using it anyway, so those who have the problem already need to > take some affirmative action to fix it. (Also, it's kinda weird to > change a function's behaviour and add a new argument in a point > release!) > > So I'd like to hear from people affected by this -- would you prefer > to have the 2nd order boundary calculations by default, you just need > some way to workaround the immediate problems in existing code? Or do > you prefer the old default remain the default, with 2nd order boundary > calculations something that must be requested by hand every time? > > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Since I started this discussion, I'll chime in. I don't have a strong preference for either mode that stems from a computational/scientific principle. As Nathaniel suggested - I have resorted to simply copying the 1.8 version of the function into my algorithm implementation, with the hope of removing that down the line. In that respect, I have a very weak preference for preserving the (1.8) status quo per default. Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Thu Oct 16 21:23:25 2014 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 16 Oct 2014 21:23:25 -0400 Subject: [Numpy-discussion] Changed behavior of np.gradient In-Reply-To: References: <19A7BCC1-8608-47A8-9FB1-0FFFF567D573@astro.physik.uni-goettingen.de> <6711F56D-CFBE-4C55-AC55-F92E7EC6CB12@astro.physik.uni-goettingen.de> Message-ID: It isn't really a question of accuracy. It breaks unit tests and reproducibility elsewhere. My vote is to revert to the old behavior in 1.9.1. Ben Root On Thu, Oct 16, 2014 at 6:10 PM, Ariel Rokem wrote: > > On Thu, Oct 16, 2014 at 10:22 AM, Nathaniel Smith wrote: > >> On Tue, Oct 14, 2014 at 10:33 PM, Charles R Harris >> wrote: >> > >> > On Tue, Oct 14, 2014 at 11:50 AM, Nathaniel Smith >> wrote: >> >> >> >> On 14 Oct 2014 18:29, "Charles R Harris" >> >> wrote: >> >> > >> >> > >> >> > >> >> > On Tue, Oct 14, 2014 at 10:57 AM, Nathaniel Smith >> wrote: >> >> >> >> >> >> On 4 Oct 2014 22:17, "St?fan van der Walt" >> wrote: >> >> >> > >> >> >> > On Oct 4, 2014 10:14 PM, "Derek Homeier" >> >> >> > wrote: >> >> >> > > >> >> >> > > +1 for an order=2 or maxorder=2 flag >> >> >> > >> >> >> > If you parameterize that flag, users will want to change its value >> >> >> > (above two). Perhaps rather use a boolean flag such as >> "second_order" or >> >> >> > "high_order", unless it seems feasible to include additional >> orders in the >> >> >> > future. >> >> >> >> >> >> Predicting the future is hard :-). And in particular high_order= >> would >> >> >> create all kinds of confusion if in the future we added 3rd order >> >> >> approximations but high_order=True continued to mean 2nd order >> because of >> >> >> compatibility. I like maxorder (or max_order would be more pep8ish >> I guess) >> >> >> because it leaves our options open. (Similar to how it's often >> better to >> >> >> have a kwarg that can take two possible string values than to have >> a boolean >> >> >> kwarg. It makes current code more explicit and makes future >> enhancements >> >> >> easier.) >> >> > >> >> > >> >> > I think maxorder is a bit misleading. The both versions are second >> order >> >> > in the interior while at the ends the old is first order and the new >> is >> >> > second order. Maybe edge_order? >> >> >> >> Ah, that makes sense. edge_order makes more sense to me too then - and >> we >> >> can always add interior_order to complement it later, if appropriate. >> >> >> >> The other thing to decide on is the default. Is the 2nd order version >> >> generally preferred (modulo compatibility)? If so then it might make >> sense >> >> to keep it the default, given that there are already numpy's in the >> wild >> >> with that version, so we can't fully guarantee compatibility even if we >> >> wanted to. But what do others think? >> > >> > I'd be inclined to keep the older as the default and regard adding the >> > keyword as a bugfix. I should have caught the incompatibility in review. >> >> I don't have any code that uses gradient, so I don't have a great >> sense of the trade-offs here. >> >> - Usually if we have a change that produces increased accuracy, we >> make the increased accuracy the default. Otherwise no-one ever uses >> it, and everyone gets less accurate results than they would otherwise. >> (I don't have a great sense of how much this change affects accuracy >> though.) >> >> - If the change in output per se is a serious problem for people, then >> it's not one we can fix at this point -- 1.9.0 is out there and people >> are using it anyway, so those who have the problem already need to >> take some affirmative action to fix it. (Also, it's kinda weird to >> change a function's behaviour and add a new argument in a point >> release!) >> >> So I'd like to hear from people affected by this -- would you prefer >> to have the 2nd order boundary calculations by default, you just need >> some way to workaround the immediate problems in existing code? Or do >> you prefer the old default remain the default, with 2nd order boundary >> calculations something that must be requested by hand every time? >> >> -n >> >> -- >> Nathaniel J. Smith >> Postdoctoral researcher - Informatics - University of Edinburgh >> http://vorpus.org >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > Since I started this discussion, I'll chime in. I don't have a strong > preference for either mode that stems from a computational/scientific > principle. As Nathaniel suggested - I have resorted to simply copying the > 1.8 version of the function into my algorithm implementation, with the hope > of removing that down the line. In that respect, I have a very weak > preference for preserving the (1.8) status quo per default. > > Thanks! > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Oct 16 21:31:09 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 17 Oct 2014 02:31:09 +0100 Subject: [Numpy-discussion] Changed behavior of np.gradient In-Reply-To: References: <19A7BCC1-8608-47A8-9FB1-0FFFF567D573@astro.physik.uni-goettingen.de> <6711F56D-CFBE-4C55-AC55-F92E7EC6CB12@astro.physik.uni-goettingen.de> Message-ID: On Fri, Oct 17, 2014 at 2:23 AM, Benjamin Root wrote: > It isn't really a question of accuracy. It breaks unit tests and > reproducibility elsewhere. My vote is to revert to the old behavior in > 1.9.1. Why would one want the 2nd order differences at all, if they're not more accurate? Should we just revert the patch entirely? I assumed the change had some benefit... -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From josef.pktd at gmail.com Thu Oct 16 21:35:35 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 16 Oct 2014 21:35:35 -0400 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: References: <543AA8E9.9020001@sebix.at> Message-ID: On Thu, Oct 16, 2014 at 3:39 PM, Nathaniel Smith wrote: > On Thu, Oct 16, 2014 at 6:30 PM, Warren Weckesser > wrote: >> >> >> On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith wrote: >>> >>> On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser >>> wrote: >>> > >>> > On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith wrote: >>> >> >>> >> Regarding names: shuffle/permutation is a terrible naming convention >>> >> IMHO and shouldn't be propagated further. We already have a good >>> >> naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. >>> >> reversed, etc. >>> >> >>> >> So, how about: >>> >> >>> >> scramble + scrambled shuffle individual entries within each >>> >> row/column/..., as in Warren's suggestion. >>> >> >>> >> shuffle + shuffled to do what shuffle, permutation do now (mnemonic: >>> >> these break a 2d array into a bunch of 1d "cards", and then shuffle >>> >> those cards). >>> >> >>> >> permuted remains indefinitely, with the docstring: "Deprecated alias >>> >> for 'shuffled'." >>> > >>> > That sounds good to me. (I might go with 'randomize' instead of >>> > 'scramble', >>> > but that's a second-order decision for the API.) >>> >>> I hesitate to use names like "randomize" because they're less >>> informative than they feel seem -- if asked what this operation does >>> to an array, then it would be natural to say "it randomizes the >>> array". But if told that the random module has a function called >>> randomize, then that's not very informative -- everything in random >>> randomizes something somehow. >> >> I had some similar concerns (hence my original "disarrange"), but >> "randomize" seemed more likely to be found when searching or browsing the >> docs, and while it might be a bit too generic-sounding, it does feel like a >> natural verb for the process. On the other hand, "permute" and "permuted" >> are even more natural and unambiguous. Any objections to those? (The >> existing function is "permutation".) > [...] >> By the way, "permutation" has a feature not yet mentioned here: if the >> argument is an integer 'n', it generates a permutation of arange(n). In >> this case, it acts like matlab's "randperm" function. Unless we replicate >> that in the new function, we shouldn't deprecate "permutation". > > I guess we could do something like: > > permutation(n): > > Return a random permutation on n items. Equivalent to permuted(arange(n)). > > Note: for backwards compatibility, a call like permutation(an_array) > currently returns the same as shuffled(an_array). (This is *not* > equivalent to permuted(an_array).) This functionality is deprecated. > > OTOH "np.random.permute" as a name does have a downside: someday we'll > probably add a function called "np.permute" (for applying a given > permutation in place -- the O(n) algorithm for this is useful and > tricky), and having two functions with the same name and very > different semantics would be pretty confusing. I like `permute`. That's the one term I'm looking for first. If np.permute does some kind of deterministic permutation or pivoting, then I wouldn't find it confusing if np.random.permute does "random" permutation. (I definitely don't like scrambled, sounds like eggs or cable TV that needs to be unscrambled.) Josef > > -n > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ben.root at ou.edu Thu Oct 16 21:38:25 2014 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 16 Oct 2014 21:38:25 -0400 Subject: [Numpy-discussion] Changed behavior of np.gradient In-Reply-To: References: <19A7BCC1-8608-47A8-9FB1-0FFFF567D573@astro.physik.uni-goettingen.de> <6711F56D-CFBE-4C55-AC55-F92E7EC6CB12@astro.physik.uni-goettingen.de> Message-ID: That isn't what I meant. Higher order doesn't "necessarily" mean more accurate. The results simply have different properties. The user needs to choose the differentiation order that they need. One interesting effect in data assimilation/modeling is that even-order differentiation can often have detrimental effects while higher odd order differentiation are better, but it is highly dependent upon the model. This change in gradient broke a unit test in matplotlib (for a new feature, so it isn't *that* critical). We didn't notice it at first because we weren't testing numpy 1.9 at the time. I want the feature (I have need for it elsewhere), but I don't want the change in default behavior. Cheers! Ben Root On Thu, Oct 16, 2014 at 9:31 PM, Nathaniel Smith wrote: > On Fri, Oct 17, 2014 at 2:23 AM, Benjamin Root wrote: > > It isn't really a question of accuracy. It breaks unit tests and > > reproducibility elsewhere. My vote is to revert to the old behavior in > > 1.9.1. > > Why would one want the 2nd order differences at all, if they're not > more accurate? Should we just revert the patch entirely? I assumed the > change had some benefit... > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Oct 16 22:25:41 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 16 Oct 2014 19:25:41 -0700 Subject: [Numpy-discussion] Changed behavior of np.gradient In-Reply-To: References: <19A7BCC1-8608-47A8-9FB1-0FFFF567D573@astro.physik.uni-goettingen.de> <6711F56D-CFBE-4C55-AC55-F92E7EC6CB12@astro.physik.uni-goettingen.de> Message-ID: Hi, On Thu, Oct 16, 2014 at 6:38 PM, Benjamin Root wrote: > That isn't what I meant. Higher order doesn't "necessarily" mean more > accurate. The results simply have different properties. The user needs to > choose the differentiation order that they need. One interesting effect in > data assimilation/modeling is that even-order differentiation can often have > detrimental effects while higher odd order differentiation are better, but > it is highly dependent upon the model. > > This change in gradient broke a unit test in matplotlib (for a new feature, > so it isn't *that* critical). We didn't notice it at first because we > weren't testing numpy 1.9 at the time. I want the feature (I have need for > it elsewhere), but I don't want the change in default behavior. I think it would be a bad idea to revert now. I suspect, if you revert, then a lot of other code will assume the < 1.9.0, >= 1.9.1 behavior. In that case, the code will work as expected most of the time, except when combined with 1.9.0, which could be seriously surprising, and often missed. If you keep the new behavior, then it will be clearer that other code will have to adapt to this change >= 1.9.0 - surprise, but predictable surprise, if you see what I mean... Matthew From charlesr.harris at gmail.com Thu Oct 16 22:31:47 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 16 Oct 2014 20:31:47 -0600 Subject: [Numpy-discussion] Changed behavior of np.gradient In-Reply-To: References: <19A7BCC1-8608-47A8-9FB1-0FFFF567D573@astro.physik.uni-goettingen.de> <6711F56D-CFBE-4C55-AC55-F92E7EC6CB12@astro.physik.uni-goettingen.de> Message-ID: On Thu, Oct 16, 2014 at 8:25 PM, Matthew Brett wrote: > Hi, > > On Thu, Oct 16, 2014 at 6:38 PM, Benjamin Root wrote: > > That isn't what I meant. Higher order doesn't "necessarily" mean more > > accurate. The results simply have different properties. The user needs to > > choose the differentiation order that they need. One interesting effect > in > > data assimilation/modeling is that even-order differentiation can often > have > > detrimental effects while higher odd order differentiation are better, > but > > it is highly dependent upon the model. > > > > This change in gradient broke a unit test in matplotlib (for a new > feature, > > so it isn't *that* critical). We didn't notice it at first because we > > weren't testing numpy 1.9 at the time. I want the feature (I have need > for > > it elsewhere), but I don't want the change in default behavior. > > I think it would be a bad idea to revert now. > > I suspect, if you revert, then a lot of other code will assume the < > 1.9.0, >= 1.9.1 behavior. In that case, the code will work as > expected most of the time, except when combined with 1.9.0, which > could be seriously surprising, and often missed. If you keep the new > behavior, then it will be clearer that other code will have to adapt > to this change >= 1.9.0 - surprise, but predictable surprise, if you > see what I mean... > 1.9.1 will be out in a week or so. To be honest, these days I regard the 1.x.0 releases as sort of an advanced release candidate. I think there are just a lot more changes going in between releases and the release gets a lot more testing than the official release candidates. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Oct 16 22:50:48 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 17 Oct 2014 03:50:48 +0100 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: References: <543AA8E9.9020001@sebix.at> Message-ID: On Fri, Oct 17, 2014 at 2:35 AM, wrote: > On Thu, Oct 16, 2014 at 3:39 PM, Nathaniel Smith wrote: >> On Thu, Oct 16, 2014 at 6:30 PM, Warren Weckesser >> wrote: >>> >>> >>> On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith wrote: >>>> >>>> On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser >>>> wrote: >>>> > >>>> > On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith wrote: >>>> >> >>>> >> Regarding names: shuffle/permutation is a terrible naming convention >>>> >> IMHO and shouldn't be propagated further. We already have a good >>>> >> naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. >>>> >> reversed, etc. >>>> >> >>>> >> So, how about: >>>> >> >>>> >> scramble + scrambled shuffle individual entries within each >>>> >> row/column/..., as in Warren's suggestion. >>>> >> >>>> >> shuffle + shuffled to do what shuffle, permutation do now (mnemonic: >>>> >> these break a 2d array into a bunch of 1d "cards", and then shuffle >>>> >> those cards). >>>> >> >>>> >> permuted remains indefinitely, with the docstring: "Deprecated alias >>>> >> for 'shuffled'." >>>> > >>>> > That sounds good to me. (I might go with 'randomize' instead of >>>> > 'scramble', >>>> > but that's a second-order decision for the API.) >>>> >>>> I hesitate to use names like "randomize" because they're less >>>> informative than they feel seem -- if asked what this operation does >>>> to an array, then it would be natural to say "it randomizes the >>>> array". But if told that the random module has a function called >>>> randomize, then that's not very informative -- everything in random >>>> randomizes something somehow. >>> >>> I had some similar concerns (hence my original "disarrange"), but >>> "randomize" seemed more likely to be found when searching or browsing the >>> docs, and while it might be a bit too generic-sounding, it does feel like a >>> natural verb for the process. On the other hand, "permute" and "permuted" >>> are even more natural and unambiguous. Any objections to those? (The >>> existing function is "permutation".) >> [...] >>> By the way, "permutation" has a feature not yet mentioned here: if the >>> argument is an integer 'n', it generates a permutation of arange(n). In >>> this case, it acts like matlab's "randperm" function. Unless we replicate >>> that in the new function, we shouldn't deprecate "permutation". >> >> I guess we could do something like: >> >> permutation(n): >> >> Return a random permutation on n items. Equivalent to permuted(arange(n)). >> >> Note: for backwards compatibility, a call like permutation(an_array) >> currently returns the same as shuffled(an_array). (This is *not* >> equivalent to permuted(an_array).) This functionality is deprecated. >> >> OTOH "np.random.permute" as a name does have a downside: someday we'll >> probably add a function called "np.permute" (for applying a given >> permutation in place -- the O(n) algorithm for this is useful and >> tricky), and having two functions with the same name and very >> different semantics would be pretty confusing. > > I like `permute`. That's the one term I'm looking for first. > > If np.permute does some kind of deterministic permutation or pivoting, > then I wouldn't find it confusing if np.random.permute does "random" > permutation. Yeah, but: from ... import permute # 500 lines later def foo(...): permute(...) # what the heck is this It definitely *can* be confusing; basically everything else in np.random has a name that suggests randomness even without seeing the full path. It's not a huge deal, though. > (I definitely don't like scrambled, sounds like eggs or cable TV that > needs to be unscrambled.) I vote that in this kind of bikeshed we try to restrict ourselves to arguments that we can at least pretend are motivated by some technical/UX concern ;-). (I guess unscrambling eggs would be technically impressive tho ;-)) -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From jaime.frio at gmail.com Thu Oct 16 23:43:12 2014 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Thu, 16 Oct 2014 20:43:12 -0700 Subject: [Numpy-discussion] Passing multiple output arguments to ufunc Message-ID: There is an oldish feature request in github (https://github.com/numpy/numpy/issues/4752), complaining about it not being possible to pass multiple output arguments to a ufunc using keyword arguments. You can pass them all as positional arguments: >>> out1 = np.empty(1) >>> out2 = np.empty(1) >>> np.modf([1.333], out1, out2) (array([ 0.333]), array([ 1.])) You can also pass the first as a kwarg if you leave the others unspecified: >>> np.modf([1.333], out=out1) (array([ 0.333]), array([ 1.])) You can also use None in a positional argument to leave some of the output arguments unspecified: >>> np.modf([1.3333], None, out2) (array([ 0.3333]), array([ 1.])) But you cannot do something like >>> np.modf([1.333], out=(out1, out2)) Traceback (most recent call last): File "", line 1, in TypeError: return arrays must be of ArrayType Would this behavior make sense? The idea would be to allow a tuple as a valid input for the 'out=' kwarg. It would have to have a length exactly matching the number of output arguments, and its items would have to be either arrays or None. For backwards compatibility we probably should still allow a single array to mean the first output argument, even if the ufunc has multiple outputs. Any other thoughts? Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. From ignathe at gmail.com Fri Oct 17 04:22:35 2014 From: ignathe at gmail.com (Ignat Harczuk) Date: Fri, 17 Oct 2014 10:22:35 +0200 Subject: [Numpy-discussion] Instaling numpy without root access In-Reply-To: References: <54381060.1040109@googlemail.com> <543817A5.4010900@googlemail.com> Message-ID: Have you considered virtual environments? http://docs.python-guide.org/en/latest/dev/virtualenvs/ Inside of each environment you can build a local python version and packages with different versions through pip. Maybe not exactly what you need help with but it is a good tool to have so that you have less dependency issues. On Mon, Oct 13, 2014 at 2:52 AM, Lahiru Samarakoon wrote: > Guys, any advice is highly appreciated. I am a little new to building in > Linux. > Thanks, > Lahiru > > On Sat, Oct 11, 2014 at 9:43 AM, Lahiru Samarakoon > wrote: > >> I switched to numpy-1.8.2. . Now getting following error. I am using >> LAPACK that comes with atlast installation. Can this be a problem? >> >> Traceback (most recent call last): >> File "", line 1, in >> File >> "/home/svu/a0095654/.local/lib/python2.7/site-packages/numpy/__init__.py", >> line 170, in >> from . import add_newdocs >> File >> "/home/svu/a0095654/.local/lib/python2.7/site-packages/numpy/add_newdocs.py", >> line 13, in >> from numpy.lib import add_newdoc >> File >> "/home/svu/a0095654/.local/lib/python2.7/site-packages/numpy/lib/__init__.py", >> line 18, in >> from .polynomial import * >> File >> "/home/svu/a0095654/.local/lib/python2.7/site-packages/numpy/lib/polynomial.py", >> line 19, in >> from numpy.linalg import eigvals, lstsq, inv >> File >> "/home/svu/a0095654/.local/lib/python2.7/site-packages/numpy/linalg/__init__.py", >> line 51, in >> from .linalg import * >> File >> "/home/svu/a0095654/.local/lib/python2.7/site-packages/numpy/linalg/linalg.py", >> line 29, in >> from numpy.linalg import lapack_lite, _umath_linalg >> ImportError: >> /home/svu/a0095654/.local/lib/python2.7/site-packages/numpy/linalg/lapack_lite.so: >> undefined symbol: zgesdd_ >> >> On Sat, Oct 11, 2014 at 1:30 AM, Julian Taylor < >> jtaylor.debian at googlemail.com> wrote: >> >>> On 10.10.2014 19:26, Lahiru Samarakoon wrote: >>> > Red Hat Enterprise Linux release 5.8 >>> > gcc (GCC) 4.1.2 >>> > >>> > I am also trying to install numpy 1.9. >>> >>> that is the broken platform, please try the master branch or the >>> maintenance/1.9.x branch, those should work now. >>> >>> Are there volunteers to report this to redhat? >>> >>> > >>> > On Sat, Oct 11, 2014 at 12:59 AM, Julian Taylor >>> > > >>> > wrote: >>> > >>> > On 10.10.2014 18:51, Lahiru Samarakoon wrote: >>> > > Dear all, >>> > > >>> > > I am trying to install numpy without root access. So I am >>> building from >>> > > the source. I have installed atlas which also has lapack with >>> it. I >>> > > changed the site.cfg file as given below >>> > > >>> > > [DEFAULT] >>> > > library_dirs = /home/svu/a0095654/ATLAS/build/lib >>> > > include_dirs = /home/svu/a0095654/ATLAS/build/include >>> > > >>> > > >>> > > However, I am getting a segmentation fault when importing numpy. >>> > > >>> > > Please advise. I also put the build log file at the end of the >>> email if >>> > > necessary. >>> > >>> > >>> > Which platform are you working on? Which compiler version? >>> > We just solved a segfault on import on red hat 5 gcc 4.1.2. Very >>> likely >>> > caused by a compiler bug. See >>> https://github.com/numpy/numpy/issues/5163 >>> > >>> > The build log is complaining about your atlas being to small, >>> possibly >>> > the installation is broken? >>> > >>> > >>> >>> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Oct 17 09:04:51 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 17 Oct 2014 09:04:51 -0400 Subject: [Numpy-discussion] Request for enhancement to numpy.random.shuffle In-Reply-To: References: <543AA8E9.9020001@sebix.at> Message-ID: On Thu, Oct 16, 2014 at 10:50 PM, Nathaniel Smith wrote: > On Fri, Oct 17, 2014 at 2:35 AM, wrote: >> On Thu, Oct 16, 2014 at 3:39 PM, Nathaniel Smith wrote: >>> On Thu, Oct 16, 2014 at 6:30 PM, Warren Weckesser >>> wrote: >>>> >>>> >>>> On Thu, Oct 16, 2014 at 12:40 PM, Nathaniel Smith wrote: >>>>> >>>>> On Thu, Oct 16, 2014 at 4:39 PM, Warren Weckesser >>>>> wrote: >>>>> > >>>>> > On Sun, Oct 12, 2014 at 9:13 PM, Nathaniel Smith wrote: >>>>> >> >>>>> >> Regarding names: shuffle/permutation is a terrible naming convention >>>>> >> IMHO and shouldn't be propagated further. We already have a good >>>>> >> naming convention for inplace-vs-sorted: sort vs. sorted, reverse vs. >>>>> >> reversed, etc. >>>>> >> >>>>> >> So, how about: >>>>> >> >>>>> >> scramble + scrambled shuffle individual entries within each >>>>> >> row/column/..., as in Warren's suggestion. >>>>> >> >>>>> >> shuffle + shuffled to do what shuffle, permutation do now (mnemonic: >>>>> >> these break a 2d array into a bunch of 1d "cards", and then shuffle >>>>> >> those cards). >>>>> >> >>>>> >> permuted remains indefinitely, with the docstring: "Deprecated alias >>>>> >> for 'shuffled'." >>>>> > >>>>> > That sounds good to me. (I might go with 'randomize' instead of >>>>> > 'scramble', >>>>> > but that's a second-order decision for the API.) >>>>> >>>>> I hesitate to use names like "randomize" because they're less >>>>> informative than they feel seem -- if asked what this operation does >>>>> to an array, then it would be natural to say "it randomizes the >>>>> array". But if told that the random module has a function called >>>>> randomize, then that's not very informative -- everything in random >>>>> randomizes something somehow. >>>> >>>> I had some similar concerns (hence my original "disarrange"), but >>>> "randomize" seemed more likely to be found when searching or browsing the >>>> docs, and while it might be a bit too generic-sounding, it does feel like a >>>> natural verb for the process. On the other hand, "permute" and "permuted" >>>> are even more natural and unambiguous. Any objections to those? (The >>>> existing function is "permutation".) >>> [...] >>>> By the way, "permutation" has a feature not yet mentioned here: if the >>>> argument is an integer 'n', it generates a permutation of arange(n). In >>>> this case, it acts like matlab's "randperm" function. Unless we replicate >>>> that in the new function, we shouldn't deprecate "permutation". >>> >>> I guess we could do something like: >>> >>> permutation(n): >>> >>> Return a random permutation on n items. Equivalent to permuted(arange(n)). >>> >>> Note: for backwards compatibility, a call like permutation(an_array) >>> currently returns the same as shuffled(an_array). (This is *not* >>> equivalent to permuted(an_array).) This functionality is deprecated. >>> >>> OTOH "np.random.permute" as a name does have a downside: someday we'll >>> probably add a function called "np.permute" (for applying a given >>> permutation in place -- the O(n) algorithm for this is useful and >>> tricky), and having two functions with the same name and very >>> different semantics would be pretty confusing. >> >> I like `permute`. That's the one term I'm looking for first. >> >> If np.permute does some kind of deterministic permutation or pivoting, >> then I wouldn't find it confusing if np.random.permute does "random" >> permutation. > > Yeah, but: > > from ... import permute > # 500 lines later > def foo(...): > permute(...) # what the heck is this > > It definitely *can* be confusing; basically everything else in > np.random has a name that suggests randomness even without seeing the > full path. I usually/always avoid importing names from random into the module namespace np.random.xxx from numpy.random import power power(...) >>> power(5, 3) array([ 0.93771162, 0.96180884, 0.80191961]) ??? and f and beta and gamma, ... >>> bytes(10) '\xa3\xf0%\x88\x11\xda\x0e\x81\x0c\x8e' >>> bytes(5) '\xb0B\x8e\xa1\x80' > > It's not a huge deal, though. > >> (I definitely don't like scrambled, sounds like eggs or cable TV that >> needs to be unscrambled.) > > I vote that in this kind of bikeshed we try to restrict ourselves to > arguments that we can at least pretend are motivated by some > technical/UX concern ;-). (I guess unscrambling eggs would be > technically impressive tho ;-)) Ignoring the eggs, it still sounds like a cheap encryption and is a word I would never look for when looking for something to implement a permutation test. Josef > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ben.root at ou.edu Fri Oct 17 09:47:58 2014 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 17 Oct 2014 09:47:58 -0400 Subject: [Numpy-discussion] Changed behavior of np.gradient In-Reply-To: References: <19A7BCC1-8608-47A8-9FB1-0FFFF567D573@astro.physik.uni-goettingen.de> <6711F56D-CFBE-4C55-AC55-F92E7EC6CB12@astro.physik.uni-goettingen.de> Message-ID: I see this as a regression. We don't keep regressions around for backwards compatibility, we fix them. Ben On Thu, Oct 16, 2014 at 10:25 PM, Matthew Brett wrote: > Hi, > > On Thu, Oct 16, 2014 at 6:38 PM, Benjamin Root wrote: > > That isn't what I meant. Higher order doesn't "necessarily" mean more > > accurate. The results simply have different properties. The user needs to > > choose the differentiation order that they need. One interesting effect > in > > data assimilation/modeling is that even-order differentiation can often > have > > detrimental effects while higher odd order differentiation are better, > but > > it is highly dependent upon the model. > > > > This change in gradient broke a unit test in matplotlib (for a new > feature, > > so it isn't *that* critical). We didn't notice it at first because we > > weren't testing numpy 1.9 at the time. I want the feature (I have need > for > > it elsewhere), but I don't want the change in default behavior. > > I think it would be a bad idea to revert now. > > I suspect, if you revert, then a lot of other code will assume the < > 1.9.0, >= 1.9.1 behavior. In that case, the code will work as > expected most of the time, except when combined with 1.9.0, which > could be seriously surprising, and often missed. If you keep the new > behavior, then it will be clearer that other code will have to adapt > to this change >= 1.9.0 - surprise, but predictable surprise, if you > see what I mean... > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Oct 17 10:11:58 2014 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 17 Oct 2014 15:11:58 +0100 Subject: [Numpy-discussion] Changed behavior of np.gradient In-Reply-To: References: <19A7BCC1-8608-47A8-9FB1-0FFFF567D573@astro.physik.uni-goettingen.de> <6711F56D-CFBE-4C55-AC55-F92E7EC6CB12@astro.physik.uni-goettingen.de> Message-ID: On 17 Oct 2014 02:38, "Benjamin Root" wrote: > > That isn't what I meant. Higher order doesn't "necessarily" mean more accurate. The results simply have different properties. The user needs to choose the differentiation order that they need. One interesting effect in data assimilation/modeling is that even-order differentiation can often have detrimental effects while higher odd order differentiation are better, but it is highly dependent upon the model. To be clear, we aren't talking about different degrees of differentiation, we're talking about different approximations to the first derivative. I just looked up the original pull request and it contains a pretty convincing graph in which the old code has large systematic errors and the new code doesn't: https://github.com/numpy/numpy/issues/3603 I think the claim is that the old code had approximation error that grows like O(1/n), and the new code has errors like O(1/n**2). (Don't ask me what n is though.) > This change in gradient broke a unit test in matplotlib (for a new feature, so it isn't *that* critical). We didn't notice it at first because we weren't testing numpy 1.9 at the time. I want the feature (I have need for it elsewhere), but I don't want the change in default behavior. You say it's bad, the original poster says it's good, how are we poor maintainers to know what to do? :-) Can you say any more about why you you prefer so-called lower accuracy approximations here by default? -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Sat Oct 18 00:56:01 2014 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 17 Oct 2014 21:56:01 -0700 Subject: [Numpy-discussion] Add an axis argument to generalized ufuncs? Message-ID: Yesterday I created a GitHub issue proposing adding an axis argument to numpy's gufuncs: https://github.com/numpy/numpy/issues/5197 I was told I should repost this on the mailing list, so here's the recap: I would like to write generalized ufuncs (probably using numba), to create fast functions such as nanmean (signature '(n)->()') or rolling_mean (signature '(n),()->(n)') that take the axis along which to aggregate as a keyword argument, e.g., nanmean(x, axis=0) or rolling_mean(x, window=5, axis=0). Of course, I could write my own wrapper for this that reorders dimensions using swapaxes or transpose. But I also think that an "axis" argument to allow for specifying the core dimensions of gufuncs would be more generally useful, and we should consider adding it to numpy. Nathaniel and Jaime added some good points, noting that such an axis argument should cleanly handle multiple input and output arguments and have a plan for handling optional dimensions (e.g., (m?,n),(n,p?)->(m?,p?) for the new dot). Here are my initial thoughts on the syntax: (1) Generally speaking, I think the "nested tuple" syntax (e.g., axis=[(0, 1), (2, 3)]) would be most congruous with the axis arguments numpy already supports. (2) For gufuncs with simpler signatures, we should support supplying an integer or an unnested tuple, e.g., - axis=0 for (n)->() - axis=(0, 1) for (n)(m)->() or (n,m)->() - axis=[(0, 1), 2] for (n,m),(o)->(). (3) If we require a full axis specification for core dimensions, we could use the axis argument for unambiguous control of optional core dimensions: e.g., axis=(0, 1) would indicate that you want the "vectorized inner product" version of the new dot operator, rather than matrix multiplication, and axis=[(-2, -1), -1] would mean that you want the "vectorized matrix-vector product". This seems relatively tidy, although I admit I am not convinced that optional core dimensions are necessary. (4) We can either include the output axis as part of the signature, or add another argument "axis_out" or "out_axis". I think prefer the separate argument, particularly if we require "axis" to specify all core dimensions, which may be a good idea even if we don't use "axis" for controlling optional core dimensions. Cheers, Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Oct 18 01:39:51 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 17 Oct 2014 23:39:51 -0600 Subject: [Numpy-discussion] Add an axis argument to generalized ufuncs? In-Reply-To: References: Message-ID: On Fri, Oct 17, 2014 at 10:56 PM, Stephan Hoyer wrote: > Yesterday I created a GitHub issue proposing adding an axis argument to > numpy's gufuncs: > https://github.com/numpy/numpy/issues/5197 > > I was told I should repost this on the mailing list, so here's the recap: > > I would like to write generalized ufuncs (probably using numba), to create > fast functions such as nanmean (signature '(n)->()') or rolling_mean > (signature '(n),()->(n)') that take the axis along which to aggregate as a > keyword argument, e.g., nanmean(x, axis=0) or rolling_mean(x, window=5, > axis=0). > > Of course, I could write my own wrapper for this that reorders dimensions > using swapaxes or transpose. But I also think that an "axis" argument to > allow for specifying the core dimensions of gufuncs would be more generally > useful, and we should consider adding it to numpy. > > Nathaniel and Jaime added some good points, noting that such an axis > argument should cleanly handle multiple input and output arguments and have > a plan for handling optional dimensions (e.g., (m?,n),(n,p?)->(m?,p?) for > the new dot). > > Here are my initial thoughts on the syntax: > > (1) Generally speaking, I think the "nested tuple" syntax (e.g., axis=[(0, > 1), (2, 3)]) would be most congruous with the axis arguments numpy already > supports. > > (2) For gufuncs with simpler signatures, we should support supplying an > integer or an unnested tuple, e.g., > - axis=0 for (n)->() > - axis=(0, 1) for (n)(m)->() or (n,m)->() > - axis=[(0, 1), 2] for (n,m),(o)->(). > > (3) If we require a full axis specification for core dimensions, we could > use the axis argument for unambiguous control of optional core dimensions: > e.g., axis=(0, 1) would indicate that you want the "vectorized inner > product" version of the new dot operator, rather than matrix > multiplication, and axis=[(-2, -1), -1] would mean that you want the > "vectorized matrix-vector product". This seems relatively tidy, although I > admit I am not convinced that optional core dimensions are necessary. > > (4) We can either include the output axis as part of the signature, or add > another argument "axis_out" or "out_axis". I think prefer the separate > argument, particularly if we require "axis" to specify all core dimensions, > which may be a good idea even if we don't use "axis" for controlling > optional core dimensions. > > Might want to contact continuum analytics also. They recently created a gufunc repository. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From vbubbly21 at gmail.com Sat Oct 18 01:58:02 2014 From: vbubbly21 at gmail.com (Artur Bercik) Date: Sat, 18 Oct 2014 14:58:02 +0900 Subject: [Numpy-discussion] Extract Indices of Numpy Array Based on Given Bit Information Message-ID: Dear Python and Numpy Users: My data are in the form of '32-bit unsigned integer' as follows: myData = np.array([1073741824, 1073741877, 1073742657, 1073742709, 1073742723, 1073755137, 1073755189,1073755969],dtype=np.int32) I want to get the index of my data where the following occurs: Bit No. 0?1 Bit Combination: 00 How can I do it? I heard this type of problem first time, please help me. Artur -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Sat Oct 18 07:28:47 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Sat, 18 Oct 2014 13:28:47 +0200 Subject: [Numpy-discussion] Extract Indices of Numpy Array Based on Given Bit Information In-Reply-To: References: Message-ID: <54424EEF.80103@googlemail.com> On 18.10.2014 07:58, Artur Bercik wrote: > Dear Python and Numpy Users: > > My data are in the form of '32-bit unsigned integer' as follows: > > myData = np.array([1073741824, 1073741877, 1073742657, 1073742709, > 1073742723, 1073755137, 1073755189,1073755969],dtype=np.int32) > > I want to get the index of my data where the following occurs: > > Bit No. 0?1 > Bit Combination: 00 > > How can I do it? I heard this type of problem first time, please help me. > > Artur > not sure I understand the problem, maybe this? np.where((myData & 0x3) == 0) From vbubbly21 at gmail.com Sat Oct 18 08:00:55 2014 From: vbubbly21 at gmail.com (Artur Bercik) Date: Sat, 18 Oct 2014 21:00:55 +0900 Subject: [Numpy-discussion] Extract Indices of Numpy Array Based on Given Bit Information In-Reply-To: <54424EEF.80103@googlemail.com> References: <54424EEF.80103@googlemail.com> Message-ID: On Sat, Oct 18, 2014 at 8:28 PM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > On 18.10.2014 07:58, Artur Bercik wrote: > > Dear Python and Numpy Users: > > > > My data are in the form of '32-bit unsigned integer' as follows: > > > > myData = np.array([1073741824, 1073741877, 1073742657, 1073742709, > > 1073742723, 1073755137, 1073755189,1073755969],dtype=np.int32) > > > > I want to get the index of my data where the following occurs: > > > > Bit No. 0?1 > > Bit Combination: 00 > > > > How can I do it? I heard this type of problem first time, please help me. > > > > Artur > > > > not sure I understand the problem, maybe this? > > np.where((myData & 0x3) == 0) > yes, it works greatly for the following case: myData = np.array([1073741824, 1073741877, 1073742657, 1073742709, 1073742723, 1073755137, 1073755189,1073755969],dtype=np.uint32) Bit No. 0?1 Bit Combination: 00 Can you make such automation for the following case as well? Bit No. 2?5 Bit Combination: 1101 Thanks in the advance. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From vbubbly21 at gmail.com Sat Oct 18 08:14:14 2014 From: vbubbly21 at gmail.com (Artur Bercik) Date: Sat, 18 Oct 2014 21:14:14 +0900 Subject: [Numpy-discussion] Extract Indices of Numpy Array Based on Given Bit Information In-Reply-To: References: <54424EEF.80103@googlemail.com> Message-ID: On Sat, Oct 18, 2014 at 9:00 PM, Artur Bercik wrote: > > > On Sat, Oct 18, 2014 at 8:28 PM, Julian Taylor < > jtaylor.debian at googlemail.com> wrote: > >> On 18.10.2014 07:58, Artur Bercik wrote: >> > Dear Python and Numpy Users: >> > >> > My data are in the form of '32-bit unsigned integer' as follows: >> > >> > myData = np.array([1073741824, 1073741877, 1073742657, 1073742709, >> > 1073742723, 1073755137, 1073755189,1073755969],dtype=np.int32) >> > >> > I want to get the index of my data where the following occurs: >> > >> > Bit No. 0?1 >> > Bit Combination: 00 >> > >> > How can I do it? I heard this type of problem first time, please help >> me. >> > >> > Artur >> > >> >> not sure I understand the problem, maybe this? >> >> np.where((myData & 0x3) == 0) >> > > yes, it works greatly for the following case: > > myData = np.array([1073741824, 1073741877, 1073742657, 1073742709, > 1073742723, 1073755137, 1073755189,1073755969],dtype=np.uint32) > Bit No. 0?1 > Bit Combination: 00 > > Can you make such automation for the following case as well? > > Bit No. 2?5 > Bit Combination: 1101 > > Thanks in the advance. > Also wondering why np.where((myData & 0x3) == 0) instead of just np.where((myData & 3) == 0) > > > > > >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Sat Oct 18 08:28:30 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Sat, 18 Oct 2014 14:28:30 +0200 Subject: [Numpy-discussion] Extract Indices of Numpy Array Based on Given Bit Information In-Reply-To: References: <54424EEF.80103@googlemail.com> Message-ID: <54425CEE.5040402@googlemail.com> On 18.10.2014 14:14, Artur Bercik wrote: > > > On Sat, Oct 18, 2014 at 9:00 PM, Artur Bercik > wrote: > > > > On Sat, Oct 18, 2014 at 8:28 PM, Julian Taylor > > wrote: > > On 18.10.2014 07:58, Artur Bercik wrote: > > Dear Python and Numpy Users: > > > > My data are in the form of '32-bit unsigned integer' as follows: > > > > myData = np.array([1073741824, 1073741877, 1073742657, 1073742709, > > 1073742723, 1073755137, 1073755189,1073755969],dtype=np.int32) > > > > I want to get the index of my data where the following occurs: > > > > Bit No. 0?1 > > Bit Combination: 00 > > > > How can I do it? I heard this type of problem first time, please help me. > > > > Artur > > > > not sure I understand the problem, maybe this? > > np.where((myData & 0x3) == 0) > > > yes, it works greatly for the following case: > > myData = np.array([1073741824, 1073741877, 1073742657, 1073742709, > 1073742723, 1073755137, 1073755189,1073755969],dtype=np.uint32) > Bit No. 0?1 > Bit Combination: 00 > > Can you make such automation for the following case as well? > > Bit No. 2?5 > Bit Combination: 1101 > sure, you can do any of these with the right masks: np.where((myData & 0x3c) == 0x34) you can use bin(number) to check if your numbers are correct. > > Also wondering why np.where((myData & 0x3) == 0) instead of > just np.where((myData & 3) == 0) > its the same, 0x means the number is in hexadecimal representation, for 3 they happen to be equal (as 3 < 10) It is often easier to work in the hexadecimal representation when dealing with binary data as its base is a power of two. So two digits in hexadecimal represent one byte. In the case above: 0x3c c is 12 -> 1100 3 is 3 -> 11 together you get 111100, mask for bits 2-5 From vbubbly21 at gmail.com Sat Oct 18 08:40:36 2014 From: vbubbly21 at gmail.com (Artur Bercik) Date: Sat, 18 Oct 2014 21:40:36 +0900 Subject: [Numpy-discussion] Extract Indices of Numpy Array Based on Given Bit Information In-Reply-To: <54425CEE.5040402@googlemail.com> References: <54424EEF.80103@googlemail.com> <54425CEE.5040402@googlemail.com> Message-ID: Dear Julian Taylor Thank you very much, I really appreciated your codes. On Sat, Oct 18, 2014 at 9:28 PM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > On 18.10.2014 14:14, Artur Bercik wrote: > > > > > > On Sat, Oct 18, 2014 at 9:00 PM, Artur Bercik > > wrote: > > > > > > > > On Sat, Oct 18, 2014 at 8:28 PM, Julian Taylor > > > > wrote: > > > > On 18.10.2014 07:58, Artur Bercik wrote: > > > Dear Python and Numpy Users: > > > > > > My data are in the form of '32-bit unsigned integer' as > follows: > > > > > > myData = np.array([1073741824, 1073741877, 1073742657, > 1073742709, > > > 1073742723, 1073755137, 1073755189,1073755969],dtype=np.int32) > > > > > > I want to get the index of my data where the following occurs: > > > > > > Bit No. 0?1 > > > Bit Combination: 00 > > > > > > How can I do it? I heard this type of problem first time, > please help me. > > > > > > Artur > > > > > > > not sure I understand the problem, maybe this? > > > > np.where((myData & 0x3) == 0) > > > > > > yes, it works greatly for the following case: > > > > myData = np.array([1073741824, 1073741877, 1073742657, 1073742709, > > 1073742723, 1073755137, 1073755189,1073755969],dtype=np.uint32) > > Bit No. 0?1 > > Bit Combination: 00 > > > > Can you make such automation for the following case as well? > > > > Bit No. 2?5 > > Bit Combination: 1101 > > > > sure, you can do any of these with the right masks: > np.where((myData & 0x3c) == 0x34) > > you can use bin(number) to check if your numbers are correct. > > > > > > Also wondering why np.where((myData & 0x3) == 0) instead of > > just np.where((myData & 3) == 0) > > > > its the same, 0x means the number is in hexadecimal representation, for > 3 they happen to be equal (as 3 < 10) > It is often easier to work in the hexadecimal representation when > dealing with binary data as its base is a power of two. So two digits in > hexadecimal represent one byte. > In the case above: 0x3c > c is 12 -> 1100 > 3 is 3 -> 11 > together you get 111100, mask for bits 2-5 > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Oct 18 21:17:07 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 19 Oct 2014 02:17:07 +0100 Subject: [Numpy-discussion] np.gradient Message-ID: Okay! I think I now actually understand what was going on with np.gradient! The discussion has been pretty confused, and I'm worried that what's in master right now is not the right solution, which is a problem because we're supposed to cut 1.9.1 tomorrow. Background: np.gradient computes gradients using a finite difference method. Wikipedia has a good explanation of how this works: https://en.wikipedia.org/wiki/Numerical_differentiation The key point is that there are multiple formulas one can use that all converge to the derivative, e.g. (f(x + h) - f(x)) / h or (f(x + h) - f(x - h)) / 2h The first is the textbook definition of the derivative. As h -> 0, the error in that approximation shrinks like O(h). The second formula also converges to the derivative as h -> 0, but it converges like O(h^2), i.e., much faster. And there's are many many formulas like this with different trade-offs: https://en.wikipedia.org/wiki/Finite_difference_coefficient In practice, given a grid of values of f(x) and a grid stepsize of h, all of these formulas come down to doing a simple convolution of the data with certain coefficients (e.g. [-1/h, 1/h] or [-1/2h, 0, 1/2h] for the two formulas above). Traditionally np.gradient has used the second formula above, with its quadratically diminishing error term, for interior points in the grid. For points on the boundary of the grid, though, this formula has a problem, b/c it requires you look at points to both the left and the right of the current point -- but for boundary points one of these doesn't exist. In such situations np.gradient has traditionally used the first formula instead (the "forward (or backward) finite difference approximation with linear error", where "forward"/"backward" means that it works on the boundary). As the web page linked above shows, though, there's an easy alternative formula that works on The change in numpy 1.9.0 that has caused all the fuss, is that we switched to using the quadratically accurate approximation instead of the linearly accurate approximation for boundary points. This had two effects: 1) it produced (hopefully small) changes in the calculation for the boundaries. In general these should have made the calculations more accurate, but apparently this did cause problems for some people who wanted perfectly reproducible output. 2) it tickled a nasty bug in how gradient and masked arrays work together. It turns out gradient applied to masked arrays has always been seriously buggy; there's some horribleness in the masked array implementation that means the output array that np.gradient is writing into ends up sharing a mask array with the input. So as we go along calculating the output, we also mutate the input. This can definitely corrupt the input, for example here: https://gist.github.com/njsmith/551738469b74d175e039 ...and I strongly suspect it's possible to find cases where it means you get the wrong answer. For mysterious reasons involving the order in which values were calculated or something, the version of np.gradient in 1.9.0 tickled this bug in a much worse manner, with early mutations to the input array ended up affecting later calculations using the same input array, causing cascading errors. This is the cause of the massive problems that matplotlib encountered in their test suite: https://github.com/matplotlib/matplotlib/issues/3598#issuecomment-58859663 There's a proposed fix for the masked array issues at: https://github.com/numpy/numpy/pull/5203 For the reasons described above, I would be very wary about using *any* currently released version of np.gradient on masked arrays. A workaround would be to make a local copy of the definition of np.gradient, and immediately after the 'out' array is allocated do 'out.mask = out.mask.copy()'. Once this bug is fixed, the "new" gradient code produces results which are identical to the "old" gradient code on the interior; only the boundary is different. ------ So here are my concerns: - We decided to revert the changes to np.gradient in 1.9.1 (at least by default). I'm not sure how much of that decision was based on the issues with masked arrays, though, which turns out to be a different issue entirely. The previous discussion conflated the two issues above. - We decided to gate the more-accurate boundary calculation behind a kwarg called "edge_order=2". AFAICT now that I actually understand what this code is doing, this is a terrible name -- we're using a 3-coefficient kernel to compute a quadratically accurate approximation to a first-order derivative. There probably exist other kernels that are also quadratically accurate. "Order 2" simply doesn't refer to this in any unique or meaningful way. And it will be even more confusing if we ever add the option to select which kernel to use on the interior, where "quadratically accurate" is definitely not enough information to uniquely define a kernel. Maybe we want something like edge_accuracy=2 or edge_accuracy="quadratic" or something? I'm not sure. - If edge_order=2 escapes into a release tomorrow then we'll be stuck with it. Some possible options for 1.9.1: - Just revert the entire np.gradient code to match 1.8, and put off a real fix until 1.9. This at least avoids getting stuck with a poor API. - Put the masked array bug fix into 1.9.1, and leave the gradient code the same as 1.9.0; put off designing a proper API for selecting the old behavior for 1.9.2. (Or 1.10 I guess.) This doesn't solve the pure reproduceability problems, but it might be enough to make matplotlib work again at least? - Delay 1.9.1 to rehash the issues and make a decision on what we want to support long-term. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From matthew.brett at gmail.com Sat Oct 18 21:23:46 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 18 Oct 2014 18:23:46 -0700 Subject: [Numpy-discussion] np.gradient In-Reply-To: References: Message-ID: Hi, On Sat, Oct 18, 2014 at 6:17 PM, Nathaniel Smith wrote: > Okay! I think I now actually understand what was going on with > np.gradient! The discussion has been pretty confused, and I'm worried > that what's in master right now is not the right solution, which is a > problem because we're supposed to cut 1.9.1 tomorrow. > > Background: > > np.gradient computes gradients using a finite difference method. > Wikipedia has a good explanation of how this works: > https://en.wikipedia.org/wiki/Numerical_differentiation > The key point is that there are multiple formulas one can use that all > converge to the derivative, e.g. > > (f(x + h) - f(x)) / h > > or > > (f(x + h) - f(x - h)) / 2h > > The first is the textbook definition of the derivative. As h -> 0, the > error in that approximation shrinks like O(h). The second formula also > converges to the derivative as h -> 0, but it converges like O(h^2), > i.e., much faster. And there's are many many formulas like this with > different trade-offs: > https://en.wikipedia.org/wiki/Finite_difference_coefficient > In practice, given a grid of values of f(x) and a grid stepsize of h, > all of these formulas come down to doing a simple convolution of the > data with certain coefficients (e.g. [-1/h, 1/h] or [-1/2h, 0, 1/2h] > for the two formulas above). > > Traditionally np.gradient has used the second formula above, with its > quadratically diminishing error term, for interior points in the grid. > For points on the boundary of the grid, though, this formula has a > problem, b/c it requires you look at points to both the left and the > right of the current point -- but for boundary points one of these > doesn't exist. In such situations np.gradient has traditionally used > the first formula instead (the "forward (or backward) finite > difference approximation with linear error", where > "forward"/"backward" means that it works on the boundary). As the web > page linked above shows, though, there's an easy alternative formula > that works on Did you lose some text here? > The change in numpy 1.9.0 that has caused all the fuss, is that we > switched to using the quadratically accurate approximation instead of > the linearly accurate approximation for boundary points. > > This had two effects: > > 1) it produced (hopefully small) changes in the calculation for the > boundaries. In general these should have made the calculations more > accurate, but apparently this did cause problems for some people who > wanted perfectly reproducible output. > > 2) it tickled a nasty bug in how gradient and masked arrays work > together. It turns out gradient applied to masked arrays has always > been seriously buggy; there's some horribleness in the masked array > implementation that means the output array that np.gradient is writing > into ends up sharing a mask array with the input. So as we go along > calculating the output, we also mutate the input. This can definitely > corrupt the input, for example here: > https://gist.github.com/njsmith/551738469b74d175e039 > ...and I strongly suspect it's possible to find cases where it means > you get the wrong answer. > > For mysterious reasons involving the order in which values were > calculated or something, the version of np.gradient in 1.9.0 tickled > this bug in a much worse manner, with early mutations to the input > array ended up affecting later calculations using the same input > array, causing cascading errors. This is the cause of the massive > problems that matplotlib encountered in their test suite: > https://github.com/matplotlib/matplotlib/issues/3598#issuecomment-58859663 > > There's a proposed fix for the masked array issues at: > https://github.com/numpy/numpy/pull/5203 > For the reasons described above, I would be very wary about using > *any* currently released version of np.gradient on masked arrays. A > workaround would be to make a local copy of the definition of > np.gradient, and immediately after the 'out' array is allocated do > 'out.mask = out.mask.copy()'. > > Once this bug is fixed, the "new" gradient code produces results which > are identical to the "old" gradient code on the interior; only the > boundary is different. > > ------ > > So here are my concerns: > > - We decided to revert the changes to np.gradient in 1.9.1 (at least > by default). I'm not sure how much of that decision was based on the > issues with masked arrays, though, which turns out to be a different > issue entirely. The previous discussion conflated the two issues > above. > > - We decided to gate the more-accurate boundary calculation behind a > kwarg called "edge_order=2". AFAICT now that I actually understand > what this code is doing, this is a terrible name -- we're using a > 3-coefficient kernel to compute a quadratically accurate approximation > to a first-order derivative. There probably exist other kernels that > are also quadratically accurate. "Order 2" simply doesn't refer to > this in any unique or meaningful way. And it will be even more > confusing if we ever add the option to select which kernel to use on > the interior, where "quadratically accurate" is definitely not enough > information to uniquely define a kernel. Maybe we want something like > edge_accuracy=2 or edge_accuracy="quadratic" or something? I'm not > sure. > > - If edge_order=2 escapes into a release tomorrow then we'll be stuck with it. > > Some possible options for 1.9.1: > - Just revert the entire np.gradient code to match 1.8, and put off a > real fix until 1.9. This at least avoids getting stuck with a poor > API. > > - Put the masked array bug fix into 1.9.1, and leave the gradient code > the same as 1.9.0; put off designing a proper API for selecting the > old behavior for 1.9.2. (Or 1.10 I guess.) This doesn't solve the pure > reproduceability problems, but it might be enough to make matplotlib > work again at least? > > - Delay 1.9.1 to rehash the issues and make a decision on what we want > to support long-term. Excellent summary, thanks for doing all the work to get to the bottom of the problem. Is there any significant disadvantage to delaying the 1.9.1 release for a few days? I would appreciate a short delay to see if it's possible to agree a workaround for the OSX Accelerate bug : https://github.com/numpy/numpy/pull/5205 - but in any case, unless there is a compelling reason to release tomorrow, it seems reasonable to take the time to agree on a good solution to the np.gradient API. Cheers, Matthew From njs at pobox.com Sat Oct 18 21:35:55 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 19 Oct 2014 02:35:55 +0100 Subject: [Numpy-discussion] np.gradient In-Reply-To: References: Message-ID: On Sun, Oct 19, 2014 at 2:23 AM, Matthew Brett wrote: > Hi, > > On Sat, Oct 18, 2014 at 6:17 PM, Nathaniel Smith wrote: >> Okay! I think I now actually understand what was going on with >> np.gradient! The discussion has been pretty confused, and I'm worried >> that what's in master right now is not the right solution, which is a >> problem because we're supposed to cut 1.9.1 tomorrow. >> >> Background: >> >> np.gradient computes gradients using a finite difference method. >> Wikipedia has a good explanation of how this works: >> https://en.wikipedia.org/wiki/Numerical_differentiation >> The key point is that there are multiple formulas one can use that all >> converge to the derivative, e.g. >> >> (f(x + h) - f(x)) / h >> >> or >> >> (f(x + h) - f(x - h)) / 2h >> >> The first is the textbook definition of the derivative. As h -> 0, the >> error in that approximation shrinks like O(h). The second formula also >> converges to the derivative as h -> 0, but it converges like O(h^2), >> i.e., much faster. And there's are many many formulas like this with >> different trade-offs: >> https://en.wikipedia.org/wiki/Finite_difference_coefficient >> In practice, given a grid of values of f(x) and a grid stepsize of h, >> all of these formulas come down to doing a simple convolution of the >> data with certain coefficients (e.g. [-1/h, 1/h] or [-1/2h, 0, 1/2h] >> for the two formulas above). >> >> Traditionally np.gradient has used the second formula above, with its >> quadratically diminishing error term, for interior points in the grid. >> For points on the boundary of the grid, though, this formula has a >> problem, b/c it requires you look at points to both the left and the >> right of the current point -- but for boundary points one of these >> doesn't exist. In such situations np.gradient has traditionally used >> the first formula instead (the "forward (or backward) finite >> difference approximation with linear error", where >> "forward"/"backward" means that it works on the boundary). As the web >> page linked above shows, though, there's an easy alternative formula >> that works on > > Did you lose some text here? "There's an easy alternative formula that works on edge points and provides quadratic accuracy." Not too critical, you probably figured out the gist of it :-) -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From njs at pobox.com Sat Oct 18 21:46:41 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 19 Oct 2014 02:46:41 +0100 Subject: [Numpy-discussion] Add an axis argument to generalized ufuncs? In-Reply-To: References: Message-ID: On Sat, Oct 18, 2014 at 5:56 AM, Stephan Hoyer wrote: > Here are my initial thoughts on the syntax: > > (1) Generally speaking, I think the "nested tuple" syntax (e.g., axis=[(0, > 1), (2, 3)]) would be most congruous with the axis arguments numpy already > supports. > > (2) For gufuncs with simpler signatures, we should support supplying an > integer or an unnested tuple, e.g., > - axis=0 for (n)->() > - axis=(0, 1) for (n)(m)->() or (n,m)->() > - axis=[(0, 1), 2] for (n,m),(o)->(). One thing we'll have to watch out for is that for reduction operations (which are basically gufuncs with (n)->() signatures), we already allow axis=(0,1) to mean "reshape axes 0 and 1 together into one big axis, and then use that as the gufunc core axis". I don't know if we'll ever want to support this functionality for gufuncs in general, but we shouldn't rule it out with the syntax. One option would be to add a new argument axes=... for gufunc core specification, and say that axis=foo is an alias for axes=[[foo]]. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From charlesr.harris at gmail.com Sat Oct 18 22:37:55 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 18 Oct 2014 20:37:55 -0600 Subject: [Numpy-discussion] np.gradient In-Reply-To: References: Message-ID: On Sat, Oct 18, 2014 at 7:17 PM, Nathaniel Smith wrote: > Okay! I think I now actually understand what was going on with > np.gradient! The discussion has been pretty confused, and I'm worried > that what's in master right now is not the right solution, which is a > problem because we're supposed to cut 1.9.1 tomorrow. > > Background: > > np.gradient computes gradients using a finite difference method. > Wikipedia has a good explanation of how this works: > https://en.wikipedia.org/wiki/Numerical_differentiation > The key point is that there are multiple formulas one can use that all > converge to the derivative, e.g. > > (f(x + h) - f(x)) / h > > or > > (f(x + h) - f(x - h)) / 2h > > The first is the textbook definition of the derivative. As h -> 0, the > error in that approximation shrinks like O(h). The second formula also > converges to the derivative as h -> 0, but it converges like O(h^2), > i.e., much faster. And there's are many many formulas like this with > different trade-offs: > https://en.wikipedia.org/wiki/Finite_difference_coefficient > In practice, given a grid of values of f(x) and a grid stepsize of h, > all of these formulas come down to doing a simple convolution of the > data with certain coefficients (e.g. [-1/h, 1/h] or [-1/2h, 0, 1/2h] > for the two formulas above). > > Traditionally np.gradient has used the second formula above, with its > quadratically diminishing error term, for interior points in the grid. > For points on the boundary of the grid, though, this formula has a > problem, b/c it requires you look at points to both the left and the > right of the current point -- but for boundary points one of these > doesn't exist. In such situations np.gradient has traditionally used > the first formula instead (the "forward (or backward) finite > difference approximation with linear error", where > "forward"/"backward" means that it works on the boundary). As the web > page linked above shows, though, there's an easy alternative formula > that works on > > The change in numpy 1.9.0 that has caused all the fuss, is that we > switched to using the quadratically accurate approximation instead of > the linearly accurate approximation for boundary points. > > This had two effects: > > 1) it produced (hopefully small) changes in the calculation for the > boundaries. In general these should have made the calculations more > accurate, but apparently this did cause problems for some people who > wanted perfectly reproducible output. > > 2) it tickled a nasty bug in how gradient and masked arrays work > together. It turns out gradient applied to masked arrays has always > been seriously buggy; there's some horribleness in the masked array > implementation that means the output array that np.gradient is writing > into ends up sharing a mask array with the input. So as we go along > calculating the output, we also mutate the input. This can definitely > corrupt the input, for example here: > https://gist.github.com/njsmith/551738469b74d175e039 > ...and I strongly suspect it's possible to find cases where it means > you get the wrong answer. > > For mysterious reasons involving the order in which values were > calculated or something, the version of np.gradient in 1.9.0 tickled > this bug in a much worse manner, with early mutations to the input > array ended up affecting later calculations using the same input > array, causing cascading errors. This is the cause of the massive > problems that matplotlib encountered in their test suite: > > https://github.com/matplotlib/matplotlib/issues/3598#issuecomment-58859663 > > There's a proposed fix for the masked array issues at: > https://github.com/numpy/numpy/pull/5203 > For the reasons described above, I would be very wary about using > *any* currently released version of np.gradient on masked arrays. A > workaround would be to make a local copy of the definition of > np.gradient, and immediately after the 'out' array is allocated do > 'out.mask = out.mask.copy()'. > > Once this bug is fixed, the "new" gradient code produces results which > are identical to the "old" gradient code on the interior; only the > boundary is different. > > ------ > > So here are my concerns: > > - We decided to revert the changes to np.gradient in 1.9.1 (at least > by default). I'm not sure how much of that decision was based on the > issues with masked arrays, though, which turns out to be a different > issue entirely. The previous discussion conflated the two issues > above. > My concern was reproducibility. The old behavior wasn't a bug, so we should be careful that old results are reproducible. > > - We decided to gate the more-accurate boundary calculation behind a > kwarg called "edge_order=2". AFAICT now that I actually understand > what this code is doing, this is a terrible name -- we're using a > 3-coefficient kernel to compute a quadratically accurate approximation > to a first-order derivative. Accuracy is a different problem and depends on the function being interpolated. As you point out above, the order refers to the *rate* of convergence. Normally that is illustrated with a loglog plot of the absolute value of the error against h, resulting in a straight line once the function is sufficiently smooth over the range of h. The end points are a bit special because of the lack of bracketing. Relative to the interior, one is effectively extrapolating rather than interpolating and things can become a bit unstable. Hence it is useful to have a safe option, here linear extrapolation, as an alternative to a higher order method. > There probably exist other kernels that > are also quadratically accurate. "Order 2" simply doesn't refer to > this in any unique or meaningful way. And it will be even more > confusing if we ever add the option to select which kernel to use on > the interior, where "quadratically accurate" is definitely not enough > information to uniquely define a kernel. Maybe we want something like > edge_accuracy=2 or edge_accuracy="quadratic" or something? I'm not > sure. > > - If edge_order=2 escapes into a release tomorrow then we'll be stuck with > it. > Order has two common meanings in the numerical context, either degree, or, for polynomials, the number of coefficients (degree + 1). I've noticed that the meaning has evolved over the years, and these days an equivalence with degree seems to be pretty common. In the present context, it refers to the power of the h term in the error term of the Taylor's series approximation of the derivative. The precise meaning needs to be elucidated in the notes, as the order doesn't map one-to-one into methods. > Some possible options for 1.9.1: > - Just revert the entire np.gradient code to match 1.8, and put off a > real fix until 1.9. This at least avoids getting stuck with a poor > API. > > - Put the masked array bug fix into 1.9.1, and leave the gradient code > the same as 1.9.0; put off designing a proper API for selecting the > old behavior for 1.9.2. (Or 1.10 I guess.) This doesn't solve the pure > reproduceability problems, but it might be enough to make matplotlib > work again at least? > The only reason I see to keep the current default is to maintain behavior from 1.9.0 to 1.9.1. I don't think 1.9.0 will last long in the wild. Anyone working on the cutting edge will likely update almost immediately, and distros releases in the near future will probably have 1.8.2. > - Delay 1.9.1 to rehash the issues and make a decision on what we want > to support long-term. > Might want to delay a few days in any case to settle the Mac/Windows issues. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Sun Oct 19 03:25:43 2014 From: shoyer at gmail.com (Stephan Hoyer) Date: Sun, 19 Oct 2014 00:25:43 -0700 Subject: [Numpy-discussion] Add an axis argument to generalized ufuncs? In-Reply-To: References: Message-ID: On Sat, Oct 18, 2014 at 6:46 PM, Nathaniel Smith wrote: > One thing we'll have to watch out for is that for reduction operations > (which are basically gufuncs with (n)->() signatures), we already > allow axis=(0,1) to mean "reshape axes 0 and 1 together into one big > axis, and then use that as the gufunc core axis". I don't know if > we'll ever want to support this functionality for gufuncs in general, > but we shouldn't rule it out with the syntax. > This is a great point. In fact, I think supporting this sort of functionality for gufuncs would be quite valuable, since there are a plenty of reduction operations that can't fit into the model provided by ufunc.reduce. An excellent example is np.median, which currently can only act on either one axis or an entire flattened array. If the syntax (m?,n),(n,p?)->(m?,p?) is accepted, then I think the natural extension to reduction operators that can act on one or more axes would be (n+)->() (this is regex syntax). Actually, adding using an axis keyword seems like the only elegant way to handle disambiguating cases like this. > One option would be to add a new argument axes=... for gufunc core > specification, and say that axis=foo is an alias for axes=[[foo]]. > Indeed, this is exactly what I was thinking. The "canonical form" for the axis argument would be doubly nested tuples, but if an integer or unnested tuple is encountered, additional nesting should be added until reaching canoncial form, e.g., axis=0 -> axis=(0,) -> axis=((0,),). The only particularly tricky case will be scenarios like my second one, axis=(0, 1) for (n)(m)->() or (n,m)->(). To deal with cases like this, the parsing will need to take the gufunc signature into consideration, and start by asking whether or not tuple is of the right size to match each function argument separately. To make it clear that this proposal covers all the bases, I would be happy to write some prototype code (and test cases) to demonstrate such a transformation to canonical form, including all these edge cases. Cheers, Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Oct 19 09:43:02 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 19 Oct 2014 14:43:02 +0100 Subject: [Numpy-discussion] Add an axis argument to generalized ufuncs? In-Reply-To: References: Message-ID: On Sun, Oct 19, 2014 at 8:25 AM, Stephan Hoyer wrote: > On Sat, Oct 18, 2014 at 6:46 PM, Nathaniel Smith wrote: >> >> One thing we'll have to watch out for is that for reduction operations >> (which are basically gufuncs with (n)->() signatures), we already >> allow axis=(0,1) to mean "reshape axes 0 and 1 together into one big >> axis, and then use that as the gufunc core axis". I don't know if >> we'll ever want to support this functionality for gufuncs in general, >> but we shouldn't rule it out with the syntax. > > > This is a great point. > > In fact, I think supporting this sort of functionality for gufuncs would be > quite valuable, since there are a plenty of reduction operations that can't > fit into the model provided by ufunc.reduce. An excellent example is > np.median, which currently can only act on either one axis or an entire > flattened array. > > If the syntax (m?,n),(n,p?)->(m?,p?) is accepted, then I think the natural > extension to reduction operators that can act on one or more axes would be > (n+)->() (this is regex syntax). It's not clear we even need to alter the signature here -- the reduction operations don't bother distinguishing between reductions that make sense in this case (the commutative ones) and the ones that don't (everything else), they just trust that no-one will try doing something like np.subtract.reduce(arr, axis=(0, 1)) because it's meaningless. Providing some basic checks here might be useful though given that gufunc signatures can be much more complicated than just (n)->(). > Actually, adding using an axis keyword seems like the only elegant way to > handle disambiguating cases like this. > >> >> One option would be to add a new argument axes=... for gufunc core >> specification, and say that axis=foo is an alias for axes=[[foo]]. > > > Indeed, this is exactly what I was thinking. The "canonical form" for the > axis argument would be doubly nested tuples, but if an integer or unnested > tuple is encountered, additional nesting should be added until reaching > canoncial form, e.g., axis=0 -> axis=(0,) -> axis=((0,),). > > The only particularly tricky case will be scenarios like my second one, > axis=(0, 1) for (n)(m)->() or (n,m)->(). To deal with cases like this, the > parsing will need to take the gufunc signature into consideration, and start > by asking whether or not tuple is of the right size to match each function > argument separately. Right, the problem with (0, 1) in this system is that you can either read it as being a single reshaping axis description and expand it to ((0, 1),), or you can read it as being two non-reshaping axis descriptions and expand it to ((0,), (1,)). I feel strongly that we should come up with a syntax that is unambiguous even *without* looking at the gufunc signature. It's easy for the computer to disambiguate stuff like this, but it'd be cruel to ask people trying to skim through code to work out the signature and then simulate the disambiguation algorithm in their head. Notice in my suggestion above there are two different kwargs, "axis" and "axes". -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From njs at pobox.com Sun Oct 19 10:13:42 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 19 Oct 2014 15:13:42 +0100 Subject: [Numpy-discussion] np.gradient In-Reply-To: References: Message-ID: On Sun, Oct 19, 2014 at 3:37 AM, Charles R Harris wrote: > > On Sat, Oct 18, 2014 at 7:17 PM, Nathaniel Smith wrote: >> >> So here are my concerns: >> >> - We decided to revert the changes to np.gradient in 1.9.1 (at least >> by default). I'm not sure how much of that decision was based on the >> issues with masked arrays, though, which turns out to be a different >> issue entirely. The previous discussion conflated the two issues >> above. > > > My concern was reproducibility. The old behavior wasn't a bug, so we should > be careful that old results are reproducible. Yep. >> - We decided to gate the more-accurate boundary calculation behind a >> kwarg called "edge_order=2". AFAICT now that I actually understand >> what this code is doing, this is a terrible name -- we're using a >> 3-coefficient kernel to compute a quadratically accurate approximation >> to a first-order derivative. > > > Accuracy is a different problem and depends on the function being > interpolated. As you point out above, the order refers to the *rate* of > convergence. Normally that is illustrated with a loglog plot of the absolute > value of the error against h, resulting in a straight line once the function > is sufficiently smooth over the range of h. The end points are a bit special > because of the lack of bracketing. Relative to the interior, one is > effectively extrapolating rather than interpolating and things can become a > bit unstable. Hence it is useful to have a safe option, here linear > extrapolation, as an alternative to a higher order method. This sounds plausible! But given that we have to (a) pick defaults, (b) pick which ones are included at all, and (c) document the difference in such a way that users can make an informed choice, then I'd kinda like... more precision :-). I'm sure there are situations where one or the other is better, but which situations are those? Do you know some way to tell which is which? Does one situation arise substantially more often than the other? >> There probably exist other kernels that >> are also quadratically accurate. "Order 2" simply doesn't refer to >> this in any unique or meaningful way. And it will be even more >> confusing if we ever add the option to select which kernel to use on >> the interior, where "quadratically accurate" is definitely not enough >> information to uniquely define a kernel. Maybe we want something like >> edge_accuracy=2 or edge_accuracy="quadratic" or something? I'm not >> sure. >> >> - If edge_order=2 escapes into a release tomorrow then we'll be stuck with >> it. > > Order has two common meanings in the numerical context, either degree, or, > for polynomials, the number of coefficients (degree + 1). I've noticed that > the meaning has evolved over the years, and these days an equivalence with > degree seems to be pretty common. In the present context, it refers to the > power of the h term in the error term of the Taylor's series approximation > of the derivative. The precise meaning needs to be elucidated in the notes, > as the order doesn't map one-to-one into methods. Surely this is still an argument for using a word that requires less elucidation, like degree or accuracy? (I'm particularly concerned because "2nd order derivative" has a *very* well known meaning that's very importantly different.) -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From charlesr.harris at gmail.com Sun Oct 19 11:46:29 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 19 Oct 2014 09:46:29 -0600 Subject: [Numpy-discussion] np.gradient In-Reply-To: References: Message-ID: On Sun, Oct 19, 2014 at 8:13 AM, Nathaniel Smith wrote: > On Sun, Oct 19, 2014 at 3:37 AM, Charles R Harris > wrote: > > > > On Sat, Oct 18, 2014 at 7:17 PM, Nathaniel Smith wrote: > >> > >> So here are my concerns: > >> > >> - We decided to revert the changes to np.gradient in 1.9.1 (at least > >> by default). I'm not sure how much of that decision was based on the > >> issues with masked arrays, though, which turns out to be a different > >> issue entirely. The previous discussion conflated the two issues > >> above. > > > > > > My concern was reproducibility. The old behavior wasn't a bug, so we > should > > be careful that old results are reproducible. > > Yep. > > >> - We decided to gate the more-accurate boundary calculation behind a > >> kwarg called "edge_order=2". AFAICT now that I actually understand > >> what this code is doing, this is a terrible name -- we're using a > >> 3-coefficient kernel to compute a quadratically accurate approximation > >> to a first-order derivative. > > > > > > Accuracy is a different problem and depends on the function being > > interpolated. As you point out above, the order refers to the *rate* of > > convergence. Normally that is illustrated with a loglog plot of the > absolute > > value of the error against h, resulting in a straight line once the > function > > is sufficiently smooth over the range of h. The end points are a bit > special > > because of the lack of bracketing. Relative to the interior, one is > > effectively extrapolating rather than interpolating and things can > become a > > bit unstable. Hence it is useful to have a safe option, here linear > > extrapolation, as an alternative to a higher order method. > > This sounds plausible! But given that we have to (a) pick defaults, > (b) pick which ones are included at all, and (c) document the > difference in such a way that users can make an informed choice, then > I'd kinda like... more precision :-). I'm sure there are situations > where one or the other is better, but which situations are those? Do > you know some way to tell which is which? Does one situation arise > substantially more often than the other? > A challenge ;) In [34]: x = arange(11)/10. In [35]: y = exp(-1./x + -1./(1 - x)) In [36]: y[-1] = y[0] = 0 In [37]: plot(x, gradient(y, .01, edge_order=1)) Out[37]: [] In [38]: plot(x, gradient(y, .01, edge_order=2)) Out[38]: [] A bit artificial I'll admit. Most functions with reasonable sample points do better with the second order version. The main argument for keeping the linear default is backward compatibility. Absent that, second order would be preferable. >> There probably exist other kernels that > >> are also quadratically accurate. "Order 2" simply doesn't refer to > >> this in any unique or meaningful way. And it will be even more > >> confusing if we ever add the option to select which kernel to use on > >> the interior, where "quadratically accurate" is definitely not enough > >> information to uniquely define a kernel. Maybe we want something like > >> edge_accuracy=2 or edge_accuracy="quadratic" or something? I'm not > >> sure. > >> > >> - If edge_order=2 escapes into a release tomorrow then we'll be stuck > with > >> it. > > > > Order has two common meanings in the numerical context, either degree, > or, > > for polynomials, the number of coefficients (degree + 1). I've noticed > that > > the meaning has evolved over the years, and these days an equivalence > with > > degree seems to be pretty common. In the present context, it refers to > the > > power of the h term in the error term of the Taylor's series > approximation > > of the derivative. The precise meaning needs to be elucidated in the > notes, > > as the order doesn't map one-to-one into methods. > > Surely this is still an argument for using a word that requires less > elucidation, like degree or accuracy? (I'm particularly concerned > because "2nd order derivative" has a *very* well known meaning that's > very importantly different.) > > Order is well understood in context, but you have a point for naive users. Maybe a string would be better, "edge_method='linear', " etc. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Sun Oct 19 15:52:03 2014 From: shoyer at gmail.com (Stephan Hoyer) Date: Sun, 19 Oct 2014 12:52:03 -0700 Subject: [Numpy-discussion] Add an axis argument to generalized ufuncs? In-Reply-To: References: Message-ID: On Sun, Oct 19, 2014 at 6:43 AM, Nathaniel Smith wrote: > I feel strongly that we should come up with a syntax that is > unambiguous even *without* looking at the gufunc signature. It's easy > for the computer to disambiguate stuff like this, but it'd be cruel to > ask people trying to skim through code to work out the signature and > then simulate the disambiguation algorithm in their head. > Since code speaks stronger than mere words, here is a notebook showing my disambiguation algorithm: http://nbviewer.ipython.org/gist/shoyer/7740d32850084261d870 I don't think this is so cruel, but I agree that the logic is more complex than ideal: "If the axis argument is a sequence with length equal to the number of variables with axis specifications in the gufunc signature, then each element is taken to specify the axis for each corresponding variable. Otherwise, if the gufunc has only one variable with a core dimension, the entire axis argument is taken to refer to only that variable." > Notice in my suggestion above there are two different kwargs, "axis" and > "axes". Ah, I missed that. That's actually pretty elegant, so +1 from me. My only ask then would be that we allow for "axis" to also be a sequence of integers, in which case they are also used to specify the axis for the single variable, e.g., axis=(1, 2) translates to axes=[(1, 2)]. This would allow for using the axis argument in the same way as it works on ufunc.reduce already. I don't think distinguishing cases for "integer" vs "tuple of integers" is too complex. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeffreback at gmail.com Sun Oct 19 17:34:43 2014 From: jeffreback at gmail.com (Jeff Reback) Date: Sun, 19 Oct 2014 17:34:43 -0400 Subject: [Numpy-discussion] ANN: Pandas 0.15.0 released Message-ID: Hello, We are proud to announce v0.15.0 of pandas, a major release from 0.14.1. This release includes a small number of API changes, several new features, enhancements, and performance improvements along with a large number of bug fixes. This was 4 months of work with 420 commits by 79 authors encompassing 236 issues. We recommend that all users upgrade to this version. *Highlights:* - Drop support for numpy < 1.7.0 - The Categorical type was integrated as a first-class pandas type - New scalar type Timedelta, and a new index type TimedeltaIndex - New DataFrame default display for df.info() to include memory usage - New datetimelike properties accessor .dt for Series - Split indexing documentation into Indexing and Selecting Data and MultiIndex / Advanced Indexing - Split out string methods documentation into Working with Text Data - read_csv will now by default ignore blank lines when parsing - API change in using Indexes in set operations - Internal refactoring of the Index class to no longer sub-class ndarray - dropping support for PyTables less than version 3.0.0, and numexpr less than version 2.1 See a full description of Whatsnew for v0.15.0 here: http://pandas.pydata.org/pandas-docs/stable/whatsnew.html *What is it:* *pandas* is a Python package providing fast, flexible, and expressive data structures designed to make working with ?relational? or ?labeled? data both easy and intuitive. It aims to be the fundamental high-level building block for doing practical, real world data analysis in Python. Additionally, it has the broader goal of becoming the most powerful and flexible open source data analysis / manipulation tool available in any language. Documentation: http://pandas.pydata.org/pandas-docs/stable/ Source tarballs, windows binaries are available on PyPI: https://pypi.python.org/pypi/pandas windows binaries are courtesy of Christoph Gohlke and are built on Numpy 1.8 macosx wheels are courtesy of Matthew Brett and are built on Numpy 1.7.1 Please report any issues here: https://github.com/pydata/pandas/issues Thanks The Pandas Development Team Contributors to the 0.15.0 release -------------- next part -------------- An HTML attachment was scrubbed... URL: From bwana.marko at yahoo.com Tue Oct 21 12:43:03 2014 From: bwana.marko at yahoo.com (Mark Mikofski) Date: Tue, 21 Oct 2014 09:43:03 -0700 Subject: [Numpy-discussion] MKL not available as separate download since 10/16/2014 Message-ID: <1413909783.57818.YahooMailNeo@web121402.mail.ne1.yahoo.com> the old MKL links point to 404.html the new MKL link has this link in the right sidebar: Changes to Intel? Software Development Products | Intel? Developer Zone Changes to Intel? Software Development Products | Intel? Developer Zone Intel? Parallel Studio XE simplification Intel? Parallel Studio XE has been simplified to three editions, Composer, Professional and Cluster. Here?s a table showing the old and new simplified names. More information about Intel Parallel Studio editions is here. View on software.intel.com Preview by Yahoo which has the following statement: Ingredient product availability changes New licenses of the following ingredient products will be available only in suites going forward. Current licensees of these ingredient products can continue to renew support maintenance for their existing license or upgrade to a suite. For more info about the ingredients below, click the product name below: Intel? Integrated Performance Primitives Intel? Math Kernel Library Intel? Threading Building Blocks Intel? Inspector XE ?As I breath in, I calm my body, as I breath out, I smile? - Thich_Nhat_Hanh -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.klemm at intel.com Tue Oct 21 15:10:27 2014 From: michael.klemm at intel.com (Klemm, Michael) Date: Tue, 21 Oct 2014 19:10:27 +0000 Subject: [Numpy-discussion] MKL not available as separate download since 10/16/2014 In-Reply-To: <1413909783.57818.YahooMailNeo@web121402.mail.ne1.yahoo.com> References: <1413909783.57818.YahooMailNeo@web121402.mail.ne1.yahoo.com> Message-ID: <0DAB4B4FC42EAA41802458ADA9C2F8242FE77890@IRSMSX104.ger.corp.intel.com> Dear Mark, Can you tell me (maybe through private email), what package you had before? We had some repackaging in the past, which unfortunately also means that some of the previously packages have now been merged into the below mentioned suites. If you would like, I can follow-up with our internal product teams to see if we are able to help here. That?s no guarantee, but I can try ?. Cheers, -michael Dr.-Ing. Michael Klemm Senior Application Engineer Software and Services Group Developer Relations Division Phone +49 89 9914 2340 Cell +49 174 2417583 From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Mark Mikofski Sent: Tuesday, October 21, 2014 6:43 PM To: numpy-discussion at scipy.org Subject: [Numpy-discussion] MKL not available as separate download since 10/16/2014 the old MKL links point to 404.html the new MKL link has this link in the right sidebar: Changes to Intel? Software Development Products | Intel? Developer Zone Changes to Intel? Software Development Products | Intel? Developer Zone Intel? Parallel Studio XE simplification Intel? Parallel Studio XE has been simplified to three editions, Composer, Professional and Cluster. Here?s a table showing the old and new simplified names. More information about Intel Parallel Studio editions is here. View on software.intel.com Preview by Yahoo which has the following statement: Ingredient product availability changes New licenses of the following ingredient products will be available only in suites going forward. Current licensees of these ingredient products can continue to renew support maintenance for their existing license or upgrade to a suite. For more info about the ingredients below, click the product name below: Intel? Integrated Performance Primitives Intel? Math Kernel Library Intel? Threading Building Blocks Intel? Inspector XE ?As I breath in, I calm my body, as I breath out, I smile? - Thich_Nhat_Hanh Intel GmbH Dornacher Strasse 1 85622 Feldkirchen/Muenchen, Deutschland Sitz der Gesellschaft: Feldkirchen bei Muenchen Geschaeftsfuehrer: Christian Lamprechter, Hannes Schwaderer, Douglas Lusk Registergericht: Muenchen HRB 47456 Ust.-IdNr./VAT Registration No.: DE129385895 Citibank Frankfurt a.M. (BLZ 502 109 00) 600119052 -------------- next part -------------- An HTML attachment was scrubbed... URL: From matt.gregory at oregonstate.edu Tue Oct 21 18:18:41 2014 From: matt.gregory at oregonstate.edu (Matt Gregory) Date: Tue, 21 Oct 2014 15:18:41 -0700 Subject: [Numpy-discussion] return unique combinations of stacked arrays - slow Message-ID: I'm trying to create an output array of integers where each value represents a unique combination of values from (1..n) input arrays. As a simple example, given these three arrays: a = np.array([0, 1, 2, 3, 0, 1, 2, 3]) b = np.array([0, 1, 0, 1, 0, 1, 0, 1]) c = np.array([0, 1, 1, 0, 0, 1, 0, 1]) I want an output array that holds 'codes' for the unique combinations and a dictionary that holds the unique combinations as keys and codes as values. out = np.array([0, 1, 2, 3, 0, 1, 4, 5]) out_dict = { (0, 0, 0): 0, (1, 1, 1): 1, (2, 0, 1): 2, (3, 1, 0): 3, (2, 0, 0): 4, (3, 1, 1): 5, } An additional constraint is that I'm bringing in the (a, b, c) arrays a chunk at a time due to memory limits (ie. very large rasters) and so I need to retain the mapping between chunks. My current (very naive and pretty slow) implementation in loop form is: out_dict = {} out = np.zeros_like(a) count = 0 stack = np.vstack((a, b, c)).T for (i, arr) in enumerate(stack): t = tuple(arr) if t not in out_dict: out_dict[t] = count count += 1 out[i] = out_dict[t] Thanks for help, matt From charlesr.harris at gmail.com Tue Oct 21 18:40:24 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 21 Oct 2014 16:40:24 -0600 Subject: [Numpy-discussion] return unique combinations of stacked arrays - slow In-Reply-To: References: Message-ID: On Tue, Oct 21, 2014 at 4:18 PM, Matt Gregory wrote: > I'm trying to create an output array of integers where each value > represents a unique combination of values from (1..n) input arrays. As > a simple example, given these three arrays: > > a = np.array([0, 1, 2, 3, 0, 1, 2, 3]) > b = np.array([0, 1, 0, 1, 0, 1, 0, 1]) > c = np.array([0, 1, 1, 0, 0, 1, 0, 1]) > > I want an output array that holds 'codes' for the unique combinations > and a dictionary that holds the unique combinations as keys and codes as > values. > > out = np.array([0, 1, 2, 3, 0, 1, 4, 5]) > out_dict = { > (0, 0, 0): 0, > (1, 1, 1): 1, > (2, 0, 1): 2, > (3, 1, 0): 3, > (2, 0, 0): 4, > (3, 1, 1): 5, > } > > An additional constraint is that I'm bringing in the (a, b, c) arrays a > chunk at a time due to memory limits (ie. very large rasters) and so I > need to retain the mapping between chunks. > > My current (very naive and pretty slow) implementation in loop form is: > > out_dict = {} > out = np.zeros_like(a) > count = 0 > stack = np.vstack((a, b, c)).T > for (i, arr) in enumerate(stack): > t = tuple(arr) > if t not in out_dict: > out_dict[t] = count > count += 1 > out[i] = out_dict[t] > > Thanks for help, > matt > > See http://stackoverflow.com/questions/23268605/grouping-indices-of-unique-elements-in-numpy for some ideas. the main difference is that you can't fit everything in memory, but if there are lots of duplicates you should be able to do it in batches, then combine the batches and repeat. Another possibility if the elements are bounded is to treat them as digits in some number system and evaluate that number, i.e., dot with something like array([1, 10, 100, ...]). Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From dhyams at gmail.com Tue Oct 21 23:52:33 2014 From: dhyams at gmail.com (Daniel Hyams) Date: Wed, 22 Oct 2014 03:52:33 +0000 (UTC) Subject: [Numpy-discussion] numpy log and exceptions Message-ID: I would have thought that this snippet would raise an exception: import numpy numpy.seterr(all='raise') a = numpy.array([1.0,0.0,-1.0]) b = numpy.log(a) I get as a result (in b): [0, -Inf, NaN] It's basically the same issue as: http://numpy-discussion.10968.n7.nabble.com/numpy-log-does-not-raise- exceptions-td5854.html Except that I have explicitly set the error flags to raise exceptions. It works fine for sqrt(), but not for log(). I've checked numpy 1.4.0 and 1.7.1 and both have the same behavior. Is there a way to force the log (and log10) function to raise an exception on invalid input? From jtaylor.debian at googlemail.com Wed Oct 22 02:44:16 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 22 Oct 2014 08:44:16 +0200 Subject: [Numpy-discussion] numpy log and exceptions In-Reply-To: References: Message-ID: <54475240.3040007@googlemail.com> On 22.10.2014 05:52, Daniel Hyams wrote: > > I would have thought that this snippet would raise an exception: > > import numpy > numpy.seterr(all='raise') > a = numpy.array([1.0,0.0,-1.0]) > b = numpy.log(a) > > I get as a result (in b): [0, -Inf, NaN] > > It's basically the same issue as: > > http://numpy-discussion.10968.n7.nabble.com/numpy-log-does-not-raise- > exceptions-td5854.html > > Except that I have explicitly set the error flags to raise exceptions. It > works fine for sqrt(), but not for log(). I've checked numpy 1.4.0 and > 1.7.1 and both have the same behavior. > > Is there a way to force the log (and log10) function to raise an exception > on invalid input? > What platform are you using? whether you get exceptions or not depends on your math library. From dhyams at gmail.com Wed Oct 22 08:00:37 2014 From: dhyams at gmail.com (Daniel Hyams) Date: Wed, 22 Oct 2014 12:00:37 +0000 (UTC) Subject: [Numpy-discussion] numpy log and exceptions References: <54475240.3040007@googlemail.com> Message-ID: Julian Taylor googlemail.com> writes: > What platform are you using? > whether you get exceptions or not depends on your math library. > Windows 7. From njs at pobox.com Wed Oct 22 09:43:05 2014 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 22 Oct 2014 14:43:05 +0100 Subject: [Numpy-discussion] numpy log and exceptions In-Reply-To: <54475240.3040007@googlemail.com> References: <54475240.3040007@googlemail.com> Message-ID: I guess we could make this more consistent by hand if we wanted - isnan is pretty cheap? On 22 Oct 2014 07:44, "Julian Taylor" wrote: > On 22.10.2014 05:52, Daniel Hyams wrote: > > > > I would have thought that this snippet would raise an exception: > > > > import numpy > > numpy.seterr(all='raise') > > a = numpy.array([1.0,0.0,-1.0]) > > b = numpy.log(a) > > > > I get as a result (in b): [0, -Inf, NaN] > > > > It's basically the same issue as: > > > > http://numpy-discussion.10968.n7.nabble.com/numpy-log-does-not-raise- > > exceptions-td5854.html > > > > Except that I have explicitly set the error flags to raise exceptions. > It > > works fine for sqrt(), but not for log(). I've checked numpy 1.4.0 and > > 1.7.1 and both have the same behavior. > > > > Is there a way to force the log (and log10) function to raise an > exception > > on invalid input? > > > > What platform are you using? > whether you get exceptions or not depends on your math library. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidmenhur at gmail.com Wed Oct 22 09:57:22 2014 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Wed, 22 Oct 2014 15:57:22 +0200 Subject: [Numpy-discussion] numpy log and exceptions In-Reply-To: References: <54475240.3040007@googlemail.com> Message-ID: On 22 October 2014 15:43, Nathaniel Smith wrote: > I guess we could make this more consistent by hand if we wanted - isnan is > pretty cheap? Can it be made avoiding storing the full bool array? The 1/8 memory overhead can be problematic for large arrays. -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Oct 22 10:02:15 2014 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 22 Oct 2014 15:02:15 +0100 Subject: [Numpy-discussion] numpy log and exceptions In-Reply-To: References: <54475240.3040007@googlemail.com> Message-ID: On 22 Oct 2014 14:57, "Da?id" wrote: > > > On 22 October 2014 15:43, Nathaniel Smith wrote: >> >> I guess we could make this more consistent by hand if we wanted - isnan is pretty cheap? > > > > Can it be made avoiding storing the full bool array? The 1/8 memory overhead can be problematic for large arrays. Yeah, we could push the check into the inner loop so the memory overhead would be, like, 1 byte. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Wed Oct 22 13:32:16 2014 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 22 Oct 2014 13:32:16 -0400 Subject: [Numpy-discussion] Bug in 1.9? Message-ID: Hello, Is this desired behaviour or a regression or a bug? http://stackoverflow.com/questions/26497656/how-do-i-align-a-numpy-record-array-recarray Thanks, Neil -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Oct 22 14:00:39 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 22 Oct 2014 12:00:39 -0600 Subject: [Numpy-discussion] Bug in 1.9? In-Reply-To: References: Message-ID: On Wed, Oct 22, 2014 at 11:32 AM, Neil Girdhar wrote: > Hello, > > Is this desired behaviour or a regression or a bug? > > > http://stackoverflow.com/questions/26497656/how-do-i-align-a-numpy-record-array-recarray > > Thanks, > I'd guess that the definition of aligned may have become stricter, that's the only thing I think has changed. Maybe Julian can comment on that. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Wed Oct 22 14:28:24 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 22 Oct 2014 20:28:24 +0200 Subject: [Numpy-discussion] Bug in 1.9? In-Reply-To: References: Message-ID: <5447F748.6050906@googlemail.com> On 22.10.2014 20:00, Charles R Harris wrote: > > > On Wed, Oct 22, 2014 at 11:32 AM, Neil Girdhar > wrote: > > Hello, > > Is this desired behaviour or a regression or a bug? > > http://stackoverflow.com/questions/26497656/how-do-i-align-a-numpy-record-array-recarray > > Thanks, > > > I'd guess that the definition of aligned may have become stricter, > that's the only thing I think has changed. Maybe Julian can comment on that. > structured dtypes have not really a well defined alignment, e.g. the stride of this is 12, so when element 0 is aligned element 1 is always unaligned. Before 1.9 structured dtype always had the aligned flag set, even if they were unaligned. Now we require a minimum alignment of 16 for strings and structured types so copying which sometimes works on the whole compound type instead of each item always works. This was the easiest way to get the testsuite running on sparc after fixing a couple of code paths not updating alignment information which forced some functions to always take super slow unaligned paths (e.g. ufunc.at) But the logic could certainly be improved. From charlesr.harris at gmail.com Wed Oct 22 14:40:42 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 22 Oct 2014 12:40:42 -0600 Subject: [Numpy-discussion] Bug in 1.9? In-Reply-To: <5447F748.6050906@googlemail.com> References: <5447F748.6050906@googlemail.com> Message-ID: On Wed, Oct 22, 2014 at 12:28 PM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > On 22.10.2014 20:00, Charles R Harris wrote: > > > > > > On Wed, Oct 22, 2014 at 11:32 AM, Neil Girdhar > > wrote: > > > > Hello, > > > > Is this desired behaviour or a regression or a bug? > > > > > http://stackoverflow.com/questions/26497656/how-do-i-align-a-numpy-record-array-recarray > > > > Thanks, > > > > > > I'd guess that the definition of aligned may have become stricter, > > that's the only thing I think has changed. Maybe Julian can comment on > that. > > > > structured dtypes have not really a well defined alignment, e.g. the > stride of this is 12, so when element 0 is aligned element 1 is always > unaligned. > > Before 1.9 structured dtype always had the aligned flag set, even if > they were unaligned. > Now we require a minimum alignment of 16 for strings and structured > types so copying which sometimes works on the whole compound type > instead of each item always works. > This was the easiest way to get the testsuite running on sparc after > fixing a couple of code paths not updating alignment information which > forced some functions to always take super slow unaligned paths (e.g. > ufunc.at) > But the logic could certainly be improved. > The stackexchange example: In [9]: a = np.zeros(4, dtype=dtype([('x', ' In [11]: a = np.zeros(4, dtype=dtype([('x', ' Note that using an aligned dtype yields a different size on my 64 bit system and 64 / 4 = 16. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From bwana.marko at yahoo.com Wed Oct 22 19:54:49 2014 From: bwana.marko at yahoo.com (Mark Mikofski) Date: Wed, 22 Oct 2014 23:54:49 +0000 (UTC) Subject: [Numpy-discussion] successful Windows x64 build with acml-5.3.1 ifort64 static libs passes tests Message-ID: <350916962.16291.1414022089586.JavaMail.yahoo@jws100210.mail.ne1.yahoo.com> * Either windows sdk 7 or ms visual studio 2008 is required, however Fortran compilers are not.* Download acml5.3.1-ifort64.exe from?http://developer.amd.com/tools-and-sdks/cpu-development/amd-core-math-library-acml/acml-downloads-resources/?and it extracts to c:\AMD\acml5.3.1 * NOTE: There is also a newer acml-6.0.6.17-ifort64 update but it only includes dynamic libraries which depend on msvcr110 (visual studio 2013). * NOTE: There are several other optional libs incl. ifort64-mt, ifort64-fma4, ifort64-mp and ifort64-fma4-mp* System info: Windows-7 (64bit), python-2.7.6, numpy-1.9.0, virtualenv-1.11.5, nose-1.3.4* create?site.cfg with: [amd]amd_libs = libacml, libifcoremd, libifportmd, libirc, libmmd, libsvml_disp, libsvml_dispmdlibrary_dirs = C:\AMD\acml5.3.1\ifort64\libinclude_dirs = C:\AMD\acml5.3.1\ifort64\include[blas]blas_libs = libacml, libifcoremd, libifportmd, libirc, libmmd, libsvml_disp, libsvml_dispmdlibrary_dirs = C:\AMD\acml5.3.1\ifort64\libinclude_dirs = C:\AMD\acml5.3.1\ifort64\include[lapack]lapack_libs = libacml, libifcoremd, libifportmd, libirc, libmmd, libsvml_disp, libsvml_dispmdlibrary_dirs = C:\AMD\acml5.3.1\ifort64\libinclude_dirs = C:\AMD\acml5.3.1\ifort64\include * activate virtualenv and inside extracted numpy-1.9.0 folder find setup.py and run it with the following commands (venv) numpy-1.9.0/numpy-1.9.0 $python setup.py config --compiler=msvc * start python, import numpy and run numpy.test()Ran 5458 tests in 29.346s OK (KNOWNFAIL=10, SKIP=18)??As I breath in, I calm my body, as I breath out, I smile? - Thich_Nhat_Hanh -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Wed Oct 22 22:39:20 2014 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 22 Oct 2014 19:39:20 -0700 Subject: [Numpy-discussion] Fwd: [julia-dev] OpenBLAS machines In-Reply-To: References: Message-ID: Not sure if folks saw this one... Might be interesting to contact this team from the python side of things. Cheers f ---------- Forwarded message ---------- From: Zhang Xianyi Date: Sun, Oct 19, 2014 at 12:50 AM Subject: [julia-dev] OpenBLAS machines To: Julia Dev Hi Julia developers, The OpenBLAS team deployed the following machines to support developing and testing the project. We like to share the machines with Julia developers. The interested developer can apply the remote access by sending the mail to OpenBLAS-dev mailing list or traits.zhang at gmail.com. - 2-way Intel Xeon E5-2680 (2.7GHz, Intel Sandy Bridge), 32GB memory, 1TB hard disk, Ubuntu 14.04 64-bit. - 1-way Intel Core i7-4770 (3.4GHz, Intel Haswell), 8GB memory, 512GB hard disk, Ubuntu 14.04 64-bit. - 1-way AMD FX-8320 (3.5GHz, AMD Piledriver), 8GB memory, 512GB hard disk, Ubuntu 14.04 64-bit. - Odroid-U2 board . 1-way Exynos4412 Prime 1.7Ghz ARM Cortex-A9 Quad Cores, 2GB memory, 16GB SD card, Ubuntu 14.04 32-bit. The AMD HSA Apu (Kaveri) machine and Intel KNC MIC are coming soon. Xianyi -- Fernando Perez (@fperez_org; http://fperez.org) fperez.net-at-gmail: mailing lists only (I ignore this when swamped!) fernando.perez-at-berkeley: contact me here for any direct mail -------------- next part -------------- An HTML attachment was scrubbed... URL: From bwana.marko at yahoo.com Wed Oct 22 23:59:36 2014 From: bwana.marko at yahoo.com (Mark Mikofski) Date: Thu, 23 Oct 2014 03:59:36 +0000 (UTC) Subject: [Numpy-discussion] MKL not available as separate download since 10/16/2014 In-Reply-To: <1100260745.47972.1414036497341.JavaMail.yahoo@jws100137.mail.ne1.yahoo.com> References: <1100260745.47972.1414036497341.JavaMail.yahoo@jws100137.mail.ne1.yahoo.com> Message-ID: <1536218723.44813.1414036776781.JavaMail.yahoo@jws100180.mail.ne1.yahoo.com> Michael, Thanks for your response. The Intel Math Kernel Library package with just the static and dynamic libraries for blas, lapack (similar to AMD's acml self extracting zip file) is what I think many Windows users were depending on to build Python's numerical and scientific libraries (numpy and scipy). If it could be made available that would be very welcomed. Currently the only remaining option is AMD's acml here: http://developer.amd.com/tools-and-sdks/cpu-development/amd-core-math-library-acml/ which passes all tests. Currently it is unfortunate that the Intel MKL website has instructions for building numpy and scipy here: https://software.intel.com/en-us/articles/numpyscipy-with-intel-mkland numpy/scipy has them here: http://www.scipy.org/scipylib/building/windows.html#building-numpy-with-the-intel-math-kernel-library-mkl but you can no longer download the MKL files. Thanks for your help! --Mark From dave.hirschfeld at gmail.com Thu Oct 23 13:21:00 2014 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Thu, 23 Oct 2014 17:21:00 +0000 (UTC) Subject: [Numpy-discussion] segfault in np.arange Message-ID: Hi, I accidentally passed a pandas DatetimeIndex to `np.arange` which caused it to segfault. It's a pretty dumb thing to do but I don't think it should cause a segfault! Python 2.7.5 |Continuum Analytics, Inc.| (default, Jul 1 2013, 12:37:52) [MSC v.1500 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. #>>> import faulthandler #>>> faulthandler.enable() #>>> import numpy as np #>>> np.__version__ '1.9.0' #>>> import pandas as pd #>>> pd.__version__ '0.14.1' #>>> np.arange(pd.DatetimeIndex([])) Fatal Python error: Segmentation fault Current thread 0x00001d18: File "", line 1 in The exception dialog which pops up contains the following information: Unhandled exception at 0x000000000255EB49 (multiarray.pyd) in python.exe: 0xC0000005: Access violation reading location 0x0000000000000008. Thanks, Dave From jtaylor.debian at googlemail.com Thu Oct 23 16:21:29 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Thu, 23 Oct 2014 22:21:29 +0200 Subject: [Numpy-discussion] segfault in np.arange In-Reply-To: References: Message-ID: <54496349.6090300@googlemail.com> On 23.10.2014 19:21, Dave Hirschfeld wrote: > Hi, > I accidentally passed a pandas DatetimeIndex to `np.arange` which caused > it to segfault. It's a pretty dumb thing to do but I don't think it > should cause a segfault! thanks for the report, this patch should fix it: https://github.com/numpy/numpy/pull/5225 From bwana.marko at yahoo.com Thu Oct 23 17:45:26 2014 From: bwana.marko at yahoo.com (Mark Mikofski) Date: Thu, 23 Oct 2014 21:45:26 +0000 (UTC) Subject: [Numpy-discussion] successful Windows x64 build with acml-5.3.1 ifort64 static libs passes tests Message-ID: <1755715195.170897.1414100726663.JavaMail.yahoo@jws10067.mail.ne1.yahoo.com> Both the /MT and /MD versions of the acml static libraries build, and pass all tests, except that for me at least there's a weird glitch in the build process. I have to purposefully add something so that config fails. EG: In the previous site.cfg, I did not realize that I had mistyped the vector math libraries as libsvml_disp.lib instead of svml_disp.lib. In fact you don't even need those libraries. Here I use the /MT version of the static libs, but add a dummy lib, "_". Config says NOT AVAILABLE for everything, but the build and install work fine. [blas] blas_libs = libacml, libifcoremt, libifport, libirc, libircmt, libmmt, svml_disp, svml_dispmt, _ library_dirs = C:\AMD\acml5.3.1\ifort64\lib\MT include_dirs = C:\AMD\acml5.3.1\ifort64\include [lapack] lapack_libs = libacml, libifcoremt, libifport, libirc, libircmt, libmmt, svml_disp, svml_dispmt, _ library_dirs = C:\AMD\acml5.3.1\ifort64\lib\MT include_dirs = C:\AMD\acml5.3.1\ifort64\include If I try to run the build without the dummy ("_"), then I get a traceback that says that lapack libs with appended underscores are unresolved. I think in the static library and the acml header file the c-wrappers remove the fortran underscore. Also I am using the Intel fortran built libs which I think use ALL CAPS not trailing underscores. I think this is the trailing underscore import issue, covered in this article: http://developer.amd.com/community/blog/2009/06/29/removing-c-wrapper-functions-from-the-amd-core-math-library-acml-to-resolve-linking-issues/ Adding the dummy lib somehow sets the NO_APPEND_FORTRAN macro. Also it seems like the [amd] section in site.cfg does nothing. Also it doesn't matter if I change the library names to remove the lib- prefix, or if I rename the lib and header from acml to amd, makes no difference. THe only reference to AMD/ACML is http://wiki.scipy.org/Installing_SciPy/Windows but this site seems out of date, and in contradiction to other building scipy site that supports MKL out of the box. From bwana.marko at yahoo.com Thu Oct 23 18:34:02 2014 From: bwana.marko at yahoo.com (Mark Mikofski) Date: Thu, 23 Oct 2014 22:34:02 +0000 (UTC) Subject: [Numpy-discussion] successful Windows x64 build with acml-5.3.1 ifort64 static libs passes tests In-Reply-To: <1755715195.170897.1414100726663.JavaMail.yahoo@jws10067.mail.ne1.yahoo.com> References: <1755715195.170897.1414100726663.JavaMail.yahoo@jws10067.mail.ne1.yahoo.com> Message-ID: <1680657649.185095.1414103643033.JavaMail.yahoo@jws100118.mail.ne1.yahoo.com> I am so dumb. It is not using the static libs at all. It is building numpy with lapack_lite and blas_lite. From bwana.marko at yahoo.com Thu Oct 23 22:44:44 2014 From: bwana.marko at yahoo.com (Mark Mikofski) Date: Fri, 24 Oct 2014 02:44:44 +0000 (UTC) Subject: [Numpy-discussion] successful Windows x64 build with acml-5.3.1 ifort64 static libs passes tests Message-ID: <211253678.210167.1414118684567.JavaMail.yahoo@jws100153.mail.ne1.yahoo.com> AMD acml libraries do not include the CBLAS implementation. To build CBLAS against the libacml.lib requires the Intel Fortran compiler. There is a note in the mailing list here http://mail.scipy.org/pipermail/numpy-discussion/2006-February/018379.html This detail should probably be added to the new wiki, but in general it means that the AMD acml is not a drop in library From dave.hirschfeld at gmail.com Fri Oct 24 05:20:31 2014 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Fri, 24 Oct 2014 09:20:31 +0000 (UTC) Subject: [Numpy-discussion] segfault in np.arange References: <54496349.6090300@googlemail.com> Message-ID: Julian Taylor googlemail.com> writes: > > On 23.10.2014 19:21, Dave Hirschfeld wrote: > > Hi, > > I accidentally passed a pandas DatetimeIndex to `np.arange` which caused > > it to segfault. It's a pretty dumb thing to do but I don't think it > > should cause a segfault! > > thanks for the report, this patch should fix it: > > https://github.com/numpy/numpy/pull/5225 > Thanks for the super-fast patch! Hopefully it will make it into the next bugfix release, even if it is an apparently little used functionality. -Dave From matthew.brett at gmail.com Fri Oct 24 21:04:47 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 24 Oct 2014 18:04:47 -0700 Subject: [Numpy-discussion] npy_log2 undefined on Linux Message-ID: Hi, We (dipy developers) have a hit a new problem trying to use the ``npy_log`` C function in our code. Specifically, on Linux, but not on Mac or Windows, we are getting errors of form: ImportError: /path/to/extension/distances.cpython-34m.so: undefined symbol: npy_log2 when compiling something like: import numpy as np cimport numpy as cnp cdef extern from "numpy/npy_math.h" nogil: double npy_log(double x) def use_log(double val): return npy_log(val) See : https://github.com/matthew-brett/mincy/tree/npy_log_example for a self-contained example that replicates the failure with ``make``. I guess this means that the code referred to by ``npy_log`` is not on the ordinary runtime path on Linux? What should I do next to debug? Thanks a lot, Matthew From matthew.brett at gmail.com Sat Oct 25 17:15:05 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 25 Oct 2014 14:15:05 -0700 Subject: [Numpy-discussion] npy_log2 undefined on Linux In-Reply-To: References: Message-ID: On Fri, Oct 24, 2014 at 6:04 PM, Matthew Brett wrote: > Hi, > > We (dipy developers) have a hit a new problem trying to use the > ``npy_log`` C function in our code. > > Specifically, on Linux, but not on Mac or Windows, we are getting > errors of form: > > ImportError: /path/to/extension/distances.cpython-34m.so: undefined > symbol: npy_log2 > > when compiling something like: > > > import numpy as np > cimport numpy as cnp > > cdef extern from "numpy/npy_math.h" nogil: > double npy_log(double x) > > > def use_log(double val): > return npy_log(val) > > > See : https://github.com/matthew-brett/mincy/tree/npy_log_example for > a self-contained example that replicates the failure with ``make``. > > I guess this means that the code referred to by ``npy_log`` is not on > the ordinary runtime path on Linux? To answer my own question - npy_log is defined in ``libnpymath.a``, in /core/lib. The hint I needed was in https://github.com/numpy/numpy/blob/master/doc/source/reference/c-api.coremath.rst The correct setup.py is: from distutils.core import setup from distutils.extension import Extension from Cython.Distutils import build_ext from numpy.distutils.misc_util import get_info npm_info = get_info('npymath') ext_modules = [Extension("eg_log", ["eg_log.pyx"], **npm_info)] setup( name = 'eg_log', cmdclass = {'build_ext': build_ext}, ext_modules = ext_modules ) Cheers, Matthew From matthew.brett at gmail.com Sat Oct 25 21:22:40 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 25 Oct 2014 18:22:40 -0700 Subject: [Numpy-discussion] npy_log2 undefined on Linux In-Reply-To: References: Message-ID: On Sat, Oct 25, 2014 at 2:15 PM, Matthew Brett wrote: > On Fri, Oct 24, 2014 at 6:04 PM, Matthew Brett wrote: >> Hi, >> >> We (dipy developers) have a hit a new problem trying to use the >> ``npy_log`` C function in our code. >> >> Specifically, on Linux, but not on Mac or Windows, we are getting >> errors of form: >> >> ImportError: /path/to/extension/distances.cpython-34m.so: undefined >> symbol: npy_log2 >> >> when compiling something like: >> >> >> import numpy as np >> cimport numpy as cnp >> >> cdef extern from "numpy/npy_math.h" nogil: >> double npy_log(double x) >> >> >> def use_log(double val): >> return npy_log(val) >> >> >> See : https://github.com/matthew-brett/mincy/tree/npy_log_example for >> a self-contained example that replicates the failure with ``make``. >> >> I guess this means that the code referred to by ``npy_log`` is not on >> the ordinary runtime path on Linux? > > To answer my own question - npy_log is defined in ``libnpymath.a``, in > /core/lib. > > The hint I needed was in > https://github.com/numpy/numpy/blob/master/doc/source/reference/c-api.coremath.rst > > The correct setup.py is: > > > from distutils.core import setup > from distutils.extension import Extension > from Cython.Distutils import build_ext > > from numpy.distutils.misc_util import get_info > npm_info = get_info('npymath') > > ext_modules = [Extension("eg_log", ["eg_log.pyx"], > **npm_info)] > > setup( > name = 'eg_log', > cmdclass = {'build_ext': build_ext}, > ext_modules = ext_modules > ) > Ah, except this doesn't work on Windows, compiling with Visual Studio: LINK : fatal error LNK1181: cannot open input file 'npymath.lib' Investigating, c:\Python27\Lib\site-packages\numpy\core\lib has only `libnpymath.a``; I guess this isn't going to work for Visual Studio. Is this a bug? Cheers, Matthew From cournape at gmail.com Sun Oct 26 02:26:06 2014 From: cournape at gmail.com (David Cournapeau) Date: Sun, 26 Oct 2014 06:26:06 +0000 Subject: [Numpy-discussion] npy_log2 undefined on Linux In-Reply-To: References: Message-ID: Not exactly: if you build numpy with mingw (as is the official binary), you need to build everything that uses numpy C API with it. On Sun, Oct 26, 2014 at 1:22 AM, Matthew Brett wrote: > On Sat, Oct 25, 2014 at 2:15 PM, Matthew Brett > wrote: > > On Fri, Oct 24, 2014 at 6:04 PM, Matthew Brett > wrote: > >> Hi, > >> > >> We (dipy developers) have a hit a new problem trying to use the > >> ``npy_log`` C function in our code. > >> > >> Specifically, on Linux, but not on Mac or Windows, we are getting > >> errors of form: > >> > >> ImportError: /path/to/extension/distances.cpython-34m.so: undefined > >> symbol: npy_log2 > >> > >> when compiling something like: > >> > >> > >> import numpy as np > >> cimport numpy as cnp > >> > >> cdef extern from "numpy/npy_math.h" nogil: > >> double npy_log(double x) > >> > >> > >> def use_log(double val): > >> return npy_log(val) > >> > >> > >> See : https://github.com/matthew-brett/mincy/tree/npy_log_example for > >> a self-contained example that replicates the failure with ``make``. > >> > >> I guess this means that the code referred to by ``npy_log`` is not on > >> the ordinary runtime path on Linux? > > > > To answer my own question - npy_log is defined in ``libnpymath.a``, in > > /core/lib. > > > > The hint I needed was in > > > https://github.com/numpy/numpy/blob/master/doc/source/reference/c-api.coremath.rst > > > > The correct setup.py is: > > > > > > from distutils.core import setup > > from distutils.extension import Extension > > from Cython.Distutils import build_ext > > > > from numpy.distutils.misc_util import get_info > > npm_info = get_info('npymath') > > > > ext_modules = [Extension("eg_log", ["eg_log.pyx"], > > **npm_info)] > > > > setup( > > name = 'eg_log', > > cmdclass = {'build_ext': build_ext}, > > ext_modules = ext_modules > > ) > > > > Ah, except this doesn't work on Windows, compiling with Visual Studio: > > LINK : fatal error LNK1181: cannot open input file 'npymath.lib' > > Investigating, c:\Python27\Lib\site-packages\numpy\core\lib has only > `libnpymath.a``; I guess this isn't going to work for Visual Studio. > Is this a bug? > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From saullogiovani at gmail.com Sun Oct 26 04:46:48 2014 From: saullogiovani at gmail.com (Saullo Castro) Date: Sun, 26 Oct 2014 09:46:48 +0100 Subject: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt Message-ID: I would like to start working on a memory efficient alternative for np.loadtxt and np.genfromtxt that uses arrays instead of lists to store the data while the file iterator is exhausted. The motivation came from this SO question: http://stackoverflow.com/q/26569852/832621 where for huge arrays the current NumPy ASCII readers are really slow and require ~6 times more memory. This case I tested with Pandas' read_csv() and it required 2 times more memory. I would be glad if you could share your experience on this matter. Greetings, Saullo -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Sun Oct 26 04:54:01 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 26 Oct 2014 01:54:01 -0700 Subject: [Numpy-discussion] npy_log2 undefined on Linux In-Reply-To: References: Message-ID: Hi, On Sat, Oct 25, 2014 at 11:26 PM, David Cournapeau wrote: > Not exactly: if you build numpy with mingw (as is the official binary), you > need to build everything that uses numpy C API with it. Some of the interwebs appear to believe that the mingw .a file is compatible with visual studio: http://stackoverflow.com/questions/2096519/from-mingw-static-library-a-to-visual-studio-static-library-lib I can get the program to compile by copying libnpymath.a as npymath.lib, followed by many linker errors like: npymath.lib(npy_math.o) : error LNK2019: unresolved external symbol _cosf referenced in function _npy_cosf Perhaps, as the Onion once said "error found on internet"... See you, Matthew Matthew From vbubbly21 at gmail.com Sun Oct 26 05:09:32 2014 From: vbubbly21 at gmail.com (Artur Bercik) Date: Sun, 26 Oct 2014 18:09:32 +0900 Subject: [Numpy-discussion] Subdividing NumPy array into Regular Grid Message-ID: I have a rectangle with the following coordinates: import numpy as np ulx,uly = (110, 60) ##uppper left lon, upper left lat urx,ury = (120, 60) ##uppper right lon, upper right lat lrx, lry = (120, 50) ##lower right lon, lower right lat llx, lly = (110, 50) ##lower left lon, lower left lat I want to divide that single rectangle into 100 regular grids inside that, and want to calculate the (ulx, uly), (urx,ury), (lrx, lry), and (llx, lly) for each grid separately: lats = np.linspace(60, 50, 10) lons = np.linspace(110, 120, 10) lats = np.repeat(lats,10).reshape(10,10) lons = np.tile(lons,10).reshape(10,10) I could not imagine what to do then? Is somebody familiar with such kind of problem? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeffreback at gmail.com Sun Oct 26 07:54:14 2014 From: jeffreback at gmail.com (Jeff Reback) Date: Sun, 26 Oct 2014 07:54:14 -0400 Subject: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt In-Reply-To: References: Message-ID: <5886D172-130C-4975-BFF3-1D7A2920380F@gmail.com> you should have a read here/ http://wesmckinney.com/blog/?p=543 going below the 2x memory usage on read in is non trivial and costly in terms of performance > On Oct 26, 2014, at 4:46 AM, Saullo Castro wrote: > > I would like to start working on a memory efficient alternative for np.loadtxt and np.genfromtxt that uses arrays instead of lists to store the data while the file iterator is exhausted. > > The motivation came from this SO question: > > http://stackoverflow.com/q/26569852/832621 > > where for huge arrays the current NumPy ASCII readers are really slow and require ~6 times more memory. This case I tested with Pandas' read_csv() and it required 2 times more memory. > > I would be glad if you could share your experience on this matter. > > Greetings, > Saullo > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Sun Oct 26 09:21:03 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Sun, 26 Oct 2014 14:21:03 +0100 Subject: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt In-Reply-To: <5886D172-130C-4975-BFF3-1D7A2920380F@gmail.com> References: <5886D172-130C-4975-BFF3-1D7A2920380F@gmail.com> Message-ID: Im not sure why the memory doubling is necessary. Isnt it possible to preallocate the arrays and write to them? I suppose this might be inefficient though, in case you end up reading only a small subset of rows out of a mostly corrupt file? But that seems to be a rather uncommon corner case. Either way, id say a doubling of memory use is fair game for numpy. Generality is more important than absolute performance. The most important thing is that temporary python datastructures are avoided. That shouldn't be too hard to accomplish, and would realize most of the performance and memory gains, I imagine. On Sun, Oct 26, 2014 at 12:54 PM, Jeff Reback wrote: > you should have a read here/ > http://wesmckinney.com/blog/?p=543 > > going below the 2x memory usage on read in is non trivial and costly in > terms of performance > > On Oct 26, 2014, at 4:46 AM, Saullo Castro > wrote: > > I would like to start working on a memory efficient alternative for > np.loadtxt and np.genfromtxt that uses arrays instead of lists to store the > data while the file iterator is exhausted. > > The motivation came from this SO question: > > http://stackoverflow.com/q/26569852/832621 > > where for huge arrays the current NumPy ASCII readers are really slow and > require ~6 times more memory. This case I tested with Pandas' read_csv() > and it required 2 times more memory. > > I would be glad if you could share your experience on this matter. > > Greetings, > Saullo > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sun Oct 26 09:32:18 2014 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 26 Oct 2014 13:32:18 +0000 Subject: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt In-Reply-To: References: <5886D172-130C-4975-BFF3-1D7A2920380F@gmail.com> Message-ID: On Sun, Oct 26, 2014 at 1:21 PM, Eelco Hoogendoorn wrote: > Im not sure why the memory doubling is necessary. Isnt it possible to > preallocate the arrays and write to them? Not without reading the whole file first to know how many rows to preallocate. -- Robert Kern From davidmenhur at gmail.com Sun Oct 26 09:41:07 2014 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Sun, 26 Oct 2014 14:41:07 +0100 Subject: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt In-Reply-To: <5886D172-130C-4975-BFF3-1D7A2920380F@gmail.com> References: <5886D172-130C-4975-BFF3-1D7A2920380F@gmail.com> Message-ID: On 26 October 2014 12:54, Jeff Reback wrote: > you should have a read here/ > http://wesmckinney.com/blog/?p=543 > > going below the 2x memory usage on read in is non trivial and costly in > terms of performance > If you know in advance the number of rows (because it is in the header, counted with wc -l, or any other prior information) you can preallocate the array and fill in the numbers as you read, with virtually no overhead. If the number of rows is unknown, an alternative is to use a chunked data container like Bcolz [1] (former carray) instead of Python structures. It may be used as such, or copied back to a ndarray if we want the memory to be aligned. Including a bit of compression we can get the memory overhead to somewhere under 2x (depending on the dataset), at the cost of not so much CPU time, and this could be very useful for large data and slow filesystems. /David. [1] http://bcolz.blosc.org/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From derek at astro.physik.uni-goettingen.de Sun Oct 26 09:43:44 2014 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Sun, 26 Oct 2014 14:43:44 +0100 Subject: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt In-Reply-To: References: <5886D172-130C-4975-BFF3-1D7A2920380F@gmail.com> Message-ID: <373C0BFC-C4B9-4DD6-A593-F5A657A5A595@astro.physik.uni-goettingen.de> On 26 Oct 2014, at 02:21 pm, Eelco Hoogendoorn wrote: > Im not sure why the memory doubling is necessary. Isnt it possible to preallocate the arrays and write to them? I suppose this might be inefficient though, in case you end up reading only a small subset of rows out of a mostly corrupt file? But that seems to be a rather uncommon corner case. > > Either way, id say a doubling of memory use is fair game for numpy. Generality is more important than absolute performance. The most important thing is that temporary python datastructures are avoided. That shouldn't be too hard to accomplish, and would realize most of the performance and memory gains, I imagine. Preallocation is not straightforward because the parser needs to be able in general to work with streamed input. I think I even still have a branch on github bypassing this on request (by keyword argument). But a factor 2 is already a huge improvement over that factor ~6 coming from the current text readers buffering the entire input as list of list of Python strings, not to speak of the vast performance gain from using a parser implemented in C like pandas? - in fact one of the last times this subject came up one suggestion was to steal pandas.read_cvs and adopt as required. Someone also posted some code or the draft thereof for using resizable arrays quite a while ago, which would reduce the memory overhead for very large arrays. Cheers, Derek From jeffreback at gmail.com Sun Oct 26 10:09:39 2014 From: jeffreback at gmail.com (Jeff Reback) Date: Sun, 26 Oct 2014 10:09:39 -0400 Subject: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt In-Reply-To: References: <5886D172-130C-4975-BFF3-1D7A2920380F@gmail.com> Message-ID: <0CE15FA2-2EFA-4D9E-B653-D87B5674D9C6@gmail.com> you are describing a special case where you know the data size apriori (eg not streaming), dtypes are readily apparent from a small sample case and in general your data is not messy I would agree if these can be satisfied then you can achieve closer to a 1x memory overhead using bcolZ is great but prob not a realistic option for a dependency for numpy (you should prob just memory map it directly instead); though this has a big perf impact - so need to weigh these things not all cases deserve the same treatment - chunking is often the best option IMHO - provides a constant memory usage (though ultimately still 2x); but combined with memory mapping can provide a fixed resource utilization > On Oct 26, 2014, at 9:41 AM, Da?id wrote: > > >> On 26 October 2014 12:54, Jeff Reback wrote: >> you should have a read here/ >> http://wesmckinney.com/blog/?p=543 >> >> going below the 2x memory usage on read in is non trivial and costly in terms of performance > > > If you know in advance the number of rows (because it is in the header, counted with wc -l, or any other prior information) you can preallocate the array and fill in the numbers as you read, with virtually no overhead. > > If the number of rows is unknown, an alternative is to use a chunked data container like Bcolz [1] (former carray) instead of Python structures. It may be used as such, or copied back to a ndarray if we want the memory to be aligned. Including a bit of compression we can get the memory overhead to somewhere under 2x (depending on the dataset), at the cost of not so much CPU time, and this could be very useful for large data and slow filesystems. > > > /David. > > [1] http://bcolz.blosc.org/ > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Oct 26 10:16:11 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 26 Oct 2014 14:16:11 +0000 Subject: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt In-Reply-To: <5886D172-130C-4975-BFF3-1D7A2920380F@gmail.com> References: <5886D172-130C-4975-BFF3-1D7A2920380F@gmail.com> Message-ID: On 26 Oct 2014 11:54, "Jeff Reback" wrote: > > you should have a read here/ > http://wesmckinney.com/blog/?p=543 > > going below the 2x memory usage on read in is non trivial and costly in terms of performance On Linux you can probably go below 2x overhead easily, by exploiting the fact that realloc on large memory blocks is basically O(1) (yes really): http://blog.httrack.com/blog/2014/04/05/a-story-of-realloc-and-laziness/ Sadly osx does not provide anything similar and I can't tell for sure about windows. Though on further thought, the numbers Wes quotes there aren't actually the most informative - massif will tell you how much virtual memory you have allocated, but a lot of that is going to be a pure vm accounting trick. The output array memory will actually be allocated incrementally one block at a time as you fill it in. This means that if you can free each temporary chunk immediately after you copy it into the output array, then even simple approaches can have very low overhead. It's possible pandas's actual overhead is already closer to 1x than 2x, and this is just hidden by the tools Wes is using to measure it. -n > On Oct 26, 2014, at 4:46 AM, Saullo Castro wrote: > >> I would like to start working on a memory efficient alternative for np.loadtxt and np.genfromtxt that uses arrays instead of lists to store the data while the file iterator is exhausted. >> >> The motivation came from this SO question: >> >> http://stackoverflow.com/q/26569852/832621 >> >> where for huge arrays the current NumPy ASCII readers are really slow and require ~6 times more memory. This case I tested with Pandas' read_csv() and it required 2 times more memory. >> >> I would be glad if you could share your experience on this matter. >> >> Greetings, >> Saullo >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniele at grinta.net Sun Oct 26 12:42:32 2014 From: daniele at grinta.net (Daniele Nicolodi) Date: Sun, 26 Oct 2014 17:42:32 +0100 Subject: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt In-Reply-To: References: Message-ID: <544D2478.8020504@grinta.net> On 26/10/14 09:46, Saullo Castro wrote: > I would like to start working on a memory efficient alternative for > np.loadtxt and np.genfromtxt that uses arrays instead of lists to store > the data while the file iterator is exhausted. ... > I would be glad if you could share your experience on this matter. I'm of the opinion that if your workflow requires you to regularly load large arrays from text files, something else needs to be fixed rather than the numpy speed and memory usage in reading data from text files. There are a number of data formats that are interoperable and allow to store data much more efficiently. hdf5 is one natural choice, maybe with the blosc compressor. Cheers, Daniele From jtaylor.debian at googlemail.com Sun Oct 26 13:13:03 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Sun, 26 Oct 2014 18:13:03 +0100 Subject: [Numpy-discussion] ANN: NumPy 1.9.1 release candidate Message-ID: <544D2B9F.9020905@googlemail.com> Hi, We have finally finished the first release candidate of NumOy 1.9.1, sorry for the week delay. The 1.9.1 release will as usual be a bugfix only release to the 1.9.x series. The tarballs and win32 binaries are available on sourceforge: https://sourceforge.net/projects/numpy/files/NumPy/1.9.1rc1/ If no regressions show up the final release is planned next week. The upgrade is recommended for all users of the 1.9.x series. Following issues have been fixed: * gh-5184: restore linear edge behaviour of gradient to as it was in < 1.9. The second order behaviour is available via the `edge_order` keyword * gh-4007: workaround Accelerate sgemv crash on OSX 10.9 * gh-5100: restore object dtype inference from iterable objects without `len()` * gh-5163: avoid gcc-4.1.2 (red hat 5) miscompilation causing a crash * gh-5138: fix nanmedian on arrays containing inf * gh-5203: copy inherited masks in MaskedArray.__array_finalize__ * gh-2317: genfromtxt did not handle filling_values=0 correctly * gh-5067: restore api of npy_PyFile_DupClose in python2 * gh-5063: cannot convert invalid sequence index to tuple * gh-5082: Segmentation fault with argmin() on unicode arrays * gh-5095: don't propagate subtypes from np.where * gh-5104: np.inner segfaults with SciPy's sparse matrices * gh-5136: Import dummy_threading if importing threading fails * gh-5148: Make numpy import when run with Python flag '-OO' * gh-5147: Einsum double contraction in particular order causes ValueError * gh-479: Make f2py work with intent(in out) * gh-5170: Make python2 .npy files readable in python3 * gh-5027: Use 'll' as the default length specifier for long long * gh-4896: fix build error with MSVC 2013 caused by C99 complex support * gh-4465: Make PyArray_PutTo respect writeable flag * gh-5225: fix crash when using arange on datetime without dtype set * gh-5231: fix build in c99 mode Source tarballs, windows installers and release notes can be found at https://sourceforge.net/projects/numpy/files/NumPy/1.9.1rc1/ Cheers, Julian Taylor -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: OpenPGP digital signature URL: From saullogiovani at gmail.com Sun Oct 26 14:27:54 2014 From: saullogiovani at gmail.com (Saullo Castro) Date: Sun, 26 Oct 2014 19:27:54 +0100 Subject: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt Message-ID: I agree with @Daniele's point, storing huge arrays in text files migh indicate a bad process.... but once these functions can be improved, why not? Unless this turns to be a burden to change. Regarding the estimation of the array size, I don't see a big performance loss when the file iterator is exhausting once more in order to estimate the number of rows and pre-allocate the proper arrays to avoid using list of lists. The hardest part seems to be dealing with arrays of strings (perhaps easily solved with dtype=object) and structured arrays. Cheers, Saullo 2014-10-26 18:00 GMT+01:00 : > Send NumPy-Discussion mailing list submissions to > numpy-discussion at scipy.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.scipy.org/mailman/listinfo/numpy-discussion > or, via email, send a message with subject or body 'help' to > numpy-discussion-request at scipy.org > > You can reach the person managing the list at > numpy-discussion-owner at scipy.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of NumPy-Discussion digest..." > > > Today's Topics: > > 1. Re: Memory efficient alternative for np.loadtxt and > np.genfromtxt (Daniele Nicolodi) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sun, 26 Oct 2014 17:42:32 +0100 > From: Daniele Nicolodi > Subject: Re: [Numpy-discussion] Memory efficient alternative for > np.loadtxt and np.genfromtxt > To: numpy-discussion at scipy.org > Message-ID: <544D2478.8020504 at grinta.net> > Content-Type: text/plain; charset=windows-1252 > > On 26/10/14 09:46, Saullo Castro wrote: > > I would like to start working on a memory efficient alternative for > > np.loadtxt and np.genfromtxt that uses arrays instead of lists to store > > the data while the file iterator is exhausted. > > ... > > > I would be glad if you could share your experience on this matter. > > I'm of the opinion that if your workflow requires you to regularly load > large arrays from text files, something else needs to be fixed rather > than the numpy speed and memory usage in reading data from text files. > > There are a number of data formats that are interoperable and allow to > store data much more efficiently. hdf5 is one natural choice, maybe with > the blosc compressor. > > Cheers, > Daniele > > > > ------------------------------ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > End of NumPy-Discussion Digest, Vol 97, Issue 57 > ************************************************ > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rays at blue-cove.com Sun Oct 26 14:40:52 2014 From: rays at blue-cove.com (RayS) Date: Sun, 26 Oct 2014 11:40:52 -0700 Subject: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt In-Reply-To: References: <5886D172-130C-4975-BFF3-1D7A2920380F@gmail.com> Message-ID: <201410261840.s9QIetpA031729@blue-cove.com> At 06:32 AM 10/26/2014, you wrote: >On Sun, Oct 26, 2014 at 1:21 PM, Eelco Hoogendoorn > wrote: > > Im not sure why the memory doubling is necessary. Isnt it possible to > > preallocate the arrays and write to them? > >Not without reading the whole file first to know how many rows to preallocate Seems to me that loadtxt() http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html should have an optional shape. I often know how many rows I have (# of samples of data) from other meta data. Then: - if the file is smaller for some reason (you're not sure and pad your estimate) it could do one of - zero pad array - raise() - return truncated view - if larger - raise() - return data read (this would act like fileObject.read( size ) ) - Ray S -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmhobson at gmail.com Sun Oct 26 22:10:39 2014 From: pmhobson at gmail.com (Paul Hobson) Date: Sun, 26 Oct 2014 19:10:39 -0700 Subject: [Numpy-discussion] Subdividing NumPy array into Regular Grid In-Reply-To: References: Message-ID: I think you want np.meshgrid -paul On Sun, Oct 26, 2014 at 2:09 AM, Artur Bercik wrote: > I have a rectangle with the following coordinates: > > import numpy as np > > ulx,uly = (110, 60) ##uppper left lon, upper left lat > urx,ury = (120, 60) ##uppper right lon, upper right lat > lrx, lry = (120, 50) ##lower right lon, lower right lat > llx, lly = (110, 50) ##lower left lon, lower left lat > > I want to divide that single rectangle into 100 regular grids inside that, > and want to calculate the (ulx, uly), (urx,ury), (lrx, lry), and (llx, lly) > for each grid separately: > > lats = np.linspace(60, 50, 10) > lons = np.linspace(110, 120, 10) > > lats = np.repeat(lats,10).reshape(10,10) > lons = np.tile(lons,10).reshape(10,10) > > I could not imagine what to do then? > > Is somebody familiar with such kind of problem? > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sunghwanchoi91 at gmail.com Mon Oct 27 04:37:58 2014 From: sunghwanchoi91 at gmail.com (Sunghwan Choi) Date: Mon, 27 Oct 2014 17:37:58 +0900 Subject: [Numpy-discussion] higher accuracy in diagonialzation Message-ID: <015d01cff1c1$51d2def0$f5789cd0$@gmail.com> Dear all, I am now diagonalizing a 200-by-200 symmetric matrix. But the two methods, scipy.linalg.eigh and numpy.linalg.eigh give significantly different result. The results from two methods are different within 10^-4 order. One of them is inaccurate or both two of them are inaccurate within that range. Which one is more accurate? or Are there any ways to control the accuracy for diagonalization? If you have some idea please let me know. Sunghwan Choi Ph. D. candidator Department of Chemistry KAIST (South Korea) -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Mon Oct 27 05:15:01 2014 From: stefan at sun.ac.za (Stefan van der Walt) Date: Mon, 27 Oct 2014 11:15:01 +0200 Subject: [Numpy-discussion] higher accuracy in diagonialzation In-Reply-To: <015d01cff1c1$51d2def0$f5789cd0$@gmail.com> References: <015d01cff1c1$51d2def0$f5789cd0$@gmail.com> Message-ID: <874muplvqi.fsf@sun.ac.za> On 2014-10-27 10:37:58, Sunghwan Choi wrote: > I am now diagonalizing a 200-by-200 symmetric matrix. But the two methods, > scipy.linalg.eigh and numpy.linalg.eigh give significantly different result. > The results from two methods are different within 10^-4 order. One of them > is inaccurate or both two of them are inaccurate within that range. Which > one is more accurate? or Are there any ways to control the accuracy for > diagonalization? If you have some idea please let me know. My first (naive) attempt would be to set up a matrix, M, in sympy and then use M.diagonalize() to find the symbolic expression of the solution. You can then do the same numerically to see which method yields a result closest to the desired answer. St?fan From jenshnielsen at gmail.com Mon Oct 27 05:55:59 2014 From: jenshnielsen at gmail.com (Jens Nielsen) Date: Mon, 27 Oct 2014 09:55:59 +0000 Subject: [Numpy-discussion] ANN: NumPy 1.9.1 release candidate In-Reply-To: <544D2B9F.9020905@googlemail.com> References: <544D2B9F.9020905@googlemail.com> Message-ID: Thanks Julian, Just confirming that this (as expected) solves the issues that we have seen with gradient in Matplotlib with 1.9.0 best regards Jens On Sun, Oct 26, 2014 at 5:13 PM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > Hi, > > We have finally finished the first release candidate of NumOy 1.9.1, > sorry for the week delay. > The 1.9.1 release will as usual be a bugfix only release to the 1.9.x > series. > The tarballs and win32 binaries are available on sourceforge: > https://sourceforge.net/projects/numpy/files/NumPy/1.9.1rc1/ > > If no regressions show up the final release is planned next week. > The upgrade is recommended for all users of the 1.9.x series. > > Following issues have been fixed: > * gh-5184: restore linear edge behaviour of gradient to as it was in < 1.9. > The second order behaviour is available via the `edge_order` keyword > * gh-4007: workaround Accelerate sgemv crash on OSX 10.9 > * gh-5100: restore object dtype inference from iterable objects without > `len()` > * gh-5163: avoid gcc-4.1.2 (red hat 5) miscompilation causing a crash > * gh-5138: fix nanmedian on arrays containing inf > * gh-5203: copy inherited masks in MaskedArray.__array_finalize__ > * gh-2317: genfromtxt did not handle filling_values=0 correctly > * gh-5067: restore api of npy_PyFile_DupClose in python2 > * gh-5063: cannot convert invalid sequence index to tuple > * gh-5082: Segmentation fault with argmin() on unicode arrays > * gh-5095: don't propagate subtypes from np.where > * gh-5104: np.inner segfaults with SciPy's sparse matrices > * gh-5136: Import dummy_threading if importing threading fails > * gh-5148: Make numpy import when run with Python flag '-OO' > * gh-5147: Einsum double contraction in particular order causes ValueError > * gh-479: Make f2py work with intent(in out) > * gh-5170: Make python2 .npy files readable in python3 > * gh-5027: Use 'll' as the default length specifier for long long > * gh-4896: fix build error with MSVC 2013 caused by C99 complex support > * gh-4465: Make PyArray_PutTo respect writeable flag > * gh-5225: fix crash when using arange on datetime without dtype set > * gh-5231: fix build in c99 mode > > Source tarballs, windows installers and release notes can be found at > https://sourceforge.net/projects/numpy/files/NumPy/1.9.1rc1/ > > Cheers, > Julian Taylor > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidmenhur at gmail.com Mon Oct 27 07:55:05 2014 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Mon, 27 Oct 2014 12:55:05 +0100 Subject: [Numpy-discussion] higher accuracy in diagonialzation In-Reply-To: <015d01cff1c1$51d2def0$f5789cd0$@gmail.com> References: <015d01cff1c1$51d2def0$f5789cd0$@gmail.com> Message-ID: On 27 October 2014 09:37, Sunghwan Choi wrote: > One of them is inaccurate or both two of them are inaccurate within that > range. Which one is more accurate? You can check it yourself using the eigenvectors. The cosine distance between v and M.dot(v) will give you the error in the eigenvectors, and the difference between ||lambda*v|| and ||M.dot(v)|| the error in the eigenvalue. I would also check the condition numbers, maybe your matrix is just not well conditioned. You would have to look at preconditioners. /David. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Mon Oct 27 08:14:57 2014 From: ndbecker2 at gmail.com (Neal Becker) Date: Mon, 27 Oct 2014 08:14:57 -0400 Subject: [Numpy-discussion] multi-dimensional c++ proposal Message-ID: The multi-dimensional c++ stuff is interesting (about time!) http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2014/n3851.pdf -- -- Those who don't understand recursion are doomed to repeat it From dmmcf at dmmcf.net Mon Oct 27 09:26:58 2014 From: dmmcf at dmmcf.net (D. Michael McFarland) Date: Mon, 27 Oct 2014 08:26:58 -0500 Subject: [Numpy-discussion] Choosing between NumPy and SciPy functions Message-ID: <87mw8hhcd9.fsf@dmmcf.net> A recent post raised a question about differences in results obtained with numpy.linalg.eigh() and scipy.linalg.eigh(), documented at http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.eigh.html#numpy.linalg.eigh and http://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.eigh.html#scipy.linalg.eigh, respectively. It is clear that these functions address different mathematical problems (among other things, the SciPy routine can solve the generalized as well as standard eigenproblems); I am not concerned here with numerical differences in the results for problems both should be able to solve (the author of the original post received useful replies in that thread). What I would like to ask about is the situation this illustrates, where both NumPy and SciPy provide similar functionality (sometimes identical, to judge by the documentation). Is there some guidance on which is to be preferred? I could argue that using only NumPy when possible avoids unnecessary dependence on SciPy in some code, or that using SciPy consistently makes for a single interface and so is less error prone. Is there a rule of thumb for cases where SciPy names shadow NumPy names? I've used Python for a long time, but have only recently returned to doing serious numerical work with it. The tools are very much improved, but sometimes, like now, I feel I'm missing the obvious. I would appreciate pointers to any relevant documentation, or just a summary of conventional wisdom on the topic. Regards, Michael From gmabey at swri.org Mon Oct 27 11:06:27 2014 From: gmabey at swri.org (Glen Mabey) Date: Mon, 27 Oct 2014 15:06:27 +0000 Subject: [Numpy-discussion] Fwd: numpy.i and std::complex References: Message-ID: Hello, I was very excited to learn about numpy.i for easy numpy+swigification of C code -- it's really handy. Knowing that swig wraps C code, I wasn't too surprised that there was the issue with complex data types (as described at http://docs.scipy.org/doc/numpy/reference/swig.interface-file.html#other-common-types-complex), but still it was pretty disappointing because most of my data is complex, and I'm invoking methods written to use C++'s std::complex class. After quite a bit of puzzling and not much help from previous mailing list posts, I created this very brief but very useful file, which I call numpy_std_complex.i -- /* -*- C -*- (not really, but good for syntax highlighting) */ #ifdef SWIGPYTHON %include "numpy.i" %include %numpy_typemaps(std::complex, NPY_CFLOAT , int) %numpy_typemaps(std::complex, NPY_CDOUBLE, int) #endif /* SWIGPYTHON */ I'd really like for this to be included alongside numpy.i -- but maybe I overestimate the number of numpy users who use complex data (let your voice be heard!) and who also end up using std::complex in C++ land. Or if anyone wants to improve upon this usage I would be very happy to hear about what I'm missing. I'm sure there's a documented way to submit this file to the git repo, but let me simultaneously ask whether list subscribers think this is worthwhile and ask someone to add+push it for me ? Thanks, Glen Mabey From sturla.molden at gmail.com Mon Oct 27 11:45:49 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 27 Oct 2014 15:45:49 +0000 (UTC) Subject: [Numpy-discussion] numpy.i and std::complex References: Message-ID: <1330968997436117273.144572sturla.molden-gmail.com@news.gmane.org> Glen Mabey wrote: > I'd really like for this to be included alongside numpy.i -- but maybe I > overestimate the number of numpy users who use complex data (let your > voice be heard!) and who also end up using std::complex in C++ land. I don't think you do. But perhaps you overestimate the number of NumPy users who use Swig? Cython seems to be the preferred wrapping tool today, and it understands complex numbers: cdef double complex J = 0.0 + 1j If you tell Cython to emit C++, this will result in code that uses std::complex. Sturla From edisongustavo at gmail.com Mon Oct 27 11:56:26 2014 From: edisongustavo at gmail.com (Edison Gustavo Muenz) Date: Mon, 27 Oct 2014 13:56:26 -0200 Subject: [Numpy-discussion] Accept numpy arrays on arguments of numpy.testing.assert_approx_equal() Message-ID: I?ve implemented support for numpy.arrays for the arguments of numpy.testing.assert_approx_equal() and have issued a pull-request on Github. I don?t know if I should be sending the message to the list to notify about this, but since I?m new to the *numpy-dev* list I think it never hurts to say hi :) ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From gmabey at swri.org Mon Oct 27 12:04:47 2014 From: gmabey at swri.org (Glen Mabey) Date: Mon, 27 Oct 2014 16:04:47 +0000 Subject: [Numpy-discussion] numpy.i and std::complex In-Reply-To: <1330968997436117273.144572sturla.molden-gmail.com@news.gmane.org> References: <1330968997436117273.144572sturla.molden-gmail.com@news.gmane.org> Message-ID: On Oct 27, 2014, at 10:45 AM, Sturla Molden wrote: > Glen Mabey wrote: > >> I'd really like for this to be included alongside numpy.i -- but maybe I >> overestimate the number of numpy users who use complex data (let your >> voice be heard!) and who also end up using std::complex in C++ land. > > I don't think you do. But perhaps you overestimate the number of NumPy > users who use Swig? Likely so. > Cython seems to be the preferred wrapping tool today, and it understands > complex numbers: > > cdef double complex J = 0.0 + 1j > > If you tell Cython to emit C++, this will result in code that uses > std::complex. I chose swig after reviewing the options listed here, and I didn't see cython on the list: http://docs.scipy.org/doc/numpy/user/c-info.python-as-glue.html I guess that's because cython is different language, right? So, if I want to interactively call C++ functions from say ipython, then is cython really an option? Thanks for the feedback -- Glen From wfspotz at sandia.gov Mon Oct 27 12:13:05 2014 From: wfspotz at sandia.gov (Bill Spotz) Date: Mon, 27 Oct 2014 10:13:05 -0600 Subject: [Numpy-discussion] [EXTERNAL] Fwd: numpy.i and std::complex In-Reply-To: References: Message-ID: <8BB5F0E1-53EC-4821-8BCC-A06CF498813E@sandia.gov> Glen, Supporting std::complex<> was just low enough priority for me that I decided to wait until someone expressed interest ... and now, many years later, someone finally has. I would be happy to include this into numpy.i, but I would like to see some tests in the numpy repository demonstrating that it works. These could be relatively short and simple, and since float and double are the only scalar data types that I could foresee supporting, there would not be a need for testing the large numbers of data types that the other tests cover. I would also want to protect the references to C++ objects with '#ifdef __cplusplus', but that is easy enough. -Bill On Oct 27, 2014, at 9:06 AM, Glen Mabey wrote: > Hello, > > I was very excited to learn about numpy.i for easy numpy+swigification of C code -- it's really handy. > > Knowing that swig wraps C code, I wasn't too surprised that there was the issue with complex data types (as described at http://docs.scipy.org/doc/numpy/reference/swig.interface-file.html#other-common-types-complex), but still it was pretty disappointing because most of my data is complex, and I'm invoking methods written to use C++'s std::complex class. > > After quite a bit of puzzling and not much help from previous mailing list posts, I created this very brief but very useful file, which I call numpy_std_complex.i -- > > /* -*- C -*- (not really, but good for syntax highlighting) */ > #ifdef SWIGPYTHON > > %include "numpy.i" > > %include > > %numpy_typemaps(std::complex, NPY_CFLOAT , int) > %numpy_typemaps(std::complex, NPY_CDOUBLE, int) > > #endif /* SWIGPYTHON */ > > > I'd really like for this to be included alongside numpy.i -- but maybe I overestimate the number of numpy users who use complex data (let your voice be heard!) and who also end up using std::complex in C++ land. > > Or if anyone wants to improve upon this usage I would be very happy to hear about what I'm missing. > > I'm sure there's a documented way to submit this file to the git repo, but let me simultaneously ask whether list subscribers think this is worthwhile and ask someone to add+push it for me ? > > Thanks, > Glen Mabey > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion ** Bill Spotz ** ** Sandia National Laboratories Voice: (505)845-0170 ** ** P.O. Box 5800 Fax: (505)284-0154 ** ** Albuquerque, NM 87185-0370 Email: wfspotz at sandia.gov ** From wfspotz at sandia.gov Mon Oct 27 12:20:25 2014 From: wfspotz at sandia.gov (Bill Spotz) Date: Mon, 27 Oct 2014 10:20:25 -0600 Subject: [Numpy-discussion] [EXTERNAL] Re: numpy.i and std::complex In-Reply-To: References: <1330968997436117273.144572sturla.molden-gmail.com@news.gmane.org> Message-ID: Python is its own language, but it allows you to import C and C++ code, thus creating an interface to these languages. Just as with SWIG, you would import a module written in Cython that gives you access to underlying C/C++ code. Cython is very nice for a lot of applications, but it is not the best tool for every job of designing an interface. SWIG is still preferable if you have a large existing code base to wrap or if you want to support more target languages than just Python. I have a specific need for cross-language polymorphism, and SWIG is much better at that than Cython is. It all depends. Looks like somebody needs to update the c-info.python-as-glue.html page. -Bill On Oct 27, 2014, at 10:04 AM, Glen Mabey wrote: > On Oct 27, 2014, at 10:45 AM, Sturla Molden > wrote: > >> Glen Mabey wrote: >> >>> I'd really like for this to be included alongside numpy.i -- but maybe I >>> overestimate the number of numpy users who use complex data (let your >>> voice be heard!) and who also end up using std::complex in C++ land. >> >> I don't think you do. But perhaps you overestimate the number of NumPy >> users who use Swig? > > Likely so. > >> Cython seems to be the preferred wrapping tool today, and it understands >> complex numbers: >> >> cdef double complex J = 0.0 + 1j >> >> If you tell Cython to emit C++, this will result in code that uses >> std::complex. > > I chose swig after reviewing the options listed here, and I didn't see cython on the list: > > http://docs.scipy.org/doc/numpy/user/c-info.python-as-glue.html > > I guess that's because cython is different language, right? So, if I want to interactively call C++ functions from say ipython, then is cython really an option? > > Thanks for the feedback -- > Glen > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion ** Bill Spotz ** ** Sandia National Laboratories Voice: (505)845-0170 ** ** P.O. Box 5800 Fax: (505)284-0154 ** ** Albuquerque, NM 87185-0370 Email: wfspotz at sandia.gov ** From wfspotz at sandia.gov Mon Oct 27 12:24:59 2014 From: wfspotz at sandia.gov (Bill Spotz) Date: Mon, 27 Oct 2014 10:24:59 -0600 Subject: [Numpy-discussion] [EXTERNAL] Re: numpy.i and std::complex In-Reply-To: References: <1330968997436117273.144572sturla.molden-gmail.com@news.gmane.org> Message-ID: <12E68247-A698-4F0C-8E2B-7058614E9037@sandia.gov> Oops, I meant '"Cython" is its own language,' not "Python" (although Python qualifies, too, just not in context). Also, Pyrex, listed in the c-info.python-as-glue.html page, was the pre-cursor to Cython. -Bill On Oct 27, 2014, at 10:20 AM, Bill Spotz wrote: > Python is its own language, but it allows you to import C and C++ code, thus creating an interface to these languages. Just as with SWIG, you would import a module written in Cython that gives you access to underlying C/C++ code. > > Cython is very nice for a lot of applications, but it is not the best tool for every job of designing an interface. SWIG is still preferable if you have a large existing code base to wrap or if you want to support more target languages than just Python. I have a specific need for cross-language polymorphism, and SWIG is much better at that than Cython is. It all depends. > > Looks like somebody needs to update the c-info.python-as-glue.html page. > > -Bill > > On Oct 27, 2014, at 10:04 AM, Glen Mabey wrote: > >> On Oct 27, 2014, at 10:45 AM, Sturla Molden >> wrote: >> >>> Glen Mabey wrote: >>> >>>> I'd really like for this to be included alongside numpy.i -- but maybe I >>>> overestimate the number of numpy users who use complex data (let your >>>> voice be heard!) and who also end up using std::complex in C++ land. >>> >>> I don't think you do. But perhaps you overestimate the number of NumPy >>> users who use Swig? >> >> Likely so. >> >>> Cython seems to be the preferred wrapping tool today, and it understands >>> complex numbers: >>> >>> cdef double complex J = 0.0 + 1j >>> >>> If you tell Cython to emit C++, this will result in code that uses >>> std::complex. >> >> I chose swig after reviewing the options listed here, and I didn't see cython on the list: >> >> http://docs.scipy.org/doc/numpy/user/c-info.python-as-glue.html >> >> I guess that's because cython is different language, right? So, if I want to interactively call C++ functions from say ipython, then is cython really an option? >> >> Thanks for the feedback -- >> Glen >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > ** Bill Spotz ** > ** Sandia National Laboratories Voice: (505)845-0170 ** > ** P.O. Box 5800 Fax: (505)284-0154 ** > ** Albuquerque, NM 87185-0370 Email: wfspotz at sandia.gov ** > > > > > ** Bill Spotz ** ** Sandia National Laboratories Voice: (505)845-0170 ** ** P.O. Box 5800 Fax: (505)284-0154 ** ** Albuquerque, NM 87185-0370 Email: wfspotz at sandia.gov ** From sturla.molden at gmail.com Mon Oct 27 12:27:04 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 27 Oct 2014 16:27:04 +0000 (UTC) Subject: [Numpy-discussion] numpy.i and std::complex References: <1330968997436117273.144572sturla.molden-gmail.com@news.gmane.org> Message-ID: <440104262436119434.352522sturla.molden-gmail.com@news.gmane.org> Glen Mabey wrote: > I chose swig after reviewing the options listed here, and I didn't see cython on the list: > > http://docs.scipy.org/doc/numpy/user/c-info.python-as-glue.html It's because that list is old and has not been updated. It has the predecessor to Cython, Pyrex, but they are very different now. Both SciPy and NumPy has Cython as a build dependency, and also projects like scikit-learn, scikit-image, statsmodels. If you find C++ projects which use Swig (wxPython, PyWin32) or SIP (PyQt) it is mainly because they are older than Cython. A more recent addition, PyZMQ, use Cython to wrap C++. > I guess that's because cython is different language, right? So, if I > want to interactively call C++ functions from say ipython, then is cython really an option? You can use Cython to call C++ functions in ipython and ipython notebook. cythonmagic takes care of that :-) Sturla From sturla.molden at gmail.com Mon Oct 27 12:30:24 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 27 Oct 2014 16:30:24 +0000 (UTC) Subject: [Numpy-discussion] [EXTERNAL] Re: numpy.i and std::complex References: <1330968997436117273.144572sturla.molden-gmail.com@news.gmane.org> <12E68247-A698-4F0C-8E2B-7058614E9037@sandia.gov> Message-ID: <983111311436120114.937136sturla.molden-gmail.com@news.gmane.org> Bill Spotz wrote: > Oops, I meant '"Cython" is its own language,' not "Python" (although > Python qualifies, too, just not in context). > > Also, Pyrex, listed in the c-info.python-as-glue.html page, was the pre-cursor to Cython. But when it comes to interfacing NumPy, they are really not comparable. :-) Sturla From vincent at vincentdavis.net Mon Oct 27 12:49:40 2014 From: vincent at vincentdavis.net (Vincent Davis) Date: Mon, 27 Oct 2014 10:49:40 -0600 Subject: [Numpy-discussion] ODE how to? Message-ID: It's been too long since I have done differential equations and I am not sure the best tools to solve this problem. I am starting with a basic kinematic equation for the balance of forces. P\v - ((A*Cw*Rho*v^2)/2 + m*g*Crl + m*g*slope) = m*a P: power x: position v: velocity, x' a: acceleration x" (A*Cw*Rho*v^2)/2 : air resistance m*g*Crl : rolling resistance m*g*slope : potential energy (elevation) I am modifying the above equation so that air velocity and slope are dependant on location x. Vair = v + f(x) where f(x) is the weather component and a function of location x. Same goes for slope, slope = g(x) Power is a function I what to optimize/find to minimize time but at this time just simulate. maybe something like: P = 2500/(v+1) I will have restriction on P but not interested in that now. The "course" I what to simulate therefore defines slope and wind speed. and is of a fixed distance. I have played with some of the simple scipy.integrate.odeint examples. I get that I need to define a system of equations but am not really sure the rules for doing so. A little help would be greatly appreciated. Vincent Davis 720-301-3003 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Mon Oct 27 12:58:04 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 27 Oct 2014 17:58:04 +0100 Subject: [Numpy-discussion] multi-dimensional c++ proposal In-Reply-To: References: Message-ID: On 27/10/14 13:14, Neal Becker wrote: > The multi-dimensional c++ stuff is interesting (about time!) > > http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2014/n3851.pdf OMG, that API is about as awful as it gets. Obviously it is written by two computer scientists who do not understand what scientific and technical computing actually needs. There is a reason why many scientists still prefer Fortran to C++, and I think this proposal shows us why. An API like that will never be suitable for implementing complex numerical alorithms. It will fail horribly because it is *not readable*. I have no doubt it will be accepted though, because the C++ standards committee tends to accept unusable things. Sturla From benjaminrk at gmail.com Mon Oct 27 14:02:59 2014 From: benjaminrk at gmail.com (MinRK) Date: Mon, 27 Oct 2014 11:02:59 -0700 Subject: [Numpy-discussion] numpy.i and std::complex Message-ID: > > It's because that list is old and has not been updated. It has the > predecessor to Cython, Pyrex, but they are very different now. > > Both SciPy and NumPy has Cython as a build dependency, and also projects > like scikit-learn, scikit-image, statsmodels. > > If you find C++ projects which use Swig (wxPython, PyWin32) or SIP (PyQt) > it is mainly because they are older than Cython. A more recent addition, > PyZMQ, use Cython to wrap C++. > > Just to clarify that, while PyZMQ wraps a library written in C++, libzmq's public API is C, not C++. PyZMQ does not use any of Cython's C++ functionality. I have done other similar (unfortunately not public) projects that wrap actual C++ libraries with Cython, and I have been happy with Cython's C++ support[1]. -MinRK [1] At least as happy as I was with the wrapped C++ code, anyway. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Mon Oct 27 14:24:35 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Mon, 27 Oct 2014 19:24:35 +0100 Subject: [Numpy-discussion] Choosing between NumPy and SciPy functions In-Reply-To: <87mw8hhcd9.fsf@dmmcf.net> References: <87mw8hhcd9.fsf@dmmcf.net> Message-ID: The same occurred to me when reading that question. My personal opinion is that such functionality should be deprecated from numpy. I don't know who said this, but it really stuck with me: but the power of numpy is first and foremost in it being a fantastic interface, not in being a library. There is nothing more annoying than every project having its own array type. The fact that the whole scientific python stack can so seamlessly communicate is where all good things begin. In my opinion, that is what numpy should focus on; basic data structures, and tools for manipulating them. Linear algebra is way too high level for numpy imo, and used by only a small subsets of its 'matlab-like' users. When I get serious about linear algebra or ffts or what have you, id rather import an extra module that wraps a specific library. On Mon, Oct 27, 2014 at 2:26 PM, D. Michael McFarland wrote: > A recent post raised a question about differences in results obtained > with numpy.linalg.eigh() and scipy.linalg.eigh(), documented at > > http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.eigh.html#numpy.linalg.eigh > and > > http://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.eigh.html#scipy.linalg.eigh > , > respectively. It is clear that these functions address different > mathematical problems (among other things, the SciPy routine can solve > the generalized as well as standard eigenproblems); I am not concerned > here with numerical differences in the results for problems both should > be able to solve (the author of the original post received useful > replies in that thread). > > What I would like to ask about is the situation this illustrates, where > both NumPy and SciPy provide similar functionality (sometimes identical, > to judge by the documentation). Is there some guidance on which is to > be preferred? I could argue that using only NumPy when possible avoids > unnecessary dependence on SciPy in some code, or that using SciPy > consistently makes for a single interface and so is less error prone. > Is there a rule of thumb for cases where SciPy names shadow NumPy names? > > I've used Python for a long time, but have only recently returned to > doing serious numerical work with it. The tools are very much improved, > but sometimes, like now, I feel I'm missing the obvious. I would > appreciate pointers to any relevant documentation, or just a summary of > conventional wisdom on the topic. > > Regards, > Michael > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Mon Oct 27 14:30:12 2014 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 27 Oct 2014 18:30:12 +0000 Subject: [Numpy-discussion] numpy.i and std::complex In-Reply-To: <440104262436119434.352522sturla.molden-gmail.com@news.gmane.org> References: <1330968997436117273.144572sturla.molden-gmail.com@news.gmane.org> <440104262436119434.352522sturla.molden-gmail.com@news.gmane.org> Message-ID: On Mon, Oct 27, 2014 at 4:27 PM, Sturla Molden wrote: > Glen Mabey wrote: > >> I chose swig after reviewing the options listed here, and I didn't see cython on the list: >> >> http://docs.scipy.org/doc/numpy/user/c-info.python-as-glue.html > > It's because that list is old and has not been updated. It has the > predecessor to Cython, Pyrex, but they are very different now. > > Both SciPy and NumPy has Cython as a build dependency, and also projects > like scikit-learn, scikit-image, statsmodels. > > If you find C++ projects which use Swig (wxPython, PyWin32) or SIP (PyQt) > it is mainly because they are older than Cython. A more recent addition, > PyZMQ, use Cython to wrap C++. SWIG is a perfectly reasonable tool that is still used on new projects, and is a supported way of building extensions against numpy. Please stop haranguing the new guy for not knowing things that you know. This thread is about extending that support, a perfectly fine and decent thing to do. -- Robert Kern From sturla.molden at gmail.com Mon Oct 27 19:36:56 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 27 Oct 2014 23:36:56 +0000 (UTC) Subject: [Numpy-discussion] numpy.i and std::complex References: <1330968997436117273.144572sturla.molden-gmail.com@news.gmane.org> <440104262436119434.352522sturla.molden-gmail.com@news.gmane.org> Message-ID: <1040561343436145342.607784sturla.molden-gmail.com@news.gmane.org> Robert Kern wrote: > Please stop haranguing the new guy for not knowing things that you > know. I am not doing any of that. You are the only one haranguing here. I usually ignore your frequent inpolite comments, but I will do an exception this time and ask you to shut up. Sturla From ndarray at mac.com Mon Oct 27 20:06:49 2014 From: ndarray at mac.com (Alexander Belopolsky) Date: Mon, 27 Oct 2014 20:06:49 -0400 Subject: [Numpy-discussion] Why ndarray provides four ways to flatten? Message-ID: Given an n-dim array x, I can do 1. x.flat 2. x.flatten() 3. x.ravel() 4. x.reshape(-1) Each of these expressions returns a flat version of x with some variations. Why does NumPy implement four different ways to do essentially the same thing? -------------- next part -------------- An HTML attachment was scrubbed... URL: From yw5aj at virginia.edu Mon Oct 27 21:41:18 2014 From: yw5aj at virginia.edu (Yuxiang Wang) Date: Mon, 27 Oct 2014 21:41:18 -0400 Subject: [Numpy-discussion] Why ndarray provides four ways to flatten? In-Reply-To: References: Message-ID: Hi Alexander, In my opinion - because they don't do the same thing, especially when you think in terms in lower-level. ndarray.flat returns an iterator; ndarray.flatten() returns a copy; ndarray.ravel() only makes copies when necessary; ndarray.reshape() is more general purpose, even though you can use it to flatten arrays. They are very distinct in behavior - for example, copies and views may store in the memory very differently and you would have to pay attention to the stride size if you are passing them down onto C/Fortran code. (Correct me if I am wrong please) -Shawn On Mon, Oct 27, 2014 at 8:06 PM, Alexander Belopolsky wrote: > Given an n-dim array x, I can do > > 1. x.flat > 2. x.flatten() > 3. x.ravel() > 4. x.reshape(-1) > > Each of these expressions returns a flat version of x with some variations. > Why does NumPy implement four different ways to do essentially the same > thing? > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Yuxiang "Shawn" Wang Gerling Research Lab University of Virginia yw5aj at virginia.edu +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/ From josef.pktd at gmail.com Mon Oct 27 22:12:48 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 27 Oct 2014 22:12:48 -0400 Subject: [Numpy-discussion] Choosing between NumPy and SciPy functions In-Reply-To: References: <87mw8hhcd9.fsf@dmmcf.net> Message-ID: On Mon, Oct 27, 2014 at 2:24 PM, Eelco Hoogendoorn < hoogendoorn.eelco at gmail.com> wrote: > The same occurred to me when reading that question. My personal opinion is > that such functionality should be deprecated from numpy. I don't know who > said this, but it really stuck with me: but the power of numpy is first and > foremost in it being a fantastic interface, not in being a library. > > There is nothing more annoying than every project having its own array > type. The fact that the whole scientific python stack can so seamlessly > communicate is where all good things begin. > > In my opinion, that is what numpy should focus on; basic data structures, > and tools for manipulating them. Linear algebra is way too high level for > numpy imo, and used by only a small subsets of its 'matlab-like' users. > > When I get serious about linear algebra or ffts or what have you, id > rather import an extra module that wraps a specific library. > We are not always "getting serious" about linalg, just a quick call to pinv or qr or matrix_rank or similar doesn't necessarily mean we need a linalg library with all advanced options. "@" matrix operations and linear algebra are "basic" stuff. > > On Mon, Oct 27, 2014 at 2:26 PM, D. Michael McFarland > wrote: > >> A recent post raised a question about differences in results obtained >> with numpy.linalg.eigh() and scipy.linalg.eigh(), documented at >> >> http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.eigh.html#numpy.linalg.eigh >> and >> >> http://docs.scipy.org/doc/scipy/reference/generated/scipy.linalg.eigh.html#scipy.linalg.eigh >> , >> respectively. It is clear that these functions address different >> mathematical problems (among other things, the SciPy routine can solve >> the generalized as well as standard eigenproblems); I am not concerned >> here with numerical differences in the results for problems both should >> be able to solve (the author of the original post received useful >> replies in that thread). >> >> What I would like to ask about is the situation this illustrates, where >> both NumPy and SciPy provide similar functionality (sometimes identical, >> to judge by the documentation). Is there some guidance on which is to >> be preferred? I could argue that using only NumPy when possible avoids >> unnecessary dependence on SciPy in some code, or that using SciPy >> consistently makes for a single interface and so is less error prone. >> Is there a rule of thumb for cases where SciPy names shadow NumPy names? >> >> I've used Python for a long time, but have only recently returned to >> doing serious numerical work with it. The tools are very much improved, >> but sometimes, like now, I feel I'm missing the obvious. I would >> appreciate pointers to any relevant documentation, or just a summary of >> conventional wisdom on the topic. >> > Just as opinion as user: Most of the time I don't care and treat this just as different versions. For example in the linalg case, I use by default numpy.linalg and switch to scipy if I need the extras. pinv is the only one that I ever seriously compared. Some details are nicer, np.linalg.qr(x, mode='r') returns the reduced matrix instead of the full matrix as does scipy.linalg. np.linalg.pinv is faster but maybe slightly less accurate (or defaults that make it less accurate in corner cases). scipy often has more overhead (and isfinite check by default). I just checked, I didn't even know scipy.linalg also has an `inv`. One of my arguments for np.linalg would have been that it's easy to switch between inv and pinv. For fft I use mostly scipy, IIRC. (scipy's fft imports numpy's fft, partially?) Essentially, I don't care most of the time that there are different ways of doing essentially the same thing, but some better information about the differences would be useful. Josef > >> Regards, >> Michael >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Mon Oct 27 22:50:07 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 28 Oct 2014 02:50:07 +0000 (UTC) Subject: [Numpy-discussion] Choosing between NumPy and SciPy functions References: <87mw8hhcd9.fsf@dmmcf.net> Message-ID: <1377373264436156267.918718sturla.molden-gmail.com@news.gmane.org> wrote: > For fft I use mostly scipy, IIRC. (scipy's fft imports numpy's fft, > partially?) No. SciPy uses the Fortran library FFTPACK (wrapped with f2py) and NumPy uses a smaller C library called fftpack_lite. Algorithmically they are are similar, but fftpack_lite has fewer features (e.g. no DCT). scipy.fftpack does not import numpy.fft. Neither of these libraries are very "fast", but usually they are "fast enough" for practical purposes. If we really need a kick-ass fast FFT we need to go to libraries like FFTW, Intel MKL or Apple's Accelerate Framework, or even use tools like CUDA or OpenCL to run the FFT on the GPU. But using such tools takes more coding (and reading API specifications) than the convinience of just using the FFTs already in NumPy or SciPy. So if you count in your own time as well, it might not be that FFTW or MKL are the "faster" FFTs. Sturla From sturla.molden at gmail.com Mon Oct 27 23:07:12 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 28 Oct 2014 03:07:12 +0000 (UTC) Subject: [Numpy-discussion] Choosing between NumPy and SciPy functions References: <87mw8hhcd9.fsf@dmmcf.net> <1377373264436156267.918718sturla.molden-gmail.com@news.gmane.org> Message-ID: <1737635418436158261.295276sturla.molden-gmail.com@news.gmane.org> Sturla Molden wrote: > If we really need a > kick-ass fast FFT we need to go to libraries like FFTW, Intel MKL or > Apple's Accelerate Framework, I should perhaps also mention FFTS here, which claim to be faster than FFTW and has a BSD licence: http://anthonix.com/ffts/index.html From josef.pktd at gmail.com Mon Oct 27 23:31:54 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 27 Oct 2014 23:31:54 -0400 Subject: [Numpy-discussion] Choosing between NumPy and SciPy functions In-Reply-To: <1377373264436156267.918718sturla.molden-gmail.com@news.gmane.org> References: <87mw8hhcd9.fsf@dmmcf.net> <1377373264436156267.918718sturla.molden-gmail.com@news.gmane.org> Message-ID: On Mon, Oct 27, 2014 at 10:50 PM, Sturla Molden wrote: > wrote: > > > For fft I use mostly scipy, IIRC. (scipy's fft imports numpy's fft, > > partially?) > > No. SciPy uses the Fortran library FFTPACK (wrapped with f2py) and NumPy > uses a smaller C library called fftpack_lite. Algorithmically they are are > similar, but fftpack_lite has fewer features (e.g. no DCT). scipy.fftpack > does not import numpy.fft. Neither of these libraries are very "fast", but > usually they are "fast enough" for practical purposes. If we really need a > kick-ass fast FFT we need to go to libraries like FFTW, Intel MKL or > Apple's Accelerate Framework, or even use tools like CUDA or OpenCL to run > the FFT on the GPU. But using such tools takes more coding (and reading API > specifications) than the convinience of just using the FFTs already in > NumPy or SciPy. So if you count in your own time as well, it might not be > that FFTW or MKL are the "faster" FFTs. > Ok, I didn't remember correctly. I didn't use much fft recently, I never used DCT. My favorite "fft function" is fftconvolve. https://github.com/scipy/scipy/blob/e758c482efb8829685dcf494bdf71eeca3dd77f0/scipy/signal/signaltools.py#L13 doesn't seem to mind mixing numpy and scipy (quick github search) It's sometimes useful to have simplified functions that are "good enough" where we don't have to figure out all the extras that the docstring of the fancy version is mentioning. Josef > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Oct 27 23:41:29 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 27 Oct 2014 23:41:29 -0400 Subject: [Numpy-discussion] Choosing between NumPy and SciPy functions In-Reply-To: References: <87mw8hhcd9.fsf@dmmcf.net> <1377373264436156267.918718sturla.molden-gmail.com@news.gmane.org> Message-ID: On Mon, Oct 27, 2014 at 11:31 PM, wrote: > > > On Mon, Oct 27, 2014 at 10:50 PM, Sturla Molden > wrote: > >> wrote: >> >> > For fft I use mostly scipy, IIRC. (scipy's fft imports numpy's fft, >> > partially?) >> >> No. SciPy uses the Fortran library FFTPACK (wrapped with f2py) and NumPy >> uses a smaller C library called fftpack_lite. Algorithmically they are are >> similar, but fftpack_lite has fewer features (e.g. no DCT). scipy.fftpack >> does not import numpy.fft. Neither of these libraries are very "fast", but >> usually they are "fast enough" for practical purposes. If we really need a >> kick-ass fast FFT we need to go to libraries like FFTW, Intel MKL or >> Apple's Accelerate Framework, or even use tools like CUDA or OpenCL to run >> the FFT on the GPU. But using such tools takes more coding (and reading >> API >> specifications) than the convinience of just using the FFTs already in >> NumPy or SciPy. So if you count in your own time as well, it might not be >> that FFTW or MKL are the "faster" FFTs. >> > > > Ok, I didn't remember correctly. > > I didn't use much fft recently, I never used DCT. My favorite "fft > function" is fftconvolve. > > https://github.com/scipy/scipy/blob/e758c482efb8829685dcf494bdf71eeca3dd77f0/scipy/signal/signaltools.py#L13 > doesn't seem to mind mixing numpy and scipy (quick github search) > > > It's sometimes useful to have simplified functions that are "good enough" > where we don't have to figure out all the extras that the docstring of the > fancy version is mentioning. > I take this back (even if it's true), because IMO the defaults should work, and I have a tendency to pile on options in my code that are intended for experts. Josef > > > Josef > > > >> >> Sturla >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Tue Oct 28 00:06:42 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 27 Oct 2014 21:06:42 -0700 Subject: [Numpy-discussion] Choosing between NumPy and SciPy functions In-Reply-To: <1737635418436158261.295276sturla.molden-gmail.com@news.gmane.org> References: <87mw8hhcd9.fsf@dmmcf.net> <1377373264436156267.918718sturla.molden-gmail.com@news.gmane.org> <1737635418436158261.295276sturla.molden-gmail.com@news.gmane.org> Message-ID: Hi, On Mon, Oct 27, 2014 at 8:07 PM, Sturla Molden wrote: > Sturla Molden wrote: > >> If we really need a >> kick-ass fast FFT we need to go to libraries like FFTW, Intel MKL or >> Apple's Accelerate Framework, > > I should perhaps also mention FFTS here, which claim to be faster than FFTW > and has a BSD licence: > > http://anthonix.com/ffts/index.html Nice. And a funny New Zealand name too. Is this an option for us? Aren't we a little behind the performance curve on FFT after we lost FFTW? Matthew From njs at pobox.com Tue Oct 28 00:28:37 2014 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 28 Oct 2014 04:28:37 +0000 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) Message-ID: On 28 Oct 2014 04:07, "Matthew Brett" wrote: > > Hi, > > On Mon, Oct 27, 2014 at 8:07 PM, Sturla Molden wrote: > > Sturla Molden wrote: > > > >> If we really need a > >> kick-ass fast FFT we need to go to libraries like FFTW, Intel MKL or > >> Apple's Accelerate Framework, > > > > I should perhaps also mention FFTS here, which claim to be faster than FFTW > > and has a BSD licence: > > > > http://anthonix.com/ffts/index.html > > Nice. And a funny New Zealand name too. > > Is this an option for us? Aren't we a little behind the performance > curve on FFT after we lost FFTW? It's definitely attractive. Some potential issues that might need dealing with, based on a quick skim: - seems to have a hard requirement for a processor supporting SSE, AVX, or NEON. No fallback for old CPUs or other architectures. (I'm not even sure whether it has x86-32 support.) - no runtime CPU detection, e.g. SSE vs AVX appears to be a compile time decision - not sure if it can handle non-power-of-two problems at all, or at all efficiently. (FFTPACK isn't great here either but major regressions would be bad.) - not sure if it supports all the modes we care about (e.g. rfft) This stuff is all probably solveable though, so if someone has a hankering to make numpy (or scipy) fft dramatically faster then you should get in touch with the author and see what they think. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Tue Oct 28 01:13:22 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 28 Oct 2014 05:13:22 +0000 (UTC) Subject: [Numpy-discussion] Choosing between NumPy and SciPy functions References: <87mw8hhcd9.fsf@dmmcf.net> <1377373264436156267.918718sturla.molden-gmail.com@news.gmane.org> Message-ID: <1282218786436164186.983364sturla.molden-gmail.com@news.gmane.org> wrote: https://github.com/scipy/scipy/blob/e758c482efb8829685dcf494bdf71eeca3dd77f0/scipy/signal/signaltools.py#L13 > doesn't seem to mind mixing numpy and scipy (quick github search) I believe it is because NumPy's FFTs (beginning with 1.9.0) are thread-safe. But FFTs from numpy.fft and scipy.fftpack should be rather similar in performance. (Except if you use Enthought, in which case the former is much faster.) It seems from the code that fftconvolve does not use overlap-add or overlap-save. I seem to remember that it did before, but I might be wrong. Personally I prefer to use overlap-add instead of a very long FFT. There is also a scipy.fftpack.convolve module. I have not used it though. Sturla From sturla.molden at gmail.com Tue Oct 28 01:24:46 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 28 Oct 2014 05:24:46 +0000 (UTC) Subject: [Numpy-discussion] Choosing between NumPy and SciPy functions References: <87mw8hhcd9.fsf@dmmcf.net> <1377373264436156267.918718sturla.molden-gmail.com@news.gmane.org> <1737635418436158261.295276sturla.molden-gmail.com@news.gmane.org> Message-ID: <2093057557436166186.834296sturla.molden-gmail.com@news.gmane.org> Matthew Brett wrote: > Is this an option for us? Aren't we a little behind the performance > curve on FFT after we lost FFTW? It does not run on Windows because it uses POSIX to allocate executable memory for tasklets, as i understand it. By the way, why did we loose FFTW, apart from GPL? One thing to mention here is that MKL supports the FFTW APIs. If we can use MKL for linalg and numpy.dot I don't see why we cannot use it for FFT. On Mac there is also vDSP in Accelerate framework which has an insanely fast FFT (also claimed to be faster than FFTW). Since it is a system library there should be no license problems. There are clearly options if someone wants to work on it and maintain it. Sturla From cournape at gmail.com Tue Oct 28 02:28:05 2014 From: cournape at gmail.com (David Cournapeau) Date: Tue, 28 Oct 2014 06:28:05 +0000 Subject: [Numpy-discussion] Choosing between NumPy and SciPy functions In-Reply-To: <2093057557436166186.834296sturla.molden-gmail.com@news.gmane.org> References: <87mw8hhcd9.fsf@dmmcf.net> <1377373264436156267.918718sturla.molden-gmail.com@news.gmane.org> <1737635418436158261.295276sturla.molden-gmail.com@news.gmane.org> <2093057557436166186.834296sturla.molden-gmail.com@news.gmane.org> Message-ID: On Tue, Oct 28, 2014 at 5:24 AM, Sturla Molden wrote: > Matthew Brett wrote: > > > Is this an option for us? Aren't we a little behind the performance > > curve on FFT after we lost FFTW? > > It does not run on Windows because it uses POSIX to allocate executable > memory for tasklets, as i understand it. > > By the way, why did we loose FFTW, apart from GPL? One thing to mention > here is that MKL supports the FFTW APIs. If we can use MKL for linalg and > numpy.dot I don't see why we cannot use it for FFT. > The problem is APIs: MKL, Accelerate, etc... all use a standard API (BLAS/LAPACK), but for FFT, you need to reimplement pretty much the whole thing. Unsurprisingly, this meant the code was not well maintained. Wrapping non standard, non-BSD libraries makes much more sense in separate libraries in general. David > On Mac there is also vDSP in Accelerate framework which has an insanely > fast FFT (also claimed to be faster than FFTW). Since it is a system > library there should be no license problems. > > There are clearly options if someone wants to work on it and maintain it. > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Jerome.Kieffer at esrf.fr Tue Oct 28 03:32:19 2014 From: Jerome.Kieffer at esrf.fr (Jerome Kieffer) Date: Tue, 28 Oct 2014 08:32:19 +0100 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: References: Message-ID: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> On Tue, 28 Oct 2014 04:28:37 +0000 Nathaniel Smith wrote: > It's definitely attractive. Some potential issues that might need dealing > with, based on a quick skim: In my tests, numpy's FFTPACK isn't that bad considering * (virtually) no extra overhead for installation * (virtually) no plan creation time * not that slower for each transformation Because the plan creation was taking ages with FFTw, numpy's FFTPACK was often faster (overall) Cheers, -- J?r?me Kieffer tel +33 476 882 445 From robert.kern at gmail.com Tue Oct 28 05:02:45 2014 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 28 Oct 2014 09:02:45 +0000 Subject: [Numpy-discussion] numpy.i and std::complex In-Reply-To: <1040561343436145342.607784sturla.molden-gmail.com@news.gmane.org> References: <1330968997436117273.144572sturla.molden-gmail.com@news.gmane.org> <440104262436119434.352522sturla.molden-gmail.com@news.gmane.org> <1040561343436145342.607784sturla.molden-gmail.com@news.gmane.org> Message-ID: On Mon, Oct 27, 2014 at 11:36 PM, Sturla Molden wrote: > Robert Kern wrote: > >> Please stop haranguing the new guy for not knowing things that you >> know. > > I am not doing any of that. You are the only one haranguing here. I understand that it's not your *intention*, so please take this as a well-meant caution from a outside observer that it *is* the effect of your words on other people, and if you intend something else, you may want to consider your words more carefully. The polite, welcoming response to someone coming along with a straightforward, obviously-correct contribution to our SWIG capabilities is "Thank you!", not "perhaps you overestimate the number of NumPy users who use Swig". You are entitled to your opinions on the relative merits of Cython and SWIG and to argue for them, but not every thread mentioning SWIG is an appropriate forum for hashing out that argument. -- Robert Kern From charlesr.harris at gmail.com Tue Oct 28 05:19:15 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 28 Oct 2014 03:19:15 -0600 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> Message-ID: On Tue, Oct 28, 2014 at 1:32 AM, Jerome Kieffer wrote: > On Tue, 28 Oct 2014 04:28:37 +0000 > Nathaniel Smith wrote: > > > It's definitely attractive. Some potential issues that might need dealing > > with, based on a quick skim: > > In my tests, numpy's FFTPACK isn't that bad considering > * (virtually) no extra overhead for installation > * (virtually) no plan creation time > * not that slower for each transformation > > Because the plan creation was taking ages with FFTw, numpy's FFTPACK was > often faster (overall) > > Cheers, > Ondrej says that f90 fftpack (his mod) runs faster than fftw. The main thing missing from fftpack is the handling of transform sizes that are not products of 2,3,4,5. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Tue Oct 28 05:41:43 2014 From: cournape at gmail.com (David Cournapeau) Date: Tue, 28 Oct 2014 09:41:43 +0000 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> Message-ID: On Tue, Oct 28, 2014 at 9:19 AM, Charles R Harris wrote: > > > On Tue, Oct 28, 2014 at 1:32 AM, Jerome Kieffer > wrote: > >> On Tue, 28 Oct 2014 04:28:37 +0000 >> Nathaniel Smith wrote: >> >> > It's definitely attractive. Some potential issues that might need >> dealing >> > with, based on a quick skim: >> >> In my tests, numpy's FFTPACK isn't that bad considering >> * (virtually) no extra overhead for installation >> * (virtually) no plan creation time >> * not that slower for each transformation >> >> Because the plan creation was taking ages with FFTw, numpy's FFTPACK was >> often faster (overall) >> >> Cheers, >> > > Ondrej says that f90 fftpack (his mod) runs faster than fftw. > I would be interested to see the benchmarks for this. The real issue with fftw (besides the license) is the need for plan computation, which are expensive (but are not needed for each transform). Handling this in a way that is user friendly while tweakable for advanced users is not easy, and IMO more appropriate for a separate package. The main thing missing from fftpack is the handling of transform sizes that > are not products of 2,3,4,5. > Strickly speaking, it is handled, just not through an FFT (it goes back to the brute force O(N**2)). I made some experiments with the Bluestein transform to handle prime transforms on fftpack, but the precision seemed to be an issue. Maybe I should revive this work (if I still have it somewhere). David -------------- next part -------------- An HTML attachment was scrubbed... URL: From heng at cantab.net Tue Oct 28 05:46:59 2014 From: heng at cantab.net (Henry Gomersall) Date: Tue, 28 Oct 2014 09:46:59 +0000 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> Message-ID: <544F6613.1030402@cantab.net> On 28/10/14 09:41, David Cournapeau wrote: > The real issue with fftw (besides the license) is the need for plan > computation, which are expensive (but are not needed for each > transform). Handling this in a way that is user friendly while > tweakable for advanced users is not easy, and IMO more appropriate for > a separate package. Just on this, I like to think I've largely solved the issue with: https://github.com/hgomersall/pyFFTW If you have suggestions on how it can be improved, I'm all ears (there are a few things in the pipeline, like creating FFTW objects for different types of transform more explicit, which is likely to be the main difference for the next major version). Cheers, Henry From heng at cantab.net Tue Oct 28 05:50:43 2014 From: heng at cantab.net (Henry Gomersall) Date: Tue, 28 Oct 2014 09:50:43 +0000 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: References: Message-ID: <544F66F3.9080706@cantab.net> On 28/10/14 04:28, Nathaniel Smith wrote: > > - not sure if it can handle non-power-of-two problems at all, or at > all efficiently. (FFTPACK isn't great here either but major > regressions would be bad.) > From my reading, this seems to be the biggest issue with FFTS (from my reading as well) and where FFTW really wins. Having a faster algorithm used when it will work, with fallback to fftpack (or something else) is a good solution IMO. Henry From stefan at sun.ac.za Tue Oct 28 05:53:44 2014 From: stefan at sun.ac.za (Stefan van der Walt) Date: Tue, 28 Oct 2014 11:53:44 +0200 Subject: [Numpy-discussion] Choosing between NumPy and SciPy functions In-Reply-To: <87mw8hhcd9.fsf@dmmcf.net> References: <87mw8hhcd9.fsf@dmmcf.net> Message-ID: <87r3xscyfr.fsf@sun.ac.za> Hi Michael On 2014-10-27 15:26:58, D. Michael McFarland wrote: > What I would like to ask about is the situation this illustrates, where > both NumPy and SciPy provide similar functionality (sometimes identical, > to judge by the documentation). Is there some guidance on which is to > be preferred? I could argue that using only NumPy when possible avoids > unnecessary dependence on SciPy in some code, or that using SciPy > consistently makes for a single interface and so is less error prone. > Is there a rule of thumb for cases where SciPy names shadow NumPy names? I'm not sure if you've received an answer to your question so far. My advice: use the SciPy functions. SciPy is often built on more extensive Fortran libraries not available during NumPy compilation, and I am not aware of any cases where a function in NumPy is faster or more extensive than the equivalent in SciPy. If you want code that falls back gracefully when SciPy is not available, you may use the ``numpy.dual`` library. Regards St?fan From sturla.molden at gmail.com Tue Oct 28 05:55:10 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 28 Oct 2014 09:55:10 +0000 (UTC) Subject: [Numpy-discussion] numpy.i and std::complex References: <1330968997436117273.144572sturla.molden-gmail.com@news.gmane.org> <440104262436119434.352522sturla.molden-gmail.com@news.gmane.org> <1040561343436145342.607784sturla.molden-gmail.com@news.gmane.org> Message-ID: <1737889037436182639.003356sturla.molden-gmail.com@news.gmane.org> Robert Kern wrote: > The polite, welcoming > response to someone coming along with a straightforward, > obviously-correct contribution to our SWIG capabilities is "Thank > you!", not "perhaps you overestimate the number of NumPy users who use > Swig". That was a response to something else. As to why this issue with NumPy and Swig has not been solved before, the OP suggested he might have overestimated the number of NumPy users who also use std::complex in C++. Hence my answer he did not (it is arguably not that uncommon), but maybe they don't use Swig. From sturla.molden at gmail.com Tue Oct 28 06:11:41 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 28 Oct 2014 10:11:41 +0000 (UTC) Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> Message-ID: <499024108436183593.465103sturla.molden-gmail.com@news.gmane.org> Jerome Kieffer wrote: > Because the plan creation was taking ages with FFTw, numpy's FFTPACK was > often faster (overall) Matlab switched from FFTPACK to FFTW because the latter was faster in general. If FFTW guesses a plan it does not take very long. Actual measurements can be slow, however, but those are not needed. Sturla From sturla.molden at gmail.com Tue Oct 28 06:11:42 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 28 Oct 2014 10:11:42 +0000 (UTC) Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> Message-ID: <1865975564436183038.983522sturla.molden-gmail.com@news.gmane.org> David Cournapeau wrote: > The real issue with fftw (besides the license) is the need for plan > computation, which are expensive (but are not needed for each transform). This is not a problem if you thell FFTW to guess a plan instead of making measurements. FFTPACK needs to set up a look-up table too. > I made some experiments with the Bluestein transform to handle prime > transforms on fftpack, but the precision seemed to be an issue. Maybe I > should revive this work (if I still have it somewhere). You have it in a branch on Github. Sturla From pierre at barbierdereuille.net Tue Oct 28 06:23:09 2014 From: pierre at barbierdereuille.net (Pierre Barbier de Reuille) Date: Tue, 28 Oct 2014 10:23:09 +0000 Subject: [Numpy-discussion] Choosing between NumPy and SciPy functions References: <87mw8hhcd9.fsf@dmmcf.net> <87r3xscyfr.fsf@sun.ac.za> Message-ID: I would add one element to the discussion: for some (odd) reasons, SciPy is lacking the functions `rfftn` and `irfftn`, functions using half the memory space compared to their non-real equivalent `fftn` and `ifftn`. However, I haven't (yet) seriously tested `scipy.fftpack.fftn` vs. `np.fft.rfftn` to check if there is a serious performance gain (beside memory usage). Cheers, Pierre On Tue Oct 28 2014 at 10:54:00 Stefan van der Walt wrote: > Hi Michael > > On 2014-10-27 15:26:58, D. Michael McFarland wrote: > > What I would like to ask about is the situation this illustrates, where > > both NumPy and SciPy provide similar functionality (sometimes identical, > > to judge by the documentation). Is there some guidance on which is to > > be preferred? I could argue that using only NumPy when possible avoids > > unnecessary dependence on SciPy in some code, or that using SciPy > > consistently makes for a single interface and so is less error prone. > > Is there a rule of thumb for cases where SciPy names shadow NumPy names? > > I'm not sure if you've received an answer to your question so far. My > advice: use the SciPy functions. SciPy is often built on more extensive > Fortran libraries not available during NumPy compilation, and I am not > aware of any cases where a function in NumPy is faster or more extensive > than the equivalent in SciPy. > > If you want code that falls back gracefully when SciPy is not available, > you may use the ``numpy.dual`` library. > > Regards > St?fan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Tue Oct 28 06:49:38 2014 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 28 Oct 2014 06:49:38 -0400 Subject: [Numpy-discussion] multi-dimensional c++ proposal References: Message-ID: Sturla Molden wrote: > On 27/10/14 13:14, Neal Becker wrote: >> The multi-dimensional c++ stuff is interesting (about time!) >> >> http://www.open-std.org/JTC1/SC22/WG21/docs/papers/2014/n3851.pdf > > OMG, that API is about as awful as it gets. Obviously it is written by > two computer scientists who do not understand what scientific and > technical computing actually needs. There is a reason why many > scientists still prefer Fortran to C++, and I think this proposal shows > us why. An API like that will never be suitable for implementing complex > numerical alorithms. It will fail horribly because it is *not readable*. > I have no doubt it will be accepted though, because the C++ standards > committee tends to accept unusable things. > > Sturla That's harsh! Do you have any specific features you dislike? Are you objecting to the syntax? -- -- Those who don't understand recursion are doomed to repeat it From sturla.molden at gmail.com Tue Oct 28 07:40:26 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 28 Oct 2014 11:40:26 +0000 (UTC) Subject: [Numpy-discussion] Choosing between NumPy and SciPy functions References: <87mw8hhcd9.fsf@dmmcf.net> <87r3xscyfr.fsf@sun.ac.za> Message-ID: <1326045948436187413.283095sturla.molden-gmail.com@news.gmane.org> Pierre Barbier de Reuille wrote: > I would add one element to the discussion: for some (odd) reasons, SciPy is > lacking the functions `rfftn` and `irfftn`, functions using half the memory > space compared to their non-real equivalent `fftn` and `ifftn`. In both NumPy and SciPy the N-dimensional FFTs are implemented in Python. It is just a Python loop over all the axes, calling fft or rfft on each axis. > However, I > haven't (yet) seriously tested `scipy.fftpack.fftn` vs. `np.fft.rfftn` to > check if there is a serious performance gain (beside memory usage). Real-value FFT is implemented with complex-value FFT. You save half the memory, but not quite half the computation. Apart from that, the FFT in SciPy is written in Fortran and the FFT in NumPy is written in C, but they are algorithmically similar. I don't see any good reason why the Fortran code in SciPy should be faster than the C code in NumPy. It used to be the case that Fortran was "faster than C", everything else being equal, but with modern C compilers and CPUs with deep pipelines and branch prediction this is rarely the case. So I would expect the NumPy rfftn to be slightly faster than SciPy fftn, but keep in mind that both have a huge Python overhead. Sturla From sturla.molden at gmail.com Tue Oct 28 08:02:57 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 28 Oct 2014 12:02:57 +0000 (UTC) Subject: [Numpy-discussion] multi-dimensional c++ proposal References: Message-ID: <900584976436189260.072979sturla.molden-gmail.com@news.gmane.org> Neal Becker wrote: > That's harsh! Do you have any specific features you dislike? Are you objecting > to the syntax? I have programmed C++ for almost 15 years. But I cannot look at the proposed code an get a mental image of what it does. It is not a specific feature, but how the code looks in general. This is e.g. not a problem with Eigen or Blitz, if you know C++ it is not particularly hard to read. Not as nice as Fortran or Cython, but ut is still not too bad. Boost multiarray suffers from not being particularly readable, however, but this proposal is even worse. I expect that scientists and engineers will not use an unreadable array API. When we write or maintain numerical algorithms we need to get a mental image of the code, because we actually spend most of the time looking at or reading the code. I agree that C++ needs multidimensional arrays in the STL, but this proposal will do more harm than good. In particular it will prevent adoption of a usable array API. And as consequence, it will fuel the problem it is trying to solve: C++ programmers will still used homebrewed multiarray classes, because there is no obvious replacement in the standard library. Sturla From njs at pobox.com Tue Oct 28 10:31:33 2014 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 28 Oct 2014 14:31:33 +0000 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> Message-ID: On 28 Oct 2014 07:32, "Jerome Kieffer" wrote: > > On Tue, 28 Oct 2014 04:28:37 +0000 > Nathaniel Smith wrote: > > > It's definitely attractive. Some potential issues that might need dealing > > with, based on a quick skim: > > In my tests, numpy's FFTPACK isn't that bad considering > * (virtually) no extra overhead for installation > * (virtually) no plan creation time > * not that slower for each transformation Well, this is what makes FFTS intriguing :-). It's BSD licensed, so we could distribute it by default like we do fftpack, it uses cache-oblivious algorithms so it has no planning step, and even without planning it benchmarks as faster than FFTW's most expensive planning mode (in the cases that FFTS supports, i.e. power-of-two transforms). The paper has lots of benchmark graphs, including measurements of setup time: http://anthonix.com/ffts/preprints/tsp2013.pdf -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Tue Oct 28 10:47:57 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Tue, 28 Oct 2014 15:47:57 +0100 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: <1865975564436183038.983522sturla.molden-gmail.com@news.gmane.org> References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> <1865975564436183038.983522sturla.molden-gmail.com@news.gmane.org> Message-ID: If I may 'hyjack' the discussion back to the meta-point: should we be having this discussion on the numpy mailing list at all? Perhaps the 'batteries included' philosophy made sense in the early days of numpy; but given that there are several fft libraries with their own pros and cons, and that most numpy projects will use none of them at all, why should numpy bundle any of them? To have a scipy.linalg and scipy.fft makes sense to me, although import pyfftw or import pyFFTPACK would arguably be better still. Just as in the case of linear algebra, those different libraries represent meaningful differences, and if the user wants to paper over those differences with a named import they are always free to do so themselves, explicitly. To be sure, the maintenance of quality fft libraries should be part of the numpy/scipy-stack in some way or another. But I would argue that the core thing that numpy should do is ndarrays alone. On Tue, Oct 28, 2014 at 11:11 AM, Sturla Molden wrote: > David Cournapeau wrote: > > > The real issue with fftw (besides the license) is the need for plan > > computation, which are expensive (but are not needed for each transform). > > This is not a problem if you thell FFTW to guess a plan instead of making > measurements. FFTPACK needs to set up a look-up table too. > > > I made some experiments with the Bluestein transform to handle prime > > transforms on fftpack, but the precision seemed to be an issue. Maybe I > > should revive this work (if I still have it somewhere). > > You have it in a branch on Github. > > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Tue Oct 28 11:06:46 2014 From: cournape at gmail.com (David Cournapeau) Date: Tue, 28 Oct 2014 15:06:46 +0000 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> Message-ID: I On Tue, Oct 28, 2014 at 2:31 PM, Nathaniel Smith wrote: > On 28 Oct 2014 07:32, "Jerome Kieffer" wrote: > > > > On Tue, 28 Oct 2014 04:28:37 +0000 > > Nathaniel Smith wrote: > > > > > It's definitely attractive. Some potential issues that might need > dealing > > > with, based on a quick skim: > > > > In my tests, numpy's FFTPACK isn't that bad considering > > * (virtually) no extra overhead for installation > > * (virtually) no plan creation time > > * not that slower for each transformation > > Well, this is what makes FFTS intriguing :-). It's BSD licensed, so we > could distribute it by default like we do fftpack, it uses cache-oblivious > algorithms so it has no planning step, and even without planning it > benchmarks as faster than FFTW's most expensive planning mode (in the cases > that FFTS supports, i.e. power-of-two transforms). > > The paper has lots of benchmark graphs, including measurements of setup > time: > http://anthonix.com/ffts/preprints/tsp2013.pdf > Nice. In this case, the solution may be to implement the Bluestein transform to deal with prime/near-prime numbers on top of FFTS. I did not look much, but it did not obviously support building on windows as well ? David > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Oct 28 11:20:04 2014 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 28 Oct 2014 15:20:04 +0000 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> <1865975564436183038.983522sturla.molden-gmail.com@news.gmane.org> Message-ID: On 28 Oct 2014 14:48, "Eelco Hoogendoorn" wrote: > > If I may 'hyjack' the discussion back to the meta-point: > > should we be having this discussion on the numpy mailing list at all? Of course we should. > Perhaps the 'batteries included' philosophy made sense in the early days of numpy; but given that there are several fft libraries with their own pros and cons, and that most numpy projects will use none of them at all, why should numpy bundle any of them? Certainly there's a place for fancy 3rd-party fft libraries. But fft is such a basic algorithm that it'd be silly to ask people who just need a quick one-off fft to go evaluate a bunch of third-party libraries. For many users, downloading one of these libraries will take longer than just doing their Fourier transform with an O(N**2) algorithm :-). And besides that there's tons of existing code that uses np.fft. So np.fft will continue to exist, and given that it exists we should make it as good as we can. > To have a scipy.linalg and scipy.fft makes sense to me, although import pyfftw or import pyFFTPACK would arguably be better still. Just as in the case of linear algebra, those different libraries represent meaningful differences, and if the user wants to paper over those differences with a named import they are always free to do so themselves, explicitly. To be sure, the maintenance of quality fft libraries should be part of the numpy/scipy-stack in some way or another. But I would argue that the core thing that numpy should do is ndarrays alone. According to some sort of abstract project planning aesthetics, perhaps. But I don't see how fractionating numpy into lots of projects would provide any benefit for users. (If we split numpy into 10 subprojects then probably 7 of them would never release, because we barely have the engineering to do release management now.) CS courses often teach that more modular = more better. That's because they're desperate to stop newbies from creating balls of mush, though, not because it's the whole truth :-). It's always true that an organized codebase is better than a ball of mush, but abstraction barriers, decoupling, etc. have real and important costs, and this needs to be taken into account. (See e.g. the Torvalds/Tenenbaum debate.) And in any case, this ship sailed a long time ago. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Tue Oct 28 11:50:44 2014 From: cournape at gmail.com (David Cournapeau) Date: Tue, 28 Oct 2014 15:50:44 +0000 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> Message-ID: On Tue, Oct 28, 2014 at 3:06 PM, David Cournapeau wrote: > I > > On Tue, Oct 28, 2014 at 2:31 PM, Nathaniel Smith wrote: > >> On 28 Oct 2014 07:32, "Jerome Kieffer" wrote: >> > >> > On Tue, 28 Oct 2014 04:28:37 +0000 >> > Nathaniel Smith wrote: >> > >> > > It's definitely attractive. Some potential issues that might need >> dealing >> > > with, based on a quick skim: >> > >> > In my tests, numpy's FFTPACK isn't that bad considering >> > * (virtually) no extra overhead for installation >> > * (virtually) no plan creation time >> > * not that slower for each transformation >> >> Well, this is what makes FFTS intriguing :-). It's BSD licensed, so we >> could distribute it by default like we do fftpack, it uses cache-oblivious >> algorithms so it has no planning step, and even without planning it >> benchmarks as faster than FFTW's most expensive planning mode (in the cases >> that FFTS supports, i.e. power-of-two transforms). >> >> The paper has lots of benchmark graphs, including measurements of setup >> time: >> http://anthonix.com/ffts/preprints/tsp2013.pdf >> > > Nice. In this case, the solution may be to implement the Bluestein > transform to deal with prime/near-prime numbers on top of FFTS. > > I did not look much, but it did not obviously support building on windows > as well ? > Ok, I took a quick look at it, and it will be a significant effort to be able to make FFTS work at all with MSVC on windows: - the code is not C89 compatible - it uses code generation using POSIX library. One would need to port that part to using Win32 API as well. - the test suite looks really limited (roundtripping only). The codebase does not seem particularly well written either (but neither is FFTPACK to be fair). Nothing impossible (looks like Sony at least uses this code on windows: https://github.com/anthonix/ffts/issues/27#issuecomment-40204403), but not a 2 hours thing either. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndarray at mac.com Tue Oct 28 12:58:08 2014 From: ndarray at mac.com (Alexander Belopolsky) Date: Tue, 28 Oct 2014 12:58:08 -0400 Subject: [Numpy-discussion] Why ndarray provides four ways to flatten? In-Reply-To: References: Message-ID: On Mon, Oct 27, 2014 at 9:41 PM, Yuxiang Wang wrote: > In my opinion - because they don't do the same thing, especially when > you think in terms in lower-level. > > ndarray.flat returns an iterator; ndarray.flatten() returns a copy; > ndarray.ravel() only makes copies when necessary; ndarray.reshape() is > more general purpose, even though you can use it to flatten arrays. > Out of the four ways, I find x.flat the most confusing. Unfortunately, it is also the most obvious name for the operation (and "ravel" is the least, but it is the fault of the English language where "to ravel" means "to unravel."). What x.flat returns, is not really an iterator. It is some hybrid between a view and an iterator. Consider this: >>> x = numpy.arange(6).reshape((2,3)) >>> i = x.flat >>> i.next() 0 >>> i.next() 1 >>> i.next() 2 So far no surprises, but what should i[0] return now? If you think of i as a C pointer you would expect 3, but >>> i[0] 0 What is worse, the above resets the index and now >>> i.index 0 OK, so now I expect that i[5] will reset the index to 5, but no >>> i[5] 5 >>> i.index 0 When would you prefer to use x.flat over x.ravel()? Is x.reshape(-1) always equivalent to x.ravel()? What is x.flat.copy()? Is it the same as x.flatten()? Why does flatiter even have a .copy() method? Isn't i.copy() the same as i.base.flatten(), only slower? And with all these methods, I still don't have the one that would flatten any array including a nested array like this: >>> x = np.array([np.arange(2), np.arange(3), np.arange(4)]) I need yet another function here, for example >>> np.hstack(x) array([0, 1, 0, 1, 2, 0, 1, 2, 3]) and what if I want to flatten a higher dimensional nested array, say >>> y = np.array([x[:1],x[:2],x]) can I do better than >>> np.hstack(np.hstack(y)) array([0, 1, 0, 1, 0, 1, 2, 0, 1, 0, 1, 2, 0, 1, 2, 3]) ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Tue Oct 28 13:21:30 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 28 Oct 2014 17:21:30 +0000 (UTC) Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> <1865975564436183038.983522sturla.molden-gmail.com@news.gmane.org> Message-ID: <1072140078436209472.929185sturla.molden-gmail.com@news.gmane.org> Eelco Hoogendoorn wrote: > Perhaps the 'batteries included' philosophy made sense in the early days of > numpy; but given that there are several fft libraries with their own pros > and cons, and that most numpy projects will use none of them at all, why > should numpy bundle any of them? Because sometimes we just need to compute a DFT, just like we sometimes need to compute a sine or an exponential. It does that job perfectly well. It is not always about speed. Just typing np.fft.fft(x) is convinient. Sturla From njs at pobox.com Tue Oct 28 13:25:05 2014 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 28 Oct 2014 17:25:05 +0000 Subject: [Numpy-discussion] Why ndarray provides four ways to flatten? In-Reply-To: References: Message-ID: On 28 Oct 2014 16:58, "Alexander Belopolsky" wrote: > > On Mon, Oct 27, 2014 at 9:41 PM, Yuxiang Wang wrote: > >> In my opinion - because they don't do the same thing, especially when >> you think in terms in lower-level. >> >> ndarray.flat returns an iterator; ndarray.flatten() returns a copy; >> ndarray.ravel() only makes copies when necessary; ndarray.reshape() is >> more general purpose, even though you can use it to flatten arrays. > > > Out of the four ways, I find x.flat the most confusing. I too would be curious to know why .flat exists (beyond "it seemed like a good idea at the time" ;-)). I've always treated it as some weird legacy thing and ignored it, and this has worked out well for me. Is there any real problem where .flat is really the best solution? Should we deprecate it, or at least warn people off from it officially? -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniele at grinta.net Tue Oct 28 13:37:17 2014 From: daniele at grinta.net (Daniele Nicolodi) Date: Tue, 28 Oct 2014 18:37:17 +0100 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> Message-ID: <544FD44D.1020406@grinta.net> On 28/10/14 16:50, David Cournapeau wrote: > Nothing impossible (looks like Sony at least uses this code on windows: > https://github.com/anthonix/ffts/issues/27#issuecomment-40204403), but > not a 2 hours thing either. One of the downsides of the BSD license :) Cheers, Daniele From alan.isaac at gmail.com Tue Oct 28 13:40:18 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Tue, 28 Oct 2014 13:40:18 -0400 Subject: [Numpy-discussion] Why ndarray provides four ways to flatten? In-Reply-To: References: Message-ID: <544FD502.5000007@gmail.com> On 10/28/2014 1:25 PM, Nathaniel Smith wrote: > I too would be curious to know why .flat exists (beyond "it seemed like a good idea at the time" How else would you iterate over all items of a multidimensional array? As an example application, use it to assign to an arbitrary diagonal. (It can be sliced.) I don't recall the specifics at the moment, but I've been happy to have it in the past. Alan Isaac From shoyer at gmail.com Tue Oct 28 13:42:36 2014 From: shoyer at gmail.com (Stephan Hoyer) Date: Tue, 28 Oct 2014 10:42:36 -0700 Subject: [Numpy-discussion] Why ndarray provides four ways to flatten? In-Reply-To: References: Message-ID: On Tue, Oct 28, 2014 at 10:25 AM, Nathaniel Smith wrote: > I too would be curious to know why .flat exists (beyond "it seemed like a > good idea at the time" ;-)). I've always treated it as some weird legacy > thing and ignored it, and this has worked out well for me. > > Is there any real problem where .flat is really the best solution? Should > we deprecate it, or at least warn people off from it officially? > > .flat lets you iterate over all elements of a N-dimensional array as if it was 1D, without ever needing to make a copy of the array. In contrast, ravel() and reshape(-1) cannot always avoid a copy, because they need to return another ndarray. np.nditer is a reasonable alternative to .flat (and it's documented as such), but it's a rather inelegant, kitchen-sink type function. -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Tue Oct 28 13:44:31 2014 From: stefan at sun.ac.za (Stefan van der Walt) Date: Tue, 28 Oct 2014 19:44:31 +0200 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: <544FD44D.1020406@grinta.net> References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> <544FD44D.1020406@grinta.net> Message-ID: <87egtsay2o.fsf@sun.ac.za> On 2014-10-28 19:37:17, Daniele Nicolodi wrote: > On 28/10/14 16:50, David Cournapeau wrote: >> Nothing impossible (looks like Sony at least uses this code on windows: >> https://github.com/anthonix/ffts/issues/27#issuecomment-40204403), but >> not a 2 hours thing either. > > One of the downsides of the BSD license :) Perhaps one of the upsides, as they may be willing to contribute back if asked nicely. St?fan From daniele at grinta.net Tue Oct 28 13:55:57 2014 From: daniele at grinta.net (Daniele Nicolodi) Date: Tue, 28 Oct 2014 18:55:57 +0100 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: <87egtsay2o.fsf@sun.ac.za> References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> <544FD44D.1020406@grinta.net> <87egtsay2o.fsf@sun.ac.za> Message-ID: <544FD8AD.2000600@grinta.net> On 28/10/14 18:44, Stefan van der Walt wrote: > On 2014-10-28 19:37:17, Daniele Nicolodi wrote: >> On 28/10/14 16:50, David Cournapeau wrote: >>> Nothing impossible (looks like Sony at least uses this code on windows: >>> https://github.com/anthonix/ffts/issues/27#issuecomment-40204403), but >>> not a 2 hours thing either. >> >> One of the downsides of the BSD license :) > > Perhaps one of the upsides, as they may be willing to contribute back if > asked nicely. If it would be GPL or similar the would have to, and there would not be need to ask nicely. Cheers, Daniele From alan.isaac at gmail.com Tue Oct 28 14:03:34 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Tue, 28 Oct 2014 14:03:34 -0400 Subject: [Numpy-discussion] Why ndarray provides four ways to flatten? In-Reply-To: References: Message-ID: <544FDA76.3000508@gmail.com> On 10/28/2014 1:42 PM, Stephan Hoyer wrote: > np.nditer is a reasonable alternative to .flat (and it's documented as such), but it's a rather inelegant, kitchen-sink type function. I'm not sure what "reasonable" means here, other than "in principle, possible to use". In particular, `flat` is much more elegant, and includes an automatic guarantee that the iterations will be in C-contiguous style. Alan From fperez.net at gmail.com Tue Oct 28 14:09:47 2014 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 28 Oct 2014 11:09:47 -0700 Subject: [Numpy-discussion] [ANN] Python for Scientific Computing conference in Boulder, CO; April'15 Message-ID: Hi folks, a colleague from NCAR in Boulder just sent me this link about a conference they are organizing in the spring: https://sea.ucar.edu/conference/2015 I figured this might be of interest to many on these lists. The actual call isn't up yet, so if you're interested, watch that site for an upcoming call when they post it (I'm not directly involved, just passing the message along). Cheers f -- Fernando Perez (@fperez_org; http://fperez.org) fperez.net-at-gmail: mailing lists only (I ignore this when swamped!) fernando.perez-at-berkeley: contact me here for any direct mail -------------- next part -------------- An HTML attachment was scrubbed... URL: From marquett at iap.fr Tue Oct 28 14:23:22 2014 From: marquett at iap.fr (Jean-Baptiste Marquette) Date: Tue, 28 Oct 2014 19:23:22 +0100 Subject: [Numpy-discussion] [ANN] Python for Scientific Computing conference in Boulder, CO; April'15 In-Reply-To: References: Message-ID: Le 28 oct. 2014 ? 19:09, Fernando Perez a ?crit : > a colleague from NCAR in Boulder just sent me this link about a conference they are organizing in the spring: > Wrong year on the web page: April 13 - 17, 2014 Cheers, JB -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan.otte at gmail.com Tue Oct 28 14:34:14 2014 From: stefan.otte at gmail.com (Stefan Otte) Date: Tue, 28 Oct 2014 19:34:14 +0100 Subject: [Numpy-discussion] Generalize hstack/vstack --> stack; Block matrices like in matlab In-Reply-To: References: <101656916431878296.890307sturla.molden-gmail.com@news.gmane.org> Message-ID: Hey, In the last weeks I tested `np.asarray(np.bmat(....))` as `stack` function and it works quite well. So the question persits: If `bmat` already offers something like `stack` should we even bother implementing `stack`? More code leads to more bugs and maintenance work. (However, the current implementation is only 5 lines and by using `bmat` which would reduce that even more.) Best, Stefan On Fri, Sep 19, 2014 at 4:47 PM, Paul Hobson wrote: > Hey Ben, > > Side note: I've had to do the same thing for stitching curvilinear model > grid coordinates together. Usings pandas DataFrames indexed by `i` and `j` > is really good for this. You can offset the indices directly, unstack the > DF, and the pandas will align for you. > > Happy to send an example along if you're curious. > -p > > On Mon, Sep 8, 2014 at 9:55 AM, Benjamin Root wrote: >> >> A use case would be "image stitching" or even data tiling. I have had to >> implement something like this at work (so, I can't share it, unfortunately) >> and it even goes so far as to allow the caller to specify how much the tiles >> can overlap and such. The specification is ungodly hideous and I doubt I >> would be willing to share it even if I could lest I release code-thulu upon >> the world... >> >> I think just having this generalize stack feature would be nice start. >> Tetris could be built on top of that later. (Although, I do vote for at >> least 3 or 4 dimensional stacking, if possible). >> >> Cheers! >> Ben Root >> >> >> On Mon, Sep 8, 2014 at 12:41 PM, Eelco Hoogendoorn >> wrote: >>> >>> Sturla: im not sure if the intention is always unambiguous, for such more >>> flexible arrangements. >>> >>> Also, I doubt such situations arise often in practice; if the arrays arnt >>> a grid, they are probably a nested grid, and the code would most naturally >>> concatenate them with nested calls to a stacking function. >>> >>> However, some form of nd-stack function would be neat in my opinion. >>> >>> On Mon, Sep 8, 2014 at 6:10 PM, Jaime Fern?ndez del R?o >>> wrote: >>>> >>>> On Mon, Sep 8, 2014 at 7:41 AM, Sturla Molden >>>> wrote: >>>>> >>>>> Stefan Otte wrote: >>>>> >>>>> > stack([[a, b], [c, d]]) >>>>> > >>>>> > In my case `stack` replaced `hstack` and `vstack` almost completely. >>>>> > >>>>> > If you're interested in including it in numpy I created a pull >>>>> > request >>>>> > [1]. I'm looking forward to getting some feedback! >>>>> >>>>> As far as I can see, it uses hstack and vstack. But that means a and b >>>>> have >>>>> to have the same number of rows, c and d must have the same rumber of >>>>> rows, >>>>> and hstack((a,b)) and hstack((c,d)) must have the same number of >>>>> columns. >>>>> >>>>> Thus it requires a regularity like this: >>>>> >>>>> AAAABB >>>>> AAAABB >>>>> CCCDDD >>>>> CCCDDD >>>>> CCCDDD >>>>> CCCDDD >>>>> >>>>> What if we just ignore this constraint, and only require the output to >>>>> be >>>>> rectangular? Now we have a 'tetris game': >>>>> >>>>> AAAABB >>>>> AAAABB >>>>> CCCCBB >>>>> CCCCBB >>>>> CCCCDD >>>>> CCCCDD >>>>> >>>>> or >>>>> >>>>> AAAABB >>>>> AAAABB >>>>> CCCCBB >>>>> CCCCBB >>>>> CCCCBB >>>>> CCCCBB >>>>> >>>>> This should be 'stackable', yes? Or perhaps we need another stacking >>>>> function for this, say numpy.tetris? >>>>> >>>>> And while we're at it, what about higher dimensions? should there be an >>>>> ndstack function too? >>>> >>>> >>>> This is starting to look like the second time in a row Stefan tries to >>>> extend numpy with a simple convenience function, and he gets tricked into >>>> implementing some sophisticated algorithm... >>>> >>>> For his next PR I expect nothing less than an NP-complete problem. ;-) >>>> >>>> >>>> Jaime >>>> >>>> -- >>>> (\__/) >>>> ( O.o) >>>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus >>>> planes de dominaci?n mundial. >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From fperez.net at gmail.com Tue Oct 28 14:34:11 2014 From: fperez.net at gmail.com (Fernando Perez) Date: Tue, 28 Oct 2014 11:34:11 -0700 Subject: [Numpy-discussion] [ANN] Python for Scientific Computing conference in Boulder, CO; April'15 In-Reply-To: References: Message-ID: thanks, reported. On Tue, Oct 28, 2014 at 11:23 AM, Jean-Baptiste Marquette wrote: > > Le 28 oct. 2014 ? 19:09, Fernando Perez a ?crit : > > a colleague from NCAR in Boulder just sent me this link about a conference > they are organizing in the spring: > > > Wrong year on the web page: *April 13 - 17, 2014* > > Cheers, > JB > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Fernando Perez (@fperez_org; http://fperez.org) fperez.net-at-gmail: mailing lists only (I ignore this when swamped!) fernando.perez-at-berkeley: contact me here for any direct mail -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Oct 28 14:46:24 2014 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 28 Oct 2014 18:46:24 +0000 Subject: [Numpy-discussion] Generalize hstack/vstack --> stack; Block matrices like in matlab In-Reply-To: References: <101656916431878296.890307sturla.molden-gmail.com@news.gmane.org> Message-ID: On 28 Oct 2014 18:34, "Stefan Otte" wrote: > > Hey, > > In the last weeks I tested `np.asarray(np.bmat(....))` as `stack` > function and it works quite well. So the question persits: If `bmat` > already offers something like `stack` should we even bother > implementing `stack`? More code leads to more > bugs and maintenance work. (However, the current implementation is > only 5 lines and by using `bmat` which would reduce that even more.) In the long run we're trying to reduce usage of np.matrix and ideally deprecate it entirely. So yes, providing ndarray equivalents of matrix functionality (like bmat) is valuable. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From dineshbvadhia at hotmail.com Tue Oct 28 16:07:46 2014 From: dineshbvadhia at hotmail.com (Din Vadhia) Date: Tue, 28 Oct 2014 13:07:46 -0700 Subject: [Numpy-discussion] Why ndarray provides four ways to flatten? In-Reply-To: References: Message-ID: It would be nice if there were a single meta numpy.flatten() function with kind: {'ravel', 'flatten', 'flat', 'reshape'} options, similar to the numpy.sort() function kind : {?quicksort?, ?mergesort?, ?heapsort?} options. It would also make it easier to select the best option for each problem need by reading the doc in one place. -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Oct 28 16:09:09 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 28 Oct 2014 13:09:09 -0700 Subject: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt In-Reply-To: <373C0BFC-C4B9-4DD6-A593-F5A657A5A595@astro.physik.uni-goettingen.de> References: <5886D172-130C-4975-BFF3-1D7A2920380F@gmail.com> <373C0BFC-C4B9-4DD6-A593-F5A657A5A595@astro.physik.uni-goettingen.de> Message-ID: A few thoughts: 1) yes, a faster, more memory efficient text file parser would be great. Yeah, if your workflow relies on parsing lots of huge text files, you probably need another workflow. But it's a really really common thing to nee to do -- why not do it fast? 2) """you are describing a special case where you know the data size apriori (eg not streaming), dtypes are readily apparent from a small sample case and in general your data is not messy """ sure -- that's a special case, but it's a really common special case (OK -- without the know your data size ,anyway...) 3) > Someone also posted some code or the draft thereof for using resizable > arrays quite a while ago, which would > reduce the memory overhead for very large arrays. > That may have been me -- I have a resizable array class, both pure python and not-quite finished Cython version. In practice, if you add stuff to the array row by row (or item by item), it's no faster than putting it all in a list and then converting to an array -- but it IS more memory efficient, which seems to be the issue here. Let me know if you want it -- I really need to get it up on gitHub one of these days. My take: for fast parsing of big files you need: To do the parsing/converting in C -- what wrong with good old fscanf, at least for the basic types -- it's pretty darn fast. Memory efficiency -- somethign like my growable array is not all that hard to implement and pretty darn quick -- you just do the usual trick_ over allocate a bit of memory, and when it gets full re-allocate a larger chunk. It turns out, at least on the hardware I tested on, that the performance is not very sensitive to how much you over allocate -- if it's tiny (1 element) performance really sucks, but once you get to a 10% or so (maybe less) over-allocation, you don't notice the difference. Keep the auto-figuring out of the structure / dtypes separate from the high speed parsing code. I"d say write high speed parsing code first -- that requires specification of the data types and structure, then, if you want, write some nice pure python code that tries to auto-detect all that. If it's a small file, it's fast regardless. if it's a large file, then the overhead of teh fancy parsing will be lost, and you'll want the line by line parsing to be as fast as possible. >From a quick loo, it seems that the Panda's code is pretty nice -- maybe the 2X memory footprint should be ignored. -Chris > Cheers, > Derek > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Oct 28 16:24:28 2014 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 28 Oct 2014 20:24:28 +0000 Subject: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt In-Reply-To: References: <5886D172-130C-4975-BFF3-1D7A2920380F@gmail.com> <373C0BFC-C4B9-4DD6-A593-F5A657A5A595@astro.physik.uni-goettingen.de> Message-ID: On 28 Oct 2014 20:10, "Chris Barker" wrote: > > Memory efficiency -- somethign like my growable array is not all that hard to implement and pretty darn quick -- you just do the usual trick_ over allocate a bit of memory, and when it gets full re-allocate a larger chunk. Can't you just do this with regular numpy using .resize()? What does your special class add? (Just curious.) > From a quick loo, it seems that the Panda's code is pretty nice -- maybe the 2X memory footprint should be ignored. +1 It's fun to sit around and brainstorm clever implementation strategies, but Wes already went ahead and implemented all the tricky bits, and optimized them too. No point in reinventing the wheel. (Plus as I pointed out upthread, it's entirely likely that this "2x overhead" is based on a misunderstanding/oversimplification of how virtual memory works, and the actual practical overhead is much lower.) -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Tue Oct 28 16:25:52 2014 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 28 Oct 2014 16:25:52 -0400 Subject: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt In-Reply-To: References: <5886D172-130C-4975-BFF3-1D7A2920380F@gmail.com> <373C0BFC-C4B9-4DD6-A593-F5A657A5A595@astro.physik.uni-goettingen.de> Message-ID: As a bit of an aside, I have just discovered that for fixed-width text data, numpy's text readers seems to edge out pandas' read_fwf(), and numpy has the advantage of being able to specify the dtypes ahead of time (seems that the pandas version just won't allow it, which means I end up with float64's and object dtypes instead of float32's and |S12 dtypes where I want them). Cheers! Ben Root On Tue, Oct 28, 2014 at 4:09 PM, Chris Barker wrote: > A few thoughts: > > 1) yes, a faster, more memory efficient text file parser would be great. > Yeah, if your workflow relies on parsing lots of huge text files, you > probably need another workflow. But it's a really really common thing to > nee to do -- why not do it fast? > > 2) """you are describing a special case where you know the data size > apriori (eg not streaming), dtypes are readily apparent from a small sample > case > and in general your data is not messy """ > > sure -- that's a special case, but it's a really common special case (OK > -- without the know your data size ,anyway...) > > 3) > >> Someone also posted some code or the draft thereof for using resizable >> arrays quite a while ago, which would >> reduce the memory overhead for very large arrays. >> > > That may have been me -- I have a resizable array class, both pure python > and not-quite finished Cython version. In practice, if you add stuff to the > array row by row (or item by item), it's no faster than putting it all in a > list and then converting to an array -- but it IS more memory efficient, > which seems to be the issue here. Let me know if you want it -- I really > need to get it up on gitHub one of these days. > > My take: for fast parsing of big files you need: > > To do the parsing/converting in C -- what wrong with good old fscanf, at > least for the basic types -- it's pretty darn fast. > > Memory efficiency -- somethign like my growable array is not all that hard > to implement and pretty darn quick -- you just do the usual trick_ over > allocate a bit of memory, and when it gets full re-allocate a larger chunk. > It turns out, at least on the hardware I tested on, that the performance is > not very sensitive to how much you over allocate -- if it's tiny (1 > element) performance really sucks, but once you get to a 10% or so (maybe > less) over-allocation, you don't notice the difference. > > Keep the auto-figuring out of the structure / dtypes separate from the > high speed parsing code. I"d say write high speed parsing code first -- > that requires specification of the data types and structure, then, if you > want, write some nice pure python code that tries to auto-detect all that. > If it's a small file, it's fast regardless. if it's a large file, then the > overhead of teh fancy parsing will be lost, and you'll want the line by > line parsing to be as fast as possible. > > From a quick loo, it seems that the Panda's code is pretty nice -- maybe > the 2X memory footprint should be ignored. > > -Chris > > > > > > > > > > >> Cheers, >> Derek >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Tue Oct 28 16:30:40 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 28 Oct 2014 21:30:40 +0100 Subject: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt In-Reply-To: References: <5886D172-130C-4975-BFF3-1D7A2920380F@gmail.com> <373C0BFC-C4B9-4DD6-A593-F5A657A5A595@astro.physik.uni-goettingen.de> Message-ID: <544FFCF0.2060609@googlemail.com> On 28.10.2014 21:24, Nathaniel Smith wrote: > On 28 Oct 2014 20:10, "Chris Barker" > wrote: >> >> Memory efficiency -- somethign like my growable array is not all that > hard to implement and pretty darn quick -- you just do the usual trick_ > over allocate a bit of memory, and when it gets full re-allocate a > larger chunk. > > Can't you just do this with regular numpy using .resize()? What does > your special class add? (Just curious.) > >> From a quick loo, it seems that the Panda's code is pretty nice -- > maybe the 2X memory footprint should be ignored. > > +1 > > It's fun to sit around and brainstorm clever implementation strategies, > but Wes already went ahead and implemented all the tricky bits, and > optimized them too. No point in reinventing the wheel. > just to through it in there, astropy recently also added a faster ascii file reader: https://groups.google.com/forum/#!topic/astropy-dev/biCgb3cF0v0 not familiar with how it compares to pandas. how is pandas support for unicode text files? unicode is the big weak point of numpys current text readers and needs to addressed. From chris.barker at noaa.gov Tue Oct 28 17:41:44 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 28 Oct 2014 14:41:44 -0700 Subject: [Numpy-discussion] Memory efficient alternative for np.loadtxt and np.genfromtxt In-Reply-To: References: <5886D172-130C-4975-BFF3-1D7A2920380F@gmail.com> <373C0BFC-C4B9-4DD6-A593-F5A657A5A595@astro.physik.uni-goettingen.de> Message-ID: On Tue, Oct 28, 2014 at 1:24 PM, Nathaniel Smith wrote: > > Memory efficiency -- somethign like my growable array is not all that > hard to implement and pretty darn quick -- you just do the usual trick_ > over allocate a bit of memory, and when it gets full re-allocate a larger > chunk. > > Can't you just do this with regular numpy using .resize()? What does your > special class add? (Just curious.) > it used resize under the hood -- it just adds the bookeeping for the over allocation, etc, and lets you access teh data as though it wasn't over-allocated like I said, not that difficult. I haven't touched it for a while, but it you are curious I just threw it up on gitHub: https://github.com/PythonCHB/NumpyExtras you want accumulator.py -- there is also a cython version that I didn't quite finish...it theory, it should be a be faster in some cases by reducing the need to round-trip between numpy and python data types... in practice, I don't think I got it to a point where I could do real-world profiling. It's fun to sit around and brainstorm clever implementation strategies, but > Wes already went ahead and implemented all the tricky bits, and optimized > them too. No point in reinventing the wheel. > > (Plus as I pointed out upthread, it's entirely likely that this "2x > overhead" is based on a misunderstanding/oversimplification of how virtual > memory works, and the actual practical overhead is much lower.) > good point. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Tue Oct 28 18:15:22 2014 From: stefan at sun.ac.za (Stefan van der Walt) Date: Wed, 29 Oct 2014 00:15:22 +0200 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: <544FD8AD.2000600@grinta.net> References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> <544FD44D.1020406@grinta.net> <87egtsay2o.fsf@sun.ac.za> <544FD8AD.2000600@grinta.net> Message-ID: <871tprc03p.fsf@sun.ac.za> On 2014-10-28 19:55:57, Daniele Nicolodi wrote: > On 28/10/14 18:44, Stefan van der Walt wrote: >> On 2014-10-28 19:37:17, Daniele Nicolodi wrote: >>> On 28/10/14 16:50, David Cournapeau wrote: >>>> Nothing impossible (looks like Sony at least uses this code on windows: >>>> https://github.com/anthonix/ffts/issues/27#issuecomment-40204403), but >>>> not a 2 hours thing either. >>> >>> One of the downsides of the BSD license :) >> >> Perhaps one of the upsides, as they may be willing to contribute back if >> asked nicely. > > If it would be GPL or similar the would have to, and there would not be > need to ask nicely. But then they would not have written the code to start off with, so that point is moot. St?fan From ndarray at mac.com Tue Oct 28 20:37:59 2014 From: ndarray at mac.com (Alexander Belopolsky) Date: Tue, 28 Oct 2014 20:37:59 -0400 Subject: [Numpy-discussion] Why ndarray provides four ways to flatten? In-Reply-To: References: Message-ID: On Tue, Oct 28, 2014 at 1:42 PM, Stephan Hoyer wrote: > .flat lets you iterate over all elements of a N-dimensional array as if it > was 1D, without ever needing to make a copy of the array. In contrast, > ravel() and reshape(-1) cannot always avoid a copy, because they need to > return another ndarray. In some cases ravel() returns a copy where a view can be easily constructed. For example, >>> x = np.arange(10) >>> y = x[::2] >>> y.ravel().flags['OWNDATA'] True Interestingly, in the same case reshape(-1) returns a view: >>> y.reshape(-1).flags['OWNDATA'] False (This suggests at least a documentation bug - numpy.ravel documentation says that it is equivalent to reshape(-1).) It is only in situations like this >>> a = np.arange(16).reshape((4,4)) >>> a[1::2,1::2].ravel() array([ 5, 7, 13, 15]) where flat view cannot be an ndarray, but .flat can still return something that is at least duck-typing compatible with ndarray (if not an ndarray subclass) and behaves as a view into original data. My preferred design would be for x.flat to return a flat view into x. This would be consistent with the way .T and .real attributes are defined and close enough to .imag. An obvious way to obtain a flat copy would be x.flat.copy(). Once we have this, ravel() and flatten() can be deprecated and reshape(-1) discouraged. I think this would be backward compatible except for rather questionable situations like this: >>> i = x.flat >>> list(i) [0, 1, 2, 3, 4, 0, 6, 7, 8, 9] >>> list(i) [] >>> np.array(i) array([0, 1, 2, 3, 4, 0, 6, 7, 8, 9]) -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Oct 28 21:23:23 2014 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 29 Oct 2014 01:23:23 +0000 Subject: [Numpy-discussion] Why ndarray provides four ways to flatten? In-Reply-To: References: Message-ID: On Wed, Oct 29, 2014 at 12:37 AM, Alexander Belopolsky wrote: > > On Tue, Oct 28, 2014 at 1:42 PM, Stephan Hoyer wrote: >> >> .flat lets you iterate over all elements of a N-dimensional array as if it >> was 1D, without ever needing to make a copy of the array. In contrast, >> ravel() and reshape(-1) cannot always avoid a copy, because they need to >> return another ndarray. > > > In some cases ravel() returns a copy where a view can be easily constructed. > For example, > >>>> x = np.arange(10) >>>> y = x[::2] >>>> y.ravel().flags['OWNDATA'] > True > > Interestingly, in the same case reshape(-1) returns a view: > >>>> y.reshape(-1).flags['OWNDATA'] > False > > (This suggests at least a documentation bug - numpy.ravel documentation says > that it is equivalent to reshape(-1).) Well, that's disturbing. Why have one implementation when you can have three... > It is only in situations like this > >>>> a = np.arange(16).reshape((4,4)) >>>> a[1::2,1::2].ravel() > array([ 5, 7, 13, 15]) > > where flat view cannot be an ndarray, but .flat can still return something > that is at least duck-typing compatible with ndarray (if not an ndarray > subclass) and behaves as a view into original data. > > My preferred design would be for x.flat to return a flat view into x. This > would be consistent with the way .T and .real attributes are defined and > close enough to .imag. .flat cannot return a flat view analogous to .T, .real, .imag, because those attributes return ndarray views, and .flat can't guarantee that. OTOH trying to make .flat into a full duck-compatible ndarray-like type is a non-starter; it would take a tremendous amount of work for no clear gain. Counter-proposal: document that .flat is only for iteration and should be avoided otherwise, and add a copy = {True, False, "if-needed"} kwarg to flatten/ravel/reshape. And the only difference between ravel and flatten is the default value of this argument. (And while we're at it, make it so that their implementation is literally to just call .reshape.) -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From ndarray at mac.com Tue Oct 28 21:46:50 2014 From: ndarray at mac.com (Alexander Belopolsky) Date: Tue, 28 Oct 2014 21:46:50 -0400 Subject: [Numpy-discussion] Why ndarray provides four ways to flatten? In-Reply-To: References: Message-ID: On Tue, Oct 28, 2014 at 9:23 PM, Nathaniel Smith wrote: > OTOH trying to make .flat into a full duck-compatible ndarray-like > type is a non-starter; it would take a tremendous amount of work for > no clear gain. > I don't think so - I think all the heavy lifting is already done in flatiter. The missing parts are mostly trivial things like .size or .shape or can be fudged by coercing to true ndarray using existing flatiter.__array__ method. It would be more interesting however if we could always return a true ndarray view. How is ndarray.diagonal() view implemented in 1.9? Can something similar be used to create a flat view? -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Oct 28 22:11:49 2014 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 29 Oct 2014 02:11:49 +0000 Subject: [Numpy-discussion] Why ndarray provides four ways to flatten? In-Reply-To: References: Message-ID: On 29 Oct 2014 01:47, "Alexander Belopolsky" wrote: > > > On Tue, Oct 28, 2014 at 9:23 PM, Nathaniel Smith wrote: >> >> OTOH trying to make .flat into a full duck-compatible ndarray-like >> type is a non-starter; it would take a tremendous amount of work for >> no clear gain. > > > I don't think so - I think all the heavy lifting is already done in flatiter. The missing parts are mostly trivial things like .size or .shape or can be fudged by coercing to true ndarray using existing flatiter.__array__ method. Now try .resize()... The full ndarray api is vast, and niggling problems would create endless maintenance issues. If your api is going to be that leaky then it's better not to have it at all. > It would be more interesting however if we could always return a true ndarray view. How is ndarray.diagonal() view implemented in 1.9? Can something similar be used to create a flat view? .diagonal has no magic, it just turns out that the diagonal of any strided array is also expressible as a strided array. (Specifically, new_strides = (sum(old_strides),).) There is no analogous theorem for flattening. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Oct 29 00:30:19 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 28 Oct 2014 22:30:19 -0600 Subject: [Numpy-discussion] Deprecate pkgload, PackageLoader Message-ID: Hi All, It is proposed to deprecate, then remove, pkgload and PackageLoader. Complaints? Cries of Anguish? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Wed Oct 29 05:11:04 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 29 Oct 2014 10:11:04 +0100 Subject: [Numpy-discussion] Why ndarray provides four ways to flatten? In-Reply-To: References: Message-ID: <1414573864.3903.4.camel@sebastian-t440> On Di, 2014-10-28 at 20:37 -0400, Alexander Belopolsky wrote: > > On Tue, Oct 28, 2014 at 1:42 PM, Stephan Hoyer > wrote: > .flat lets you iterate over all elements of a N-dimensional > array as if it was 1D, without ever needing to make a copy of > the array. In contrast, ravel() and reshape(-1) cannot always > avoid a copy, because they need to return another ndarray. > > > In some cases ravel() returns a copy where a view can be easily > constructed. For example, > Yeah, but we just changed that for 1.10, not ravel can even get further if you use order='K'. - Sebastian > > >>> x = np.arange(10) > >>> y = x[::2] > >>> y.ravel().flags['OWNDATA'] > True > > > Interestingly, in the same case reshape(-1) returns a view: > > > >>> y.reshape(-1).flags['OWNDATA'] > False > > > (This suggests at least a documentation bug - numpy.ravel > documentation says that it is equivalent to reshape(-1).) > > > It is only in situations like this > > > >>> a = np.arange(16).reshape((4,4)) > >>> a[1::2,1::2].ravel() > array([ 5, 7, 13, 15]) > > > where flat view cannot be an ndarray, but .flat can still return > something that is at least duck-typing compatible with ndarray (if not > an ndarray subclass) and behaves as a view into original data. > > > My preferred design would be for x.flat to return a flat view into x. > This would be consistent with the way .T and .real attributes are > defined and close enough to .imag. An obvious way to obtain a flat > copy would be x.flat.copy(). Once we have this, ravel() and flatten() > can be deprecated and reshape(-1) discouraged. > > > I think this would be backward compatible except for rather > questionable situations like this: > > > >>> i = x.flat > >>> list(i) > [0, 1, 2, 3, 4, 0, 6, 7, 8, 9] > >>> list(i) > [] > >>> np.array(i) > array([0, 1, 2, 3, 4, 0, 6, 7, 8, 9]) > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Wed Oct 29 05:16:51 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 29 Oct 2014 10:16:51 +0100 Subject: [Numpy-discussion] Why ndarray provides four ways to flatten? In-Reply-To: <544FDA76.3000508@gmail.com> References: <544FDA76.3000508@gmail.com> Message-ID: <1414574211.3903.9.camel@sebastian-t440> On Di, 2014-10-28 at 14:03 -0400, Alan G Isaac wrote: > On 10/28/2014 1:42 PM, Stephan Hoyer wrote: > > np.nditer is a reasonable alternative to .flat (and it's documented as such), but it's a rather inelegant, kitchen-sink type function. > > > I'm not sure what "reasonable" means here, > other than "in principle, possible to use". > > In particular, `flat` is much more elegant, > and includes an automatic guarantee that the > iterations will be in C-contiguous style. > I don't really like flat (it is a pretty old part of numpy), but I agree, while you can force nditer to be C-contiguous, nditer has its own problems and is also pretty complex. I would argue that nditer it is a "you know what you are doing" type of function... Or of course if you want to understand the C-Api. Unless you keep in mind how buffers are copied around inside it, using half its features is dangerous. - Sebastian > Alan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From andyfaff at gmail.com Wed Oct 29 05:39:14 2014 From: andyfaff at gmail.com (Andrew Nelson) Date: Wed, 29 Oct 2014 20:39:14 +1100 Subject: [Numpy-discussion] help using np.einsum for stacked matrix multiplication Message-ID: Dear list, I have a 4D array, A, that has the shape (NX, NY, 2, 2). I wish to perform matrix multiplication of the 'NY' 2x2 matrices, resulting in the matrix B. B would have shape (NX, 2, 2). I believe that np.einsum would be up to the task, but I'm not quite sure of the subscripts I would need to achieve this. Can anyone help, please? cheers, Andrew. -- _____________________________________ Dr. Andrew Nelson _____________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Wed Oct 29 05:48:05 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Wed, 29 Oct 2014 10:48:05 +0100 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: <1072140078436209472.929185sturla.molden-gmail.com@news.gmane.org> References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> <1865975564436183038.983522sturla.molden-gmail.com@news.gmane.org> <1072140078436209472.929185sturla.molden-gmail.com@news.gmane.org> Message-ID: My point isn't about speed; its about the scope of numpy. typing np.fft.fft isn't more or less convenient than using some other symbol from the scientific python stack. Numerical algorithms should be part of the stack, for sure; but should they be part of numpy? I think its cleaner to have them in a separate package. Id rather have us discuss how to facilitate the integration of as many possible fft libraries with numpy behind a maximally uniform interface, rather than having us debate which fft library is 'best'. On Tue, Oct 28, 2014 at 6:21 PM, Sturla Molden wrote: > Eelco Hoogendoorn wrote: > > > Perhaps the 'batteries included' philosophy made sense in the early days > of > > numpy; but given that there are several fft libraries with their own pros > > and cons, and that most numpy projects will use none of them at all, why > > should numpy bundle any of them? > > Because sometimes we just need to compute a DFT, just like we sometimes > need to compute a sine or an exponential. It does that job perfectly well. > It is not always about speed. Just typing np.fft.fft(x) is convinient. > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dave.hirschfeld at gmail.com Wed Oct 29 06:18:55 2014 From: dave.hirschfeld at gmail.com (Dave Hirschfeld) Date: Wed, 29 Oct 2014 10:18:55 +0000 (UTC) Subject: [Numpy-discussion] =?utf-8?q?help_using_np=2Eeinsum_for_stacked_m?= =?utf-8?q?atrix=09multiplication?= References: Message-ID: Andrew Nelson writes: > > Dear list,I have a 4D array, A, that has the shape (NX, NY, 2, 2).? I wish to perform matrix multiplication of the 'NY' 2x2 matrices, resulting in the matrix B.? B would have shape (NX, 2, 2).? I believe that np.einsum would be up to the task, but I'm not quite sure of the subscripts I would need to achieve this. > > Can anyone help, please? > > cheers, > Andrew. Sorry, I'm a little unclear on what is supposed to be multiplied with what. You've got (NX x NY) 2x2 matricies, how do you end up with NX 2x2 matricies? Perhaps if you show the code using loops with `np.dot` it will be clearer how to translate to using `np.einsum` -Dave From hoogendoorn.eelco at gmail.com Wed Oct 29 06:19:17 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Wed, 29 Oct 2014 11:19:17 +0100 Subject: [Numpy-discussion] help using np.einsum for stacked matrix multiplication In-Reply-To: References: Message-ID: You need to specify your input format. Also, if your output matrix misses the NY dimension, that implies you wish to contract (sum) over it, which contradicts your statement that the 2x2 subblocks form the matrices to multiply with. In general, I think it would help if you give a little more background on your problem. On Wed, Oct 29, 2014 at 10:39 AM, Andrew Nelson wrote: > Dear list, > I have a 4D array, A, that has the shape (NX, NY, 2, 2). I wish to > perform matrix multiplication of the 'NY' 2x2 matrices, resulting in the > matrix B. B would have shape (NX, 2, 2). I believe that np.einsum would > be up to the task, but I'm not quite sure of the subscripts I would need to > achieve this. > > Can anyone help, please? > > cheers, > Andrew. > > -- > _____________________________________ > Dr. Andrew Nelson > > > _____________________________________ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidmenhur at gmail.com Wed Oct 29 06:24:57 2014 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Wed, 29 Oct 2014 11:24:57 +0100 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> <1865975564436183038.983522sturla.molden-gmail.com@news.gmane.org> <1072140078436209472.929185sturla.molden-gmail.com@news.gmane.org> Message-ID: On 29 October 2014 10:48, Eelco Hoogendoorn wrote: > My point isn't about speed; its about the scope of numpy. typing np.fft.fft > isn't more or less convenient than using some other symbol from the > scientific python stack. The problem is in distribution. For many users, installing a new library is not easy (computing cluster, company regulations...). And this assuming the alternative library is held to the same quality standards as Numpy. Another argument is that this should only be living in Scipy, that is, after all, quite standard; but it requires a FORTRAN compiler, that is quite a big dependency. /David. From cournape at gmail.com Wed Oct 29 06:33:08 2014 From: cournape at gmail.com (David Cournapeau) Date: Wed, 29 Oct 2014 10:33:08 +0000 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> <1865975564436183038.983522sturla.molden-gmail.com@news.gmane.org> <1072140078436209472.929185sturla.molden-gmail.com@news.gmane.org> Message-ID: On Wed, Oct 29, 2014 at 9:48 AM, Eelco Hoogendoorn < hoogendoorn.eelco at gmail.com> wrote: > My point isn't about speed; its about the scope of numpy. typing > np.fft.fft isn't more or less convenient than using some other symbol from > the scientific python stack. > > Numerical algorithms should be part of the stack, for sure; but should > they be part of numpy? I think its cleaner to have them in a separate > package. Id rather have us discuss how to facilitate the integration of as > many possible fft libraries with numpy behind a maximally uniform > interface, rather than having us debate which fft library is 'best'. > I would agree if it were not already there, but removing it (like Blas/Lapack) is out of the question for backward compatibility reason. Too much code depends on it. David > > On Tue, Oct 28, 2014 at 6:21 PM, Sturla Molden > wrote: > >> Eelco Hoogendoorn wrote: >> >> > Perhaps the 'batteries included' philosophy made sense in the early >> days of >> > numpy; but given that there are several fft libraries with their own >> pros >> > and cons, and that most numpy projects will use none of them at all, why >> > should numpy bundle any of them? >> >> Because sometimes we just need to compute a DFT, just like we sometimes >> need to compute a sine or an exponential. It does that job perfectly well. >> It is not always about speed. Just typing np.fft.fft(x) is convinient. >> >> Sturla >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Oct 29 07:12:16 2014 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 29 Oct 2014 11:12:16 +0000 Subject: [Numpy-discussion] Deprecate pkgload, PackageLoader In-Reply-To: References: Message-ID: On Wed, Oct 29, 2014 at 4:30 AM, Charles R Harris wrote: > Hi All, > > It is proposed to deprecate, then remove, pkgload and PackageLoader. > > Complaints? Cries of Anguish? Tears of joy. -- Robert Kern From sebastian at sipsolutions.net Wed Oct 29 08:05:52 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 29 Oct 2014 13:05:52 +0100 Subject: [Numpy-discussion] help using np.einsum for stacked matrix multiplication In-Reply-To: References: Message-ID: <1414584352.4342.3.camel@sebastian-t440> On Mi, 2014-10-29 at 20:39 +1100, Andrew Nelson wrote: > Dear list, > I have a 4D array, A, that has the shape (NX, NY, 2, 2). I wish to > perform matrix multiplication of the 'NY' 2x2 matrices, resulting in > the matrix B. B would have shape (NX, 2, 2). I believe that > np.einsum would be up to the task, but I'm not quite sure of the > subscripts I would need to achieve this. > Just remember that you sum over the columns of the first matrix and the rows of the second so those share the index: np.einsum('...ix, ...xj->...', a, b) in the future, the np.dot predecessor (whatever it is) or the @ operator should be better at it though. - Sebastian > Can anyone help, please? > > > cheers, > Andrew. > > > -- > _____________________________________ > Dr. Andrew Nelson > > > _____________________________________ > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Wed Oct 29 08:09:38 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 29 Oct 2014 13:09:38 +0100 Subject: [Numpy-discussion] help using np.einsum for stacked matrix multiplication In-Reply-To: <1414584352.4342.3.camel@sebastian-t440> References: <1414584352.4342.3.camel@sebastian-t440> Message-ID: <1414584578.5904.1.camel@sebastian-t440> On Mi, 2014-10-29 at 13:05 +0100, Sebastian Berg wrote: > On Mi, 2014-10-29 at 20:39 +1100, Andrew Nelson wrote: > > Dear list, > > I have a 4D array, A, that has the shape (NX, NY, 2, 2). I wish to > > perform matrix multiplication of the 'NY' 2x2 matrices, resulting in > > the matrix B. B would have shape (NX, 2, 2). I believe that > > np.einsum would be up to the task, but I'm not quite sure of the > > subscripts I would need to achieve this. > > > > Just remember that you sum over the columns of the first matrix and the > rows of the second so those share the index: > > np.einsum('...ix, ...xj->...', a, b) > Nevermind, didn't read carefully. This is not possible since the reduction operation is a sum, you will have to do at least two operations. > in the future, the np.dot predecessor (whatever it is) or the @ operator > should be better at it though. > > - Sebastian > > > Can anyone help, please? > > > > > > cheers, > > Andrew. > > > > > > -- > > _____________________________________ > > Dr. Andrew Nelson > > > > > > _____________________________________ > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From ben.root at ou.edu Wed Oct 29 09:39:42 2014 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 29 Oct 2014 09:39:42 -0400 Subject: [Numpy-discussion] Deprecate pkgload, PackageLoader In-Reply-To: References: Message-ID: /me looks at pydoc numpy.pkgload What in the world?! On Wed, Oct 29, 2014 at 7:12 AM, Robert Kern wrote: > On Wed, Oct 29, 2014 at 4:30 AM, Charles R Harris > wrote: > > Hi All, > > > > It is proposed to deprecate, then remove, pkgload and PackageLoader. > > > > Complaints? Cries of Anguish? > > Tears of joy. > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan.otte at gmail.com Wed Oct 29 10:59:30 2014 From: stefan.otte at gmail.com (Stefan Otte) Date: Wed, 29 Oct 2014 15:59:30 +0100 Subject: [Numpy-discussion] Generalize hstack/vstack --> stack; Block matrices like in matlab In-Reply-To: References: <101656916431878296.890307sturla.molden-gmail.com@news.gmane.org> Message-ID: Hey, there are several ways how to proceed. - My proposed solution covers the 80% case quite well (at least I use it all the time). I'd convert the doctests into unittests and we're done. - We could slightly change the interface to leave out the surrounding square brackets, i.e. turning `stack([[a, b], [c, d]])` into `stack([a, b], [c, d])` - We could extend it even further allowing a "filler value" for non set values and a "shape" argument. This could be done later as well. - `bmat` is not really matrix specific. We could refactor `bmat` a bit to use the same logic in `stack`. Except the `matrix` calls `bmat` and `_from_string` are pretty agnostic to the input. I'm in favor of the first or last approach. The first: because it already works and is quite simple. The last: because the logic and tests of both `bmat` and `stack` would be the same and the feature to specify a string representation of the block matrix is nice. Best, Stefan On Tue, Oct 28, 2014 at 7:46 PM, Nathaniel Smith wrote: > On 28 Oct 2014 18:34, "Stefan Otte" wrote: >> >> Hey, >> >> In the last weeks I tested `np.asarray(np.bmat(....))` as `stack` >> function and it works quite well. So the question persits: If `bmat` >> already offers something like `stack` should we even bother >> implementing `stack`? More code leads to more >> bugs and maintenance work. (However, the current implementation is >> only 5 lines and by using `bmat` which would reduce that even more.) > > In the long run we're trying to reduce usage of np.matrix and ideally > deprecate it entirely. So yes, providing ndarray equivalents of matrix > functionality (like bmat) is valuable. > > -n > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From noel.pierre.andre at gmail.com Wed Oct 29 13:03:20 2014 From: noel.pierre.andre at gmail.com (Pierre-Andre Noel) Date: Wed, 29 Oct 2014 10:03:20 -0700 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> <1865975564436183038.983522sturla.molden-gmail.com@news.gmane.org> <1072140078436209472.929185sturla.molden-gmail.com@news.gmane.org> Message-ID: <54511DD8.5020806@gmail.com> >> Id rather have us discuss how to facilitate the integration of as many possible fft libraries with numpy behind a maximally uniform interface, rather than having us debate which fft library is 'best'. I agree with the above. > I would agree if it were not already there, but removing it (like Blas/Lapack) is out of the question for backward compatibility reason. Too much code depends on it. And I definitely agree with that too. I think that numpy.fft should be left there in its current state (although perhaps as deprecated). Now scipy.fft should have a good generic algorithm as default, and easily allow for different implementations to be accessed through the same interface. Pierre-Andr? On 10/29/2014 03:33 AM, David Cournapeau wrote: > > > On Wed, Oct 29, 2014 at 9:48 AM, Eelco Hoogendoorn > > wrote: > > My point isn't about speed; its about the scope of numpy. typing > np.fft.fft isn't more or less convenient than using some other > symbol from the scientific python stack. > > Numerical algorithms should be part of the stack, for sure; but > should they be part of numpy? I think its cleaner to have them in > a separate package. Id rather have us discuss how to facilitate > the integration of as many possible fft libraries with numpy > behind a maximally uniform interface, rather than having us debate > which fft library is 'best'. > > > I would agree if it were not already there, but removing it (like > Blas/Lapack) is out of the question for backward compatibility reason. > Too much code depends on it. > > David > > > On Tue, Oct 28, 2014 at 6:21 PM, Sturla Molden > > wrote: > > Eelco Hoogendoorn > wrote: > > > Perhaps the 'batteries included' philosophy made sense in > the early days of > > numpy; but given that there are several fft libraries with > their own pros > > and cons, and that most numpy projects will use none of them > at all, why > > should numpy bundle any of them? > > Because sometimes we just need to compute a DFT, just like we > sometimes > need to compute a sine or an exponential. It does that job > perfectly well. > It is not always about speed. Just typing np.fft.fft(x) is > convinient. > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndarray at mac.com Wed Oct 29 13:07:42 2014 From: ndarray at mac.com (Alexander Belopolsky) Date: Wed, 29 Oct 2014 13:07:42 -0400 Subject: [Numpy-discussion] Why ndarray provides four ways to flatten? In-Reply-To: References: Message-ID: On Tue, Oct 28, 2014 at 10:11 PM, Nathaniel Smith wrote: > > I don't think so - I think all the heavy lifting is already done in > flatiter. The missing parts are mostly trivial things like .size or .shape > or can be fudged by coercing to true ndarray using existing > flatiter.__array__ method. > > Now try .resize()... > Simple: def resize(self, shape): if self.shape == shape: return else: raise ValueError >From ndarray.resize documentation: Raises ------ ValueError If `a` does not own its own data or references or views to it exist, and the data memory must be changed. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndarray at mac.com Wed Oct 29 13:09:39 2014 From: ndarray at mac.com (Alexander Belopolsky) Date: Wed, 29 Oct 2014 13:09:39 -0400 Subject: [Numpy-discussion] Why ndarray provides four ways to flatten? In-Reply-To: References: Message-ID: On Tue, Oct 28, 2014 at 10:11 PM, Nathaniel Smith wrote: > .diagonal has no magic, it just turns out that the diagonal of any strided > array is also expressible as a strided array. (Specifically, new_strides = > (sum(old_strides),).) This is genius! Once you mentioned this, it is obvious how the new diagonal() works and one can only wonder why it took over 20 years to get this feature in NumPy. -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Wed Oct 29 13:34:13 2014 From: shoyer at gmail.com (Stephan Hoyer) Date: Wed, 29 Oct 2014 10:34:13 -0700 Subject: [Numpy-discussion] Why ndarray provides four ways to flatten? In-Reply-To: <1414574211.3903.9.camel@sebastian-t440> References: <544FDA76.3000508@gmail.com> <1414574211.3903.9.camel@sebastian-t440> Message-ID: On Wed, Oct 29, 2014 at 2:16 AM, Sebastian Berg wrote: > On Di, 2014-10-28 at 14:03 -0400, Alan G Isaac wrote: > I don't really like flat (it is a pretty old part of numpy), but I > agree, while you can force nditer to be C-contiguous, nditer has its own > problems and is also pretty complex. I would argue that nditer it is a > "you know what you are doing" type of function... Or of course if you > want to understand the C-Api. > Unless you keep in mind how buffers are copied around inside it, using > half its features is dangerous. > One other subtle way in which nditer and falt differ is that nditer returns 0-dimensional arrays, not scalars. It turns out there is actually no way in numpy to iterate in the fastest possible order ('K') over an N-dimensional array in Python without the overhead of creating 0-dimensional arrays. This came up recently an as issue for numba: https://github.com/numba/numba/issues/841 So although this discussion has mostly been about consolidating options, this is at least one feature that we might like to add. Honestly, this is likely only to be an issue for projects like Numba that take valid python/numpy code and compile it. If you're writing C/Cython, you can just use the NumPy C API. Cheers, Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Wed Oct 29 14:05:46 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 29 Oct 2014 19:05:46 +0100 Subject: [Numpy-discussion] Deprecate pkgload, PackageLoader In-Reply-To: References: Message-ID: <54512C7A.4030807@googlemail.com> On 29.10.2014 05:30, Charles R Harris wrote: > Hi All, > > It is proposed to deprecate, then remove, pkgload and PackageLoader. > > Complaints? Cries of Anguish? > I don't mind the deprecation, but I have to ask why? is it causing issues? it does look like something some people use in their workflows. From alex.eberspaecher at gmail.com Wed Oct 29 14:23:50 2014 From: alex.eberspaecher at gmail.com (=?ISO-8859-15?Q?Alexander_Ebersp=E4cher?=) Date: Wed, 29 Oct 2014 19:23:50 +0100 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: <54511DD8.5020806@gmail.com> References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> <1865975564436183038.983522sturla.molden-gmail.com@news.gmane.org> <1072140078436209472.929185sturla.molden-gmail.com@news.gmane.org> <54511DD8.5020806@gmail.com> Message-ID: <545130B6.4010308@gmail.com> On 29.10.2014 18:03, Pierre-Andre Noel wrote: >>> Id rather have us discuss how to facilitate the integration of as > many possible fft libraries with numpy behind a maximally uniform > interface, rather than having us debate which fft library is 'best'. > > I agree with the above. Absolutely. I think the NumPy/Scipy interfaces are easy and convenient enough to serve as a front-end to different FFT codes. >> I would agree if it were not already there, but removing it (like > Blas/Lapack) is out of the question for backward compatibility reason. > Too much code depends on it. > > And I definitely agree with that too. > > I think that numpy.fft should be left there in its current state > (although perhaps as deprecated). Now scipy.fft should have a good > generic algorithm as default, and easily allow for different > implementations to be accessed through the same interface. Definitely. My attempt at streamlining the use of pyfftw even further can be found here: https://github.com/aeberspaecher/transparent_pyfftw This small package does nothing more than to automatically load fftw wisdom on export and add a keyword that gives the number of threads to use to NumPy/Scipy style FFT calls. I think similar attempts could be made with other FFT libraries. The mission statement would be to map each library's interface to the simple and convenient SciPy/NumPy-style interface, and also to wisely choose default parameters (such as e.g. pyfftw's planner_effort). Also, I think it's obvious that a generic and easy-to-use implementation cannot deliver exactly the same performance as hand-tuned code, but anyway I see plenty room for improvement for SciPy/NumPy's FFT modules. Alex From robert.kern at gmail.com Wed Oct 29 14:33:12 2014 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 29 Oct 2014 18:33:12 +0000 Subject: [Numpy-discussion] Deprecate pkgload, PackageLoader In-Reply-To: <54512C7A.4030807@googlemail.com> References: <54512C7A.4030807@googlemail.com> Message-ID: On Wed, Oct 29, 2014 at 6:05 PM, Julian Taylor wrote: > On 29.10.2014 05:30, Charles R Harris wrote: >> Hi All, >> >> It is proposed to deprecate, then remove, pkgload and PackageLoader. >> >> Complaints? Cries of Anguish? > > I don't mind the deprecation, but I have to ask why? is it causing issues? > it does look like something some people use in their workflows. Why do you think so? It's old code that was used in scipy/__init__.py, but we removed it from there years ago. It has been unofficially deprecated (and widely considered a bad idea) for years. The only modifications to it in the past 6 years or so have been repo-wide cleanups, Python 3 compatibility and the like, so it *is* adding to the maintenance burden for those tasks. Do you know of anyone that is actually currently using it? -- Robert Kern From heng at cantab.net Wed Oct 29 14:40:38 2014 From: heng at cantab.net (Henry Gomersall) Date: Wed, 29 Oct 2014 18:40:38 +0000 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: <545130B6.4010308@gmail.com> References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> <1865975564436183038.983522sturla.molden-gmail.com@news.gmane.org> <1072140078436209472.929185sturla.molden-gmail.com@news.gmane.org> <54511DD8.5020806@gmail.com> <545130B6.4010308@gmail.com> Message-ID: <545134A6.4050307@cantab.net> On 29/10/14 18:23, Alexander Ebersp?cher wrote: > Definitely. My attempt at streamlining the use of pyfftw even further > can be found here: > > https://github.com/aeberspaecher/transparent_pyfftw There could be an argument that this sort of capability should be added to the pyfftw package, as a package level config. Something like: import pyfftw pyfftw.default_threads = 4 import pyfftw.interfaces.numpy_fft as fft The wisdom code can be added at the package import level too, but that doesn't need anything more. Cheers, Henry From alex.eberspaecher at gmail.com Wed Oct 29 14:59:51 2014 From: alex.eberspaecher at gmail.com (=?ISO-8859-15?Q?Alexander_Ebersp=E4cher?=) Date: Wed, 29 Oct 2014 19:59:51 +0100 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: <545134A6.4050307@cantab.net> References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> <1865975564436183038.983522sturla.molden-gmail.com@news.gmane.org> <1072140078436209472.929185sturla.molden-gmail.com@news.gmane.org> <54511DD8.5020806@gmail.com> <545130B6.4010308@gmail.com> <545134A6.4050307@cantab.net> Message-ID: <54513927.9090500@gmail.com> On 29.10.2014 19:40, Henry Gomersall wrote: > There could be an argument that this sort of capability should be added > to the pyfftw package, as a package level config. > > Something like: > > import pyfftw > pyfftw.default_threads = 4 I think that would be great, though probably slightly off-topic here. > import pyfftw.interfaces.numpy_fft as fft > > The wisdom code can be added at the package import level too, but that > doesn't need anything more. If NumPy/SciPy provided interfaces to different FFT implementations, implementation specific routines (e.g. wisdom load/save or creation of byte-aligned empty arrays in the pyfftw case) could be made available through a subpackage, e.g. np.fft.implementation_specific. That subpackage then exposed routines specific to the implementation that lives below the simple interfaces. For implementation-specific configuration, perhaps a user-level configuration file or set of environment variables could be read on import of the specific implementation. At the very heart of allowing NumPy to use different FFT implementations could be a definition of an intermediate layer, much like LAPACK is for linear algebra. This probably would have to happen at the C-level. I'm only wildly speculating here as I don't have enough experience with interfaces to different FFT libraries, so I don't know whether the individual interfaces are close enough to be able to define a suitable "common interface". Alex From andyfaff at gmail.com Wed Oct 29 16:27:01 2014 From: andyfaff at gmail.com (Andrew Nelson) Date: Thu, 30 Oct 2014 07:27:01 +1100 Subject: [Numpy-discussion] help using np.einsum for stacked matrix multiplication Message-ID: On Wed, Oct 29, 2014 at 10:39 AM, Andrew Nelson wrote: > Dear list, > I have a 4D array, A, that has the shape (NX, NY, 2, 2). I wish to > perform matrix multiplication of the 'NY' 2x2 matrices, resulting in the > matrix B. B would have shape (NX, 2, 2). I believe that np.einsum would > be up to the task, but I'm not quite sure of the subscripts I would need to > achieve this. Ok, I'll try to explain in more detail of what I'm trying to do (I'm not skilled in matrix algebra). Say I have a series of matrices, M, which are all 2x2: M_0, M_1, ..., M_{NY - 1}. These all need to be multiplied by each other. i.e. N = M_0 x M_1 x ... x M_{NY - 1}. Note that I want to multiply M_0 by M_1, the result of that by M_2, the result of that by M_3 and so on. I can hold the (NY) matrices in a single array that has shape (NY, 2, 2). The first row in that array would be M_0, the last would be M_{NY-1}. The output of all that matrix multiplication would be a single 2x2 matrix. So I would've thought an operation would do something like this: #there are NY-1 matrix multiplications involved here. M[NY, 2, 2] -----> N[2, 2] Now let's make the next level of complication, I have NX of those M[NY, 2, 2] matrices. So I need to do the above matrix multiplication series NX times. I could hold all this in an array, P, with shape (NX, NY, 2, 2). Each of the NX rows are independent. Currently I am doing this in a nested loop, pseudocode follows: output = np.zeros((NX, 2, 2)) for i in range(NX): temp = np.identity(2) for j in range(NY): temp = np.dot(temp, P[i, j]) output[i] = temp My original question was posted as I would like to remove that doubly nested loop, with something more elegant, as well as a whole load faster. Is there an np.einsum that can furnish that? (Please forgive me if this still isn't clear or precise enough). -- _____________________________________ Dr. Andrew Nelson _____________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndarray at mac.com Wed Oct 29 16:38:20 2014 From: ndarray at mac.com (Alexander Belopolsky) Date: Wed, 29 Oct 2014 16:38:20 -0400 Subject: [Numpy-discussion] help using np.einsum for stacked matrix multiplication In-Reply-To: References: Message-ID: On Wed, Oct 29, 2014 at 5:39 AM, Andrew Nelson wrote: > I have a 4D array, A, that has the shape (NX, NY, 2, 2). I wish to > perform matrix multiplication of the 'NY' 2x2 matrices, resulting in the > matrix B. B would have shape (NX, 2, 2). What you are looking for is dot.reduce and NumPy does not implement that. You can save an explicit loop by doing reduce(np.dot, matrices). For example In [6] A Out[6] array([[[ 1., 0.], [ 0., 1.]], [[ 2., 0.], [ 0., 2.]], [[ 3., 0.], [ 0., 3.]]]) In [7] reduce(np.dot, A) Out[7] array([[ 6., 0.], [ 0., 6.]]) -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Wed Oct 29 22:15:51 2014 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 29 Oct 2014 19:15:51 -0700 Subject: [Numpy-discussion] [ANN - JOB] Assistant Researcher - Berkeley Institute for Data Science Message-ID: Hi all, the newly founded Berkeley Institute for Data Science is hiring researchers with a focus on open source tools for scientific computing, please see here for details: https://aprecruit.berkeley.edu/apply/JPF00590 Cheers, f -- Fernando Perez (@fperez_org; http://fperez.org) fperez.net-at-gmail: mailing lists only (I ignore this when swamped!) fernando.perez-at-berkeley: contact me here for any direct mail -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Wed Oct 29 23:58:31 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 30 Oct 2014 04:58:31 +0100 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> <1865975564436183038.983522sturla.molden-gmail.com@news.gmane.org> <1072140078436209472.929185sturla.molden-gmail.com@news.gmane.org> Message-ID: On 29/10/14 10:48, Eelco Hoogendoorn wrote: > Id rather have us discuss how to facilitate the integration of > as many possible fft libraries with numpy behind a maximally uniform > interface, rather than having us debate which fft library is 'best'. I am happy with the NumPy interface. There are minor differences between the SciPy and NumPy FFT interfaces (e.g. for rfft, see below). Personally I prefer the NumPy interface because it makes it easier to map Fourier coeffs to frequencies. One thing we could do, without too much hassle, is to use FFTs from MKL or Accelerate (vDSP) if we link with these libararies for BLAS/LAPACK. MKL has an API compatible with FFTW, so FFTW and MKL can be supported with the same C code. FFTW and MKL also have a Fortran 77 API which we could wrap with f2py (no Fortran compiler are needed). It is actually possible to use the FFTs in FFTW and MKL from Python without any C coding at all. We just need to add a Python interface on top of the f2py wrappers, which is similar to what we do for scipy.linalg. The FFTs in Accelerate have a very simple C interface, but only support power-of-two array sizes, so we would need to use them with Bluestein's algorithm. Again, because of their simplicity, it is possible to wrap these FFT functions with f2py. We cannot bundle NumPy or SciPy binaries with FFTW due to GPL [*], but as I understand it we already have permission from Intel to bundle binary wheels linked with MKL. Accelerate is a system library, so that does not pose a license problem. [*] Actually, we could, but the binaries would be tainted with a viral license. >>> a = np.random.rand(8) >>> scipy.fftpack.rfft(a)[:,None] array([[ 3.47756851], [-0.45869926], [-0.21730867], [-0.43763425], [-0.67338213], [-0.28799 ], [ 0.17321793], [-0.31514119]]) >>> np.fft.rfft(a)[:,None] array([[ 3.47756851+0.j ], [-0.45869926-0.21730867j], [-0.43763425-0.67338213j], [-0.28799000+0.17321793j], [-0.31514119+0.j ]]) Sturla From njs at pobox.com Thu Oct 30 02:38:39 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 30 Oct 2014 06:38:39 +0000 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> <1865975564436183038.983522sturla.molden-gmail.com@news.gmane.org> <1072140078436209472.929185sturla.molden-gmail.com@news.gmane.org> Message-ID: On 30 Oct 2014 03:58, "Sturla Molden" wrote: [...] > We cannot bundle NumPy or SciPy binaries with FFTW due to GPL [*], but > as I understand it we already have permission from Intel to bundle > binary wheels linked with MKL. Accelerate is a system library, so that > does not pose a license problem. > > [*] Actually, we could, but the binaries would be tainted with a viral > license. And binaries linked with MKL are tainted by a proprietary license... They have very similar effects, in both cases you can use the binary just fine for whatever you want, but if you redistribute it as part of a larger work, then you must (follow the terms of the GPL/follow the terms of Intel's license). -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From heng at cantab.net Thu Oct 30 04:48:11 2014 From: heng at cantab.net (Henry Gomersall) Date: Thu, 30 Oct 2014 08:48:11 +0000 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> <1865975564436183038.983522sturla.molden-gmail.com@news.gmane.org> <1072140078436209472.929185sturla.molden-gmail.com@news.gmane.org> Message-ID: <5451FB4B.7030301@cantab.net> On 30/10/14 03:58, Sturla Molden wrote: > MKL has an API compatible with FFTW, so FFTW and MKL can be supported > with the same C code. Compatible with big caveats: https://software.intel.com/en-us/node/522278 Henry From sturla.molden at gmail.com Thu Oct 30 07:11:21 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 30 Oct 2014 11:11:21 +0000 (UTC) Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> <1865975564436183038.983522sturla.molden-gmail.com@news.gmane.org> <1072140078436209472.929185sturla.molden-gmail.com@news.gmane.org> Message-ID: <1859492949436360087.493538sturla.molden-gmail.com@news.gmane.org> Nathaniel Smith wrote: >> [*] Actually, we could, but the binaries would be tainted with a viral >> license. > > And binaries linked with MKL are tainted by a proprietary license... They > have very similar effects, The MKL license is proprietary but not viral. Sturla From robert.kern at gmail.com Thu Oct 30 07:27:53 2014 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 30 Oct 2014 11:27:53 +0000 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: <1859492949436360087.493538sturla.molden-gmail.com@news.gmane.org> References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> <1865975564436183038.983522sturla.molden-gmail.com@news.gmane.org> <1072140078436209472.929185sturla.molden-gmail.com@news.gmane.org> <1859492949436360087.493538sturla.molden-gmail.com@news.gmane.org> Message-ID: On Thu, Oct 30, 2014 at 11:11 AM, Sturla Molden wrote: > Nathaniel Smith wrote: > >>> [*] Actually, we could, but the binaries would be tainted with a viral >>> license. >> >> And binaries linked with MKL are tainted by a proprietary license... They >> have very similar effects, > > The MKL license is proprietary but not viral. For our purposes, it has the same effect, though. As a project policy, we only want to put out official binaries that can be used in both proprietary and GPLed projects. Since proprietary licenses and GPL licenses are mutually incompatible, we cannot use components that are either proprietary or GPLed in our official binaries. -- Robert Kern From njs at pobox.com Thu Oct 30 07:28:22 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 30 Oct 2014 11:28:22 +0000 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: <1859492949436360087.493538sturla.molden-gmail.com@news.gmane.org> References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> <1865975564436183038.983522sturla.molden-gmail.com@news.gmane.org> <1072140078436209472.929185sturla.molden-gmail.com@news.gmane.org> <1859492949436360087.493538sturla.molden-gmail.com@news.gmane.org> Message-ID: On 30 Oct 2014 11:12, "Sturla Molden" wrote: > > Nathaniel Smith wrote: > > >> [*] Actually, we could, but the binaries would be tainted with a viral > >> license. > > > > And binaries linked with MKL are tainted by a proprietary license... They > > have very similar effects, > > The MKL license is proprietary but not viral. If you like, but I think you are getting confused by the vividness of anti-GPL rhetoric. GPL and proprietary software are identical in that you have to pay some price if you want to legally redistribute derivative works (e.g. numpy + MKL/FFTW + other software). For proprietary software the price is money and other random more or less onerous conditions (e.g. anti-benchmarking and anti-reverse-engineering clauses are common). For GPL software the price is that you have to let people reuse your source code for free. That's literally all that "viral" means. Which of these prices you find more affordable will depend on your circumstances. Either way it's just something to take into account before redistributing "tainted" binaries. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From nilsc.becker at gmail.com Thu Oct 30 08:34:04 2014 From: nilsc.becker at gmail.com (Nils Becker) Date: Thu, 30 Oct 2014 13:34:04 +0100 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: <54511DD8.5020806@gmail.com> References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> <1865975564436183038.983522sturla.molden-gmail.com@news.gmane.org> <1072140078436209472.929185sturla.molden-gmail.com@news.gmane.org> <54511DD8.5020806@gmail.com> Message-ID: > I think that numpy.fft should be left there in its current state (although perhaps as deprecated). Now scipy.fft should have a good generic algorithm as default, and easily allow for different implementations to be accessed through the same interface. I also agree with the above. But I want to add that in this case it would be wise to include some (sophisticated) testing suite to ensure that all possible libraries implement the DFT with high accuracy. numpy's fftpack (or scipy's) has the advantage that it is old and well tested. FFTW also provides extensive benchmarks of speed and accuracy. Other libraries do not. Most users just want an fft function that works and not bother with details like numerical accuracy. When I encountered such an issue (https://github.com/hgomersall/pyFFTW/issues/51) it took me really a long time to track it down to the fft function. One remark to FFTS: does it implement double precision yet? The corresponding issue (https://github.com/anthonix/ffts/issues/24) seems to be still open but I did not look into it. If it does not it is not suited as a fftpack replacement (I hope I did not overlook some comment about that in the thread). Cheers Nils PS: although I am a long time user of numpy, I am fairly new to the list. So hello! -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Oct 30 13:24:38 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 30 Oct 2014 10:24:38 -0700 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> <1865975564436183038.983522sturla.molden-gmail.com@news.gmane.org> <1072140078436209472.929185sturla.molden-gmail.com@news.gmane.org> <1859492949436360087.493538sturla.molden-gmail.com@news.gmane.org> Message-ID: On Thu, Oct 30, 2014 at 4:28 AM, Nathaniel Smith wrote: > On 30 Oct 2014 11:12, "Sturla Molden" wrote: >> >> Nathaniel Smith wrote: >> >> >> [*] Actually, we could, but the binaries would be tainted with a viral >> >> license. >> > >> > And binaries linked with MKL are tainted by a proprietary license... >> > They >> > have very similar effects, >> >> The MKL license is proprietary but not viral. > > If you like, but I think you are getting confused by the vividness of > anti-GPL rhetoric. GPL and proprietary software are identical in that you > have to pay some price if you want to legally redistribute derivative works > (e.g. numpy + MKL/FFTW + other software). For proprietary software the price > is money and other random more or less onerous conditions (e.g. > anti-benchmarking and anti-reverse-engineering clauses are common). For GPL > software the price is that you have to let people reuse your source code for > free. That's literally all that "viral" means. I wrote a summary of the MKL license problems here: https://github.com/numpy/numpy/wiki/Numerical-software-on-Windows#blas--lapack-libraries In summary, if you distribute something with the MKL you have to: * require your users to agree to a license forbidding them from reverse-engineering the MKL * indemnify Intel against being sued as a result of using MKL in your binaries I think the users are not allowed to further distribute any part of the MKL libraries, but I am happy to be corrected on that. Cheers, Matthew From matthew.brett at gmail.com Thu Oct 30 13:56:41 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 30 Oct 2014 10:56:41 -0700 Subject: [Numpy-discussion] FFTS for numpy's FFTs (was: Re: Choosing between NumPy and SciPy functions) In-Reply-To: References: <20141028083219.2a0aabef1403a70c6ae107f4@esrf.fr> <1865975564436183038.983522sturla.molden-gmail.com@news.gmane.org> <1072140078436209472.929185sturla.molden-gmail.com@news.gmane.org> <1859492949436360087.493538sturla.molden-gmail.com@news.gmane.org> Message-ID: On Thu, Oct 30, 2014 at 10:24 AM, Matthew Brett wrote: > On Thu, Oct 30, 2014 at 4:28 AM, Nathaniel Smith wrote: >> On 30 Oct 2014 11:12, "Sturla Molden" wrote: >>> >>> Nathaniel Smith wrote: >>> >>> >> [*] Actually, we could, but the binaries would be tainted with a viral >>> >> license. >>> > >>> > And binaries linked with MKL are tainted by a proprietary license... >>> > They >>> > have very similar effects, >>> >>> The MKL license is proprietary but not viral. >> >> If you like, but I think you are getting confused by the vividness of >> anti-GPL rhetoric. GPL and proprietary software are identical in that you >> have to pay some price if you want to legally redistribute derivative works >> (e.g. numpy + MKL/FFTW + other software). For proprietary software the price >> is money and other random more or less onerous conditions (e.g. >> anti-benchmarking and anti-reverse-engineering clauses are common). For GPL >> software the price is that you have to let people reuse your source code for >> free. That's literally all that "viral" means. > > I wrote a summary of the MKL license problems here: > > https://github.com/numpy/numpy/wiki/Numerical-software-on-Windows#blas--lapack-libraries > > In summary, if you distribute something with the MKL you have to: > > * require your users to agree to a license forbidding them from > reverse-engineering the MKL > * indemnify Intel against being sued as a result of using MKL in your binaries Sorry - I should point out that this last 'indemnify' clause is "including attorney's fees". Meaning that, if someone sues Intel because of your software, you have to pay Intel's attorney's fees. Matthew From stefan.otte at gmail.com Fri Oct 31 09:13:04 2014 From: stefan.otte at gmail.com (Stefan Otte) Date: Fri, 31 Oct 2014 14:13:04 +0100 Subject: [Numpy-discussion] Generalize hstack/vstack --> stack; Block matrices like in matlab In-Reply-To: References: <101656916431878296.890307sturla.molden-gmail.com@news.gmane.org> Message-ID: To make the last point more concrete the implementation could look something like this (note that I didn't test it and that it still takes some work): def bmat(obj, ldict=None, gdict=None): return matrix(stack(obj, ldict, gdict)) def stack(obj, ldict=None, gdict=None): # the old bmat code minus the matrix calls if isinstance(obj, str): if gdict is None: # get previous frame frame = sys._getframe().f_back glob_dict = frame.f_globals loc_dict = frame.f_locals else: glob_dict = gdict loc_dict = ldict return _from_string(obj, glob_dict, loc_dict) if isinstance(obj, (tuple, list)): # [[A,B],[C,D]] arr_rows = [] for row in obj: if isinstance(row, N.ndarray): # not 2-d return concatenate(obj, axis=-1) else: arr_rows.append(concatenate(row, axis=-1)) return concatenate(arr_rows, axis=0) if isinstance(obj, N.ndarray): return obj I basically turned the old `bmat` into `stack` and removed the matrix calls. Best, Stefan On Wed, Oct 29, 2014 at 3:59 PM, Stefan Otte wrote: > Hey, > > there are several ways how to proceed. > > - My proposed solution covers the 80% case quite well (at least I use > it all the time). I'd convert the doctests into unittests and we're > done. > > - We could slightly change the interface to leave out the surrounding > square brackets, i.e. turning `stack([[a, b], [c, d]])` into > `stack([a, b], [c, d])` > > - We could extend it even further allowing a "filler value" for non > set values and a "shape" argument. This could be done later as well. > > - `bmat` is not really matrix specific. We could refactor `bmat` a bit > to use the same logic in `stack`. Except the `matrix` calls `bmat` and > `_from_string` are pretty agnostic to the input. > > I'm in favor of the first or last approach. The first: because it > already works and is quite simple. The last: because the logic and > tests of both `bmat` and `stack` would be the same and the feature to > specify a string representation of the block matrix is nice. > > > Best, > Stefan > > > > On Tue, Oct 28, 2014 at 7:46 PM, Nathaniel Smith wrote: >> On 28 Oct 2014 18:34, "Stefan Otte" wrote: >>> >>> Hey, >>> >>> In the last weeks I tested `np.asarray(np.bmat(....))` as `stack` >>> function and it works quite well. So the question persits: If `bmat` >>> already offers something like `stack` should we even bother >>> implementing `stack`? More code leads to more >>> bugs and maintenance work. (However, the current implementation is >>> only 5 lines and by using `bmat` which would reduce that even more.) >> >> In the long run we're trying to reduce usage of np.matrix and ideally >> deprecate it entirely. So yes, providing ndarray equivalents of matrix >> functionality (like bmat) is valuable. >> >> -n >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> From dmmcf at dmmcf.net Fri Oct 31 10:26:36 2014 From: dmmcf at dmmcf.net (D. Michael McFarland) Date: Fri, 31 Oct 2014 09:26:36 -0500 Subject: [Numpy-discussion] Choosing between NumPy and SciPy functions In-Reply-To: <87r3xscyfr.fsf@sun.ac.za> (Stefan van der Walt's message of "Tue, 28 Oct 2014 11:53:44 +0200") References: <87mw8hhcd9.fsf@dmmcf.net> <87r3xscyfr.fsf@sun.ac.za> Message-ID: <8761f0wc0z.fsf@dmmcf.net> Stefan van der Walt writes: > On 2014-10-27 15:26:58, D. Michael McFarland wrote: >> What I would like to ask about is the situation this illustrates, where >> both NumPy and SciPy provide similar functionality (sometimes identical, >> to judge by the documentation). Is there some guidance on which is to >> be preferred? > > I'm not sure if you've received an answer to your question so far. My > advice: use the SciPy functions. SciPy is often built on more extensive > Fortran libraries not available during NumPy compilation, and I am not > aware of any cases where a function in NumPy is faster or more extensive > than the equivalent in SciPy. The whole thread has been interesting reading (now that I've finally come back to it...got busy for a few days), but this is the sort of answer I was hoping for. Thank you. Best, Michael From ben.root at ou.edu Fri Oct 31 11:07:03 2014 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 31 Oct 2014 11:07:03 -0400 Subject: [Numpy-discussion] Choosing between NumPy and SciPy functions In-Reply-To: <8761f0wc0z.fsf@dmmcf.net> References: <87mw8hhcd9.fsf@dmmcf.net> <87r3xscyfr.fsf@sun.ac.za> <8761f0wc0z.fsf@dmmcf.net> Message-ID: Just to throw in my two cents here. I feel that sometimes, features are tried out first elsewhere (possibly in scipy) and then brought down into numpy after sufficient shakedown time. So, in some cases, I wonder if the numpy version is actually more refined than the scipy version? Of course, there is no way to know this from the documentation, which is a problem. Didn't scipy have nanmean() for a while before Numpy added it in version 1.8? Ben Root On Fri, Oct 31, 2014 at 10:26 AM, D. Michael McFarland wrote: > Stefan van der Walt writes: > > > On 2014-10-27 15:26:58, D. Michael McFarland wrote: > >> What I would like to ask about is the situation this illustrates, where > >> both NumPy and SciPy provide similar functionality (sometimes identical, > >> to judge by the documentation). Is there some guidance on which is to > >> be preferred? > > > > I'm not sure if you've received an answer to your question so far. My > > advice: use the SciPy functions. SciPy is often built on more extensive > > Fortran libraries not available during NumPy compilation, and I am not > > aware of any cases where a function in NumPy is faster or more extensive > > than the equivalent in SciPy. > > The whole thread has been interesting reading (now that I've finally > come back to it...got busy for a few days), but this is the sort of > answer I was hoping for. Thank you. > > Best, > Michael > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Oct 31 11:21:36 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 31 Oct 2014 11:21:36 -0400 Subject: [Numpy-discussion] Choosing between NumPy and SciPy functions In-Reply-To: References: <87mw8hhcd9.fsf@dmmcf.net> <87r3xscyfr.fsf@sun.ac.za> <8761f0wc0z.fsf@dmmcf.net> Message-ID: On Fri, Oct 31, 2014 at 11:07 AM, Benjamin Root wrote: > Just to throw in my two cents here. I feel that sometimes, features are > tried out first elsewhere (possibly in scipy) and then brought down into > numpy after sufficient shakedown time. So, in some cases, I wonder if the > numpy version is actually more refined than the scipy version? Of course, > there is no way to know this from the documentation, which is a problem. > Didn't scipy have nanmean() for a while before Numpy added it in version > 1.8? > That's true for several functions in scipy.stats. And we have more deprecation in scipy.stats in favor of numpy pending. part of polynomials is another case, kind of. But I don't remember any other ones in my time. (There is also a reverse extension for scipy binned_stats based on the np.histogram code.) Josef > > Ben Root > > On Fri, Oct 31, 2014 at 10:26 AM, D. Michael McFarland > wrote: > >> Stefan van der Walt writes: >> >> > On 2014-10-27 15:26:58, D. Michael McFarland wrote: >> >> What I would like to ask about is the situation this illustrates, where >> >> both NumPy and SciPy provide similar functionality (sometimes >> identical, >> >> to judge by the documentation). Is there some guidance on which is to >> >> be preferred? >> > >> > I'm not sure if you've received an answer to your question so far. My >> > advice: use the SciPy functions. SciPy is often built on more extensive >> > Fortran libraries not available during NumPy compilation, and I am not >> > aware of any cases where a function in NumPy is faster or more extensive >> > than the equivalent in SciPy. >> >> The whole thread has been interesting reading (now that I've finally >> come back to it...got busy for a few days), but this is the sort of >> answer I was hoping for. Thank you. >> >> Best, >> Michael >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Oct 31 11:34:24 2014 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 31 Oct 2014 15:34:24 +0000 Subject: [Numpy-discussion] Choosing between NumPy and SciPy functions In-Reply-To: References: <87mw8hhcd9.fsf@dmmcf.net> <87r3xscyfr.fsf@sun.ac.za> <8761f0wc0z.fsf@dmmcf.net> Message-ID: On Fri, Oct 31, 2014 at 3:07 PM, Benjamin Root wrote: > Just to throw in my two cents here. I feel that sometimes, features are > tried out first elsewhere (possibly in scipy) and then brought down into > numpy after sufficient shakedown time. So, in some cases, I wonder if the > numpy version is actually more refined than the scipy version? Of course, > there is no way to know this from the documentation, which is a problem. > Didn't scipy have nanmean() for a while before Numpy added it in version > 1.8? Not that often, and these usually get actively deprecated eventually. Most duplications are of the form Stefan discusses. -- Robert Kern