From rhattersley at gmail.com Wed Apr 1 05:17:01 2015 From: rhattersley at gmail.com (R Hattersley) Date: Wed, 1 Apr 2015 10:17:01 +0100 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal Message-ID: There are two different interpretations in common use of how to handle multi-valued (array/sequence) indexes. The numpy style is to consider all multi-valued indices together which allows arbitrary points to be extracted. The orthogonal style (e.g. as provided by netcdf4-python) is to consider each multi-valued index independently. For example: >>> type(v)>>> v.shape (240, 37, 49)>>> v[(0, 1), (0, 2, 3)].shape (2, 3, 49)>>> np.array(v)[(0, 1), (0, 2, 3)].shape Traceback (most recent call last): File "", line 1, in IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,) In a netcdf4-python GitHub issue the authors of various orthogonal indexing packages have been discussing how to distinguish the two behaviours and have currently settled on a boolean __orthogonal_indexing__ attribute. 1. Is there any appetite for adding that attribute (with the value `False`) to ndarray? 2. As suggested by shoyer , is there any appetite for adding an alternative indexer to ndarray where __orthogonal_indexing__ = True? For example: myarray.ix_[(0,1), (0, 2, 3)] Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Wed Apr 1 10:06:02 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Wed, 1 Apr 2015 07:06:02 -0700 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: Message-ID: On Wed, Apr 1, 2015 at 2:17 AM, R Hattersley wrote: > There are two different interpretations in common use of how to handle > multi-valued (array/sequence) indexes. The numpy style is to consider all > multi-valued indices together which allows arbitrary points to be > extracted. The orthogonal style (e.g. as provided by netcdf4-python) is to > consider each multi-valued index independently. > > For example: > > >>> type(v)>>> v.shape > (240, 37, 49)>>> v[(0, 1), (0, 2, 3)].shape > (2, 3, 49)>>> np.array(v)[(0, 1), (0, 2, 3)].shape > Traceback (most recent call last): > File "", line 1, in IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,) > > > In a netcdf4-python GitHub issue > the authors of > various orthogonal indexing packages have been discussing how to > distinguish the two behaviours and have currently settled on a boolean > __orthogonal_indexing__ attribute. > > 1. Is there any appetite for adding that attribute (with the value > `False`) to ndarray? > > 2. As suggested by shoyer > , > is there any appetite for adding an alternative indexer to ndarray where > __orthogonal_indexing__ = True? For example: myarray.ix_[(0,1), (0, 2, 3)] > Is there any other package implementing non-orthogonal indexing aside from numpy? I understand that it would be nice to do: if x.__orthogonal_indexing__: return x[idx] else: return x.ix_[idx] But I think you would get the exact same result doing: if isinstance(x, np.ndarray): return x[np.ix_(*idx)] else: return x[idx] If `not x.__orthogonal_indexing__` is going to be a proxy for `isinstance(x, ndarray)` I don't really see the point of disguising it, explicit is better than implicit and all that. If the functionality is lacking, e,g, use of slices in `np.ix_`, I'm all for improving that to provide the full functionality of "orthogonal indexing". I just need a little more convincing that those new attributes/indexers are going to ever see any real use. Jaime > > Richard > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Apr 1 12:04:08 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 1 Apr 2015 10:04:08 -0600 Subject: [Numpy-discussion] IDE's for numpy development? Message-ID: Hi All, In a recent exchange Mark Wiebe suggested that the lack of support for numpy development in Visual Studio might limit the number of developers attracted to the project. I'm a vim/console developer myself and make no claim of familiarity with modern development tools, but I wonder if such tools might now be available for Numpy. A quick google search turns up a beta plugin for Visual Studio, , and there is an xcode IDE for the mac that apparently offers some Python support. The two things that I think are required are: 1) support for mixed C, python developement and 2) support for building and testing numpy. I'd be interested in information from anyone with experience in using such an IDE and ideas of how Numpy might make using some of the common IDEs easier. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Wed Apr 1 12:07:34 2015 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 1 Apr 2015 12:07:34 -0400 Subject: [Numpy-discussion] IDE's for numpy development? In-Reply-To: References: Message-ID: mixed C and python development? I would just wait for the Jupyter folks to create "IC" and maybe even "IC++"! On Wed, Apr 1, 2015 at 12:04 PM, Charles R Harris wrote: > Hi All, > > In a recent exchange Mark Wiebe suggested that the lack of support for > numpy development in Visual Studio might limit the number of developers > attracted to the project. I'm a vim/console developer myself and make no > claim of familiarity with modern development tools, but I wonder if such > tools might now be available for Numpy. A quick google search turns up a > beta plugin for Visual Studio, , and there > is an xcode IDE for the mac that apparently offers some Python support. The > two things that I think are required are: 1) support for mixed C, python > developement and 2) support for building and testing numpy. I'd be > interested in information from anyone with experience in using such an IDE > and ideas of how Numpy might make using some of the common IDEs easier. > > Thoughts? > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yw5aj at virginia.edu Wed Apr 1 12:43:10 2015 From: yw5aj at virginia.edu (Yuxiang Wang) Date: Wed, 1 Apr 2015 12:43:10 -0400 Subject: [Numpy-discussion] IDE's for numpy development? In-Reply-To: References: Message-ID: That would really be hilarious - and "IFortran" probably! :) Shawn On Wed, Apr 1, 2015 at 12:07 PM, Benjamin Root wrote: > mixed C and python development? I would just wait for the Jupyter folks to > create "IC" and maybe even "IC++"! > > On Wed, Apr 1, 2015 at 12:04 PM, Charles R Harris > wrote: >> >> Hi All, >> >> In a recent exchange Mark Wiebe suggested that the lack of support for >> numpy development in Visual Studio might limit the number of developers >> attracted to the project. I'm a vim/console developer myself and make no >> claim of familiarity with modern development tools, but I wonder if such >> tools might now be available for Numpy. A quick google search turns up a >> beta plugin for Visual Studio,, and there is an xcode IDE for the mac that >> apparently offers some Python support. The two things that I think are >> required are: 1) support for mixed C, python developement and 2) support for >> building and testing numpy. I'd be interested in information from anyone >> with experience in using such an IDE and ideas of how Numpy might make using >> some of the common IDEs easier. >> >> Thoughts? >> >> Chuck >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Yuxiang "Shawn" Wang Gerling Research Lab University of Virginia yw5aj at virginia.edu +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/ From eraldo.pomponi at gmail.com Wed Apr 1 13:04:08 2015 From: eraldo.pomponi at gmail.com (Eraldo Pomponi) Date: Wed, 1 Apr 2015 19:04:08 +0200 Subject: [Numpy-discussion] IDE's for numpy development? In-Reply-To: References: Message-ID: Sorry for the OT and top-posting but, It reminds me of "ITex" (https://www.youtube.com/watch?v=eKaI78K_rgA) ... On Wed, Apr 1, 2015 at 6:43 PM, Yuxiang Wang wrote: > That would really be hilarious - and "IFortran" probably! :) > > Shawn > > On Wed, Apr 1, 2015 at 12:07 PM, Benjamin Root wrote: > > mixed C and python development? I would just wait for the Jupyter folks > to > > create "IC" and maybe even "IC++"! > > > > On Wed, Apr 1, 2015 at 12:04 PM, Charles R Harris > > wrote: > >> > >> Hi All, > >> > >> In a recent exchange Mark Wiebe suggested that the lack of support for > >> numpy development in Visual Studio might limit the number of developers > >> attracted to the project. I'm a vim/console developer myself and make no > >> claim of familiarity with modern development tools, but I wonder if such > >> tools might now be available for Numpy. A quick google search turns up a > >> beta plugin for Visual Studio,, and there is an xcode IDE for the mac > that > >> apparently offers some Python support. The two things that I think are > >> required are: 1) support for mixed C, python developement and 2) > support for > >> building and testing numpy. I'd be interested in information from anyone > >> with experience in using such an IDE and ideas of how Numpy might make > using > >> some of the common IDEs easier. > >> > >> Thoughts? > >> > >> Chuck > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > -- > Yuxiang "Shawn" Wang > Gerling Research Lab > University of Virginia > yw5aj at virginia.edu > +1 (434) 284-0836 > https://sites.google.com/a/virginia.edu/yw5aj/ > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Apr 1 13:21:14 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 1 Apr 2015 13:21:14 -0400 Subject: [Numpy-discussion] IDE's for numpy development? In-Reply-To: References: Message-ID: On Wed, Apr 1, 2015 at 12:04 PM, Charles R Harris wrote: > Hi All, > > In a recent exchange Mark Wiebe suggested that the lack of support for numpy > development in Visual Studio might limit the number of developers attracted > to the project. I'm a vim/console developer myself and make no claim of > familiarity with modern development tools, but I wonder if such tools might > now be available for Numpy. A quick google search turns up a beta plugin for > Visual Studio,, and there is an xcode IDE for the mac that apparently offers > some Python support. The two things that I think are required are: 1) > support for mixed C, python developement and 2) support for building and > testing numpy. I'd be interested in information from anyone with experience > in using such an IDE and ideas of how Numpy might make using some of the > common IDEs easier. > > Thoughts? I have no experience with the C/C++ part, but I'm using the C/C++ version of Eclipse with PyDev. It should have all the extra features available, but I don't use them and don't have compiler, debugger and so on for C/C++ connected to Eclipse. It looks like it supports Visual C++ and MingW GCC toolchain. (I'm not sure the same project can be a C/C++ and a PyDev project at the same time.) Josef > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From edisongustavo at gmail.com Wed Apr 1 13:55:23 2015 From: edisongustavo at gmail.com (Edison Gustavo Muenz) Date: Wed, 1 Apr 2015 14:55:23 -0300 Subject: [Numpy-discussion] IDE's for numpy development? In-Reply-To: References: Message-ID: The PTVS can debug into native code. On Wed, Apr 1, 2015 at 2:21 PM, wrote: > On Wed, Apr 1, 2015 at 12:04 PM, Charles R Harris > wrote: > > Hi All, > > > > In a recent exchange Mark Wiebe suggested that the lack of support for > numpy > > development in Visual Studio might limit the number of developers > attracted > > to the project. I'm a vim/console developer myself and make no claim of > > familiarity with modern development tools, but I wonder if such tools > might > > now be available for Numpy. A quick google search turns up a beta plugin > for > > Visual Studio,, and there is an xcode IDE for the mac that apparently > offers > > some Python support. The two things that I think are required are: 1) > > support for mixed C, python developement and 2) support for building and > > testing numpy. I'd be interested in information from anyone with > experience > > in using such an IDE and ideas of how Numpy might make using some of the > > common IDEs easier. > > > > Thoughts? > > I have no experience with the C/C++ part, but I'm using the C/C++ > version of Eclipse with PyDev. > > It should have all the extra features available, but I don't use them > and don't have compiler, debugger and so on for C/C++ connected to > Eclipse. It looks like it supports Visual C++ and MingW GCC toolchain. > (I'm not sure the same project can be a C/C++ and a PyDev project at > the same time.) > > > Josef > > > > > Chuck > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Wed Apr 1 13:55:52 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Wed, 1 Apr 2015 17:55:52 +0000 (UTC) Subject: [Numpy-discussion] IDE's for numpy development? References: Message-ID: <1587162722449602516.347602sturla.molden-gmail.com@news.gmane.org> Charles R Harris wrote: > I'd be > interested in information from anyone with experience in using such an IDE > and ideas of how Numpy might make using some of the common IDEs easier. > > Thoughts? I guess we could include project files for Visual Studio (and perhaps Eclipse?), like Python does. But then we would need to make sure the different build systems are kept in sync, and it will be a PITA for those who do not use Windows and Visual Studio. It is already bad enough with Distutils and Bento. I, for one, would really prefer if there only was one build process to care about. One should also note that a Visual Studio project is the only supported build process for Python on Windows. So they are not using this in addition to something else. Eclipse is better than Visual Studio for mixed Python and C development. It is also cross-platform. cmake needs to be mentioned too. It is not fully integrated with Visual Studio, but better than having multiple build processes. But still, there is nothing that prevents the use of Visual Studio as a glorified text editor. Sturla From jaime.frio at gmail.com Wed Apr 1 14:34:29 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Wed, 1 Apr 2015 11:34:29 -0700 Subject: [Numpy-discussion] Adding 'where' to ufunc methods? Message-ID: This question on StackOverflow: http://stackoverflow.com/questions/29394377/minimum-of-numpy-array-ignoring-diagonal Got me thinking that I had finally found a use for the 'where' kwarg of ufuncs. Unfortunately it is only provided for the ufunc itself, but not for any of its methods. Is there any fundamental reason these were not implemented back in the day? Any frontal opposition to having them now? Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Apr 1 14:43:10 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 1 Apr 2015 12:43:10 -0600 Subject: [Numpy-discussion] IDE's for numpy development? In-Reply-To: <1587162722449602516.347602sturla.molden-gmail.com@news.gmane.org> References: <1587162722449602516.347602sturla.molden-gmail.com@news.gmane.org> Message-ID: On Wed, Apr 1, 2015 at 11:55 AM, Sturla Molden wrote: > Charles R Harris wrote: > > > I'd be > > interested in information from anyone with experience in using such an > IDE > > and ideas of how Numpy might make using some of the common IDEs easier. > > > > Thoughts? > > I guess we could include project files for Visual Studio (and perhaps > Eclipse?), like Python does. But then we would need to make sure the > different build systems are kept in sync, and it will be a PITA for those > who do not use Windows and Visual Studio. It is already bad enough with > Distutils and Bento. I, for one, would really prefer if there only was one > build process to care about. One should also note that a Visual Studio > project is the only supported build process for Python on Windows. So they > are not using this in addition to something else. > > Eclipse is better than Visual Studio for mixed Python and C development. It > is also cross-platform. > > cmake needs to be mentioned too. It is not fully integrated with Visual > Studio, but better than having multiple build processes. > Mark chose cmake for DyND because it supported Visual Studio projects. OTOH, he said it was a PITA to program. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Wed Apr 1 15:25:31 2015 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 1 Apr 2015 15:25:31 -0400 Subject: [Numpy-discussion] Adding 'where' to ufunc methods? In-Reply-To: References: Message-ID: Another usecase would be for MaskedArrays. ma.masked_array.min() wouldn't have to make a copy anymore (there is a github issue about that). It could just pass its mask into the where= argument of min() and be done with it. Problem would be generalizing situations where where= effectively results in "nowhere". Cheers! Ben Root On Wed, Apr 1, 2015 at 2:34 PM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > This question on StackOverflow: > > > http://stackoverflow.com/questions/29394377/minimum-of-numpy-array-ignoring-diagonal > > Got me thinking that I had finally found a use for the 'where' kwarg of > ufuncs. Unfortunately it is only provided for the ufunc itself, but not for > any of its methods. > > Is there any fundamental reason these were not implemented back in the > day? Any frontal opposition to having them now? > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Apr 1 15:47:35 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 1 Apr 2015 12:47:35 -0700 Subject: [Numpy-discussion] Adding 'where' to ufunc methods? In-Reply-To: References: Message-ID: On Wed, Apr 1, 2015 at 11:34 AM, Jaime Fern?ndez del R?o wrote: > This question on StackOverflow: > > http://stackoverflow.com/questions/29394377/minimum-of-numpy-array-ignoring-diagonal > > Got me thinking that I had finally found a use for the 'where' kwarg of > ufuncs. Unfortunately it is only provided for the ufunc itself, but not for > any of its methods. > > Is there any fundamental reason these were not implemented back in the day? > Any frontal opposition to having them now? The where= argument stuff was rescued from the last aborted attempt to add missing value support to numpy. The only reason they aren't implemented for the ufunc methods is that Mark didn't get that far. +1 to adding them now. -n -- Nathaniel J. Smith -- http://vorpus.org From josef.pktd at gmail.com Wed Apr 1 15:55:45 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 1 Apr 2015 15:55:45 -0400 Subject: [Numpy-discussion] Adding 'where' to ufunc methods? In-Reply-To: References: Message-ID: On Wed, Apr 1, 2015 at 3:47 PM, Nathaniel Smith wrote: > On Wed, Apr 1, 2015 at 11:34 AM, Jaime Fern?ndez del R?o > wrote: >> This question on StackOverflow: >> >> http://stackoverflow.com/questions/29394377/minimum-of-numpy-array-ignoring-diagonal >> >> Got me thinking that I had finally found a use for the 'where' kwarg of >> ufuncs. Unfortunately it is only provided for the ufunc itself, but not for >> any of its methods. >> >> Is there any fundamental reason these were not implemented back in the day? >> Any frontal opposition to having them now? > > The where= argument stuff was rescued from the last aborted attempt to > add missing value support to numpy. The only reason they aren't > implemented for the ufunc methods is that Mark didn't get that far. > > +1 to adding them now. can you get `where` in ufuncs without missing value support? what's the result for ufuncs that are not reduce operations? what's the result for reduce operations along an axis if there is nothing there (in a row or column or ...)? Josef > > -n > > -- > Nathaniel J. Smith -- http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From njs at pobox.com Wed Apr 1 16:02:52 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 1 Apr 2015 13:02:52 -0700 Subject: [Numpy-discussion] Adding 'where' to ufunc methods? In-Reply-To: References: Message-ID: On Apr 1, 2015 12:55 PM, wrote: > > On Wed, Apr 1, 2015 at 3:47 PM, Nathaniel Smith wrote: > > On Wed, Apr 1, 2015 at 11:34 AM, Jaime Fern?ndez del R?o > > wrote: > >> This question on StackOverflow: > >> > >> http://stackoverflow.com/questions/29394377/minimum-of-numpy-array-ignoring-diagonal > >> > >> Got me thinking that I had finally found a use for the 'where' kwarg of > >> ufuncs. Unfortunately it is only provided for the ufunc itself, but not for > >> any of its methods. > >> > >> Is there any fundamental reason these were not implemented back in the day? > >> Any frontal opposition to having them now? > > > > The where= argument stuff was rescued from the last aborted attempt to > > add missing value support to numpy. The only reason they aren't > > implemented for the ufunc methods is that Mark didn't get that far. > > > > +1 to adding them now. > > can you get `where` in ufuncs without missing value support? where= is implemented since 1.7 iirc, for regular ufunc calls. I.e. you can currently do np.add(a, b, where=mask), but not np.add.reduce(a, b, where=mask). > what's the result for ufuncs that are not reduce operations? The operation skips over any entries where the mask is false. So if you pass an out= array, the masked out entries will remain unchanged from before the call; if you don't pass an out= array then one will be allocated for you as if by calling np.empty, and then the masked out entries will remain uninitialized. > what's the result for reduce operations along an axis if there is > nothing there (in a row or column or ...)? The same as a reduce operation on a zero length axis: the identity if the ufunc has one, and an error otherwise. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Wed Apr 1 17:12:21 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Wed, 1 Apr 2015 21:12:21 +0000 (UTC) Subject: [Numpy-discussion] How to Force Storage Order References: <0DAB4B4FC42EAA41802458ADA9C2F824301304E8@IRSMSX104.ger.corp.intel.com> Message-ID: <1791137206449615405.619299sturla.molden-gmail.com@news.gmane.org> "Klemm, Michael" wrote: > I have found that the numpy.linalg.svd algorithm creates the resulting U, > sigma, and V matrixes with Fortran storage. Is there any way to force > these kind of algorithms to not change the storage order? That would > make passing the matrixes to the native dgemm operation much easier. NumPy's dot function will call cblas_dgemm in the most efficient way regardless of storage. It knows what to do with C and Fortran arrays. Sturla From shoyer at gmail.com Thu Apr 2 04:29:26 2015 From: shoyer at gmail.com (Stephan Hoyer) Date: Thu, 2 Apr 2015 01:29:26 -0700 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: Message-ID: On Wed, Apr 1, 2015 at 7:06 AM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > Is there any other package implementing non-orthogonal indexing aside from > numpy? > I think we can safely say that NumPy's implementation of broadcasting indexing is unique :). The issue is that many other packages rely on numpy for implementation of custom array objects (e.g., scipy.sparse and scipy.io.netcdf). It's not immediately obvious what sort of indexing these objects represent. If the functionality is lacking, e,g, use of slices in `np.ix_`, I'm all > for improving that to provide the full functionality of "orthogonal > indexing". I just need a little more convincing that those new > attributes/indexers are going to ever see any real use. > Orthogonal indexing is close to the norm for packages that implement labeled data structures, both because it's easier to understand and implement, and because it's difficult to maintain associations with labels through complex broadcasting indexing. Unfortunately, the lack of a full featured implementation of orthogonal indexing has lead to that wheel being reinvented at least three times (in Iris, xray [1] and pandas). So it would be nice to have a canonical implementation that supports slices and integers in numpy for that reason alone. This could be done by building on the existing `np.ix_` function, but a new indexer seems more elegant: there's just much less noise with `arr.ix_[:1, 2, [3]]` than `arr[np.ix_(slice(1), 2, [3])]`. It's also well known that indexing with __getitem__ can be much slower than np.take. It seems plausible to me that a careful implementation of orthogonal indexing could close or eliminate this speed gap, because the model for orthogonal indexing is so much simpler than that for broadcasting indexing: each element of the key tuple can be applied separately along the corresponding axis. So I think there could be a real benefit to having the feature in numpy. In particular, if somebody is up for implementing it in C or Cython, I would be very pleased. Cheers, Stephan [1] Here is my implementation of remapping from orthogonal to broadcasting indexing. It works, but it's a real mess, especially because I try to optimize by minimizing the number of times slices are converted into arrays: https://github.com/xray/xray/blob/0d164d848401209971ded33aea2880c1fdc892cb/xray/core/indexing.py#L68 -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Thu Apr 2 05:00:37 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 02 Apr 2015 11:00:37 +0200 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: Message-ID: <1427965237.1353.15.camel@sipsolutions.net> On Do, 2015-04-02 at 01:29 -0700, Stephan Hoyer wrote: > On Wed, Apr 1, 2015 at 7:06 AM, Jaime Fern?ndez del R?o > wrote: > Is there any other package implementing non-orthogonal > indexing aside from numpy? > > > I think we can safely say that NumPy's implementation of broadcasting > indexing is unique :). > > > The issue is that many other packages rely on numpy for implementation > of custom array objects (e.g., scipy.sparse and scipy.io.netcdf). It's > not immediately obvious what sort of indexing these objects represent. > > > If the functionality is lacking, e,g, use of slices in > `np.ix_`, I'm all for improving that to provide the full > functionality of "orthogonal indexing". I just need a little > more convincing that those new attributes/indexers are going > to ever see any real use. > > > > Orthogonal indexing is close to the norm for packages that implement > labeled data structures, both because it's easier to understand and > implement, and because it's difficult to maintain associations with > labels through complex broadcasting indexing. > > > Unfortunately, the lack of a full featured implementation of > orthogonal indexing has lead to that wheel being reinvented at least > three times (in Iris, xray [1] and pandas). So it would be nice to > have a canonical implementation that supports slices and integers in > numpy for that reason alone. This could be done by building on the > existing `np.ix_` function, but a new indexer seems more elegant: > there's just much less noise with `arr.ix_[:1, 2, [3]]` than > `arr[np.ix_(slice(1), 2, [3])]`. > > > It's also well known that indexing with __getitem__ can be much slower > than np.take. It seems plausible to me that a careful implementation > of orthogonal indexing could close or eliminate this speed gap, > because the model for orthogonal indexing is so much simpler than that > for broadcasting indexing: each element of the key tuple can be > applied separately along the corresponding axis. > Wrong (sorry, couldn't resist ;)), since 1.9. take is not typically faster unless you have a small subspace ("subspace" are the non-indexed/slice-indexed axes, though I guess small subspace is common in some cases, i.e. Nx3 array), it should typically be noticeably slower for large subspaces at the moment. Anyway, unfortunately while orthogonal indexing may seem simpler, as you probably noticed, mapping it fully featured to advanced indexing does not seem like a walk in the park due to how axis remapping works when you have a combination of slices and advanced indices. It might be possible to basically implement a second MapIterSwapaxis in addition to adding extra axes to the inputs (which I think would need a post-processing step, but that is not that bad). If you do that, you can mostly reuse the current machinery and avoid most of the really annoying code blocks which set up the iterators for the various special cases. Otherwise, for hacking it of course you can replace the slices by arrays as well ;). > > So I think there could be a real benefit to having the feature in > numpy. In particular, if somebody is up for implementing it in C or > Cython, I would be very pleased. > > > Cheers, > > Stephan > > > [1] Here is my implementation of remapping from orthogonal to > broadcasting indexing. It works, but it's a real mess, especially > because I try to optimize by minimizing the number of times slices are > converted into arrays: > https://github.com/xray/xray/blob/0d164d848401209971ded33aea2880c1fdc892cb/xray/core/indexing.py#L68 > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From cournape at gmail.com Thu Apr 2 09:46:12 2015 From: cournape at gmail.com (David Cournapeau) Date: Thu, 2 Apr 2015 14:46:12 +0100 Subject: [Numpy-discussion] IDE's for numpy development? In-Reply-To: References: <1587162722449602516.347602sturla.molden-gmail.com@news.gmane.org> Message-ID: On Wed, Apr 1, 2015 at 7:43 PM, Charles R Harris wrote: > > > On Wed, Apr 1, 2015 at 11:55 AM, Sturla Molden > wrote: > >> Charles R Harris wrote: >> >> > I'd be >> > interested in information from anyone with experience in using such an >> IDE >> > and ideas of how Numpy might make using some of the common IDEs easier. >> > >> > Thoughts? >> >> I guess we could include project files for Visual Studio (and perhaps >> Eclipse?), like Python does. But then we would need to make sure the >> different build systems are kept in sync, and it will be a PITA for those >> who do not use Windows and Visual Studio. It is already bad enough with >> Distutils and Bento. I, for one, would really prefer if there only was one >> build process to care about. One should also note that a Visual Studio >> project is the only supported build process for Python on Windows. So they >> are not using this in addition to something else. >> >> Eclipse is better than Visual Studio for mixed Python and C development. >> It >> is also cross-platform. >> >> cmake needs to be mentioned too. It is not fully integrated with Visual >> Studio, but better than having multiple build processes. >> > > Mark chose cmake for DyND because it supported Visual Studio projects. > OTOH, he said it was a PITA to program. > I concur on that: For the 350+ packages we support at Enthought, cmake has been a higher pain point than any other build tool (that is including custom ones). And we only support mainstream platforms. But the real question for me is what does visual studio support mean ? Does it really mean solution files ? David -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Thu Apr 2 10:15:17 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Thu, 2 Apr 2015 07:15:17 -0700 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: Message-ID: On Thu, Apr 2, 2015 at 1:29 AM, Stephan Hoyer wrote: > On Wed, Apr 1, 2015 at 7:06 AM, Jaime Fern?ndez del R?o < > jaime.frio at gmail.com> wrote: > >> Is there any other package implementing non-orthogonal indexing aside >> from numpy? >> > > I think we can safely say that NumPy's implementation of broadcasting > indexing is unique :). > > The issue is that many other packages rely on numpy for implementation of > custom array objects (e.g., scipy.sparse and scipy.io.netcdf). It's not > immediately obvious what sort of indexing these objects represent. > > If the functionality is lacking, e,g, use of slices in `np.ix_`, I'm all >> for improving that to provide the full functionality of "orthogonal >> indexing". I just need a little more convincing that those new >> attributes/indexers are going to ever see any real use. >> > > Orthogonal indexing is close to the norm for packages that implement > labeled data structures, both because it's easier to understand and > implement, and because it's difficult to maintain associations with labels > through complex broadcasting indexing. > > Unfortunately, the lack of a full featured implementation of orthogonal > indexing has lead to that wheel being reinvented at least three times (in > Iris, xray [1] and pandas). So it would be nice to have a canonical > implementation that supports slices and integers in numpy for that reason > alone. This could be done by building on the existing `np.ix_` function, > but a new indexer seems more elegant: there's just much less noise with > `arr.ix_[:1, 2, [3]]` than `arr[np.ix_(slice(1), 2, [3])]`. > > It's also well known that indexing with __getitem__ can be much slower > than np.take. It seems plausible to me that a careful implementation of > orthogonal indexing could close or eliminate this speed gap, because the > model for orthogonal indexing is so much simpler than that for broadcasting > indexing: each element of the key tuple can be applied separately along the > corresponding axis. > > So I think there could be a real benefit to having the feature in numpy. > In particular, if somebody is up for implementing it in C or Cython, I > would be very pleased. > > Cheers, > Stephan > > [1] Here is my implementation of remapping from orthogonal to broadcasting > indexing. It works, but it's a real mess, especially because I try to > optimize by minimizing the number of times slices are converted into arrays: > > https://github.com/xray/xray/blob/0d164d848401209971ded33aea2880c1fdc892cb/xray/core/indexing.py#L68 > > I believe you can leave all slices unchanged if you later reshuffle your axes. Basically all the fancy-indexed axes go in the front of the shape in order, and the subspace follows, e.g.: >>> a = np.arange(60).reshape(3, 4, 5) >>> a[np.array([1])[:, None], ::2, np.array([1, 2, 3])].shape (1, 3, 2) So you would need to swap the second and last axes and be done. You would not get a contiguous array without a copy, but that's a different story. Assigning to an orthogonally indexed subarray is an entirely different beast, not sure if there is a use case for that. We probably need more traction on the "should this be done?" discussion than on the "can this be done?" one, the need for a reordering of the axes swings me slightly in favor, but I mostly don't see it yet. Nathaniel usually has good insights on who we are, where do we come from, where are we going to, type of questions, would be good to have him chime in. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Thu Apr 2 14:03:27 2015 From: efiring at hawaii.edu (Eric Firing) Date: Thu, 02 Apr 2015 08:03:27 -1000 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: Message-ID: <551D846F.3080808@hawaii.edu> On 2015/04/02 4:15 AM, Jaime Fern?ndez del R?o wrote: > We probably need more traction on the "should this be done?" discussion > than on the "can this be done?" one, the need for a reordering of the > axes swings me slightly in favor, but I mostly don't see it yet. As a long-time user of numpy, and an advocate and teacher of Python for science, here is my perspective: Fancy indexing is a horrible design mistake--a case of cleverness run amok. As you can read in the Numpy documentation, it is hard to explain, hard to understand, hard to remember. Its use easily leads to unreadable code and hard-to-see errors. Here is the essence of an example that a student presented me with just this week, in the context of reordering eigenvectors based on argsort applied to eigenvalues: In [25]: xx = np.arange(2*3*4).reshape((2, 3, 4)) In [26]: ii = np.arange(4) In [27]: print(xx[0]) [[ 0 1 2 3] [ 4 5 6 7] [ 8 9 10 11]] In [28]: print(xx[0, :, ii]) [[ 0 4 8] [ 1 5 9] [ 2 6 10] [ 3 7 11]] Quickly now, how many numpy users would look at that last expression and say, "Of course, that is equivalent to transposing xx[0]"? And, "Of course that expression should give a completely different result from xx[0][:, ii]."? I would guess it would be less than 1%. That should tell you right away that we have a real problem here. Fancy indexing can't be *read* by a sub-genius--it has to be laboriously figured out piece by piece, with frequent reference to the baffling descriptions in the Numpy docs. So I think you should turn the question around and ask, "What is the actual real-world use case for fancy indexing?" How often does real code rely on it? I have taken advantage of it occasionally, maybe you have too, but I think a survey of existing code would show that the need for it is *far* less common than the need for simple orthogonal indexing. That tells me that it is fancy indexing, not orthogonal indexing, that should be available through a function and/or special indexing attribute. The question is then how to make that transition. Eric From shoyer at gmail.com Thu Apr 2 14:41:59 2015 From: shoyer at gmail.com (Stephan Hoyer) Date: Thu, 2 Apr 2015 11:41:59 -0700 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: <551D846F.3080808@hawaii.edu> References: <551D846F.3080808@hawaii.edu> Message-ID: On Thu, Apr 2, 2015 at 11:03 AM, Eric Firing wrote: > Fancy indexing is a horrible design mistake--a case of cleverness run > amok. As you can read in the Numpy documentation, it is hard to > explain, hard to understand, hard to remember. Well put! I also failed to correct predict your example. > So I think you should turn the question around and ask, "What is the > actual real-world use case for fancy indexing?" How often does real > code rely on it? I'll just note that Indexing with a boolean array with the same shape as the array (e.g., x[x < 0] when x has greater than 1 dimension) technically falls outside a strict interpretation of orthogonal indexing. But there's not any ambiguity in adding that as an extension to orthogonal indexing (which otherwise does not allow ndim > 1), so I think your point still stands. Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Thu Apr 2 14:59:10 2015 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 2 Apr 2015 14:59:10 -0400 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: <551D846F.3080808@hawaii.edu> Message-ID: The distinction that boolean indexing has over the other 2 methods of indexing is that it can guarantee that it references a position at most once. Slicing and scalar indexes are also this way, hence why these methods allow for in-place assignments. I don't see boolean indexing as an extension of orthogonal indexing because of that. Ben Root On Thu, Apr 2, 2015 at 2:41 PM, Stephan Hoyer wrote: > On Thu, Apr 2, 2015 at 11:03 AM, Eric Firing wrote: > >> Fancy indexing is a horrible design mistake--a case of cleverness run >> amok. As you can read in the Numpy documentation, it is hard to >> explain, hard to understand, hard to remember. > > > Well put! > > I also failed to correct predict your example. > > >> So I think you should turn the question around and ask, "What is the >> actual real-world use case for fancy indexing?" How often does real >> code rely on it? > > > I'll just note that Indexing with a boolean array with the same shape as > the array (e.g., x[x < 0] when x has greater than 1 dimension) technically > falls outside a strict interpretation of orthogonal indexing. But there's > not any ambiguity in adding that as an extension to orthogonal indexing > (which otherwise does not allow ndim > 1), so I think your point still > stands. > > Stephan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Apr 2 16:22:16 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Apr 2015 16:22:16 -0400 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: <551D846F.3080808@hawaii.edu> References: <551D846F.3080808@hawaii.edu> Message-ID: On Thu, Apr 2, 2015 at 2:03 PM, Eric Firing wrote: > On 2015/04/02 4:15 AM, Jaime Fern?ndez del R?o wrote: >> We probably need more traction on the "should this be done?" discussion >> than on the "can this be done?" one, the need for a reordering of the >> axes swings me slightly in favor, but I mostly don't see it yet. > > As a long-time user of numpy, and an advocate and teacher of Python for > science, here is my perspective: > > Fancy indexing is a horrible design mistake--a case of cleverness run > amok. As you can read in the Numpy documentation, it is hard to > explain, hard to understand, hard to remember. Its use easily leads to > unreadable code and hard-to-see errors. Here is the essence of an > example that a student presented me with just this week, in the context > of reordering eigenvectors based on argsort applied to eigenvalues: > > In [25]: xx = np.arange(2*3*4).reshape((2, 3, 4)) > > In [26]: ii = np.arange(4) > > In [27]: print(xx[0]) > [[ 0 1 2 3] > [ 4 5 6 7] > [ 8 9 10 11]] > > In [28]: print(xx[0, :, ii]) > [[ 0 4 8] > [ 1 5 9] > [ 2 6 10] > [ 3 7 11]] > > Quickly now, how many numpy users would look at that last expression and > say, "Of course, that is equivalent to transposing xx[0]"? And, "Of > course that expression should give a completely different result from > xx[0][:, ii]."? > > I would guess it would be less than 1%. That should tell you right away > that we have a real problem here. Fancy indexing can't be *read* by a > sub-genius--it has to be laboriously figured out piece by piece, with > frequent reference to the baffling descriptions in the Numpy docs. > > So I think you should turn the question around and ask, "What is the > actual real-world use case for fancy indexing?" How often does real > code rely on it? I have taken advantage of it occasionally, maybe you > have too, but I think a survey of existing code would show that the need > for it is *far* less common than the need for simple orthogonal > indexing. That tells me that it is fancy indexing, not orthogonal > indexing, that should be available through a function and/or special > indexing attribute. The question is then how to make that transition. Swapping the axis when slices are mixed with fancy indexing was a design mistake, IMO. But not fancy indexing itself. >>> np.triu_indices(5) (array([0, 0, 0, 0, 0, 1, 1, 1, 1, 2, 2, 2, 3, 3, 4], dtype=int64), array([0, 1, 2, 3, 4, 1, 2, 3, 4, 2, 3, 4, 3, 4, 4], dtype=int64)) >>> m = np.arange(25).reshape(5, 5)[np.triu_indices(5)] >>> m array([ 0, 1, 2, 3, 4, 6, 7, 8, 9, 12, 13, 14, 18, 19, 24]) >>> m2 = np.zeros((5,5)) >>> m2[np.triu_indices(5)] = m >>> m2 array([[ 0., 1., 2., 3., 4.], [ 0., 6., 7., 8., 9.], [ 0., 0., 12., 13., 14.], [ 0., 0., 0., 18., 19.], [ 0., 0., 0., 0., 24.]]) (I don't remember what's "fancy" in indexing, just that broadcasting rules apply.) Josef > > Eric > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From efiring at hawaii.edu Thu Apr 2 16:35:29 2015 From: efiring at hawaii.edu (Eric Firing) Date: Thu, 02 Apr 2015 10:35:29 -1000 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: <551D846F.3080808@hawaii.edu> Message-ID: <551DA811.9050707@hawaii.edu> On 2015/04/02 10:22 AM, josef.pktd at gmail.com wrote: > Swapping the axis when slices are mixed with fancy indexing was a > design mistake, IMO. But not fancy indexing itself. I'm not saying there should be no fancy indexing capability; I am saying that it should be available through a function or method, rather than via the square brackets. Square brackets should do things that people expect them to do--the most common and easy-to-understand style of indexing. Eric From cjw at ncf.ca Thu Apr 2 18:04:42 2015 From: cjw at ncf.ca (Colin J. Williams) Date: Thu, 02 Apr 2015 18:04:42 -0400 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: <551DA811.9050707@hawaii.edu> References: <551D846F.3080808@hawaii.edu> <551DA811.9050707@hawaii.edu> Message-ID: <551DBCFA.2010407@ncf.ca> On 02-Apr-15 4:35 PM, Eric Firing wrote: > On 2015/04/02 10:22 AM, josef.pktd at gmail.com wrote: >> Swapping the axis when slices are mixed with fancy indexing was a >> design mistake, IMO. But not fancy indexing itself. > I'm not saying there should be no fancy indexing capability; I am saying > that it should be available through a function or method, rather than > via the square brackets. Square brackets should do things that people > expect them to do--the most common and easy-to-understand style of indexing. > > Eric +1 > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Thu Apr 2 18:50:27 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 2 Apr 2015 16:50:27 -0600 Subject: [Numpy-discussion] IDE's for numpy development? In-Reply-To: References: <1587162722449602516.347602sturla.molden-gmail.com@news.gmane.org> Message-ID: On Thu, Apr 2, 2015 at 7:46 AM, David Cournapeau wrote: > > > On Wed, Apr 1, 2015 at 7:43 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Wed, Apr 1, 2015 at 11:55 AM, Sturla Molden >> wrote: >> >>> Charles R Harris wrote: >>> >>> > I'd be >>> > interested in information from anyone with experience in using such an >>> IDE >>> > and ideas of how Numpy might make using some of the common IDEs easier. >>> > >>> > Thoughts? >>> >>> I guess we could include project files for Visual Studio (and perhaps >>> Eclipse?), like Python does. But then we would need to make sure the >>> different build systems are kept in sync, and it will be a PITA for those >>> who do not use Windows and Visual Studio. It is already bad enough with >>> Distutils and Bento. I, for one, would really prefer if there only was >>> one >>> build process to care about. One should also note that a Visual Studio >>> project is the only supported build process for Python on Windows. So >>> they >>> are not using this in addition to something else. >>> >>> Eclipse is better than Visual Studio for mixed Python and C development. >>> It >>> is also cross-platform. >>> >>> cmake needs to be mentioned too. It is not fully integrated with Visual >>> Studio, but better than having multiple build processes. >>> >> >> Mark chose cmake for DyND because it supported Visual Studio projects. >> OTOH, he said it was a PITA to program. >> > > I concur on that: For the 350+ packages we support at Enthought, cmake > has been a higher pain point than any other build tool (that is including > custom ones). And we only support mainstream platforms. > > But the real question for me is what does visual studio support mean ? > Does it really mean solution files ? > > I have no useful experience with Visual Studio, so don't really know, but solution files sounds like a step in the right direction. What do solution files provide? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From klemm at phys.ethz.ch Thu Apr 2 19:14:21 2015 From: klemm at phys.ethz.ch (Hanno Klemm) Date: Fri, 3 Apr 2015 01:14:21 +0200 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: <551DBCFA.2010407@ncf.ca> References: <551D846F.3080808@hawaii.edu> <551DA811.9050707@hawaii.edu> <551DBCFA.2010407@ncf.ca> Message-ID: <9D27C526-73D6-4C1B-A0F4-1280E45085F5@phys.ethz.ch> > On 03 Apr 2015, at 00:04, Colin J. Williams wrote: > > > > On 02-Apr-15 4:35 PM, Eric Firing wrote: >> On 2015/04/02 10:22 AM, josef.pktd at gmail.com wrote: >>> Swapping the axis when slices are mixed with fancy indexing was a >>> design mistake, IMO. But not fancy indexing itself. >> I'm not saying there should be no fancy indexing capability; I am saying >> that it should be available through a function or method, rather than >> via the square brackets. Square brackets should do things that people >> expect them to do--the most common and easy-to-understand style of indexing. >> >> Eric > +1 Well, I have written quite a bit of code that relies on fancy indexing, and I think the question, if the behaviour of the [] operator should be changed has sailed with numpy now at version 1.9. Given the amount packages that rely on numpy, changing this fundamental behaviour would not be a clever move. If people want to implement orthogonal indexing with another method, by all means I might use it at some point in the future. However, adding even more complexity to the behaviour of the bracket slicing is probably not a good idea. Hanno From cgodshall at enthought.com Thu Apr 2 20:00:36 2015 From: cgodshall at enthought.com (Courtenay Godshall (Enthought)) Date: Thu, 2 Apr 2015 19:00:36 -0500 Subject: [Numpy-discussion] SciPy 2015 Conference Updates - Call for talks extended to 4/10, registration open, keynotes announced, John Hunter Plotting Contest In-Reply-To: <052501d06da1$151b5760$3f520620$@enthought.com> References: <052501d06da1$151b5760$3f520620$@enthought.com> Message-ID: <053001d06da1$37cda290$a768e7b0$@enthought.com> ---------------------------------------------------------------------------- ----------------------------------------------------------- **LAST CALL FOR SCIPY 2015 TALK AND POSTER SUBMISSIONS - EXTENSION TO 4/10* ---------------------------------------------------------------------------- ----------------------------------------------------------- SciPy 2015 will include 3 major topic tracks and 7 mini-symposia tracks. Submit a proposal on the SciPy 2015 website: http://scipy2015.scipy.org. If you have any questions or comments, feel free to contact us at: scipy-organizers at scipy.org . You can also follow @scipyconf on Twitter or sign up for the mailing list on the website for the latest updates! Major topic tracks include: - Scientific Computing in Python (General track) - Python in Data Science - Quantitative Finance and Computational Social Sciences Mini-symposia will include the applications of Python in: - Astronomy and astrophysics - Computational life and medical sciences - Engineering - Geographic information systems (GIS) - Geophysics - Oceanography and meteorology - Visualization, vision and imaging -------------------------------------------------------------------------- **SCIPY 2015 REGISTRATION IS OPEN** Please register ASAP to help us get a good headcount and open the conference to as many people as we can. PLUS, everyone who registers before May 15 will not only get early bird discounts, but will also be entered in a drawing for a free registration (via refund or extra)! Register on the website at http://scipy2015.scipy.org -------------------------------------------------------------------------- **SCIPY 2015 KEYNOTE SPEAKERS ANNOUNCED** Keynote speakers were just announced and include Wes McKinney, author of Pandas; Chris Wiggins, Chief Data Scientist for The New York Times; and Jake VanderPlas, director of research at the University of Washington's eScience Institute and core contributor to a number of scientific Python libraries including sci-kit learn and AstroML. -------------------------------------------------------------------------- **ENTER THE SCIPY JOHN HUNTER EXCELLENCE IN PLOTTING CONTEST - DUE 4/13** In memory of John Hunter, creator of matplotlib, we are pleased to announce the Third Annual SciPy John Hunter Excellence in Plotting Competition. This open competition aims to highlight the importance of quality plotting to scientific progress and showcase the capabilities of the current generation of plotting software. Participants are invited to submit scientific plots to be judged by a panel. The winning entries will be announced and displayed at the conference. John Hunter's family is graciously sponsoring cash prizes up to $1,000 for the winners. We look forward to exciting submissions that push the boundaries of plotting! See details here: http://scipy2015.scipy.org/ehome/115969/276538/ Entries must be submitted by April 13, 2015 via e-mail to plotting-contest at scipy.org -------------------------------------------------------------------------- **CALENDAR AND IMPORTANT DATES** --Sprint, Birds of a Feather, Financial Aid and Talk submissions are open NOW --Apr 10, 2015: Talk and Poster submission deadline --Apr 13, 2015: Plotting contest submissions due --Apr 15, 2015: Financial aid application deadline --Apr 17, 2015: Tutorial schedule announced --May 1, 2015: General conference speakers & schedule announced --May 15, 2015 (or 150 registrants): Early-bird registration ends --Jun 1, 2015: BoF submission deadline --Jul 6-7, 2015: SciPy 2015 Tutorials --Jul 8-10, 2015: SciPy 2015 General Conference --Jul 11-12, 2015: SciPy 2015 Sprints -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Thu Apr 2 20:02:05 2015 From: efiring at hawaii.edu (Eric Firing) Date: Thu, 02 Apr 2015 14:02:05 -1000 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: <9D27C526-73D6-4C1B-A0F4-1280E45085F5@phys.ethz.ch> References: <551D846F.3080808@hawaii.edu> <551DA811.9050707@hawaii.edu> <551DBCFA.2010407@ncf.ca> <9D27C526-73D6-4C1B-A0F4-1280E45085F5@phys.ethz.ch> Message-ID: <551DD87D.2000305@hawaii.edu> On 2015/04/02 1:14 PM, Hanno Klemm wrote: > Well, I have written quite a bit of code that relies on fancy > indexing, and I think the question, if the behaviour of the [] > operator should be changed has sailed with numpy now at version 1.9. > Given the amount packages that rely on numpy, changing this > fundamental behaviour would not be a clever move. Are you *positive* that there is no clever way to make a transition? It's not worth any further thought? > > If people want to implement orthogonal indexing with another method, > by all means I might use it at some point in the future. However, > adding even more complexity to the behaviour of the bracket slicing > is probably not a good idea. I'm not advocating adding even more complexity, I'm trying to think about ways to make it *less* complex from the typical user's standpoint. Eric From josef.pktd at gmail.com Thu Apr 2 21:09:09 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Apr 2015 21:09:09 -0400 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: <551DD87D.2000305@hawaii.edu> References: <551D846F.3080808@hawaii.edu> <551DA811.9050707@hawaii.edu> <551DBCFA.2010407@ncf.ca> <9D27C526-73D6-4C1B-A0F4-1280E45085F5@phys.ethz.ch> <551DD87D.2000305@hawaii.edu> Message-ID: On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing wrote: > On 2015/04/02 1:14 PM, Hanno Klemm wrote: >> Well, I have written quite a bit of code that relies on fancy >> indexing, and I think the question, if the behaviour of the [] >> operator should be changed has sailed with numpy now at version 1.9. >> Given the amount packages that rely on numpy, changing this >> fundamental behaviour would not be a clever move. > > Are you *positive* that there is no clever way to make a transition? > It's not worth any further thought? I guess it would be similar to python 3 string versus bytes, but without the overwhelming benefits. I don't think I would be in favor of deprecating fancy indexing even if it were possible. In general, my impression is that if there is a trade-off in numpy between powerful machinery versus easy to learn and teach, then the design philosophy when in favor of power. I think numpy indexing is not too difficult and follows a consistent pattern, and I completely avoid mixing slices and index arrays with ndim > 2. I think it should be DOA, except as a discussion topic for numpy 3000. just my opinion Josef > >> >> If people want to implement orthogonal indexing with another method, >> by all means I might use it at some point in the future. However, >> adding even more complexity to the behaviour of the bracket slicing >> is probably not a good idea. > > I'm not advocating adding even more complexity, I'm trying to think > about ways to make it *less* complex from the typical user's standpoint. > > Eric > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From josef.pktd at gmail.com Thu Apr 2 21:35:55 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Apr 2015 21:35:55 -0400 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: <551D846F.3080808@hawaii.edu> <551DA811.9050707@hawaii.edu> <551DBCFA.2010407@ncf.ca> <9D27C526-73D6-4C1B-A0F4-1280E45085F5@phys.ethz.ch> <551DD87D.2000305@hawaii.edu> Message-ID: On Thu, Apr 2, 2015 at 9:09 PM, wrote: > On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing wrote: >> On 2015/04/02 1:14 PM, Hanno Klemm wrote: >>> Well, I have written quite a bit of code that relies on fancy >>> indexing, and I think the question, if the behaviour of the [] >>> operator should be changed has sailed with numpy now at version 1.9. >>> Given the amount packages that rely on numpy, changing this >>> fundamental behaviour would not be a clever move. >> >> Are you *positive* that there is no clever way to make a transition? >> It's not worth any further thought? > > I guess it would be similar to python 3 string versus bytes, but > without the overwhelming benefits. > > I don't think I would be in favor of deprecating fancy indexing even > if it were possible. In general, my impression is that if there is a > trade-off in numpy between powerful machinery versus easy to learn and > teach, then the design philosophy when in favor of power. > > I think numpy indexing is not too difficult and follows a consistent > pattern, and I completely avoid mixing slices and index arrays with > ndim > 2. > > I think it should be DOA, except as a discussion topic for numpy 3000. > > just my opinion is this fancy? >>> vals array([6, 5, 4, 1, 2, 3]) >>> a+b array([[3, 2, 1, 0], [4, 3, 2, 1], [5, 4, 3, 2]]) >>> vals[a+b] array([[1, 4, 5, 6], [2, 1, 4, 5], [3, 2, 1, 4]]) https://github.com/scipy/scipy/blob/v0.14.0/scipy/linalg/special_matrices.py#L178 (I thought about this because I was looking at accessing off-diagonal elements, m2[np.arange(4), np.arange(4) + 1] ) How would you find all the code that would not be correct anymore with a changed definition of indexing and slicing, if there is insufficient test coverage and it doesn't raise an exception? If we find it, who fixes all the legacy code? (I don't think it will be minor unless there is a new method `fix_[...]` (fancy ix) Josef > > Josef > >> >>> >>> If people want to implement orthogonal indexing with another method, >>> by all means I might use it at some point in the future. However, >>> adding even more complexity to the behaviour of the bracket slicing >>> is probably not a good idea. >> >> I'm not advocating adding even more complexity, I'm trying to think >> about ways to make it *less* complex from the typical user's standpoint. >> >> Eric >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Thu Apr 2 22:30:52 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 2 Apr 2015 19:30:52 -0700 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: <551D846F.3080808@hawaii.edu> <551DA811.9050707@hawaii.edu> <551DBCFA.2010407@ncf.ca> <9D27C526-73D6-4C1B-A0F4-1280E45085F5@phys.ethz.ch> <551DD87D.2000305@hawaii.edu> Message-ID: Hi, On Thu, Apr 2, 2015 at 6:09 PM, wrote: > On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing wrote: >> On 2015/04/02 1:14 PM, Hanno Klemm wrote: >>> Well, I have written quite a bit of code that relies on fancy >>> indexing, and I think the question, if the behaviour of the [] >>> operator should be changed has sailed with numpy now at version 1.9. >>> Given the amount packages that rely on numpy, changing this >>> fundamental behaviour would not be a clever move. >> >> Are you *positive* that there is no clever way to make a transition? >> It's not worth any further thought? > > I guess it would be similar to python 3 string versus bytes, but > without the overwhelming benefits. > > I don't think I would be in favor of deprecating fancy indexing even > if it were possible. In general, my impression is that if there is a > trade-off in numpy between powerful machinery versus easy to learn and > teach, then the design philosophy when in favor of power. > > I think numpy indexing is not too difficult and follows a consistent > pattern, and I completely avoid mixing slices and index arrays with > ndim > 2. I'm sure y'all are totally on top of this, but for myself, I would like to distinguish: * fancy indexing with boolean arrays - I use it all the time and don't get confused; * fancy indexing with non-boolean arrays - horrendously confusing, almost never use it, except on a single axis when I can't confuse it with orthogonal indexing: In [3]: a = np.arange(24).reshape(6, 4) In [4]: a Out[4]: array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15], [16, 17, 18, 19], [20, 21, 22, 23]]) In [5]: a[[1, 2, 4]] Out[5]: array([[ 4, 5, 6, 7], [ 8, 9, 10, 11], [16, 17, 18, 19]]) I also remember a discussion with Travis O where he was also saying that this indexing was confusing and that it would be good if there was some way to transition to what he called outer product indexing (I think that's the same as 'orthogonal' indexing). > I think it should be DOA, except as a discussion topic for numpy 3000. I think there are two proposals here: 1) Add some syntactic sugar to allow orthogonal indexing of numpy arrays, no backward compatibility break. That seems like a very good idea to me - were there any big objections to that? 2) Over some long time period, move the default behavior of np.array non-boolean indexing from the current behavior to the orthogonal behavior. That is going to be very tough, because it will cause very confusing breakage of legacy code. On the other hand, maybe it is worth going some way towards that, like this: * implement orthogonal indexing as a method arr.sensible_index[...] * implement the current non-boolean fancy indexing behavior as a method - arr.crazy_index[...] * deprecate non-boolean fancy indexing as standard arr[...] indexing; * wait a long time; * remove non-boolean fancy indexing as standard arr[...] (errors are preferable to change in behavior) Then if we are brave we could: * wait a very long time; * make orthogonal indexing the default. But the not-brave steps above seem less controversial, and fairly reasonable. What about that as an approach? Cheers, Matthew From josef.pktd at gmail.com Thu Apr 2 23:18:35 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Apr 2015 23:18:35 -0400 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: <551D846F.3080808@hawaii.edu> <551DA811.9050707@hawaii.edu> <551DBCFA.2010407@ncf.ca> <9D27C526-73D6-4C1B-A0F4-1280E45085F5@phys.ethz.ch> <551DD87D.2000305@hawaii.edu> Message-ID: On Thu, Apr 2, 2015 at 10:30 PM, Matthew Brett wrote: > Hi, > > On Thu, Apr 2, 2015 at 6:09 PM, wrote: >> On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing wrote: >>> On 2015/04/02 1:14 PM, Hanno Klemm wrote: >>>> Well, I have written quite a bit of code that relies on fancy >>>> indexing, and I think the question, if the behaviour of the [] >>>> operator should be changed has sailed with numpy now at version 1.9. >>>> Given the amount packages that rely on numpy, changing this >>>> fundamental behaviour would not be a clever move. >>> >>> Are you *positive* that there is no clever way to make a transition? >>> It's not worth any further thought? >> >> I guess it would be similar to python 3 string versus bytes, but >> without the overwhelming benefits. >> >> I don't think I would be in favor of deprecating fancy indexing even >> if it were possible. In general, my impression is that if there is a >> trade-off in numpy between powerful machinery versus easy to learn and >> teach, then the design philosophy when in favor of power. >> >> I think numpy indexing is not too difficult and follows a consistent >> pattern, and I completely avoid mixing slices and index arrays with >> ndim > 2. > > I'm sure y'all are totally on top of this, but for myself, I would > like to distinguish: > > * fancy indexing with boolean arrays - I use it all the time and don't > get confused; > * fancy indexing with non-boolean arrays - horrendously confusing, > almost never use it, except on a single axis when I can't confuse it > with orthogonal indexing: > > In [3]: a = np.arange(24).reshape(6, 4) > > In [4]: a > Out[4]: > array([[ 0, 1, 2, 3], > [ 4, 5, 6, 7], > [ 8, 9, 10, 11], > [12, 13, 14, 15], > [16, 17, 18, 19], > [20, 21, 22, 23]]) > > In [5]: a[[1, 2, 4]] > Out[5]: > array([[ 4, 5, 6, 7], > [ 8, 9, 10, 11], > [16, 17, 18, 19]]) > > I also remember a discussion with Travis O where he was also saying > that this indexing was confusing and that it would be good if there > was some way to transition to what he called outer product indexing (I > think that's the same as 'orthogonal' indexing). > >> I think it should be DOA, except as a discussion topic for numpy 3000. > > I think there are two proposals here: > > 1) Add some syntactic sugar to allow orthogonal indexing of numpy > arrays, no backward compatibility break. > > That seems like a very good idea to me - were there any big objections to that? > > 2) Over some long time period, move the default behavior of np.array > non-boolean indexing from the current behavior to the orthogonal > behavior. > > That is going to be very tough, because it will cause very confusing > breakage of legacy code. > > On the other hand, maybe it is worth going some way towards that, like this: > > * implement orthogonal indexing as a method arr.sensible_index[...] > * implement the current non-boolean fancy indexing behavior as a > method - arr.crazy_index[...] > * deprecate non-boolean fancy indexing as standard arr[...] indexing; > * wait a long time; > * remove non-boolean fancy indexing as standard arr[...] (errors are > preferable to change in behavior) > > Then if we are brave we could: > > * wait a very long time; > * make orthogonal indexing the default. > > But the not-brave steps above seem less controversial, and fairly reasonable. > > What about that as an approach? I also thought the transition would have to be something like that or a clear break point, like numpy 3.0. I would be in favor something like this for the axis swapping case with ndim>2. However, before going to that, you would still have to provide a list of behaviors that will be deprecated, and make a poll in various libraries for how much it is actually used. My impression is that fancy indexing is used more often than orthogonal indexing (beyond the trivial case x[:, idx]). Also, many usecases for orthogonal indexing moved to using pandas, and numpy is left with non-orthogonal indexing use cases. And third, fancy indexing is a superset of orthogonal indexing (with proper broadcasting), and you still need to justify why everyone should be restricted to the subset instead of a voluntary constraint to use code that is easier to understand. I checked numpy.random.choice which I would have implemented with fancy indexing, but it uses only `take`, AFAICS. Switching to using a explicit method is not really a problem for maintained library code, but I still don't really see why we should do this. Josef > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From jaime.frio at gmail.com Thu Apr 2 23:20:06 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Thu, 2 Apr 2015 20:20:06 -0700 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: <551D846F.3080808@hawaii.edu> <551DA811.9050707@hawaii.edu> <551DBCFA.2010407@ncf.ca> <9D27C526-73D6-4C1B-A0F4-1280E45085F5@phys.ethz.ch> <551DD87D.2000305@hawaii.edu> Message-ID: On Thu, Apr 2, 2015 at 7:30 PM, Matthew Brett wrote: > Hi, > > On Thu, Apr 2, 2015 at 6:09 PM, wrote: > > On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing wrote: > >> On 2015/04/02 1:14 PM, Hanno Klemm wrote: > >>> Well, I have written quite a bit of code that relies on fancy > >>> indexing, and I think the question, if the behaviour of the [] > >>> operator should be changed has sailed with numpy now at version 1.9. > >>> Given the amount packages that rely on numpy, changing this > >>> fundamental behaviour would not be a clever move. > >> > >> Are you *positive* that there is no clever way to make a transition? > >> It's not worth any further thought? > > > > I guess it would be similar to python 3 string versus bytes, but > > without the overwhelming benefits. > > > > I don't think I would be in favor of deprecating fancy indexing even > > if it were possible. In general, my impression is that if there is a > > trade-off in numpy between powerful machinery versus easy to learn and > > teach, then the design philosophy when in favor of power. > > > > I think numpy indexing is not too difficult and follows a consistent > > pattern, and I completely avoid mixing slices and index arrays with > > ndim > 2. > > I'm sure y'all are totally on top of this, but for myself, I would > like to distinguish: > > * fancy indexing with boolean arrays - I use it all the time and don't > get confused; > * fancy indexing with non-boolean arrays - horrendously confusing, > almost never use it, except on a single axis when I can't confuse it > with orthogonal indexing: > > In [3]: a = np.arange(24).reshape(6, 4) > > In [4]: a > Out[4]: > array([[ 0, 1, 2, 3], > [ 4, 5, 6, 7], > [ 8, 9, 10, 11], > [12, 13, 14, 15], > [16, 17, 18, 19], > [20, 21, 22, 23]]) > > In [5]: a[[1, 2, 4]] > Out[5]: > array([[ 4, 5, 6, 7], > [ 8, 9, 10, 11], > [16, 17, 18, 19]]) > > I also remember a discussion with Travis O where he was also saying > that this indexing was confusing and that it would be good if there > was some way to transition to what he called outer product indexing (I > think that's the same as 'orthogonal' indexing). > > > I think it should be DOA, except as a discussion topic for numpy 3000. > > I think there are two proposals here: > > 1) Add some syntactic sugar to allow orthogonal indexing of numpy > arrays, no backward compatibility break. > > That seems like a very good idea to me - were there any big objections to > that? > > 2) Over some long time period, move the default behavior of np.array > non-boolean indexing from the current behavior to the orthogonal > behavior. > > That is going to be very tough, because it will cause very confusing > breakage of legacy code. > > On the other hand, maybe it is worth going some way towards that, like > this: > > * implement orthogonal indexing as a method arr.sensible_index[...] > * implement the current non-boolean fancy indexing behavior as a > method - arr.crazy_index[...] > * deprecate non-boolean fancy indexing as standard arr[...] indexing; > * wait a long time; > * remove non-boolean fancy indexing as standard arr[...] (errors are > preferable to change in behavior) > > Then if we are brave we could: > > * wait a very long time; > * make orthogonal indexing the default. > > But the not-brave steps above seem less controversial, and fairly > reasonable. > > What about that as an approach? > Your option 1 was what was being discussed before the posse was assembled to bring fancy indexing before justice... ;-) My background is in image processing, and I have used fancy indexing in all its fanciness far more often than orthogonal or outer product indexing. I actually have a vivid memory of the moment I fell in love with NumPy: after seeing a code snippet that ran a huge image through a look-up table by indexing the LUT with the image. Beautifully simple. And here is a younger me, learning to ride NumPy without the training wheels. Another obvious use case that you can find all over the place in scikit-image is drawing a curve on an image from the coordinates. If there is such strong agreement on an orthogonal indexer, we might as well go ahead an implement it. But before considering any bolder steps, we should probably give it a couple of releases to see how many people out there really use it. Jaime P.S. As an aside on the remapping of axes when arrays and slices are mixed, there really is no better way. Once you realize that the array indexing a dimension does not have to be 1-D, it should clearly appear that what seems the obvious way does not generalize to the general case. E.g.: One may rightfully think that: >>> a = np.arange(60).reshape(3, 4, 5) >>> a[np.array([1])[:, None], ::2, [0, 1, 3]].shape (1, 3, 2) should not reorder the axes, and return an array of shape (1, 2, 3). But what do you do in the following case? >>> idx0 = np.random.randint(3, size=(10, 1, 10)) >>> idx2 = np.random.randint(5, size=(1, 20, 1)) >>> a[idx0, ::2, idx2].shape (10, 20, 10, 2) What is the right place for that 2 now? -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Apr 2 23:29:11 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 2 Apr 2015 20:29:11 -0700 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: <551D846F.3080808@hawaii.edu> <551DA811.9050707@hawaii.edu> <551DBCFA.2010407@ncf.ca> <9D27C526-73D6-4C1B-A0F4-1280E45085F5@phys.ethz.ch> <551DD87D.2000305@hawaii.edu> Message-ID: Hi, On Thu, Apr 2, 2015 at 8:20 PM, Jaime Fern?ndez del R?o wrote: > On Thu, Apr 2, 2015 at 7:30 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Thu, Apr 2, 2015 at 6:09 PM, wrote: >> > On Thu, Apr 2, 2015 at 8:02 PM, Eric Firing wrote: >> >> On 2015/04/02 1:14 PM, Hanno Klemm wrote: >> >>> Well, I have written quite a bit of code that relies on fancy >> >>> indexing, and I think the question, if the behaviour of the [] >> >>> operator should be changed has sailed with numpy now at version 1.9. >> >>> Given the amount packages that rely on numpy, changing this >> >>> fundamental behaviour would not be a clever move. >> >> >> >> Are you *positive* that there is no clever way to make a transition? >> >> It's not worth any further thought? >> > >> > I guess it would be similar to python 3 string versus bytes, but >> > without the overwhelming benefits. >> > >> > I don't think I would be in favor of deprecating fancy indexing even >> > if it were possible. In general, my impression is that if there is a >> > trade-off in numpy between powerful machinery versus easy to learn and >> > teach, then the design philosophy when in favor of power. >> > >> > I think numpy indexing is not too difficult and follows a consistent >> > pattern, and I completely avoid mixing slices and index arrays with >> > ndim > 2. >> >> I'm sure y'all are totally on top of this, but for myself, I would >> like to distinguish: >> >> * fancy indexing with boolean arrays - I use it all the time and don't >> get confused; >> * fancy indexing with non-boolean arrays - horrendously confusing, >> almost never use it, except on a single axis when I can't confuse it >> with orthogonal indexing: >> >> In [3]: a = np.arange(24).reshape(6, 4) >> >> In [4]: a >> Out[4]: >> array([[ 0, 1, 2, 3], >> [ 4, 5, 6, 7], >> [ 8, 9, 10, 11], >> [12, 13, 14, 15], >> [16, 17, 18, 19], >> [20, 21, 22, 23]]) >> >> In [5]: a[[1, 2, 4]] >> Out[5]: >> array([[ 4, 5, 6, 7], >> [ 8, 9, 10, 11], >> [16, 17, 18, 19]]) >> >> I also remember a discussion with Travis O where he was also saying >> that this indexing was confusing and that it would be good if there >> was some way to transition to what he called outer product indexing (I >> think that's the same as 'orthogonal' indexing). >> >> > I think it should be DOA, except as a discussion topic for numpy 3000. >> >> I think there are two proposals here: >> >> 1) Add some syntactic sugar to allow orthogonal indexing of numpy >> arrays, no backward compatibility break. >> >> That seems like a very good idea to me - were there any big objections to >> that? >> >> 2) Over some long time period, move the default behavior of np.array >> non-boolean indexing from the current behavior to the orthogonal >> behavior. >> >> That is going to be very tough, because it will cause very confusing >> breakage of legacy code. >> >> On the other hand, maybe it is worth going some way towards that, like >> this: >> >> * implement orthogonal indexing as a method arr.sensible_index[...] >> * implement the current non-boolean fancy indexing behavior as a >> method - arr.crazy_index[...] >> * deprecate non-boolean fancy indexing as standard arr[...] indexing; >> * wait a long time; >> * remove non-boolean fancy indexing as standard arr[...] (errors are >> preferable to change in behavior) >> >> Then if we are brave we could: >> >> * wait a very long time; >> * make orthogonal indexing the default. >> >> But the not-brave steps above seem less controversial, and fairly >> reasonable. >> >> What about that as an approach? > > > Your option 1 was what was being discussed before the posse was assembled to > bring fancy indexing before justice... ;-) Yes, sorry - I was trying to bring the argument back there. > My background is in image processing, and I have used fancy indexing in all > its fanciness far more often than orthogonal or outer product indexing. I > actually have a vivid memory of the moment I fell in love with NumPy: after > seeing a code snippet that ran a huge image through a look-up table by > indexing the LUT with the image. Beautifully simple. And here is a younger > me, learning to ride NumPy without the training wheels. > > Another obvious use case that you can find all over the place in > scikit-image is drawing a curve on an image from the coordinates. No question at all that it does have its uses - but then again, no-one thinks that it should not be available, only, maybe, in the very far future, not what you get by default... Cheers, Matthew From njs at pobox.com Thu Apr 2 23:30:02 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 2 Apr 2015 20:30:02 -0700 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: <551D846F.3080808@hawaii.edu> <551DA811.9050707@hawaii.edu> <551DBCFA.2010407@ncf.ca> <9D27C526-73D6-4C1B-A0F4-1280E45085F5@phys.ethz.ch> <551DD87D.2000305@hawaii.edu> Message-ID: On Thu, Apr 2, 2015 at 6:35 PM, wrote: > (I thought about this because I was looking at accessing off-diagonal > elements, m2[np.arange(4), np.arange(4) + 1] ) Psst: np.diagonal(m2, offset=1) -- Nathaniel J. Smith -- http://vorpus.org From josef.pktd at gmail.com Thu Apr 2 23:42:54 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 2 Apr 2015 23:42:54 -0400 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: <551D846F.3080808@hawaii.edu> <551DA811.9050707@hawaii.edu> <551DBCFA.2010407@ncf.ca> <9D27C526-73D6-4C1B-A0F4-1280E45085F5@phys.ethz.ch> <551DD87D.2000305@hawaii.edu> Message-ID: On Thu, Apr 2, 2015 at 11:30 PM, Nathaniel Smith wrote: > On Thu, Apr 2, 2015 at 6:35 PM, wrote: >> (I thought about this because I was looking at accessing off-diagonal >> elements, m2[np.arange(4), np.arange(4) + 1] ) > > Psst: np.diagonal(m2, offset=1) It was just an example (banded or toeplitz) (I know how indexing works, kind off, but don't remember what diag or other functions are exactly doing.) >>> m2b = m2.copy() >>> m2b[np.arange(4), np.arange(4) + 1] array([ 1., 7., 13., 19.]) >>> m2b[np.arange(4), np.arange(4) + 1] = np.nan >>> m2b array([[ 0., nan, 2., 3., 4.], [ 0., 6., nan, 8., 9.], [ 0., 0., 12., nan, 14.], [ 0., 0., 0., 18., nan], [ 0., 0., 0., 0., 24.]]) >>> m2c = m2.copy() >>> np.diagonal(m2c, offset=1) = np.nan SyntaxError: can't assign to function call >>> dd = np.diagonal(m2c, offset=1) >>> dd[:] = np.nan Traceback (most recent call last): File "", line 1, in dd[:] = np.nan ValueError: assignment destination is read-only >>> np.__version__ '1.9.2rc1' >>> m2d = m2.copy() >>> m2d[np.arange(4)[::-1], np.arange(4) + 1] = np.nan Josef > > -- > Nathaniel J. Smith -- http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From pav at iki.fi Fri Apr 3 04:32:02 2015 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 03 Apr 2015 11:32:02 +0300 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: <551D846F.3080808@hawaii.edu> <551DA811.9050707@hawaii.edu> <551DBCFA.2010407@ncf.ca> <9D27C526-73D6-4C1B-A0F4-1280E45085F5@phys.ethz.ch> <551DD87D.2000305@hawaii.edu> Message-ID: 03.04.2015, 04:09, josef.pktd at gmail.com kirjoitti: [clip] > I think numpy indexing is not too difficult and follows a consistent > pattern, and I completely avoid mixing slices and index arrays with > ndim > 2. > > I think it should be DOA, except as a discussion topic for numpy 3000. If you change how Numpy indexing works, you need to scrap a nontrivial amount of existing code, at which point everybody should just go back to Matlab, which at least provides a stable API. From jaime.frio at gmail.com Fri Apr 3 13:59:21 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Fri, 3 Apr 2015 10:59:21 -0700 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: <551D846F.3080808@hawaii.edu> <551DA811.9050707@hawaii.edu> <551DBCFA.2010407@ncf.ca> <9D27C526-73D6-4C1B-A0F4-1280E45085F5@phys.ethz.ch> <551DD87D.2000305@hawaii.edu> Message-ID: I have an all-Pyhton implementation of an OrthogonalIndexer class, loosely based on Stephan's code plus some axis remapping, that provides all the needed functionality for getting and setting with orthogonal indices. Would those interested rather see it as a gist to play around with, or as a PR adding an orthogonally indexable `.ix_` argument to ndarray? Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From shoyer at gmail.com Fri Apr 3 18:31:51 2015 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 3 Apr 2015 15:31:51 -0700 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: <551D846F.3080808@hawaii.edu> <551DA811.9050707@hawaii.edu> <551DBCFA.2010407@ncf.ca> <9D27C526-73D6-4C1B-A0F4-1280E45085F5@phys.ethz.ch> <551DD87D.2000305@hawaii.edu> Message-ID: On Fri, Apr 3, 2015 at 10:59 AM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > I have an all-Pyhton implementation of an OrthogonalIndexer class, loosely > based on Stephan's code plus some axis remapping, that provides all the > needed functionality for getting and setting with orthogonal indices. > Awesome, thanks! > Would those interested rather see it as a gist to play around with, or as > a PR adding an orthogonally indexable `.ix_` argument to ndarray? > My preference would be for a PR (even if it's purely a prototype) because it supports inline comments better than a gist. Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Fri Apr 3 18:49:55 2015 From: efiring at hawaii.edu (Eric Firing) Date: Fri, 03 Apr 2015 12:49:55 -1000 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: <551D846F.3080808@hawaii.edu> <551DA811.9050707@hawaii.edu> <551DBCFA.2010407@ncf.ca> <9D27C526-73D6-4C1B-A0F4-1280E45085F5@phys.ethz.ch> <551DD87D.2000305@hawaii.edu> Message-ID: <551F1913.1020701@hawaii.edu> On 2015/04/03 7:59 AM, Jaime Fern?ndez del R?o wrote: > I have an all-Pyhton implementation of an OrthogonalIndexer class, > loosely based on Stephan's code plus some axis remapping, that provides > all the needed functionality for getting and setting with orthogonal > indices. Excellent! > > Would those interested rather see it as a gist to play around with, or > as a PR adding an orthogonally indexable `.ix_` argument to ndarray? I think the PR would be easier to test. Eric > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus > planes de dominaci?n mundial. > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From njs at pobox.com Fri Apr 3 19:54:25 2015 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 3 Apr 2015 16:54:25 -0700 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: Message-ID: On Apr 1, 2015 2:17 AM, "R Hattersley" wrote: > > There are two different interpretations in common use of how to handle multi-valued (array/sequence) indexes. The numpy style is to consider all multi-valued indices together which allows arbitrary points to be extracted. The orthogonal style (e.g. as provided by netcdf4-python) is to consider each multi-valued index independently. > > For example: > > >>> type(v) > > >>> v.shape > (240, 37, 49) > >>> v[(0, 1), (0, 2, 3)].shape > (2, 3, 49) > >>> np.array(v)[(0, 1), (0, 2, 3)].shape > Traceback (most recent call last): > File "", line 1, in > IndexError: shape mismatch: indexing arrays could not be broadcast together with shapes (2,) (3,) > > > In a netcdf4-python GitHub issue the authors of various orthogonal indexing packages have been discussing how to distinguish the two behaviours and have currently settled on a boolean __orthogonal_indexing__ attribute. I guess my feeling is that this attribute is a fine solution to the wrong problem. If I understand the situation correctly: users are writing two copies of their indexing code to handle two different array-duck-types (those that do broadcasting indexing and those that do Cartesian product indexing), and then have trouble knowing which set of code to use for a given object. The problem that __orthogonal_indexing__ solves is that it makes easier to decide which code to use. It works well for this, great. But, the real problem here is that we have two different array duck types that force everyone to write their code twice. This is a terrible state of affairs! (And exactly analogous to the problems caused by np.ndarray disagreeing with np.matrix & scipy.sparse about the the proper definition of *, which PEP 465 may eventually alleviate.) IMO we should be solving this indexing problem directly, not applying bandaids to its symptoms, and the way to do that is to come up with some common duck type that everyone can agree on. Unfortunately, AFAICT this means our only options here are to have some kind of backcompat break in numpy, some kind of backcompat break in pandas, or to do nothing and continue indefinitely with the status quo where the same indexing operation might silently return different results depending on the types passed in. All of these options have real costs for users, and it isn't at all clear to me what the relative costs will be when we dig into the details of our various options. So I'd be very happy to see worked out proposals for any or all of these approaches. It strikes me as really premature to be issuing proclamations about what changes might be considered. There is really no danger to *considering* a proposal; the worst case is that we end up rejecting it anyway, but based on better information. -n From shoyer at gmail.com Fri Apr 3 20:01:50 2015 From: shoyer at gmail.com (Stephan Hoyer) Date: Fri, 3 Apr 2015 17:01:50 -0700 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: Message-ID: On Fri, Apr 3, 2015 at 4:54 PM, Nathaniel Smith wrote: > Unfortunately, AFAICT this means our only options here are to have > some kind of backcompat break in numpy, some kind of backcompat break > in pandas, or to do nothing and continue indefinitely with the status > quo where the same indexing operation might silently return different > results depending on the types passed in. For what it's worth, DataFrame.__getitem__ is also pretty broken in pandas (even worse than in NumPy). Not even the pandas devs can keep straight how it works! https://github.com/pydata/pandas/issues/9595 So we'll probably need a backwards incompatible switch there at some point, too. That said, the issues are somewhat different, and in my experience the strict label and integer based indexers .loc and .iloc work pretty well. I haven't heard any complaints about how they do cartesian indexing rather than fancy indexing. Stephan -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Apr 3 23:00:38 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 3 Apr 2015 21:00:38 -0600 Subject: [Numpy-discussion] NPY_SEPARATE_COMPILATION and RELAXED_STRIDES_CHECKING Message-ID: Hi All, Just to raise the question if these two options should be removed at some point? The current default value for both is 0, so we have separate compilation and relaxed strides checking by default. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Apr 3 23:01:24 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 3 Apr 2015 21:01:24 -0600 Subject: [Numpy-discussion] NPY_SEPARATE_COMPILATION and RELAXED_STRIDES_CHECKING In-Reply-To: References: Message-ID: On Fri, Apr 3, 2015 at 9:00 PM, Charles R Harris wrote: > Hi All, > > Just to raise the question if these two options should be removed at some > point? The current default value for both is 0, so we have separate > compilation and relaxed strides checking by default. > > Oops, default value is 1, not 0. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Fri Apr 3 23:25:29 2015 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 3 Apr 2015 20:25:29 -0700 Subject: [Numpy-discussion] NPY_SEPARATE_COMPILATION and RELAXED_STRIDES_CHECKING In-Reply-To: References: Message-ID: IIRC there allegedly exist platforms where separate compilation doesn't work right? I'm happy to get rid of it if no one speaks up to defend such platforms, though, we can always add it back later. One case was for statically linking numpy into the interpreter, but I'm skeptical about how much we should care about that case, since that's already a hacky kind of process and there are simple alternative hacks that could be used to strip the offending symbols. Depends on how much it lets us simplify things, I guess. Would we get to remove all the no-export attributes on everything? On Apr 3, 2015 8:01 PM, "Charles R Harris" wrote: > > > On Fri, Apr 3, 2015 at 9:00 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Hi All, >> >> Just to raise the question if these two options should be removed at some >> point? The current default value for both is 0, so we have separate >> compilation and relaxed strides checking by default. >> >> > Oops, default value is 1, not 0. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Apr 4 03:17:19 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 4 Apr 2015 09:17:19 +0200 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: Message-ID: On Sat, Apr 4, 2015 at 1:54 AM, Nathaniel Smith wrote: > > But, the real problem here is that we have two different array duck > types that force everyone to write their code twice. This is a > terrible state of affairs! (And exactly analogous to the problems > caused by np.ndarray disagreeing with np.matrix & scipy.sparse about > the the proper definition of *, which PEP 465 may eventually > alleviate.) IMO we should be solving this indexing problem directly, > not applying bandaids to its symptoms, and the way to do that is to > come up with some common duck type that everyone can agree on. > > Unfortunately, AFAICT this means our only options here are to have > some kind of backcompat break in numpy, some kind of backcompat break > in pandas, or to do nothing and continue indefinitely with the status > quo where the same indexing operation might silently return different > results depending on the types passed in. All of these options have > real costs for users, and it isn't at all clear to me what the > relative costs will be when we dig into the details of our various > options. I doubt that there is a reasonable way to quantify those costs, especially those of breaking backwards compatibility. If someone has a good method, I'd be interested though. > So I'd be very happy to see worked out proposals for any or > all of these approaches. It strikes me as really premature to be > issuing proclamations about what changes might be considered. There is > really no danger to *considering* a proposal; Sorry, I have to disagree. Numpy is already seen by some as having a poor track record on backwards compatibility. Having core developers say "propose some backcompat break to how indexing works and we'll consider it" makes our stance on that look even worse. Of course everyone is free to make any technical proposal they deem fit and we'll consider the merits of it. However I'd like us to be clear that we do care strongly about backwards compatibility and that the fundamentals of the core of Numpy (things like indexing, broadcasting, dtypes and ufuncs) will not be changed in backwards-incompatible ways. Ralf P.S. also not for a possible numpy 2.0 (or have we learned nothing from Python3?). -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Apr 4 04:54:33 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 4 Apr 2015 01:54:33 -0700 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: Message-ID: On Sat, Apr 4, 2015 at 12:17 AM, Ralf Gommers wrote: > > > On Sat, Apr 4, 2015 at 1:54 AM, Nathaniel Smith wrote: >> >> >> But, the real problem here is that we have two different array duck >> types that force everyone to write their code twice. This is a >> terrible state of affairs! (And exactly analogous to the problems >> caused by np.ndarray disagreeing with np.matrix & scipy.sparse about >> the the proper definition of *, which PEP 465 may eventually >> alleviate.) IMO we should be solving this indexing problem directly, >> not applying bandaids to its symptoms, and the way to do that is to >> come up with some common duck type that everyone can agree on. >> >> Unfortunately, AFAICT this means our only options here are to have >> some kind of backcompat break in numpy, some kind of backcompat break >> in pandas, or to do nothing and continue indefinitely with the status >> quo where the same indexing operation might silently return different >> results depending on the types passed in. All of these options have >> real costs for users, and it isn't at all clear to me what the >> relative costs will be when we dig into the details of our various >> options. > > > I doubt that there is a reasonable way to quantify those costs, especially > those of breaking backwards compatibility. If someone has a good method, I'd > be interested though. I'm a little nervous about how easily this argument might turn into "either A or B is better but we can't be 100% *certain* which it is so instead of doing our best using the data available we should just choose B." Being a maintainer means accepting uncertainty and doing our best anyway. But that said I'm still totally on board with erring on the side of caution (in particular, you can never go back and *un*break backcompat). An obvious challenge to anyone trying to take this forward (in any direction!) would definitely be to gather the most useful data possible. And it's not obviously impossible -- maybe one could do something useful by scanning ASTs of lots of packages (I have a copy of pypi if anyone wants it, that I downloaded with the idea of making some similar arguments for why core python should slightly break backcompat to allow overloading of a < b < c syntax), or adding instrumentation to numpy, or running small-scale usability tests, or surveying people, or ... (I was pretty surprised by some of the data gathered during the PEP 465 process, e.g. on how common dot() calls are relative to existing built-in operators, and on its associativity in practice.) >> >> So I'd be very happy to see worked out proposals for any or >> all of these approaches. It strikes me as really premature to be >> issuing proclamations about what changes might be considered. There is >> really no danger to *considering* a proposal; > > > Sorry, I have to disagree. Numpy is already seen by some as having a poor > track record on backwards compatibility. Having core developers say "propose > some backcompat break to how indexing works and we'll consider it" makes our > stance on that look even worse. Of course everyone is free to make any > technical proposal they deem fit and we'll consider the merits of it. > However I'd like us to be clear that we do care strongly about backwards > compatibility and that the fundamentals of the core of Numpy (things like > indexing, broadcasting, dtypes and ufuncs) will not be changed in > backwards-incompatible ways. > > Ralf > > P.S. also not for a possible numpy 2.0 (or have we learned nothing from > Python3?). I agree 100% that we should and do care strongly about backwards compatibility. But you're saying in one sentence that we should tell people that we won't consider backcompat breaks, and then in the next sentence that of course we actually will consider them (even if we almost always reject them). Basically, I think saying one thing and doing another is not a good way to build people's trust. Core python broke backcompat on a regular basis throughout the python 2 series, and almost certainly will again -- the bar to doing so is *very* high, and they use elaborate mechanisms to ease the way (__future__, etc.), but they do it. A few months ago there was even some serious consideration given to changing py3 bytestring indexing to return bytestrings instead of integers. (Consensus was unsurprisingly that this was a bad idea, but there were core devs seriously exploring it, and no-one complained about the optics.) It's true that numpy has something of a bad reputation in this area, and I think it's because until ~1.7 or so, we randomly broke stuff by accident on a pretty regular basis, even in "bug fix" releases. I think the way to rebuild that trust is to honestly say to our users that when we do break backcompat, we will never do it by accident, and we will do it only rarely, after careful consideration, with the smoothest transition possible, only in situations where we are convinced that it the net best possible solution for our users, and only after public discussion and getting buy-in from stakeholders (e.g. major projects affected). And then follow through on that to the best of our ability. We've certainly gotten a lot better at this over the last few years. If we say we'll *never* break backcompat then we'll inevitably end up convincing some people that we're liars, just because one person's bugfix is another's backcompat break. (And they're right, it is a backcompat break; it's just one where the benefits of the fix obviously outweigh the cost of the break.) Or we could actually avoid breaking backcompat by descending into Knuth-style stasis... but even there notice that none of us are actually using Knuth's TeX, we all use forks like XeTeX that have further changes added, which goes to show how futile this would be. In particular, I'd *not* willingly say that we'll never incompatibly change the core pieces of numpy, b/c I'm personally convinced that rewriting how e.g. dtypes work could be a huge win with minimal real-world breakage -- even though technically there's practically nothing we can touch there without breaking backcompat to some extent b/c dtype structs are all public, including even silly things like the ad hoc, barely-used refcounting system. OTOH I'm happy to say that we won't incompatibly change the core of how dtypes work except in ways that make the userbase glad that we did. How's that? :-) -n -- Nathaniel J. Smith -- http://vorpus.org From robert.kern at gmail.com Sat Apr 4 05:15:47 2015 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 4 Apr 2015 10:15:47 +0100 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: Message-ID: On Sat, Apr 4, 2015 at 9:54 AM, Nathaniel Smith wrote: > > On Sat, Apr 4, 2015 at 12:17 AM, Ralf Gommers wrote: > > > > On Sat, Apr 4, 2015 at 1:54 AM, Nathaniel Smith wrote: > >> So I'd be very happy to see worked out proposals for any or > >> all of these approaches. It strikes me as really premature to be > >> issuing proclamations about what changes might be considered. There is > >> really no danger to *considering* a proposal; > > > > Sorry, I have to disagree. Numpy is already seen by some as having a poor > > track record on backwards compatibility. Having core developers say "propose > > some backcompat break to how indexing works and we'll consider it" makes our > > stance on that look even worse. Of course everyone is free to make any > > technical proposal they deem fit and we'll consider the merits of it. > > However I'd like us to be clear that we do care strongly about backwards > > compatibility and that the fundamentals of the core of Numpy (things like > > indexing, broadcasting, dtypes and ufuncs) will not be changed in > > backwards-incompatible ways. > > > > Ralf > > > > P.S. also not for a possible numpy 2.0 (or have we learned nothing from > > Python3?). > > I agree 100% that we should and do care strongly about backwards > compatibility. But you're saying in one sentence that we should tell > people that we won't consider backcompat breaks, and then in the next > sentence that of course we actually will consider them (even if we > almost always reject them). Basically, I think saying one thing and > doing another is not a good way to build people's trust. There is a difference between politely considering what proposals people send us uninvited and inviting people to work on specific proposals. That is what Ralf was getting at. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Apr 4 05:38:48 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 4 Apr 2015 02:38:48 -0700 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: Message-ID: On Sat, Apr 4, 2015 at 2:15 AM, Robert Kern wrote: > On Sat, Apr 4, 2015 at 9:54 AM, Nathaniel Smith wrote: >> >> On Sat, Apr 4, 2015 at 12:17 AM, Ralf Gommers >> wrote: >> > >> > On Sat, Apr 4, 2015 at 1:54 AM, Nathaniel Smith wrote: > >> >> So I'd be very happy to see worked out proposals for any or >> >> all of these approaches. It strikes me as really premature to be >> >> issuing proclamations about what changes might be considered. There is >> >> really no danger to *considering* a proposal; >> > >> > Sorry, I have to disagree. Numpy is already seen by some as having a >> > poor >> > track record on backwards compatibility. Having core developers say >> > "propose >> > some backcompat break to how indexing works and we'll consider it" makes >> > our >> > stance on that look even worse. Of course everyone is free to make any >> > technical proposal they deem fit and we'll consider the merits of it. >> > However I'd like us to be clear that we do care strongly about backwards >> > compatibility and that the fundamentals of the core of Numpy (things >> > like >> > indexing, broadcasting, dtypes and ufuncs) will not be changed in >> > backwards-incompatible ways. >> > >> > Ralf >> > >> > P.S. also not for a possible numpy 2.0 (or have we learned nothing from >> > Python3?). >> >> I agree 100% that we should and do care strongly about backwards >> compatibility. But you're saying in one sentence that we should tell >> people that we won't consider backcompat breaks, and then in the next >> sentence that of course we actually will consider them (even if we >> almost always reject them). Basically, I think saying one thing and >> doing another is not a good way to build people's trust. > > There is a difference between politely considering what proposals people > send us uninvited and inviting people to work on specific proposals. That is > what Ralf was getting at. I mean, I get that Ralf read my bit quoted above and got worried that people would read it as "numpy core team announces they don't care about backcompat", which is fair enough. Sometimes people jump to all kinds of conclusions, esp. when confirmation bias meets skim-reading meets hastily-written emails. But it's just not true that I read people's proposals out of politeness; I read them because I'm interested, because they might surprise us by being more practical/awesome/whatever than we expect, and because we all learn things by giving them due consideration regardless of the final outcome. So yeah, I do honestly do want to see people work on specific proposals for important problems (and this indexing thing strikes me as important), even proposals that involve breaking backcompat. Pretending otherwise would still be a lie, at least on my part. So the distinction you're making here doesn't help me much. -n -- Nathaniel J. Smith -- http://vorpus.org From toddrjen at gmail.com Sat Apr 4 07:11:56 2015 From: toddrjen at gmail.com (Todd) Date: Sat, 4 Apr 2015 13:11:56 +0200 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: Message-ID: On Apr 4, 2015 10:54 AM, "Nathaniel Smith" wrote: > > On Sat, Apr 4, 2015 at 12:17 AM, Ralf Gommers wrote: > > > > > > On Sat, Apr 4, 2015 at 1:54 AM, Nathaniel Smith wrote: > >> > >> > >> But, the real problem here is that we have two different array duck > >> types that force everyone to write their code twice. This is a > >> terrible state of affairs! (And exactly analogous to the problems > >> caused by np.ndarray disagreeing with np.matrix & scipy.sparse about > >> the the proper definition of *, which PEP 465 may eventually > >> alleviate.) IMO we should be solving this indexing problem directly, > >> not applying bandaids to its symptoms, and the way to do that is to > >> come up with some common duck type that everyone can agree on. > >> > >> Unfortunately, AFAICT this means our only options here are to have > >> some kind of backcompat break in numpy, some kind of backcompat break > >> in pandas, or to do nothing and continue indefinitely with the status > >> quo where the same indexing operation might silently return different > >> results depending on the types passed in. All of these options have > >> real costs for users, and it isn't at all clear to me what the > >> relative costs will be when we dig into the details of our various > >> options. > > > > > > I doubt that there is a reasonable way to quantify those costs, especially > > those of breaking backwards compatibility. If someone has a good method, I'd > > be interested though. > > I'm a little nervous about how easily this argument might turn into > "either A or B is better but we can't be 100% *certain* which it is so > instead of doing our best using the data available we should just > choose B." Being a maintainer means accepting uncertainty and doing > our best anyway. I think the burden of proof needs to be on the side proposing a change, and the more invasive the change the higher that burden needs to be. When faced with a situation like this, where the proposed change will cause fundamental alterations to the most basic, high-level operation of numpy, and where the is an alternative approach with no backwards-compatibility issues, I think the burden of proof would necessarily be nearly impossibly large. > But that said I'm still totally on board with erring on the side of > caution (in particular, you can never go back and *un*break > backcompat). An obvious challenge to anyone trying to take this > forward (in any direction!) would definitely be to gather the most > useful data possible. And it's not obviously impossible -- maybe one > could do something useful by scanning ASTs of lots of packages (I have > a copy of pypi if anyone wants it, that I downloaded with the idea of > making some similar arguments for why core python should slightly > break backcompat to allow overloading of a < b < c syntax), or adding > instrumentation to numpy, or running small-scale usability tests, or > surveying people, or ... > > (I was pretty surprised by some of the data gathered during the PEP > 465 process, e.g. on how common dot() calls are relative to existing > built-in operators, and on its associativity in practice.) Surveys like this have the problem of small sample size and selection bias. Usability studies can't measure the effect of the compatibility break, not to mention the effect on numpy's reputation. This is considerably more difficult to scan existing projects for than .dot because it depends on the type being passed (which may not even be defined in the same project). And I am not sure I much like the idea of numpy "phoning home" by default, and an opt-in had the same issues as a survey. So to make a long story short, in this sort of situation I have a hard time imaging ways to get enough reliable, representative data to justify this level of backwards compatibility break. > Core python broke backcompat on a regular basis throughout the python > 2 series, and almost certainly will again -- the bar to doing so is > *very* high, and they use elaborate mechanisms to ease the way > (__future__, etc.), but they do it. A few months ago there was even > some serious consideration given to changing py3 bytestring indexing > to return bytestrings instead of integers. (Consensus was > unsurprisingly that this was a bad idea, but there were core devs > seriously exploring it, and no-one complained about the optics.) There was no break as large as this. In fact I would say this is even a larger change than any individual change we saw in the python 2 to 3 switch. The basic mechanics of indexing are just too fundamental and touch on too many things to make this sort of change feasible. It would be better to have a new language, or in this case anew project. > It's true that numpy has something of a bad reputation in this area, > and I think it's because until ~1.7 or so, we randomly broke stuff by > accident on a pretty regular basis, even in "bug fix" releases. I > think the way to rebuild that trust is to honestly say to our users > that when we do break backcompat, we will never do it by accident, and > we will do it only rarely, after careful consideration, with the > smoothest transition possible, only in situations where we are > convinced that it the net best possible solution for our users, and > only after public discussion and getting buy-in from stakeholders > (e.g. major projects affected). And then follow through on that to the > best of our ability. We've certainly gotten a lot better at this over > the last few years. > > If we say we'll *never* break backcompat then we'll inevitably end up > convincing some people that we're liars, just because one person's > bugfix is another's backcompat break. (And they're right, it is a > backcompat break; it's just one where the benefits of the fix > obviously outweigh the cost of the break.) Or we could actually avoid > breaking backcompat by descending into Knuth-style stasis... but even > there notice that none of us are actually using Knuth's TeX, we all > use forks like XeTeX that have further changes added, which goes to > show how futile this would be. I think it is fair to say that some things are just too fundamental to what makes numpy numpy that they are off-limits, that people will always be able to count on those working. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Apr 4 09:30:24 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 4 Apr 2015 15:30:24 +0200 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: Message-ID: On Sat, Apr 4, 2015 at 1:11 PM, Todd wrote: > There was no break as large as this. In fact I would say this is even a > larger change than any individual change we saw in the python 2 to 3 > switch. > Well, the impact of what Python3 did to everyone's string handling code caused so much work that it's close to impossible to top that within numpy I'd say:) Ralf The basic mechanics of indexing are just too fundamental and touch on too > many things to make this sort of change feasible. It would be better to > have a new language, or in this case anew project. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Apr 4 09:43:59 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 4 Apr 2015 15:43:59 +0200 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: Message-ID: On Sat, Apr 4, 2015 at 11:38 AM, Nathaniel Smith wrote: > On Sat, Apr 4, 2015 at 2:15 AM, Robert Kern wrote: > > On Sat, Apr 4, 2015 at 9:54 AM, Nathaniel Smith wrote: > >> > >> On Sat, Apr 4, 2015 at 12:17 AM, Ralf Gommers > >> wrote: > >> > > >> > On Sat, Apr 4, 2015 at 1:54 AM, Nathaniel Smith > wrote: > > > >> >> So I'd be very happy to see worked out proposals for any or > >> >> all of these approaches. It strikes me as really premature to be > >> >> issuing proclamations about what changes might be considered. There > is > >> >> really no danger to *considering* a proposal; > >> > > >> > Sorry, I have to disagree. Numpy is already seen by some as having a > >> > poor > >> > track record on backwards compatibility. Having core developers say > >> > "propose > >> > some backcompat break to how indexing works and we'll consider it" > makes > >> > our > >> > stance on that look even worse. Of course everyone is free to make any > >> > technical proposal they deem fit and we'll consider the merits of it. > >> > However I'd like us to be clear that we do care strongly about > backwards > >> > compatibility and that the fundamentals of the core of Numpy (things > >> > like > >> > indexing, broadcasting, dtypes and ufuncs) will not be changed in > >> > backwards-incompatible ways. > >> > > >> > Ralf > >> > > >> > P.S. also not for a possible numpy 2.0 (or have we learned nothing > from > >> > Python3?). > >> > >> I agree 100% that we should and do care strongly about backwards > >> compatibility. But you're saying in one sentence that we should tell > >> people that we won't consider backcompat breaks, and then in the next > >> sentence that of course we actually will consider them (even if we > >> almost always reject them). Basically, I think saying one thing and > >> doing another is not a good way to build people's trust. > > > > There is a difference between politely considering what proposals people > > send us uninvited and inviting people to work on specific proposals. > That is > > what Ralf was getting at. > > I mean, I get that Ralf read my bit quoted above and got worried that > people would read it as "numpy core team announces they don't care > about backcompat", which is fair enough. Sometimes people jump to all > kinds of conclusions, esp. when confirmation bias meets skim-reading > meets hastily-written emails. > > But it's just not true that I read people's proposals out of > politeness; I read them because I'm interested, because they might > surprise us by being more practical/awesome/whatever than we expect, > and because we all learn things by giving them due consideration > regardless of the final outcome. Thanks for explaining, good perspective. > So yeah, I do honestly do want to see > people work on specific proposals for important problems (and this > indexing thing strikes me as important), even proposals that involve > breaking backcompat. Pretending otherwise would still be a lie, at > least on my part. So the distinction you're making here doesn't help > me much. > A change in semantics would help already. If you'd phrased it for example as: "I'd personally be interested in seeing a description of what changes, including backwards-incompatible ones, would need to be made to numpy indexing behavior to resolve this situation. We could learn a lot from such an exercise.", that would have invited the same investigation from interested people without creating worries about Numpy stability. And without potentially leading new enthusiastic contributors to believe that this is an opportunity to make an important change to Numpy: >99.9% chance that they'd be disappointed after having their well thought out proposal rejected. Cheers, Ralf > -n > > -- > Nathaniel J. Smith -- http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Apr 4 11:52:23 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 4 Apr 2015 17:52:23 +0200 Subject: [Numpy-discussion] numpy vendor repo Message-ID: Hi, Today I wanted to add something to https://github.com/numpy/vendor and realised that this repo is in pretty bad shape. A couple of years ago Ondrej took a copy of the ATLAS binaries in that repo and started a new repo (not a fork) at https://github.com/certik/numpy-vendor. The latest improvements were made by Julian and live at https://github.com/juliantaylor/numpy-vendor. I'd like to start from numpy/vendor, then add all commits from Julian's numpy-vendor on top of it, then move things around so we have the binaries/sources/tools layout back and finally update the README so it's clear how to build both the ATLAS binaries and Numpy releases. Any objections or better ideas? Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From jakirkham at gmail.com Sat Apr 4 11:52:19 2015 From: jakirkham at gmail.com (John Kirkham) Date: Sat, 4 Apr 2015 11:52:19 -0400 Subject: [Numpy-discussion] Fix masked arrays to properly edit views In-Reply-To: References: Message-ID: Hey Eric, That's a good point. I remember seeing this behavior before and thought it was a bit odd. Best, John > On Mar 16, 2015, at 2:20 AM, numpy-discussion-request at scipy.org wrote: > > Send NumPy-Discussion mailing list submissions to > numpy-discussion at scipy.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://mail.scipy.org/mailman/listinfo/numpy-discussion > or, via email, send a message with subject or body 'help' to > numpy-discussion-request at scipy.org > > You can reach the person managing the list at > numpy-discussion-owner at scipy.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of NumPy-Discussion digest..." > > > Today's Topics: > > 1. Re: Fix masked arrays to properly edit views (Eric Firing) > 2. Rewrite np.histogram in c? (Robert McGibbon) > 3. numpy.stack -- which function, if any, deserves the name? > (Stephan Hoyer) > 4. Re: Rewrite np.histogram in c? (Jaime Fern?ndez del R?o) > 5. Re: Rewrite np.histogram in c? (Robert McGibbon) > 6. Re: Rewrite np.histogram in c? (Robert McGibbon) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Sat, 14 Mar 2015 14:01:04 -1000 > From: Eric Firing > Subject: Re: [Numpy-discussion] Fix masked arrays to properly edit > views > To: numpy-discussion at scipy.org > Message-ID: <5504CBC0.1080502 at hawaii.edu> > Content-Type: text/plain; charset=windows-1252; format=flowed > >> On 2015/03/14 1:02 PM, John Kirkham wrote: >> The sample case of the issue ( >> https://github.com/numpy/numpy/issues/5558 ) is shown below. A proposal >> to address this behavior can be found here ( >> https://github.com/numpy/numpy/pull/5580 ). Please give me your feedback. >> >> >> I tried to change the mask of `a` through a subindexed view, but was >> unable. Using this setup I can reproduce this in the 1.9.1 version of NumPy. >> >> import numpy as np >> >> a = np.arange(6).reshape(2,3) >> a = np.ma.masked_array(a, mask=np.ma.getmaskarray(a), shrink=False) >> >> b = a[1:2,1:2] >> >> c = np.zeros(b.shape, b.dtype) >> c = np.ma.masked_array(c, mask=np.ma.getmaskarray(c), shrink=False) >> c[:] = np.ma.masked >> >> This yields what one would expect for `a`, `b`, and `c` (seen below). >> >> masked_array(data = >> [[0 1 2] >> [3 4 5]], >> mask = >> [[False False False] >> [False False False]], >> fill_value = 999999) >> >> masked_array(data = >> [[4]], >> mask = >> [[False]], >> fill_value = 999999) >> >> masked_array(data = >> [[--]], >> mask = >> [[ True]], >> fill_value = 999999) >> >> Now, it would seem reasonable that to copy data into `b` from `c` one >> can use `__setitem__` (seen below). >> >> b[:] = c >> >> This results in new data and mask for `b`. >> >> masked_array(data = >> [[--]], >> mask = >> [[ True]], >> fill_value = 999999) >> >> This should, in turn, change `a`. However, the mask of `a` remains >> unchanged (seen below). >> >> masked_array(data = >> [[0 1 2] >> [3 0 5]], >> mask = >> [[False False False] >> [False False False]], >> fill_value = 999999) > > I agree that this behavior is wrong. A related oddity is this: > > In [24]: a = np.arange(6).reshape(2,3) > In [25]: a = np.ma.array(a, mask=np.ma.getmaskarray(a), shrink=False) > In [27]: a.sharedmask > True > In [28]: a.unshare_mask() > In [30]: b = a[1:2, 1:2] > In [31]: b[:] = np.ma.masked > In [32]: b.sharedmask > False > In [33]: a > masked_array(data = > [[0 1 2] > [3 -- 5]], > mask = > [[False False False] > [False True False]], > fill_value = 999999) > > It looks like the sharedmask property simply is not being set and > interpreted correctly--a freshly initialized array has sharedmask True; > and after setting it to False, changing the mask of a new view *does* > change the mask in the original. > > Eric > >> >> Best, >> John >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > ------------------------------ > > Message: 2 > Date: Sun, 15 Mar 2015 21:32:49 -0700 > From: Robert McGibbon > Subject: [Numpy-discussion] Rewrite np.histogram in c? > To: Discussion of Numerical Python > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > Hi, > > Numpy.histogram is implemented in python, and is a little sluggish. This > has been discussed previously on the mailing list, [1, 2]. It came up in a > project that I maintain, where a new feature is bottlenecked by > numpy.histogram, and one developer suggested a faster implementation in > cython [3]. > > Would it make sense to reimplement this function in c? or cython? Is moving > functions like this from python to c to improve performance within the > scope of the development roadmap for numpy? I started implementing this a > little bit in c, [4] but I figured I should check in here first. > > -Robert > > [1] > http://scipy-user.10969.n7.nabble.com/numpy-histogram-is-slow-td17208.html > [2] http://numpy-discussion.10968.n7.nabble.com/Fast-histogram-td9359.html > [3] https://github.com/mdtraj/mdtraj/pull/734 > [4] https://github.com/rmcgibbo/numpy/tree/histogram > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20150315/84ca916d/attachment-0001.html > > ------------------------------ > > Message: 3 > Date: Sun, 15 Mar 2015 22:12:40 -0700 > From: Stephan Hoyer > Subject: [Numpy-discussion] numpy.stack -- which function, if any, > deserves the name? > To: Discussion of Numerical Python > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > In the past months there have been two proposals for new numpy functions > using the name "stack": > > 1. np.stack for stacking like np.asarray(np.bmat(...)) > http://thread.gmane.org/gmane.comp.python.numeric.general/58748/ > https://github.com/numpy/numpy/pull/5057 > > 2. np.stack for stacking along an arbitrary new axis (this was my proposal) > http://thread.gmane.org/gmane.comp.python.numeric.general/59850/ > https://github.com/numpy/numpy/pull/5605 > > Both functions generalize the notion of stacking arrays from the existing > hstack, vstack and dstack, but in two very different ways. Both could be > useful -- but we can only call one "stack". Which one deserves that name? > > The existing *stack functions use the word "stack" to refer to combining > arrays in two similarly different ways: > a. For ND -> ND stacking along an existing dimensions (like > numpy.concatenate and proposal 1) > b. For ND -> (N+1)D stacking along new dimensions (like proposal 2). > > I think it would be much cleaner API design if we had different words to > denote these two different operations. Concatenate for "combine along an > existing dimension" already exists, so my thought (when I wrote proposal > 2), was that the verb "stack" could be reserved (going forward) for > "combine along a new dimension." This also has the advantage of suggesting > that "concatenate" and "stack" are the two fundamental operations for > combining N-dimensional arrays. The documentation on this is currently > quite confusing, mostly because no function like that in proposal 2 > currently exists. > > Of course, the *stack functions have existed for quite some time, and in > many cases vstack and hstack are indeed used for concatenate like > functionality (e.g., whenever they are used for 2D arrays/matrices). So the > case is not entirely clear-cut. (We'll never be able to remove this > functionality from NumPy.) > > In any case, I would appreciate your thoughts. > > Best, > Stephan > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20150315/5a72a8bb/attachment-0001.html > > ------------------------------ > > Message: 4 > Date: Sun, 15 Mar 2015 23:00:33 -0700 > From: Jaime Fern?ndez del R?o > Subject: Re: [Numpy-discussion] Rewrite np.histogram in c? > To: Discussion of Numerical Python > Message-ID: > > Content-Type: text/plain; charset="utf-8" > >> On Sun, Mar 15, 2015 at 9:32 PM, Robert McGibbon wrote: >> >> Hi, >> >> Numpy.histogram is implemented in python, and is a little sluggish. This >> has been discussed previously on the mailing list, [1, 2]. It came up in a >> project that I maintain, where a new feature is bottlenecked by >> numpy.histogram, and one developer suggested a faster implementation in >> cython [3]. >> >> Would it make sense to reimplement this function in c? or cython? Is >> moving functions like this from python to c to improve performance within >> the scope of the development roadmap for numpy? I started implementing this >> a little bit in c, [4] but I figured I should check in here first. > > Where do you think the performance gains will come from? The PR in your > project that claims a 10x speed-up uses a method that is only fit for > equally spaced bins. I want to think that implementing that exact same > algorithm in Python with NumPy would be comparably fast, say within 2x. > > For the general case, NumPy is already doing most of the heavy lifting (the > sorting and the searching) in C: simply replicating the same algorithmic > approach entirely in C is unlikely to provide any major speed-up. And if > the change is to the algorithm, then we should first try it out in Python. > > That said, if you can speed things up 10x, I don't think there is going to > be much opposition to moving it to C! > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20150315/ab2c26a9/attachment-0001.html > > ------------------------------ > > Message: 5 > Date: Sun, 15 Mar 2015 23:06:43 -0700 > From: Robert McGibbon > Subject: Re: [Numpy-discussion] Rewrite np.histogram in c? > To: Discussion of Numerical Python > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > It might make sense to dispatch to difference c implements if the bins are > equally spaced (as created by using an integer for the np.histogram bins > argument), vs. non-equally-spaced bins. > > In that case, getting the bigger speedup may be easier, at least for one > common use case. > > -Robert > > On Sun, Mar 15, 2015 at 11:00 PM, Jaime Fern?ndez del R?o < > jaime.frio at gmail.com> wrote: > >> On Sun, Mar 15, 2015 at 9:32 PM, Robert McGibbon >> wrote: >> >>> Hi, >>> >>> Numpy.histogram is implemented in python, and is a little sluggish. This >>> has been discussed previously on the mailing list, [1, 2]. It came up in a >>> project that I maintain, where a new feature is bottlenecked by >>> numpy.histogram, and one developer suggested a faster implementation in >>> cython [3]. >>> >>> Would it make sense to reimplement this function in c? or cython? Is >>> moving functions like this from python to c to improve performance within >>> the scope of the development roadmap for numpy? I started implementing this >>> a little bit in c, [4] but I figured I should check in here first. >> >> Where do you think the performance gains will come from? The PR in your >> project that claims a 10x speed-up uses a method that is only fit for >> equally spaced bins. I want to think that implementing that exact same >> algorithm in Python with NumPy would be comparably fast, say within 2x. >> >> For the general case, NumPy is already doing most of the heavy lifting >> (the sorting and the searching) in C: simply replicating the same >> algorithmic approach entirely in C is unlikely to provide any major >> speed-up. And if the change is to the algorithm, then we should first try >> it out in Python. >> >> That said, if you can speed things up 10x, I don't think there is going to >> be much opposition to moving it to C! >> >> Jaime >> >> -- >> (\__/) >> ( O.o) >> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes >> de dominaci?n mundial. >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20150315/0dffb1eb/attachment-0001.html > > ------------------------------ > > Message: 6 > Date: Sun, 15 Mar 2015 23:19:59 -0700 > From: Robert McGibbon > Subject: Re: [Numpy-discussion] Rewrite np.histogram in c? > To: Discussion of Numerical Python > Message-ID: > > Content-Type: text/plain; charset="utf-8" > > My apologies for the typo: 'implements' -> 'implementations' > > -Robert > > On Sun, Mar 15, 2015 at 11:06 PM, Robert McGibbon > wrote: > >> It might make sense to dispatch to difference c implements if the bins are >> equally spaced (as created by using an integer for the np.histogram bins >> argument), vs. non-equally-spaced bins. >> >> In that case, getting the bigger speedup may be easier, at least for one >> common use case. >> >> -Robert >> >> On Sun, Mar 15, 2015 at 11:00 PM, Jaime Fern?ndez del R?o < >> jaime.frio at gmail.com> wrote: >> >>> On Sun, Mar 15, 2015 at 9:32 PM, Robert McGibbon >>> wrote: >>> >>>> Hi, >>>> >>>> Numpy.histogram is implemented in python, and is a little sluggish. This >>>> has been discussed previously on the mailing list, [1, 2]. It came up in a >>>> project that I maintain, where a new feature is bottlenecked by >>>> numpy.histogram, and one developer suggested a faster implementation in >>>> cython [3]. >>>> >>>> Would it make sense to reimplement this function in c? or cython? Is >>>> moving functions like this from python to c to improve performance within >>>> the scope of the development roadmap for numpy? I started implementing this >>>> a little bit in c, [4] but I figured I should check in here first. >>> >>> Where do you think the performance gains will come from? The PR in your >>> project that claims a 10x speed-up uses a method that is only fit for >>> equally spaced bins. I want to think that implementing that exact same >>> algorithm in Python with NumPy would be comparably fast, say within 2x. >>> >>> For the general case, NumPy is already doing most of the heavy lifting >>> (the sorting and the searching) in C: simply replicating the same >>> algorithmic approach entirely in C is unlikely to provide any major >>> speed-up. And if the change is to the algorithm, then we should first try >>> it out in Python. >>> >>> That said, if you can speed things up 10x, I don't think there is going >>> to be much opposition to moving it to C! >>> >>> Jaime >>> >>> -- >>> (\__/) >>> ( O.o) >>> ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes >>> de dominaci?n mundial. >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- > An HTML attachment was scrubbed... > URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20150315/d22f7d7d/attachment.html > > ------------------------------ > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > End of NumPy-Discussion Digest, Vol 102, Issue 21 > ************************************************* From charlesr.harris at gmail.com Sat Apr 4 12:55:13 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 4 Apr 2015 10:55:13 -0600 Subject: [Numpy-discussion] numpy vendor repo In-Reply-To: References: Message-ID: On Sat, Apr 4, 2015 at 9:52 AM, Ralf Gommers wrote: > Hi, > > Today I wanted to add something to https://github.com/numpy/vendor and > realised that this repo is in pretty bad shape. A couple of years ago > Ondrej took a copy of the ATLAS binaries in that repo and started a new > repo (not a fork) at https://github.com/certik/numpy-vendor. The latest > improvements were made by Julian and live at > https://github.com/juliantaylor/numpy-vendor. > > I'd like to start from numpy/vendor, then add all commits from Julian's > numpy-vendor on top of it, then move things around so we have the > binaries/sources/tools layout back and finally update the README so it's > clear how to build both the ATLAS binaries and Numpy releases. > > Any objections or better ideas? > > No objections from me, getting all the good stuff together in an easily found place is a plus. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Apr 4 17:38:18 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 4 Apr 2015 14:38:18 -0700 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: Message-ID: On Apr 4, 2015 4:12 AM, "Todd" wrote: > > > On Apr 4, 2015 10:54 AM, "Nathaniel Smith" wrote: > > > > Core python broke backcompat on a regular basis throughout the python > > 2 series, and almost certainly will again -- the bar to doing so is > > *very* high, and they use elaborate mechanisms to ease the way > > (__future__, etc.), but they do it. A few months ago there was even > > some serious consideration given to changing py3 bytestring indexing > > to return bytestrings instead of integers. (Consensus was > > unsurprisingly that this was a bad idea, but there were core devs > > seriously exploring it, and no-one complained about the optics.) > > There was no break as large as this. In fact I would say this is even a larger change than any individual change we saw in the python 2 to 3 switch. The basic mechanics of indexing are just too fundamental and touch on too many things to make this sort of change feasible. I'm afraid I'm not clever enough to know how large or feasible a change is without even seeing the proposed change. I may well agree with you when I do see it; I just prefer to base important decisions on as much data as possible. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sat Apr 4 19:01:43 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 5 Apr 2015 01:01:43 +0200 Subject: [Numpy-discussion] GSoC students: please read In-Reply-To: References: Message-ID: On Mon, Mar 23, 2015 at 10:42 PM, Ralf Gommers wrote: Hi Stephan, all, On Mon, Mar 23, 2015 at 10:29 PM, Stephan Hoyer wrote: > >> On Mon, Mar 23, 2015 at 2:21 PM, Ralf Gommers >> wrote: >> >>> It's great to see that this year there are a lot of students interested >>> in doing a GSoC project with Numpy or Scipy. So far five proposals have >>> been submitted, and it looks like several more are being prepared now. >>> >> >> Hi Ralf, >> >> Is there a centralized place for non-mentors to view proposals and give >> feedback? >> > > Hi Stephan, there isn't really. All students post their drafts to the > mailing list, where they can get feedback. They're free to keep that draft > wherever they want - blogs, Github, StackEdit, ftp sites and more are all > being used. The central overview is in Melange (the official GSoC tool), > but that's not publicly accessible. > This was actually a very good idea, for next year we should require proposals on Github and added to an overview page. For this year it was a bit late to require all students to make this change, but I've compiled an overview of all proposals that have been submitted including links to Melange and the public drafts that students posted to the mailing lists: https://github.com/scipy/scipy/wiki/GSoC-project-ideas#student-applications-for-2015-to-scipy-and-numpy I hope that this helps. Everyone who is signed up as a mentor can comment (privately or publicly) in Melange, and everyone who's interested can now more easily find back the mailing list threads on this and comment there. Cheers, Ralf > Note that an overview of project ideas can be found at > https://github.com/scipy/scipy/wiki/GSoC-project-ideas. If you're > particularly interested in one or more of those, it should be easy to find > back in the mailing list archive what students sent draft proposals for > feedback. Your comments on individual proposals will be much appreciated. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Sun Apr 5 03:45:13 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Sun, 5 Apr 2015 00:45:13 -0700 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: <551D846F.3080808@hawaii.edu> <551DA811.9050707@hawaii.edu> <551DBCFA.2010407@ncf.ca> <9D27C526-73D6-4C1B-A0F4-1280E45085F5@phys.ethz.ch> <551DD87D.2000305@hawaii.edu> Message-ID: On Fri, Apr 3, 2015 at 10:59 AM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > I have an all-Pyhton implementation of an OrthogonalIndexer class, loosely > based on Stephan's code plus some axis remapping, that provides all the > needed functionality for getting and setting with orthogonal indices. > > Would those interested rather see it as a gist to play around with, or as > a PR adding an orthogonally indexable `.ix_` argument to ndarray? > A PR it is, #5749 to be precise. I think it has all the bells and whistles: integers, boolean and integer 1-D arrays, slices, ellipsis, and even newaxis, both for getting and setting. No tests yet, so correctness of the implementation is dubious at best. As a small example: >>> a = np.arange(60).reshape(3, 4, 5) >>> a.ix_ >>> a.ix_[[0, 1], :, [True, False, True, False, True]] array([[[ 0, 2, 4], [ 5, 7, 9], [10, 12, 14], [15, 17, 19]], [[20, 22, 24], [25, 27, 29], [30, 32, 34], [35, 37, 39]]]) >>> a.ix_[[0, 1], :, [True, False, True, False, True]] = 0 >>> a array([[[ 0, 1, 0, 3, 0], [ 0, 6, 0, 8, 0], [ 0, 11, 0, 13, 0], [ 0, 16, 0, 18, 0]], [[ 0, 21, 0, 23, 0], [ 0, 26, 0, 28, 0], [ 0, 31, 0, 33, 0], [ 0, 36, 0, 38, 0]], [[40, 41, 42, 43, 44], [45, 46, 47, 48, 49], [50, 51, 52, 53, 54], [55, 56, 57, 58, 59]]]) Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Sun Apr 5 06:09:08 2015 From: cournape at gmail.com (David Cournapeau) Date: Sun, 5 Apr 2015 11:09:08 +0100 Subject: [Numpy-discussion] NPY_SEPARATE_COMPILATION and RELAXED_STRIDES_CHECKING In-Reply-To: References: Message-ID: On Sat, Apr 4, 2015 at 4:25 AM, Nathaniel Smith wrote: > IIRC there allegedly exist platforms where separate compilation doesn't > work right? I'm happy to get rid of it if no one speaks up to defend such > platforms, though, we can always add it back later. One case was for > statically linking numpy into the interpreter, but I'm skeptical about how > much we should care about that case, since that's already a hacky kind of > process and there are simple alternative hacks that could be used to strip > the offending symbols. > > Depends on how much it lets us simplify things, I guess. Would we get to > remove all the no-export attributes on everything? > No, the whole point of the no-export is to support the separate compilation use case. David > On Apr 3, 2015 8:01 PM, "Charles R Harris" > wrote: > >> >> >> On Fri, Apr 3, 2015 at 9:00 PM, Charles R Harris < >> charlesr.harris at gmail.com> wrote: >> >>> Hi All, >>> >>> Just to raise the question if these two options should be removed at >>> some point? The current default value for both is 0, so we have separate >>> compilation and relaxed strides checking by default. >>> >>> >> Oops, default value is 1, not 0. >> >> Chuck >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Sun Apr 5 06:37:01 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 05 Apr 2015 12:37:01 +0200 Subject: [Numpy-discussion] NPY_SEPARATE_COMPILATION and RELAXED_STRIDES_CHECKING In-Reply-To: References: Message-ID: <1428230221.16305.2.camel@sipsolutions.net> On Fr, 2015-04-03 at 21:00 -0600, Charles R Harris wrote: > Hi All, > > > Just to raise the question if these two options should be removed at > some point? The current default value for both is 0, so we have > separate compilation and relaxed strides checking by default. > I still have some small doubts that leaving relaxed strides as default will work out for 1.10, plus we will have to make "debugging mode" switchable (default off), and abusing the flag with different values for it is probably simplest. So my guess is, we should wait at least one version with it. - Sebastian > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From robert.kern at gmail.com Sun Apr 5 07:03:45 2015 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 5 Apr 2015 12:03:45 +0100 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: Message-ID: On Sat, Apr 4, 2015 at 10:38 PM, Nathaniel Smith wrote: > > On Apr 4, 2015 4:12 AM, "Todd" wrote: > > > > > > On Apr 4, 2015 10:54 AM, "Nathaniel Smith" wrote: > > > > > > Core python broke backcompat on a regular basis throughout the python > > > 2 series, and almost certainly will again -- the bar to doing so is > > > *very* high, and they use elaborate mechanisms to ease the way > > > (__future__, etc.), but they do it. A few months ago there was even > > > some serious consideration given to changing py3 bytestring indexing > > > to return bytestrings instead of integers. (Consensus was > > > unsurprisingly that this was a bad idea, but there were core devs > > > seriously exploring it, and no-one complained about the optics.) > > > > There was no break as large as this. In fact I would say this is even a larger change than any individual change we saw in the python 2 to 3 switch. The basic mechanics of indexing are just too fundamental and touch on too many things to make this sort of change feasible. > > I'm afraid I'm not clever enough to know how large or feasible a change is without even seeing the proposed change. It doesn't take any cleverness. The change in question was to make the default indexing semantics to orthogonal indexing. No matter the details of the ultimate proposal to achieve that end, it has known minimum consequences, at least in the broad outline. Current documentation and books become obsolete for a fundamental operation. Current code must be modified by some step to continue working. These are consequences inherent in the end, not just the means to the end; we don't need a concrete proposal in front of us to know what they are. There are ways to mitigate these consequences, but there are no silver bullets that eliminate them. And we can compare those consequences to approaches like Jaime's that achieve a majority of the benefits of such a change without any of the negative consequences. That comparison does not bode well for any proposal. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sun Apr 5 08:08:16 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sun, 5 Apr 2015 05:08:16 -0700 Subject: [Numpy-discussion] NPY_SEPARATE_COMPILATION and RELAXED_STRIDES_CHECKING In-Reply-To: References: Message-ID: On Apr 5, 2015 3:09 AM, "David Cournapeau" wrote: > > On Sat, Apr 4, 2015 at 4:25 AM, Nathaniel Smith wrote: >> >> IIRC there allegedly exist platforms where separate compilation doesn't work right? I'm happy to get rid of it if no one speaks up to defend such platforms, though, we can always add it back later. One case was for statically linking numpy into the interpreter, but I'm skeptical about how much we should care about that case, since that's already a hacky kind of process and there are simple alternative hacks that could be used to strip the offending symbols. >> >> Depends on how much it lets us simplify things, I guess. Would we get to remove all the no-export attributes on everything? > > > No, the whole point of the no-export is to support the separate compilation use case. > Oog, on further checking I guess this is still true as long as we are using our heirloom mingw compiler on Windows. AFAIK all other compilers we care about support -fvisibility=hidden or equivalent. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Sun Apr 5 08:13:22 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 05 Apr 2015 14:13:22 +0200 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: References: <551D846F.3080808@hawaii.edu> <551DA811.9050707@hawaii.edu> <551DBCFA.2010407@ncf.ca> <9D27C526-73D6-4C1B-A0F4-1280E45085F5@phys.ethz.ch> <551DD87D.2000305@hawaii.edu> Message-ID: <1428236002.16305.39.camel@sipsolutions.net> On So, 2015-04-05 at 00:45 -0700, Jaime Fern?ndez del R?o wrote: > On Fri, Apr 3, 2015 at 10:59 AM, Jaime Fern?ndez del R?o > > > A PR it is, #5749 to be precise. I think it has all the bells and > whistles: integers, boolean and integer 1-D arrays, slices, ellipsis, > and even newaxis, both for getting and setting. No tests yet, so > correctness of the implementation is dubious at best. As a small > example: > Looks neat, I am sure there will be some details. Just a quick thought, I wonder if it might make sense to even introduce a context manager. Not sure how easy it is to make sure that it is thread safe, etc? If the code is not too difficult, maybe it can even be moved to C. Though I have to think about it, I think currently we parse from first index to last, maybe it would be plausible to parse from last to first so that adding dimensions could be done easily inside the preparation function. The second axis remapping is probably reasonably easy (if, like the first thing, tedious). - Sebastian PS: One side comment about the discussion. I don't think anyone suggests that we should not/do not even consider proposals as such, even if it might looks like that. Not that I can compare, but my guess is that numpy is actually very open (though no idea if it appears like that, too). But also to me it does seem like a lost cause to try to actually change indexing itself. So maybe that does not sound diplomatic, but without a specific reasoning about how the change does not wreak havoc, talking about switching indexing behaviour seems a waste time to me. Please try to surprise me, but until then.... > > >>> a = np.arange(60).reshape(3, 4, 5) > >>> a.ix_ > > Jaime > > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus > planes de dominaci?n mundial. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From charlesr.harris at gmail.com Sun Apr 5 09:08:29 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 5 Apr 2015 07:08:29 -0600 Subject: [Numpy-discussion] NPY_SEPARATE_COMPILATION and RELAXED_STRIDES_CHECKING In-Reply-To: <1428230221.16305.2.camel@sipsolutions.net> References: <1428230221.16305.2.camel@sipsolutions.net> Message-ID: On Sun, Apr 5, 2015 at 4:37 AM, Sebastian Berg wrote: > On Fr, 2015-04-03 at 21:00 -0600, Charles R Harris wrote: > > Hi All, > > > > > > Just to raise the question if these two options should be removed at > > some point? The current default value for both is 0, so we have > > separate compilation and relaxed strides checking by default. > > > > I still have some small doubts that leaving relaxed strides as default > will work out for 1.10, plus we will have to make "debugging mode" > switchable (default off), and abusing the flag with different values for > it is probably simplest. > So my guess is, we should wait at least one version with it. > Agree, I'm thinking one or two release down the road. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Sun Apr 5 09:25:36 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 05 Apr 2015 15:25:36 +0200 Subject: [Numpy-discussion] NPY_SEPARATE_COMPILATION and RELAXED_STRIDES_CHECKING In-Reply-To: References: <1428230221.16305.2.camel@sipsolutions.net> Message-ID: <1428240336.16305.47.camel@sipsolutions.net> On So, 2015-04-05 at 07:08 -0600, Charles R Harris wrote: > > > On Sun, Apr 5, 2015 at 4:37 AM, Sebastian Berg > wrote: > On Fr, 2015-04-03 at 21:00 -0600, Charles R Harris wrote: > > Hi All, > > > > > > Just to raise the question if these two options should be > removed at > > some point? The current default value for both is 0, so we > have > > separate compilation and relaxed strides checking by > default. > > > > > I still have some small doubts that leaving relaxed strides as > default > will work out for 1.10, plus we will have to make "debugging > mode" > switchable (default off), and abusing the flag with different > values for > it is probably simplest. > So my guess is, we should wait at least one version with it. > > > Agree, I'm thinking one or two release down the road. > Ah ok, misunderstood it. I suppose it will depend on whether the debug feature of messing up strides will be used. I don't think we will have to support it for disabling relaxed strides at that point though (so maybe a rename makes sense by then). > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Sun Apr 5 09:50:15 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 05 Apr 2015 15:50:15 +0200 Subject: [Numpy-discussion] Advanced indexing: "fancy" vs. orthogonal In-Reply-To: <1428236002.16305.39.camel@sipsolutions.net> References: <551D846F.3080808@hawaii.edu> <551DA811.9050707@hawaii.edu> <551DBCFA.2010407@ncf.ca> <9D27C526-73D6-4C1B-A0F4-1280E45085F5@phys.ethz.ch> <551DD87D.2000305@hawaii.edu> <1428236002.16305.39.camel@sipsolutions.net> Message-ID: <1428241815.16305.61.camel@sipsolutions.net> On So, 2015-04-05 at 14:13 +0200, Sebastian Berg wrote: > On So, 2015-04-05 at 00:45 -0700, Jaime Fern?ndez del R?o wrote: > > On Fri, Apr 3, 2015 at 10:59 AM, Jaime Fern?ndez del R?o > > > > > > > A PR it is, #5749 to be precise. I think it has all the bells and > > whistles: integers, boolean and integer 1-D arrays, slices, ellipsis, > > and even newaxis, both for getting and setting. No tests yet, so > > correctness of the implementation is dubious at best. As a small > > example: > > > > Looks neat, I am sure there will be some details. Just a quick thought, > I wonder if it might make sense to even introduce a context manager. Not > sure how easy it is to make sure that it is thread safe, etc? Also wondering, because while I think that actually changing numpy is probably impossible, I do think we can talk about something like: np.enable_outer_indexing() or along the lines of: from numpy.future import outer_indexing or some such, to do a module wide switch and maybe also allow at some point to make it easier to write code that is compatible between a possible followup such as blaze (or also pandas I guess), that uses incompatible indexing. I have no clue if this is technically feasible, though. The python equivalent would be teaching someone to use: from __future__ import division even though you don't even tell them that python 3 exists ;), just because you like the behaviour more. > > > >>> a = np.arange(60).reshape(3, 4, 5) > > >>> a.ix_ > > > > > Jaime > > > > > > -- > > (\__/) > > ( O.o) > > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus > > planes de dominaci?n mundial. > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From msuzen at gmail.com Mon Apr 6 14:33:23 2015 From: msuzen at gmail.com (Suzen, Mehmet) Date: Mon, 6 Apr 2015 19:33:23 +0100 Subject: [Numpy-discussion] IDE's for numpy development? Message-ID: Hi Chuck, Spider is good. If you are coming from Matlab world. http://spyder-ide.blogspot.co.uk/ I don't think it supports C. But Maybe you are after Eclipse. Best, -m From charlesr.harris at gmail.com Mon Apr 6 15:22:11 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 6 Apr 2015 13:22:11 -0600 Subject: [Numpy-discussion] 1.10 release again. Message-ID: Hi All, I'd like to mark current PR's for inclusion in 1.10. If there is something that you want to have in the release, please mention it here by PR #.I think new enhancement PR's should be considered for 1.11 rather than 1.10, but bug fixes will go in. There is some flexibility, of course, as there are always last minute items that come up when release contents are begin decided. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Apr 6 17:01:18 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 6 Apr 2015 23:01:18 +0200 Subject: [Numpy-discussion] 1.10 release again. In-Reply-To: References: Message-ID: On Mon, Apr 6, 2015 at 9:22 PM, Charles R Harris wrote: > Hi All, > > I'd like to mark current PR's for inclusion in 1.10. > Good idea. If you're going to do this, it may be helpful to create a new 1.10 milestone and keep but clean up the "1.10 blockers" milestone so there are only real blockers in there. > If there is something that you want to have in the release, please mention > it here by PR #.I think new enhancement PR's should be considered for 1.11 > rather than 1.10, but bug fixes will go in. > Assuming you mean "no guarantees for anything that comes in from now on", rather then "no one is allowed to merge new enhancements PRs before the release split" - makes sense. There is some flexibility, of course, as there are always last minute items > that come up when release contents are begin decided. > I had a look through the complete set again. Of the ones that are not yet marked for 1.10, those that look important to get in are: - new "contract" function (#5488) - the whole set of numpy.ma PRs - the two numpy.distutils PRs (#4378, #5597) - rewrite of docs on indexing (#4331) - deciding on a bool indexing deprecation (#4353) - weighted covariance for corrcoef (#4960) There are too many PRs marked as "1.10 blockers", I think the only real blockers are: - __numpy_ufunc__ PRs (#4815, #4855) - sgemv segfault workaround (#5237) - fix for alignment issue (#5656) - resolving the debate on diagonal (#5407) Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Mon Apr 6 19:39:42 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 07 Apr 2015 01:39:42 +0200 Subject: [Numpy-discussion] IDE's for numpy development? In-Reply-To: References: Message-ID: On 06/04/15 20:33, Suzen, Mehmet wrote: > Hi Chuck, > > Spider is good. If you are coming from Matlab world. > > http://spyder-ide.blogspot.co.uk/ > > I don't think it supports C. But Maybe you are after Eclipse. Spyder supports C. Sturla From njs at pobox.com Mon Apr 6 19:49:52 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 6 Apr 2015 16:49:52 -0700 Subject: [Numpy-discussion] OS X wheels: speed versus multiprocessing In-Reply-To: References: Message-ID: Hi all, Starting with 1.9.1, the official numpy OS X wheels (the ones you get by doing "pip install numpy") have been built to use Apple's Accelerate library for linear algebra. This is fast, but it breaks multiprocessing in obscure ways (e.g. see this user report: https://github.com/numpy/numpy/issues/5752). Unfortunately, there is no obvious best solution to what linear algebra package to use, so we have to make a decision as to which set of compromises we prefer. Options: Accelerate: fast, but breaks multiprocessing as above. OpenBLAS: fast, but Julian raised concerns about its trustworthiness last year ( http://mail.scipy.org/pipermail/numpy-discussion/2014-March/069659.html). Possibly things have improved since then (I get the impression that they've gotten some additional developer attention from the Julia community), but I don't know. Atlas: slower (faster than reference blas but definitely slower than fancy options like the above), but solid. My feeling is that for wheels in particular it's more important that everything "just work" than that we get the absolute fastest speeds. And this is especially true for the multiprocessing issue, given that it's a widely used part of the stdlib, the failures are really obscure/confusing, and there is no workaround for python 2 which is still where a majority of our users still are. So I'd vote for using either atlas or OpenBLAS. (And would defer to Julian and Matthew about which to choose between these.) Any opinions, objections? -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Apr 6 19:55:46 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 6 Apr 2015 17:55:46 -0600 Subject: [Numpy-discussion] 1.10 release again. In-Reply-To: References: Message-ID: On Mon, Apr 6, 2015 at 3:01 PM, Ralf Gommers wrote: > > > On Mon, Apr 6, 2015 at 9:22 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Hi All, >> >> I'd like to mark current PR's for inclusion in 1.10. >> > > Good idea. If you're going to do this, it may be helpful to create a new > 1.10 milestone and keep but clean up the "1.10 blockers" milestone so there > are only real blockers in there. > Good idea. > > >> If there is something that you want to have in the release, please >> mention it here by PR #.I think new enhancement PR's should be considered >> for 1.11 rather than 1.10, but bug fixes will go in. >> > > Assuming you mean "no guarantees for anything that comes in from now on", > rather then "no one is allowed to merge new enhancements PRs before the > release split" - makes sense. > > There is some flexibility, of course, as there are always last minute >> items that come up when release contents are begin decided. >> > > I had a look through the complete set again. Of the ones that are not yet > marked for 1.10, those that look important to get in are: > Thanks for taking a look. > - new "contract" function (#5488) > - the whole set of numpy.ma PRs > - the two numpy.distutils PRs (#4378, #5597) > - rewrite of docs on indexing (#4331) > - deciding on a bool indexing deprecation (#4353) > - weighted covariance for corrcoef (#4960) > > There are too many PRs marked as "1.10 blockers", I think the only real > blockers are: > - __numpy_ufunc__ PRs (#4815, #4855) > - sgemv segfault workaround (#5237) > - fix for alignment issue (#5656) > - resolving the debate on diagonal (#5407) > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Apr 6 19:59:47 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 6 Apr 2015 16:59:47 -0700 Subject: [Numpy-discussion] 1.10 release again. In-Reply-To: References: Message-ID: On Apr 6, 2015 2:01 PM, "Ralf Gommers" wrote: > > There are too many PRs marked as "1.10 blockers", I think the only real blockers are: > - __numpy_ufunc__ PRs (#4815, #4855) The main blocker here is figuring out how to coordinate __numpy_ufunc__ and __binop__ dispatch, e.g. PR #5748. We need to either resolve this or disable __numpy_ufunc__ for another release (which would suck). This needs some careful attention, so it'd be great if people could take a look. > - sgemv segfault workaround (#5237) > - fix for alignment issue (#5656) Agreed on these. > - resolving the debate on diagonal (#5407) Not really a blocker IMHO -- if we release 1.10 with the same settings as 1.9, then no harm will be done. (I guess some docs might be slightly off.) And IMO that's the proper resolution for the moment anyway :-). -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Mon Apr 6 20:13:15 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 07 Apr 2015 02:13:15 +0200 Subject: [Numpy-discussion] OS X wheels: speed versus multiprocessing In-Reply-To: References: Message-ID: On 07/04/15 01:49, Nathaniel Smith wrote: > Any opinions, objections? Accelerate does not break multiprocessing, quite the opposite. The bug is in multiprocessing and has been fixed in Python 3.4. My vote would nevertheless be for OpenBLAS if we can use it without producing test failures in NumPy and SciPy. Most of the test failures with OpenBLAS and Carl Kleffner's toolchain on Windows are due to differences between Microsoft and MinGW runtime libraries and not due to OpenBLAS itself. These test failures are not relevant on Mac. ATLAS can easily reduce the speed of a matrix product or a linear algebra call with a factor of 20 compared to Accelerate, MKL or OpenBLAS. It would give us bad karma. Sturla From matthew.brett at gmail.com Mon Apr 6 20:19:09 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 6 Apr 2015 17:19:09 -0700 Subject: [Numpy-discussion] OS X wheels: speed versus multiprocessing In-Reply-To: References: Message-ID: Hi, On Mon, Apr 6, 2015 at 5:13 PM, Sturla Molden wrote: > On 07/04/15 01:49, Nathaniel Smith wrote: > >> Any opinions, objections? > > Accelerate does not break multiprocessing, quite the opposite. The bug > is in multiprocessing and has been fixed in Python 3.4. > > My vote would nevertheless be for OpenBLAS if we can use it without > producing test failures in NumPy and SciPy. > > Most of the test failures with OpenBLAS and Carl Kleffner's toolchain on > Windows are due to differences between Microsoft and MinGW runtime > libraries and not due to OpenBLAS itself. These test failures are not > relevant on Mac. > > ATLAS can easily reduce the speed of a matrix product or a linear > algebra call with a factor of 20 compared to Accelerate, MKL or > OpenBLAS. It would give us bad karma. ATLAS compiled with gcc also gives us some more license complication: http://numpy-discussion.10968.n7.nabble.com/Copyright-status-of-NumPy-binaries-on-Windows-OS-X-tp38793p38824.html I agree that big slowdowns would be dangerous for numpy's reputation. Sturla - do you have a citable source for your factor of 20 figure? Cheers, Matthew From sturla.molden at gmail.com Mon Apr 6 20:18:06 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 07 Apr 2015 02:18:06 +0200 Subject: [Numpy-discussion] OS X wheels: speed versus multiprocessing In-Reply-To: References: Message-ID: On 07/04/15 02:13, Sturla Molden wrote: > Most of the test failures with OpenBLAS and Carl Kleffner's toolchain on > Windows are due to differences between Microsoft and MinGW runtime > libraries ... and also differences in FPU precision. Sturla From charlesr.harris at gmail.com Mon Apr 6 20:28:34 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 6 Apr 2015 18:28:34 -0600 Subject: [Numpy-discussion] 1.10 release again. In-Reply-To: References: Message-ID: On Mon, Apr 6, 2015 at 5:59 PM, Nathaniel Smith wrote: > On Apr 6, 2015 2:01 PM, "Ralf Gommers" wrote: > > > > There are too many PRs marked as "1.10 blockers", I think the only real > blockers are: > > - __numpy_ufunc__ PRs (#4815, #4855) > > The main blocker here is figuring out how to coordinate __numpy_ufunc__ > and __binop__ dispatch, e.g. PR #5748. We need to either resolve this or > disable __numpy_ufunc__ for another release (which would suck). > > This needs some careful attention, so it'd be great if people could take a > look. > > > - sgemv segfault workaround (#5237) > > - fix for alignment issue (#5656) > I think #5316 is the alignment fix. > Agreed on these. > > > - resolving the debate on diagonal (#5407) > > Not really a blocker IMHO -- if we release 1.10 with the same settings as > 1.9, then no harm will be done. (I guess some docs might be slightly off.) > And IMO that's the proper resolution for the moment anyway :-). > Asked for this to be reopened anyway, as it was closed by accident. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Apr 6 20:41:48 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 6 Apr 2015 17:41:48 -0700 Subject: [Numpy-discussion] OS X wheels: speed versus multiprocessing In-Reply-To: References: Message-ID: On Apr 6, 2015 5:13 PM, "Sturla Molden" wrote: > > On 07/04/15 01:49, Nathaniel Smith wrote: > > > Any opinions, objections? > > Accelerate does not break multiprocessing, quite the opposite. The bug > is in multiprocessing and has been fixed in Python 3.4. I disagree, but it hardly matters: you can call it a bug in accelerate, or call it a bug in python, but either way it's an issue that affects our users and we need to either work around it or not. > ATLAS can easily reduce the speed of a matrix product or a linear > algebra call with a factor of 20 compared to Accelerate, MKL or > OpenBLAS. It would give us bad karma. Sure, but in some cases accelerate reduces speed by a factor of infinity by hanging, and OpenBLAS may or may not give wrong answers (but quickly!) since apparently they don't do regression tests, so we have to pick our poison. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Mon Apr 6 20:41:49 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 07 Apr 2015 02:41:49 +0200 Subject: [Numpy-discussion] OS X wheels: speed versus multiprocessing In-Reply-To: References: Message-ID: On 07/04/15 02:19, Matthew Brett wrote: > ATLAS compiled with gcc also gives us some more license complication: > > http://numpy-discussion.10968.n7.nabble.com/Copyright-status-of-NumPy-binaries-on-Windows-OS-X-tp38793p38824.html Ok, then I have a question regarding OpenBLAS: Do we use the f2c'd lapack_lite or do we build LAPACK with gfortran and link into OpenBLAS? In the latter case we might get the libquadmath linked into the OpenBLAS binary as well. > I agree that big slowdowns would be dangerous for numpy's reputation. > > Sturla - do you have a citable source for your factor of 20 figure? I will look it up. The best thing would be to do a new benchmark though. Another thing is it depends on the hardware. ATLAS is not very scalable on multiple processors, so it will be worse on a Mac Pro than a Macbook. It will also we worse with AVX than without. Sturla From sturla.molden at gmail.com Mon Apr 6 20:48:08 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 07 Apr 2015 02:48:08 +0200 Subject: [Numpy-discussion] OS X wheels: speed versus multiprocessing In-Reply-To: References: Message-ID: On 07/04/15 02:41, Nathaniel Smith wrote: > Sure, but in some cases accelerate reduces speed by a factor of infinity > by hanging, and OpenBLAS may or may not give wrong answers (but > quickly!) since apparently they don't do regression tests, so we have to > pick our poison. OpenBLAS is safer on Mac than Windows (no MinGW related errors on Mac) so we should try it and see what happens. GotoBLAS2 used to be great so it can't be that bad :-) From misnomer at gmail.com Mon Apr 6 19:49:34 2015 From: misnomer at gmail.com (Nicholas Devenish) Date: Tue, 7 Apr 2015 00:49:34 +0100 Subject: [Numpy-discussion] Multidimensional Indexing Message-ID: With the indexing example from the documentation: y = np.arange(35).reshape(5,7) Why does selecting an item from explicitly every row work as I?d expect: >>> y[np.array([0,1,2,3,4]),np.array([0,0,0,0,0])] array([ 0, 7, 14, 21, 28]) But doing so from a full slice (which, I would naively expect to mean ?Every Row?) has some?other? behaviour: >>> y[:,np.array([0,0,0,0,0])] array([[ 0, 0, 0, 0, 0], [ 7, 7, 7, 7, 7], [14, 14, 14, 14, 14], [21, 21, 21, 21, 21], [28, 28, 28, 28, 28]]) What is going on in this example, and how do I get what I expect? By explicitly passing in an extra array with value===index? What is the rationale for this difference in behaviour? Thanks, Nick From n59_ru at hotmail.com Tue Apr 7 09:02:24 2015 From: n59_ru at hotmail.com (Nikolay Mayorov) Date: Tue, 7 Apr 2015 18:02:24 +0500 Subject: [Numpy-discussion] Multidimensional Indexing In-Reply-To: References: Message-ID: I think the rationale is to allow selection of whole rows / columns. If you want to choose a single element from each row/column, then, yes, you have to pass np.arange(...). There is also np.choose function, but not recommended to use for such cases as far as I understand. I'm not an expert, though. Nikolay. > From: misnomer at gmail.com > Date: Tue, 7 Apr 2015 00:49:34 +0100 > To: numpy-discussion at scipy.org > Subject: [Numpy-discussion] Multidimensional Indexing > > With the indexing example from the documentation: > > y = np.arange(35).reshape(5,7) > > Why does selecting an item from explicitly every row work as I?d expect: > >>> y[np.array([0,1,2,3,4]),np.array([0,0,0,0,0])] > array([ 0, 7, 14, 21, 28]) > > But doing so from a full slice (which, I would naively expect to mean ?Every Row?) has some?other? behaviour: > > >>> y[:,np.array([0,0,0,0,0])] > array([[ 0, 0, 0, 0, 0], > [ 7, 7, 7, 7, 7], > [14, 14, 14, 14, 14], > [21, 21, 21, 21, 21], > [28, 28, 28, 28, 28]]) > > What is going on in this example, and how do I get what I expect? By explicitly passing in an extra array with value===index? What is the rationale for this difference in behaviour? > > Thanks, > > Nick > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Apr 7 21:06:04 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 7 Apr 2015 18:06:04 -0700 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) Message-ID: On Apr 5, 2015 7:04 AM, "Robert Kern" wrote: > > On Sat, Apr 4, 2015 at 10:38 PM, Nathaniel Smith wrote: > > > > On Apr 4, 2015 4:12 AM, "Todd" wrote: > > > > > > There was no break as large as this. In fact I would say this is even a larger change than any individual change we saw in the python 2 to 3 switch. The basic mechanics of indexing are just too fundamental and touch on too many things to make this sort of change feasible. > > > > I'm afraid I'm not clever enough to know how large or feasible a change is without even seeing the proposed change. > > It doesn't take any cleverness. The change in question was to make the default indexing semantics to orthogonal indexing. No matter the details of the ultimate proposal to achieve that end, it has known minimum consequences, at least in the broad outline. Current documentation and books become obsolete for a fundamental operation. Current code must be modified by some step to continue working. These are consequences inherent in the end, not just the means to the end; we don't need a concrete proposal in front of us to know what they are. There are ways to mitigate these consequences, but there are no silver bullets that eliminate them. And we can compare those consequences to approaches like Jaime's that achieve a majority of the benefits of such a change without any of the negative consequences. That comparison does not bode well for any proposal. Ok, let me try to make my point another way. I don't actually care at this stage in the discussion whether the change is ultimately viable. And I don't think you should either. (For values of "you" that includes everyone in the discussion, not picking on Robert in particular :-).) My point is that rational, effective discussion requires giving ideas room to breath. Sometimes ideas turn out to be not as bad as they looked. Sometimes it turns out that they are, but there's some clever tweak that gives you 95% of the benefits for 5% of the cost. Sometimes you generate a better understanding of the tradeoffs that subsequently informs later design decisions. Sometimes working through the details makes both sides realize that there's a third option that solves both their problems. Sometimes you merely get a very specific understanding of why the whole approach is unreasonable that you can then, say, take to the pandas and netcdf developers as evidence of that you made a good faith effort and ask them to meet you half way. And all these things require understanding the specifics of what *exactly* works or doesn't work about about idea. IMHO, it's extremely misleading at this stage to make any assertion about whether Jaime's approach gives the "majority of benefits of such a change" is extremely misleading at this stage: not because it's wrong, but because it totally short-circuits the discussion about what benefits we care about. Jaime's patch certainly has merits, but does it help us make numpy and pandas/netcdf's more compatible? Does it make it easier for Eric to teach? Those are some specific merits that we might care about a lot, and for which Jaime's patch may or may not help much. But that kind of nuance gets lost when we jump straight to debating thumbs-up versus thumbs-down. I cross-my-heart promise that under the current regime, no PR breaking fancy indexing would ever get anywhere *near* numpy master without *extensive* discussion and warnings on the list. The core devs just spent weeks quibbling about whether a PR that adds a new field to the end of the dtype struct would break ABI backcompat (we're now pretty sure it doesn't), and the current standard we enforce is that every PR that touches public API needs a list discussion, even minor extensions with no compatibility issues at all. No one is going to sneak anything by anyone. Plus, I dunno, our current approach to discussions just seems to make things hostile and shouty and unpleasant. If a grad student or junior colleague comes to you with an idea where you see some potentially critical flaw, do you yell THAT WILL NEVER WORK and kick them out of your office? Or, do you maybe ask a few leading questions and see where they go? I think things will work better if the next time something like this comes up, *one* person just says "hmm, interesting idea, but the backcompat issues seem pretty severe; do you have any ideas about how to mitigate that?", and then we let that point be taken as having been made and see where the discussion goes. Maybe we can all give it a try? -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Apr 8 13:38:29 2015 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 8 Apr 2015 18:38:29 +0100 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: References: Message-ID: On Wed, Apr 8, 2015 at 2:06 AM, Nathaniel Smith wrote: > > On Apr 5, 2015 7:04 AM, "Robert Kern" wrote: > > > > On Sat, Apr 4, 2015 at 10:38 PM, Nathaniel Smith wrote: > > > > > > On Apr 4, 2015 4:12 AM, "Todd" wrote: > > > > > > > > There was no break as large as this. In fact I would say this is even a larger change than any individual change we saw in the python 2 to 3 switch. The basic mechanics of indexing are just too fundamental and touch on too many things to make this sort of change feasible. > > > > > > I'm afraid I'm not clever enough to know how large or feasible a change is without even seeing the proposed change. > > > > It doesn't take any cleverness. The change in question was to make the default indexing semantics to orthogonal indexing. No matter the details of the ultimate proposal to achieve that end, it has known minimum consequences, at least in the broad outline. Current documentation and books become obsolete for a fundamental operation. Current code must be modified by some step to continue working. These are consequences inherent in the end, not just the means to the end; we don't need a concrete proposal in front of us to know what they are. There are ways to mitigate these consequences, but there are no silver bullets that eliminate them. And we can compare those consequences to approaches like Jaime's that achieve a majority of the benefits of such a change without any of the negative consequences. That comparison does not bode well for any proposal. > > Ok, let me try to make my point another way. > > I don't actually care at this stage in the discussion whether the change is ultimately viable. And I don't think you should either. (For values of "you" that includes everyone in the discussion, not picking on Robert in particular :-).) > > My point is that rational, effective discussion requires giving ideas room to breath. Sometimes ideas turn out to be not as bad as they looked. Sometimes it turns out that they are, but there's some clever tweak that gives you 95% of the benefits for 5% of the cost. Sometimes you generate a better understanding of the tradeoffs that subsequently informs later design decisions. Sometimes working through the details makes both sides realize that there's a third option that solves both their problems. Sometimes you merely get a very specific understanding of why the whole approach is unreasonable that you can then, say, take to the pandas and netcdf developers as evidence of that you made a good faith effort and ask them to meet you half way. And all these things require understanding the specifics of what *exactly* works or doesn't work about about idea. IMHO, it's extremely misleading at this stage to make any assertion about whether Jaime's approach gives the "majority of benefits of such a change" is extremely misleading at this stage: not because it's wrong, but because it totally short-circuits the discussion about what benefits we care about. Jaime's patch certainly has merits, but does it help us make numpy and pandas/netcdf's more compatible? Does it make it easier for Eric to teach? Those are some specific merits that we might care about a lot, and for which Jaime's patch may or may not help much. But that kind of nuance gets lost when we jump straight to debating thumbs-up versus thumbs-down. And we can get all of that discussion from discussing Jaime's proposal. I would argue that we will get better, more focused discussion from it since it is actually a concrete proposal and not just a wish that numpy's indexing semantics were something else. I think that a full airing and elaboration of Jaime's proposal (as the final PR should look quite different than the initial one to incorporate the what is found in the discussion) will give us a satisficing solution. I certainly think that that is *more likely* to arrive at a satisficing solution than an attempt to change the default indexing semantics. I can name specific improvements that would specifically address the concerns you named if you would like. Maybe it won't be *quite* as good (for some parties) than if Numeric chose orthogonal indexing from the get-go, but it will likely be much better for everyone than if numpy broke backward compatibility on this feature now. > I cross-my-heart promise that under the current regime, no PR breaking fancy indexing would ever get anywhere *near* numpy master without *extensive* discussion and warnings on the list. The core devs just spent weeks quibbling about whether a PR that adds a new field to the end of the dtype struct would break ABI backcompat (we're now pretty sure it doesn't), and the current standard we enforce is that every PR that touches public API needs a list discussion, even minor extensions with no compatibility issues at all. No one is going to sneak anything by anyone. That is not the issue. Ralf asked you not to invite such PRs in the first place. No one thinks that such a PR would get "snuck" in. That's not anyone's concern. > Plus, I dunno, our current approach to discussions just seems to make things hostile and shouty and unpleasant. If a grad student or junior colleague comes to you with an idea where you see some potentially critical flaw, do you yell THAT WILL NEVER WORK and kick them out of your office? Or, do you maybe ask a few leading questions and see where they go? > > I think things will work better if the next time something like this comes up, *one* person just says "hmm, interesting idea, but the backcompat issues seem pretty severe; do you have any ideas about how to mitigate that?", and then we let that point be taken as having been made and see where the discussion goes. Maybe we can all give it a try? You do remember that I said we should be "politely considering [...] proposals people send us uninvited", right? The "politely" was a key part of that. Prospectively inviting backwards-incompatible proposals for a full airing goes beyond this. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From blake.a.griffith at gmail.com Tue Apr 7 13:03:53 2015 From: blake.a.griffith at gmail.com (Blake Griffith) Date: Tue, 7 Apr 2015 12:03:53 -0500 Subject: [Numpy-discussion] Behavior of np.random.multivariate_normal with bad covariance matrices In-Reply-To: References: Message-ID: I like your idea Josef, I'll add it to the PR. Just to be clear, we should have something like: Have a single "check_valid" keyword arg, which will default to warn, since that is the current behavior. It will check approximate symmetry, PSDness, and for NaN & infs. Other options on the check_valid keyword arg will be ignore, and raise. What should happen when "fix" is passed for check_valid? Set negative eigenvalues to 0 and symmetrize the matrix? On Mon, Mar 30, 2015 at 8:34 AM, wrote: > On Sun, Mar 29, 2015 at 7:39 PM, Blake Griffith > wrote: > > I have an open PR which lets users control the checks on the input > > covariance matrix. The matrix is required to be symmetric and positve > > semi-definite (PSD). The current behavior is that NumPy raises a warning > if > > the matrix is not PSD, and does not even check for symmetry. > > > > I added a symmetry check, which raises a warning when the input is not > > symmetric. And added two keyword args which users can use to turn off the > > checks/warnings when the matrix is ill formed. So this would only cause > > another new warning to be raised in existing code. > > > > This is needed because sometimes the covariance matrix is only *almost* > > symmetric or PSD due to roundoff error. > > > > Thoughts? > > My only question is why is **exact** symmetry relevant? > > AFAIU > A empirical covariance matrix might not be exactly symmetric unless we > specifically force it to be. But I don't see why some roundoff errors > that violate symmetry should be relevant. > > use allclose with floating point rtol or equivalent? > > Some user code might suddenly get irrelevant warnings. > > BTW: > neg = (np.sum(u.T * v, axis=1) < 0) & (s > 0) > doesn't need to be calculated if cov_psd is false. > > ----- > > some more: > > svd can hang if the values are not finite, i.e. nan or infs > > counter proposal would be to add a `check_valid` keyword with option > ignore. warn, raise, and "fix" > > and raise an error if there are nans and check_valid is not ignore. > > --------- > > aside: > np.random.multivariate_normal is only relevant if you have a new cov > each call (or don't mind repeated possibly expensive calculations), > so, I guess, adding checks by default won't upset many users. > > > Josef > > > > > > > > PR: https://github.com/numpy/numpy/pull/5726 > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Wed Apr 8 14:09:54 2015 From: alan.isaac at gmail.com (Alan G Isaac) Date: Wed, 08 Apr 2015 14:09:54 -0400 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: References: Message-ID: <55256EF2.6040602@gmail.com> That analogy fails because it suggests a private conversation. This list is extremely public. For example, I am just a user, and I am on it. I can tell you that as a long-time numpy user my reaction to the proposal to change indexing semantics was (i) OMG YMBFKM and then (ii) take a breath; this too will fade away. It is very reasonable to worry that some users will start at the same place but them move in a different direction, and that worry should affect how such proposals are floated and discussed. I am personally grateful that the idea's reception has been so chilly; it's very reassuring. fwiw, Alan On 4/7/2015 9:06 PM, Nathaniel Smith wrote: > If a grad student or junior colleague comes to you with an > idea where you see some potentially critical flaw, do you > yell THAT WILL NEVER WORK and kick them out of your > office? Or, do you maybe ask a few leading questions and > see where they go? From lists at hilboll.de Tue Apr 7 11:14:45 2015 From: lists at hilboll.de (Andreas Hilboll) Date: Tue, 07 Apr 2015 17:14:45 +0200 Subject: [Numpy-discussion] FutureWarning: comparison to `None` will result in an elementwise object comparison in the future. Message-ID: <5523F465.9000001@hilboll.de> Hi all, I'm commonly using function signatures like def myfunc(a, b, c=None): if c is None: # do something ... ... where c is an optional array argument. For some time now, I'm getting a FutureWarning: comparison to `None` will result in an elementwise object comparison in the future from the "c is None" comparison. I'm wondering what would be the best way to do this check in a future-proof way? Best, -- Andreas. From totonixsame at gmail.com Tue Apr 7 09:54:37 2015 From: totonixsame at gmail.com (Thiago Franco Moraes) Date: Tue, 07 Apr 2015 13:54:37 +0000 Subject: [Numpy-discussion] Research position in the Brazilian Research Institute for Science and Neurotechnology - BRAINN Message-ID: Research position in the Brazilian Research Institute for Science and Neurotechnology ? BRAINN Postdoc researcher to work with software development for medical imaging The Brazilian Research Institute for Neuroscience and Neurotechnology (BRAINN) (www.brainn.org.br) focuses on the investigation of basic mechanisms leading to epilepsy and stroke, and the injury mechanisms that follow disease onset and progression. This research has important applications related to prevention, diagnosis, treatment and rehabilitation and will serve as a model for better understanding normal and abnormal brain function. The BRAINN Institute is composed of 10 institutions from Brazil and abroad and hosted by State University of Campinas (UNICAMP). Among the associated institutions is Renato Archer Information Technology Center (CTI) that has a specialized team in open-source software development for medical imaging (www.cti.gov.br/invesalius) and 3D printing applications for healthcare. CTI is located close the UNICAMP in the city of Campinas, State of S?o Paulo in a very technological region of Brazil and is looking for a postdoc researcher to work with software development for medical imaging related to the imaging analysis, diagnosis and treatment of brain diseases. The postdoc position is for two years with the possibility of being renovated for more two years. Education - PhD in computer science, computer engineering, mathematics, physics or related. Requirements - Digital image processing (Medical imaging) - Computer graphics (basic) Benefits 6.143,40 Reais per month free of taxes (about US$ 2.000,00); 15% technical reserve for conferences participation and specific materials acquisition; Interested Send curriculum to: jorge.silva at cti.gov.br with subject ?Postdoc position? Applications reviews will begin April 30, 2015 and continue until the position is filled. -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Apr 8 14:24:26 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 8 Apr 2015 14:24:26 -0400 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: References: Message-ID: On Wed, Apr 8, 2015 at 1:38 PM, Robert Kern wrote: > On Wed, Apr 8, 2015 at 2:06 AM, Nathaniel Smith wrote: >> >> On Apr 5, 2015 7:04 AM, "Robert Kern" wrote: >> > >> > On Sat, Apr 4, 2015 at 10:38 PM, Nathaniel Smith wrote: >> > > >> > > On Apr 4, 2015 4:12 AM, "Todd" wrote: >> > > > >> > > > There was no break as large as this. In fact I would say this is >> > > > even a larger change than any individual change we saw in the python 2 to 3 >> > > > switch. The basic mechanics of indexing are just too fundamental and touch >> > > > on too many things to make this sort of change feasible. >> > > >> > > I'm afraid I'm not clever enough to know how large or feasible a >> > > change is without even seeing the proposed change. >> > >> > It doesn't take any cleverness. The change in question was to make the >> > default indexing semantics to orthogonal indexing. No matter the details of >> > the ultimate proposal to achieve that end, it has known minimum >> > consequences, at least in the broad outline. Current documentation and books >> > become obsolete for a fundamental operation. Current code must be modified >> > by some step to continue working. These are consequences inherent in the >> > end, not just the means to the end; we don't need a concrete proposal in >> > front of us to know what they are. There are ways to mitigate these >> > consequences, but there are no silver bullets that eliminate them. And we >> > can compare those consequences to approaches like Jaime's that achieve a >> > majority of the benefits of such a change without any of the negative >> > consequences. That comparison does not bode well for any proposal. >> >> Ok, let me try to make my point another way. >> >> I don't actually care at this stage in the discussion whether the change >> is ultimately viable. And I don't think you should either. (For values of >> "you" that includes everyone in the discussion, not picking on Robert in >> particular :-).) >> >> My point is that rational, effective discussion requires giving ideas room >> to breath. Sometimes ideas turn out to be not as bad as they looked. >> Sometimes it turns out that they are, but there's some clever tweak that >> gives you 95% of the benefits for 5% of the cost. Sometimes you generate a >> better understanding of the tradeoffs that subsequently informs later design >> decisions. Sometimes working through the details makes both sides realize >> that there's a third option that solves both their problems. Sometimes you >> merely get a very specific understanding of why the whole approach is >> unreasonable that you can then, say, take to the pandas and netcdf >> developers as evidence of that you made a good faith effort and ask them to >> meet you half way. And all these things require understanding the specifics >> of what *exactly* works or doesn't work about about idea. IMHO, it's >> extremely misleading at this stage to make any assertion about whether >> Jaime's approach gives the "majority of benefits of such a change" is >> extremely misleading at this stage: not because it's wrong, but because it >> totally short-circuits the discussion about what benefits we care about. >> Jaime's patch certainly has merits, but does it help us make numpy and >> pandas/netcdf's more compatible? Does it make it easier for Eric to teach? >> Those are some specific merits that we might care about a lot, and for which >> Jaime's patch may or may not help much. But that kind of nuance gets lost >> when we jump straight to debating thumbs-up versus thumbs-down. > > And we can get all of that discussion from discussing Jaime's proposal. I > would argue that we will get better, more focused discussion from it since > it is actually a concrete proposal and not just a wish that numpy's indexing > semantics were something else. I think that a full airing and elaboration of > Jaime's proposal (as the final PR should look quite different than the > initial one to incorporate the what is found in the discussion) will give us > a satisficing solution. I certainly think that that is *more likely* to > arrive at a satisficing solution than an attempt to change the default > indexing semantics. I can name specific improvements that would specifically > address the concerns you named if you would like. Maybe it won't be *quite* > as good (for some parties) than if Numeric chose orthogonal indexing from > the get-go, but it will likely be much better for everyone than if numpy > broke backward compatibility on this feature now. > >> I cross-my-heart promise that under the current regime, no PR breaking >> fancy indexing would ever get anywhere *near* numpy master without >> *extensive* discussion and warnings on the list. The core devs just spent >> weeks quibbling about whether a PR that adds a new field to the end of the >> dtype struct would break ABI backcompat (we're now pretty sure it doesn't), >> and the current standard we enforce is that every PR that touches public API >> needs a list discussion, even minor extensions with no compatibility issues >> at all. No one is going to sneak anything by anyone. > > That is not the issue. Ralf asked you not to invite such PRs in the first > place. No one thinks that such a PR would get "snuck" in. That's not > anyone's concern. > >> Plus, I dunno, our current approach to discussions just seems to make >> things hostile and shouty and unpleasant. If a grad student or junior >> colleague comes to you with an idea where you see some potentially critical >> flaw, do you yell THAT WILL NEVER WORK and kick them out of your office? Or, >> do you maybe ask a few leading questions and see where they go? >> >> I think things will work better if the next time something like this comes >> up, *one* person just says "hmm, interesting idea, but the backcompat issues >> seem pretty severe; do you have any ideas about how to mitigate that?", and >> then we let that point be taken as having been made and see where the >> discussion goes. Maybe we can all give it a try? > > You do remember that I said we should be "politely considering [...] > proposals people send us uninvited", right? The "politely" was a key part of > that. Prospectively inviting backwards-incompatible proposals for a full > airing goes beyond this. If a suggestion like changing the default indexing behavior and dropping fancy indexing has a ex ante chance of succeeding of less than 0.1%, then we should say so. Adding an improved additional features is then a useful alternative, and a better way to spend our or your time. A while ago we had the request on the mailing list to make numpy broadcasting behavior optional, the discussion "died" pretty fast. Fancy indexing and similar is a great feature (even if not many said so in the thread) and as far as I can tell it is heavily entrenched in the existing usage of numpy. You can always discuss proposals, as long as it is clear that these are low probability events. Josef > > -- > Robert Kern > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From njs at pobox.com Wed Apr 8 14:30:11 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 8 Apr 2015 11:30:11 -0700 Subject: [Numpy-discussion] FutureWarning: comparison to `None` will result in an elementwise object comparison in the future. In-Reply-To: <5523F465.9000001@hilboll.de> References: <5523F465.9000001@hilboll.de> Message-ID: On Apr 8, 2015 2:16 PM, "Andreas Hilboll" wrote: > > Hi all, > > I'm commonly using function signatures like > > def myfunc(a, b, c=None): > if c is None: > # do something ... > ... > > where c is an optional array argument. For some time now, I'm getting a > > FutureWarning: comparison to `None` will result in an elementwise > object comparison in the future > > from the "c is None" comparison. I'm wondering what would be the best > way to do this check in a future-proof way? As far as I know, you should be getting the warning when you write c == None and the fix should be to write c is None instead. (And this is definitely an important fix -- it's basically a bug in numpy that the == form ever worked.) Are you certain that you're getting warnings from 'c is None'? -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From chadfulton at gmail.com Wed Apr 8 14:36:51 2015 From: chadfulton at gmail.com (Chad Fulton) Date: Wed, 8 Apr 2015 11:36:51 -0700 Subject: [Numpy-discussion] Multidimensional Indexing In-Reply-To: References: Message-ID: On Mon, Apr 6, 2015 at 4:49 PM, Nicholas Devenish wrote: > With the indexing example from the documentation: > > y = np.arange(35).reshape(5,7) > > Why does selecting an item from explicitly every row work as I?d expect: >>>> y[np.array([0,1,2,3,4]),np.array([0,0,0,0,0])] > array([ 0, 7, 14, 21, 28]) > > But doing so from a full slice (which, I would naively expect to mean ?Every Row?) has some?other? behaviour: > >>>> y[:,np.array([0,0,0,0,0])] > array([[ 0, 0, 0, 0, 0], > [ 7, 7, 7, 7, 7], > [14, 14, 14, 14, 14], > [21, 21, 21, 21, 21], > [28, 28, 28, 28, 28]]) > > What is going on in this example, and how do I get what I expect? By explicitly passing in an extra array with value===index? What is the rationale for this difference in behaviour? > To understand this example, it is important to understand that for multi-dimensional arrays, Numpy attempts to make the index array along each dimension the same size, using broadcasting. So in your original example, y[np.array([0,1,2,3,4]),np.array([0,0,0,0,0])], the arrays are the same size, and the behavior is as you'd expect. In the second case, the first index is a slice, and the second index is an array. Documentation for this case can be found in the indexing docs under "Combining index arrays with slices". Here's the relevant portion: > In effect, the slice is converted to an [new] index array ... that is broadcast with the [other] index array So in your case, the slice ":" is *first* being converted to np.arange(5), *then* is broadcast across the shape of the [other] index array so that it is ultimately transformed into something like np.repeat(np.arange(5)[:,np.newaxis], 5, axis=1), giving you: array([[0, 0, 0, 0, 0], [1, 1, 1, 1, 1], [2, 2, 2, 2, 2], [3, 3, 3, 3, 3], [4, 4, 4, 4, 4]]) Now at this point you have converted your slice to an [new] index array of shape (5,5), and your [other] index array is shaped (5,). So now numpy applies broadcasting rules to the second array to get it into shape 5. This operation is identical to what just occurred, so your [other] index array *also* looks like: array([[0, 0, 0, 0, 0], [1, 1, 1, 1, 1], [2, 2, 2, 2, 2], [3, 3, 3, 3, 3], [4, 4, 4, 4, 4]]) Which then gives the result you saw. Now, you may say: once the slice was converted to np.arange(5), why was it then broadcast to shape (5,5) rather than kept at shape (5,) which would work. The reason (I suspect at least) is to keep it consistent with other types of slices. Consider if you did something like: y[1:3, np.array([0,0,0,0,0])] Then the same operation would apply as above, except that when the slice was converted to an array, it would be converted to np.arange(1,3) which has shape (2,). Obviously this isn't compatible with the second index array of shape (5,), so it *has* to be broadcast. One final note: in this case, you can instead use either of the following: y[np.array([0,1,2,3,4]), 0] or y[:, 0] using the same steps above, the slice is converted to an np.arange(5), and then the shapes are compared, (5,) versus (). Then the integer index is broadcast to shape (5,) which gives you what you want. Hope that helps. From lists at hilboll.de Wed Apr 8 14:42:27 2015 From: lists at hilboll.de (Andreas Hilboll) Date: Wed, 08 Apr 2015 20:42:27 +0200 Subject: [Numpy-discussion] FutureWarning: comparison to `None` will result in an elementwise object comparison in the future. In-Reply-To: References: <5523F465.9000001@hilboll.de> Message-ID: <55257693.3000602@hilboll.de> On 08.04.2015 20:30, Nathaniel Smith wrote: > On Apr 8, 2015 2:16 PM, "Andreas Hilboll" > wrote: >> >> Hi all, >> >> I'm commonly using function signatures like >> >> def myfunc(a, b, c=None): >> if c is None: >> # do something ... >> ... >> >> where c is an optional array argument. For some time now, I'm getting a >> >> FutureWarning: comparison to `None` will result in an elementwise >> object comparison in the future >> >> from the "c is None" comparison. I'm wondering what would be the best >> way to do this check in a future-proof way? > > As far as I know, you should be getting the warning when you write > c == None > and the fix should be to write > c is None > instead. (And this is definitely an important fix -- it's basically a > bug in numpy that the == form ever worked.) Are you certain that you're > getting warnings from 'c is None'? My mistake; I was actually doing if p1 is None == h1 is None: instead of if (p1 is None) == (h1 is None): Sorry for the noise. -- Andreas. From msuzen at gmail.com Tue Apr 7 02:46:50 2015 From: msuzen at gmail.com (Suzen, Mehmet) Date: Tue, 7 Apr 2015 07:46:50 +0100 Subject: [Numpy-discussion] IDE's for numpy development? Message-ID: > Spyder supports C. Thanks for correcting this. I wasn't aware of it. How was your experience with it? Best, -m From efiring at hawaii.edu Wed Apr 8 15:05:08 2015 From: efiring at hawaii.edu (Eric Firing) Date: Wed, 08 Apr 2015 09:05:08 -1000 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: <55256EF2.6040602@gmail.com> References: <55256EF2.6040602@gmail.com> Message-ID: <55257BE4.1040205@hawaii.edu> On 2015/04/08 8:09 AM, Alan G Isaac wrote: > That analogy fails because it suggests a private conversation. This list is extremely public. > For example, I am just a user, and I am on it. I can tell you that as a long-time numpy user > my reaction to the proposal to change indexing semantics was (i) OMG YMBFKM and then > (ii) take a breath; this too will fade away. It is very reasonable to worry that some users > will start at the same place but them move in a different direction, and that worry should > affect how such proposals are floated and discussed. I am personally grateful that the > idea's reception has been so chilly; it's very reassuring. OK, so I was not sufficiently tactful when I tried to illustrate the real practical problem associated with a *core* aspect of numpy. My intent was not to alarm users, and I apologize if I have done so. I'm glad you have been reassured. I know perfectly well that back-compatibility and stability are highly important. What I wanted to do was to stimulate thought about how to handle a serious challenge to numpy's future--short-term, and long-term. Jaime's PR is a very welcome response to that challenge, but it might not be the end of the story. Matthew nicely sketched out one possible scenario, or actually a range of scenarios. Now, can we please get back to consideration of reasonable options? What sequence of steps might reduce the disconnect between numpy and the rest of the array-handling world? And make it a little friendlier for students? Are there *any* changes to indexing, whether by default or as an option, that would help? Consider the example I started with, in which indexing with [1, :, array] gives results that many find surprising and hard to understand. Might it make sense to *slowly* deprecate this? Or are such indexing expressions actually useful? If they are, would it be out of the question to have them *optionally* trigger a warning, so that numpy could be configured to be a little less likely to trip up a non-expert user? Eric > > fwiw, > Alan > > > On 4/7/2015 9:06 PM, Nathaniel Smith wrote: >> If a grad student or junior colleague comes to you with an >> idea where you see some potentially critical flaw, do you >> yell THAT WILL NEVER WORK and kick them out of your >> office? Or, do you maybe ask a few leading questions and >> see where they go? > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From yw5aj at virginia.edu Wed Apr 8 15:19:10 2015 From: yw5aj at virginia.edu (Yuxiang Wang) Date: Wed, 8 Apr 2015 15:19:10 -0400 Subject: [Numpy-discussion] IDE's for numpy development? In-Reply-To: References: Message-ID: I think spyder supports code highlighting in C and that's all... There's no way to compile in Spyder, is there? Shawn On Tue, Apr 7, 2015 at 2:46 AM, Suzen, Mehmet wrote: >> Spyder supports C. > > Thanks for correcting this. I wasn't aware of it. > How was your experience with it? > > Best, > -m > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Yuxiang "Shawn" Wang Gerling Research Lab University of Virginia yw5aj at virginia.edu +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/ From robert.kern at gmail.com Wed Apr 8 15:28:57 2015 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 8 Apr 2015 20:28:57 +0100 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: <55257BE4.1040205@hawaii.edu> References: <55256EF2.6040602@gmail.com> <55257BE4.1040205@hawaii.edu> Message-ID: On Wed, Apr 8, 2015 at 8:05 PM, Eric Firing wrote: > Now, can we please get back to consideration of reasonable options? Sure, but I recommend going back to the actually topical thread (or a new one), as this one is meta. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Wed Apr 8 15:40:54 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 8 Apr 2015 21:40:54 +0200 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: <55257BE4.1040205@hawaii.edu> References: <55256EF2.6040602@gmail.com> <55257BE4.1040205@hawaii.edu> Message-ID: On Wed, Apr 8, 2015 at 9:05 PM, Eric Firing wrote: > On 2015/04/08 8:09 AM, Alan G Isaac wrote: > > That analogy fails because it suggests a private conversation. This list > is extremely public. > > For example, I am just a user, and I am on it. I can tell you that as a > long-time numpy user > > my reaction to the proposal to change indexing semantics was (i) OMG > YMBFKM and then > > (ii) take a breath; this too will fade away. It is very reasonable to > worry that some users > > will start at the same place but them move in a different direction, and > that worry should > > affect how such proposals are floated and discussed. I am personally > grateful that the > > idea's reception has been so chilly; it's very reassuring. > > OK, so I was not sufficiently tactful when I tried to illustrate the > real practical problem associated with a *core* aspect of numpy. My > intent was not to alarm users, and I apologize if I have done so. I'm > glad you have been reassured. I know perfectly well that > back-compatibility and stability are highly important. What I wanted to > do was to stimulate thought about how to handle a serious challenge to > numpy's future--short-term, and long-term. Jaime's PR is a very welcome > response to that challenge, but it might not be the end of the story. > Matthew nicely sketched out one possible scenario, or actually a range > of scenarios. > > Now, can we please get back to consideration of reasonable options? > Well, in many people's definition of reasonable, that's basically Jaime's proposal and maybe the original __orthogonal_indexing__ one. Those both have a chance of actually being implemented, and presumably the original proposers have a use for the latter. Their proposal is not being discussed; instead that potentially useful discussion is being completely derailed by insisting on wanting to talk about changes to numpy's indexing behavior. To address in detail the list of Matthew you mention above: * implement orthogonal indexing as a method arr.sensible_index[...] That's basically Jaime's PR. * implement the current non-boolean fancy indexing behavior as a method - arr.crazy_index[...] Not that harmful, but only makes sense in combination with the next steps. * deprecate non-boolean fancy indexing as standard arr[...] indexing; No, see negative reaction by many people. * wait a long time; * remove non-boolean fancy indexing as standard arr[...] (errors are preferable to change in behavior) Most definitely no. Ralf P.S. random thought: maybe we should have a "numpy-ideas" list, just like python-dev has python-ideas -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Wed Apr 8 16:08:26 2015 From: efiring at hawaii.edu (Eric Firing) Date: Wed, 08 Apr 2015 10:08:26 -1000 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: References: <55256EF2.6040602@gmail.com> <55257BE4.1040205@hawaii.edu> Message-ID: <55258ABA.3030202@hawaii.edu> On 2015/04/08 9:40 AM, Ralf Gommers wrote: > Their proposal is not being discussed; instead that potentially useful > discussion is being completely derailed by insisting on wanting to talk > about changes to numpy's indexing behavior. Good point. That was an unintended consequence of my message. Eric From alan.isaac at gmail.com Wed Apr 8 16:02:58 2015 From: alan.isaac at gmail.com (Alan G Isaac) Date: Wed, 08 Apr 2015 16:02:58 -0400 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: <55257BE4.1040205@hawaii.edu> References: <55256EF2.6040602@gmail.com> <55257BE4.1040205@hawaii.edu> Message-ID: <55258972.6050501@gmail.com> 1. I use numpy in teaching. I have never heard a complaint about its indexing behavior. Have you heard such complaints? 2. One reason I use numpy in teaching is its indexing behavior. What specific language provides a better indexing model, in your opinion? 3. I admit, my students are NOT using non-boolen fancy indexing on multidimensional arrays. (As far as I know.) Are yours? Cheers, Alan On 4/8/2015 3:05 PM, Eric Firing wrote: > What sequence of steps might reduce the disconnect between numpy and the > rest of the array-handling world? And make it a little friendlier for > students? From efiring at hawaii.edu Wed Apr 8 16:16:41 2015 From: efiring at hawaii.edu (Eric Firing) Date: Wed, 08 Apr 2015 10:16:41 -1000 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: <55258972.6050501@gmail.com> References: <55256EF2.6040602@gmail.com> <55257BE4.1040205@hawaii.edu> <55258972.6050501@gmail.com> Message-ID: <55258CA9.9000302@hawaii.edu> On 2015/04/08 10:02 AM, Alan G Isaac wrote: > 3. I admit, my students are NOT using non-boolen fancy indexing on > multidimensional arrays. (As far as I know.) Are yours? Yes, one attempted to, essentially by accident. That was in my original message. Please refer back to that. The earlier part of this thread, under its original name, is also relevant to your other questions. I'm not going to discuss this further. The thread is now closed as far as I am concerned. Eric From robert.kern at gmail.com Wed Apr 8 16:20:33 2015 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 8 Apr 2015 21:20:33 +0100 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: References: <55256EF2.6040602@gmail.com> <55257BE4.1040205@hawaii.edu> Message-ID: On Wed, Apr 8, 2015 at 8:40 PM, Ralf Gommers wrote: > To address in detail the list of Matthew you mention above: > > * implement orthogonal indexing as a method arr.sensible_index[...] > That's basically Jaime's PR. > > * implement the current non-boolean fancy indexing behavior as a method - arr.crazy_index[...] > Not that harmful, but only makes sense in combination with the next steps. Well, since we got the peanut butter in our meta, I might as well join in here. I think this step is useful even without the deprecation steps. First, allow me to rename things in a less judgy fashion: * arr.ortho_ix is a property that allows for orthogonal indexing, a la Jaime's PR. * arr.fancy_ix is a property that implements the current numpy-standard fancy indexing. Even though arr.fancy_ix "only" replicating the default semantics, having it opens up some possibilities. Other array-like objects can implement both of these with the same names. Thus, to write generic code that exploits the particular behaviors of one of these semantics, you just make sure to use the property that behaves the way you want. Then you don't care what the default syntax does on that object. You don't have to test if an object has arr.__orthogonal_indexing__ and have two different code paths; you just use the property that behaves the way you want. It also allows us to issue warnings when the default indexing syntax is used for some of the behaviors that are weird corner cases, like [1, :, array]. This is one of those corner cases where the behavior is probably not what anyone actually *wants*; it was just the only thing we could do that is consistent with the desired semantics of the rest of the cases. I think it would be reasonable to issue a warning if the default indexing syntax was used with it. It's probably a sign that the user thought that indexing worked like orthogonal indexing. The warning would *not* be issued if the arr.fancy_ix property was used, since that is an explicit signal that the user is specifically requesting a particular set of behaviors. I probably won't want to ever *deprecate* the behavior for the default syntax, but a warning is easy to deal with even with old code that you don't want to modify directly. Lastly, I would appreciate having some signal to tell readers "pay attention; this is nontrivial index manipulation; here is a googleable term so you can look up what this means". I almost always use the default fancy indexing, but I'd use the arr.fancy_ix property for the nontrivial cases just for this alone. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From fperez.net at gmail.com Wed Apr 8 16:32:27 2015 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 8 Apr 2015 13:32:27 -0700 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: References: Message-ID: Trying to bring the meta back into this thread (sorry for Robert's PB :)... The only thing I'd like to add, is that it's perhaps worth messaging that: a PR is just (as the Github folks like to say) "a conversation based on code". It is NOT necessarily something intended explicitly for merging. In IPython, we do sometimes create PRs that we mark explicitly as *Not for merging*, just so that we can play with a concrete implementation of an idea, even when we know in advance it's not going to go in. But it may be a useful way to explore the problem with code everyone can easily grab and run, to have a thread of discussion right there next to the code, to evolve the code together with the discussions as insights arise, and to finally document anything learned, all in one place. So, with a bit of messaging, "encouraging PRs" doesn't need to be seen as "the numpy core devs would like to see every last crazy idea you have in mind to see what we can merge in our next drunken stupor". But rather, "some ideas, even crazy ones, are easier to understand when accompanied by code, could you send a PR for what you have in mind, knowing we're nearly certain we won't merge it, but it will make it easier for us to have a fruitful discussion with you". Back to lurking ;) f -- Fernando Perez (@fperez_org; http://fperez.org) fperez.net-at-gmail: mailing lists only (I ignore this when swamped!) fernando.perez-at-berkeley: contact me here for any direct mail -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue Apr 7 03:58:59 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 07 Apr 2015 09:58:59 +0200 Subject: [Numpy-discussion] Multidimensional Indexing In-Reply-To: References: Message-ID: <1428393539.7538.7.camel@sipsolutions.net> On Di, 2015-04-07 at 00:49 +0100, Nicholas Devenish wrote: > With the indexing example from the documentation: > > y = np.arange(35).reshape(5,7) > > Why does selecting an item from explicitly every row work as I?d expect: > >>> y[np.array([0,1,2,3,4]),np.array([0,0,0,0,0])] > array([ 0, 7, 14, 21, 28]) > > But doing so from a full slice (which, I would naively expect to mean ?Every Row?) has some?other? behaviour: > > >>> y[:,np.array([0,0,0,0,0])] > array([[ 0, 0, 0, 0, 0], > [ 7, 7, 7, 7, 7], > [14, 14, 14, 14, 14], > [21, 21, 21, 21, 21], > [28, 28, 28, 28, 28]]) > > What is going on in this example, and how do I get what I expect? By explicitly passing in an extra array with value===index? What is the rationale for this difference in behaviour? > The rationale is historic. Indexing with arrays (advanced indexing) works different from slicing. So two arrays will be iterated together, while slicing is not (we sometimes call it outer/orthogonal indexing for that matter, there is just a big discussion about this). These are different beasts, you can basically get the slicing like behaviour by adding appropriate axes to your indexing arrays: y[np.array([[0],[1],[2],[3],[4]]),np.array([0,0,0,0,0])] The other way around is not possible. Note that if it was the case: y[:, :] would give the diagonal (if possible) and not the full array as you would probably also expect. One warning: If you index with more then one array (scalars are also arrays in this sense -- so `[0, :, array]` is an example) in combination with slices, the result can be transposed in a confusing way (it is not that difficult, but usually unexpected). - Sebastian > Thanks, > > Nick > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From ralf.gommers at gmail.com Wed Apr 8 16:54:19 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Wed, 8 Apr 2015 22:54:19 +0200 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: References: <55256EF2.6040602@gmail.com> <55257BE4.1040205@hawaii.edu> Message-ID: On Wed, Apr 8, 2015 at 10:20 PM, Robert Kern wrote: > On Wed, Apr 8, 2015 at 8:40 PM, Ralf Gommers > wrote: > > > To address in detail the list of Matthew you mention above: > > > > * implement orthogonal indexing as a method arr.sensible_index[...] > > That's basically Jaime's PR. > > > > * implement the current non-boolean fancy indexing behavior as a > method - arr.crazy_index[...] > > Not that harmful, but only makes sense in combination with the next > steps. > > Well, since we got the peanut butter in our meta, I might as well join in > here. > Yeah, sorry about that. I don't even like peanut butter. > I think this step is useful even without the deprecation steps. > Your arguments make sense. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Wed Apr 8 19:34:52 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 8 Apr 2015 19:34:52 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors Message-ID: Numpy's outer product works fine with vectors. However, I seem to always want len(outer(a, b).shape) to be equal to len(a.shape) + len(b.shape). Wolfram-alpha seems to agree https://reference.wolfram.com/language/ref/Outer.html with respect to matrix outer products. My suggestion is to define outer as defined below. I've contrasted it with numpy's current outer product. In [36]: def a(n): return np.ones(n) In [37]: b = a(()) In [38]: c = a(4) In [39]: d = a(5) In [40]: np.outer(b, d).shape Out[40]: (1, 5) In [41]: np.outer(c, d).shape Out[41]: (4, 5) In [42]: np.outer(c, b).shape Out[42]: (4, 1) In [43]: def outer(a, b): return a[(...,) + len(b.shape) * (np.newaxis,)] * b ....: In [44]: outer(b, d).shape Out[44]: (5,) In [45]: outer(c, d).shape Out[45]: (4, 5) In [46]: outer(c, b).shape Out[46]: (4,) Best, Neil -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Thu Apr 9 01:40:51 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 09 Apr 2015 07:40:51 +0200 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: Message-ID: <1428558051.26878.6.camel@sipsolutions.net> On Mi, 2015-04-08 at 19:34 -0400, Neil Girdhar wrote: > Numpy's outer product works fine with vectors. However, I seem to > always want len(outer(a, b).shape) to be equal to len(a.shape) + > len(b.shape). Wolfram-alpha seems to agree > https://reference.wolfram.com/language/ref/Outer.html with respect to > matrix outer products. My suggestion is to define outer as defined > below. I've contrasted it with numpy's current outer product. > Actually this behaviour already exists (I was not aware it was different) for all ufuncs, though it is not that well known, and I tend to broadcast manually which may be a pity ;): np.multiply.outer(a, b) does exactly what you want, though I am not sure if np.outer might leverage blas or not. - Sebastian > > In [36]: def a(n): return np.ones(n) > > > In [37]: b = a(()) > > > In [38]: c = a(4) > > > In [39]: d = a(5) > > > In [40]: np.outer(b, d).shape > Out[40]: (1, 5) > > > In [41]: np.outer(c, d).shape > Out[41]: (4, 5) > > > In [42]: np.outer(c, b).shape > Out[42]: (4, 1) > > > In [43]: def outer(a, b): > return a[(...,) + len(b.shape) * (np.newaxis,)] * b > ....: > > > In [44]: outer(b, d).shape > Out[44]: (5,) > > > In [45]: outer(c, d).shape > Out[45]: (4, 5) > > > In [46]: outer(c, b).shape > Out[46]: (4,) > > > Best, > > > Neil > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From njs at pobox.com Thu Apr 9 01:44:28 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 9 Apr 2015 01:44:28 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: Message-ID: On Wed, Apr 8, 2015 at 7:34 PM, Neil Girdhar wrote: > Numpy's outer product works fine with vectors. However, I seem to always > want len(outer(a, b).shape) to be equal to len(a.shape) + len(b.shape). > Wolfram-alpha seems to agree > https://reference.wolfram.com/language/ref/Outer.html with respect to matrix > outer products. You're probably right that this is the correct definition of the outer product in an n-dimensional world. But this seems to go beyond being just a bug in handling 0-d arrays (which is the kind of corner case we've fixed in the past); np.outer is documented to always ravel its inputs to 1d. In fact the implementation is literally just: a = asarray(a) b = asarray(b) return multiply(a.ravel()[:, newaxis], b.ravel()[newaxis,:], out) Sebastian's np.multiply.outer is much more generic and effective. Maybe we should just deprecate np.outer? I don't see what use it serves. (When and whether it actually got removed after being deprecated would depend on how much use it actually gets in real code, which I certainly don't know while typing a quick email. But we could start telling people not to use it any time.) -n -- Nathaniel J. Smith -- http://vorpus.org From njs at pobox.com Thu Apr 9 01:57:02 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 9 Apr 2015 01:57:02 -0400 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: <55256EF2.6040602@gmail.com> References: <55256EF2.6040602@gmail.com> Message-ID: On Wed, Apr 8, 2015 at 2:09 PM, Alan G Isaac wrote: > That analogy fails because it suggests a private conversation. This list is extremely public. > For example, I am just a user, and I am on it. I can tell you that as a long-time numpy user > my reaction to the proposal to change indexing semantics was (i) OMG YMBFKM and then > (ii) take a breath; this too will fade away. It is very reasonable to worry that some users > will start at the same place but them move in a different direction, and that worry should > affect how such proposals are floated and discussed. I am personally grateful that the > idea's reception has been so chilly; it's very reassuring. Thanks, this is really useful feedback. I can totally understand that panic flare and it sucks. Do you think there's anything we could be doing to reduce this kind of adrenaline reaction while still allowing for relaxed discussion about out-there ideas? In my mind the "relaxed" part is actually a huge part of the goal: reaching the point where everyone can be confident that their voice will be heard, etc., so that things become less fraught and it becomes easier to focus on ideas. -n -- Nathaniel J. Smith -- http://vorpus.org From njs at pobox.com Thu Apr 9 02:22:17 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 9 Apr 2015 02:22:17 -0400 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: <55258972.6050501@gmail.com> References: <55256EF2.6040602@gmail.com> <55257BE4.1040205@hawaii.edu> <55258972.6050501@gmail.com> Message-ID: On Wed, Apr 8, 2015 at 4:02 PM, Alan G Isaac wrote: > 1. I use numpy in teaching. > I have never heard a complaint about its indexing behavior. > Have you heard such complaints? Some observations: 1) There's an unrelated thread on numpy-discussion right now in which a user is baffled by the interaction between slicing and integer fancy indexing: http://thread.gmane.org/gmane.comp.python.numeric.general/60321 And one of the three replies AFAICT also doesn't actually make sense, in that its explanation relies on broadcasting two arrays with shape (5,) against each other to produce an array with shape (5, 5). (Which is not how broadcasting works.) To be fair, though, this isn't the poster's fault, because they are quoting the documentation! 2) Again, entirely by coincidence, literally this week a numpy user at Berkeley felt spontaneously moved to send a warning message to the campus py4science list just to warn everyone about the bizarre behaviour they had stumbled on where arr[0, :, idx] produced inexplicable results. They had already found the docs and worked out what was going on, they just felt it was necessary to warn everyone else to be careful out there. 3) I personally regularly get confused by integer fancy indexing. I actually understand it substantially better due to thinking it through while reading these threads, but I'm a bit disturbed that I had that much left to learn. (New key insight: you can think of *scalar* indexing arr[i, j, k] as a function f(i, j, k) -> value. If you take that function and make it a ufunc, then you have integer fancy indexing. ...Though there's still an extra pound of explanation needed to describe the mixed slice/fancy cases, it at least captures the basic intuition. Maybe this was already obvious to everyone else, but it helped me.) 4) Even with my New and Improved Explanatory Powers, when this thread came up chatting with Thomas Kluyver today, I attempted to provide a simple, accurate description of how numpy indexing works so that the debate would make sense, and his conclusion was (paraphrased) "okay, now I don't understand numpy indexing anymore and never did". I say this not to pick on Thomas, but to make that point that Thomas is a pretty smart guy so maybe this is actually confusing. (Or maybe I'm just terrible at explaining things.) I actually think the evidence is very very strong that numpy's current way of mixing integer fancy indexing and slice-based indexing is a mistake. It's just not clear whether there's anything we can do to mitigate that mistake (or indeed, what would actually be better even if we could start over from scratch). (Which we can't.) > 2. One reason I use numpy in teaching is its indexing behavior. > What specific language provides a better indexing model, > in your opinion? > > 3. I admit, my students are NOT using non-boolen fancy indexing on > multidimensional arrays. (As far as I know.) Are yours? Well, okay, this would explain it, since integer fancy indexing is exactly the confusing case :-) On the plus side, this also means that even if pigs started doing barrel-rolls through hell's winter-vortex-chilled air tomorrow and we simply removed integer fancy indexing, your students would be unaffected :-) -n -- Nathaniel J. Smith -- http://vorpus.org From sebastian at sipsolutions.net Thu Apr 9 02:28:50 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 09 Apr 2015 08:28:50 +0200 Subject: [Numpy-discussion] SIAM meeting in Snowbird anyone? Message-ID: <1428560930.26878.11.camel@sipsolutions.net> Hey, since I am not in the US often and the SIAM conference is pretty large, I was wondering if some more of our community will be at the SIAM conference in Snowbird around May 17th-21st and would like to meet up then. - Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Thu Apr 9 02:50:43 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 09 Apr 2015 08:50:43 +0200 Subject: [Numpy-discussion] Non-meta indexing improvements discussion Message-ID: <1428562243.26878.26.camel@sipsolutions.net> Hi all, Let me take a shot at summing up some suggestions to make the indexing less surprising, and maybe we can gather some more in a more concentrated way now. 1. Implement something like `arr.fancy_index[...]` and `arr.ortho_index[...]` (i.e. Jaimes PR is the start for trying this) 2. Add warnings for non-consecutive advanced indexing (i.e. the original example `arr[0, :, index_array]`). 3. I do not know if it possible or useful, but I could imagine a module wide switch (similar to __future__ imports) to change the default indexing behaviour. One more thing, implementing this (especially the "new" indexing) is non-trivial, so as always help beyond just a discussion is appreciated and in my opinion the best way to push an actual change to happen sooner rather then in some far off future. I do not have time for concentrating much on an implementation for a while myself for a while at least. - Sebastian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Thu Apr 9 03:01:26 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 09 Apr 2015 09:01:26 +0200 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: References: <55256EF2.6040602@gmail.com> <55257BE4.1040205@hawaii.edu> <55258972.6050501@gmail.com> Message-ID: <1428562886.26878.34.camel@sipsolutions.net> On Do, 2015-04-09 at 02:22 -0400, Nathaniel Smith wrote: > On Wed, Apr 8, 2015 at 4:02 PM, Alan G Isaac wrote: > > 1. I use numpy in teaching. > > I have never heard a complaint about its indexing behavior. > > Have you heard such complaints? > > Some observations: > > 1) There's an unrelated thread on numpy-discussion right now in which > a user is baffled by the interaction between slicing and integer fancy > indexing: > http://thread.gmane.org/gmane.comp.python.numeric.general/60321 > And one of the three replies AFAICT also doesn't actually make sense, > in that its explanation relies on broadcasting two arrays with shape > (5,) against each other to produce an array with shape (5, 5). (Which > is not how broadcasting works.) To be fair, though, this isn't the > poster's fault, because they are quoting the documentation! > > 2) Again, entirely by coincidence, literally this week a numpy user at > Berkeley felt spontaneously moved to send a warning message to the > campus py4science list just to warn everyone about the bizarre > behaviour they had stumbled on where arr[0, :, idx] produced > inexplicable results. They had already found the docs and worked out > what was going on, they just felt it was necessary to warn everyone > else to be careful out there. > > 3) I personally regularly get confused by integer fancy indexing. I > actually understand it substantially better due to thinking it through > while reading these threads, but I'm a bit disturbed that I had that > much left to learn. (New key insight: you can think of *scalar* > indexing arr[i, j, k] as a function f(i, j, k) -> value. If you take > that function and make it a ufunc, then you have integer fancy > indexing. ...Though there's still an extra pound of explanation needed > to describe the mixed slice/fancy cases, it at least captures the > basic intuition. Maybe this was already obvious to everyone else, but > it helped me.) > > 4) Even with my New and Improved Explanatory Powers, when this thread > came up chatting with Thomas Kluyver today, I attempted to provide a > simple, accurate description of how numpy indexing works so that the > debate would make sense, and his conclusion was (paraphrased) "okay, > now I don't understand numpy indexing anymore and never did". I say > this not to pick on Thomas, but to make that point that Thomas is a > pretty smart guy so maybe this is actually confusing. (Or maybe I'm > just terrible at explaining things.) > > I actually think the evidence is very very strong that numpy's current > way of mixing integer fancy indexing and slice-based indexing is a > mistake. It's just not clear whether there's anything we can do to > mitigate that mistake (or indeed, what would actually be better even > if we could start over from scratch). (Which we can't.) > I think the best way to think about the mixing is to think about "subspaces" defined by all of the slices which are taken for each individual fancy indexing "element". I.e. each subspaces is something like: new[:, 0, :] = arr[:, fancy1[0], fancy2[0], :] then you iterate the fancy indexes so the subspaces moves ahead: new[:, 1, :] = arr[:, fancy1[1], fancy2[1], :] new[:, 2, :] = arr[:, fancy1[2], fancy2[2], :] and so on. This is also how it is implemented. Plus of course the transposing to the front when the fancy indices are not consecutive and you cannot add the fancy dimensions to where they were. I think you mentioned an error in the docu, I thought I cleared some of them, but proabably that did not make it more understandable sometimes. The whole subspace way of is used, but there is a lot of improvement possible and I would be happy if more feel like stepping up to fill that void, since you do not need to know the implementation details for that. - Sebastian > > 2. One reason I use numpy in teaching is its indexing behavior. > > What specific language provides a better indexing model, > > in your opinion? > > > > 3. I admit, my students are NOT using non-boolen fancy indexing on > > multidimensional arrays. (As far as I know.) Are yours? > > Well, okay, this would explain it, since integer fancy indexing is > exactly the confusing case :-) On the plus side, this also means that > even if pigs started doing barrel-rolls through hell's > winter-vortex-chilled air tomorrow and we simply removed integer fancy > indexing, your students would be unaffected :-) > > -n > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Thu Apr 9 03:07:59 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Thu, 09 Apr 2015 09:07:59 +0200 Subject: [Numpy-discussion] Non-meta indexing improvements discussion In-Reply-To: <1428562243.26878.26.camel@sipsolutions.net> References: <1428562243.26878.26.camel@sipsolutions.net> Message-ID: <1428563279.26878.40.camel@sipsolutions.net> On Do, 2015-04-09 at 08:50 +0200, Sebastian Berg wrote: > Hi all, > > Let me take a shot at summing up some suggestions to make the indexing > less surprising, and maybe we can gather some more in a more > concentrated way now. > Did not want to comment on the first mail > 1. Implement something like `arr.fancy_index[...]` and > `arr.ortho_index[...]` (i.e. Jaimes PR is the start for trying this) > I like this, personally. There is not much to be lost and I fully agree with Robert on this. It opens up a lot of possibilities for us and especially also others. > 2. Add warnings for non-consecutive advanced indexing (i.e. the original > example `arr[0, :, index_array]`). This could be annoying sometimes, but then warnings do not break legacy code, and I think in new code again Robert is right for these cases using arr.fancy_index[...] is more explicit and a nice warning/google help to the confused reader. > > 3. I do not know if it possible or useful, but I could imagine a module > wide switch (similar to __future__ imports) to change the default > indexing behaviour. > OK, my suggestion.... But actually I do not know if I like it all that much (nor if it can be done) since 1. and 2. seem to me like enough. But if someone feels strongly about fancy indexing being bad, I wanted to point out that there may be ways to go down the road to "switch" numpy without actually switching. - Sebastian > > One more thing, implementing this (especially the "new" indexing) is > non-trivial, so as always help beyond just a discussion is appreciated > and in my opinion the best way to push an actual change to happen sooner > rather then in some far off future. I do not have time for concentrating > much on an implementation for a while myself for a while at least. > > - Sebastian > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From robert.kern at gmail.com Thu Apr 9 06:13:01 2015 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 9 Apr 2015 11:13:01 +0100 Subject: [Numpy-discussion] Non-meta indexing improvements discussion In-Reply-To: <1428563279.26878.40.camel@sipsolutions.net> References: <1428562243.26878.26.camel@sipsolutions.net> <1428563279.26878.40.camel@sipsolutions.net> Message-ID: On Thu, Apr 9, 2015 at 8:07 AM, Sebastian Berg wrote: > > On Do, 2015-04-09 at 08:50 +0200, Sebastian Berg wrote: > > 3. I do not know if it possible or useful, but I could imagine a module > > wide switch (similar to __future__ imports) to change the default > > indexing behaviour. > > OK, my suggestion.... But actually I do not know if I like it all that > much (nor if it can be done) since 1. and 2. seem to me like enough. But > if someone feels strongly about fancy indexing being bad, I wanted to > point out that there may be ways to go down the road to "switch" numpy > without actually switching. I can't think of a way to actually make that work (I can list all the ways I thought of that *don't* work if anyone insists, but it's a tedious dead end), but it also seems to me to be a step backward. Assuming that we have both .ortho_ix and .fancy_ix to work with, it seems to me that the explicitness is a good thing. Even if in this module you only want to exploit one of those semantics, your module's readers live in a wider context where both semantics play a role. Moving your marker of "this is the non-default semantics I am using" to some module or function header away from where the semantics are actually used makes the code harder to read. A newish user trying to read some nontrivial indexing code may come to the list and ask "what exactly is this expression doing here?" and give us just the line of code with the expression (anecdotally, this is usually how this scenario goes down). If we have to answer "it depends; is there an @ortho_indexing decorator at the top of the function?", that's probably a cure worse than the disease. The properties are a good way to provide googleable signposts right where the tricky semantics are being used. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From markbak at gmail.com Thu Apr 9 06:14:17 2015 From: markbak at gmail.com (Mark Bakker) Date: Thu, 9 Apr 2015 12:14:17 +0200 Subject: [Numpy-discussion] what files to include with compiled fortran extension on Mac Message-ID: Hello list, I want to send somebody my compiled fortran extension on a Mac (compiled with f2py and gfortran). Problem is that it doesn't work on other Macs unless they also instal xcode (2 GB, yikes!) and gfortran. So apparently there are some additional files missing when I just send the compiled extension. Does anybody know what other files to include or (better) how to compile a fortran extension without needing to send any additional files? Thanks, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Thu Apr 9 08:25:48 2015 From: alan.isaac at gmail.com (Alan G Isaac) Date: Thu, 09 Apr 2015 08:25:48 -0400 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: References: <55256EF2.6040602@gmail.com> Message-ID: <55266FCC.1010304@gmail.com> On 4/9/2015 1:57 AM, Nathaniel Smith wrote: > Do you think there's anything we could be > doing to reduce this kind of adrenaline reaction while still allowing > for relaxed discussion about out-there ideas? numpy3000 at scipy.org :-) From alan.isaac at gmail.com Thu Apr 9 10:11:54 2015 From: alan.isaac at gmail.com (Alan G Isaac) Date: Thu, 09 Apr 2015 10:11:54 -0400 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: References: <55256EF2.6040602@gmail.com> <55257BE4.1040205@hawaii.edu> <55258972.6050501@gmail.com> Message-ID: <552688AA.4050305@gmail.com> > Alan wrote: >> 3. I admit, my students are NOT using non-boolen fancy indexing on >> >multidimensional arrays. (As far as I know.) Are yours? On 4/9/2015 2:22 AM, Nathaniel Smith wrote: > Well, okay, this would explain it, since integer fancy indexing is > exactly the confusing case:-) On the plus side, this also means that > even if pigs started doing barrel-rolls through hell's > winter-vortex-chilled air tomorrow and we simply removed integer fancy > indexing, your students would be unaffected:-) Except that they do use statsmodels, which I believe (?) does make use of integer fancy-indexing. Alan From josef.pktd at gmail.com Thu Apr 9 12:12:54 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 9 Apr 2015 12:12:54 -0400 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: <552688AA.4050305@gmail.com> References: <55256EF2.6040602@gmail.com> <55257BE4.1040205@hawaii.edu> <55258972.6050501@gmail.com> <552688AA.4050305@gmail.com> Message-ID: On Thu, Apr 9, 2015 at 10:11 AM, Alan G Isaac wrote: > > Alan wrote: >>> 3. I admit, my students are NOT using non-boolen fancy indexing on >>> >multidimensional arrays. (As far as I know.) Are yours? The only confusing case is mixing slices and integer array indexing for ndim > 2. The rest looks unsurprising, AFAIR (AFAICS, my last fancy indexing mailing list discussion is at least 4 years old, with Jonathan Taylor. I don't remember when I discovered the usefulness of the axis argument in take which covers many 3 or higher dimensional indexing use cases.) > > > On 4/9/2015 2:22 AM, Nathaniel Smith wrote: >> Well, okay, this would explain it, since integer fancy indexing is >> exactly the confusing case:-) On the plus side, this also means that >> even if pigs started doing barrel-rolls through hell's >> winter-vortex-chilled air tomorrow and we simply removed integer fancy >> indexing, your students would be unaffected:-) > > > Except that they do use statsmodels, which I believe (?) does make use of > integer fancy-indexing. And maybe all work would come to a standstill, because every library is using fancy integer indexing. I still don't know what all constitutes fancy indexing. The two most common use cases for me (statsmodels) are indexing for selecting elements like diag_indices, triu_indices and maybe nonzero, and expanding from a unique array like inverse index in np.unique. And there are just a few, AFAIR, orthogonal indexing cases with broadcasting index arrays to select rectangular pieces of an array. Josef > > Alan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From andrew.collette at gmail.com Thu Apr 9 14:00:24 2015 From: andrew.collette at gmail.com (Andrew Collette) Date: Thu, 9 Apr 2015 12:00:24 -0600 Subject: [Numpy-discussion] ANN: HDF5 for Python 2.5.0 Message-ID: Announcing HDF5 for Python (h5py) 2.5.0 ======================================== The h5py team is happy to announce the availability of h5py 2.5.0. This release introduces experimental support for the highly-anticipated "Single Writer Multiple Reader" (SWMR) feature in the upcoming HDF5 1.10 release. SWMR allows sharing of a single HDF5 file between multiple processes without the complexity of MPI or multiprocessing-based solutions. This is an experimental feature that should NOT be used in production code. We are interested in getting feedback from the broader community with respect to performance and the API design. For more details, check out the h5py user guide: http://docs.h5py.org/en/latest/swmr.html SWMR support was contributed by Ulrik Pedersen. What's h5py? ------------ The h5py package is a Pythonic interface to the HDF5 binary data format. It lets you store huge amounts of numerical data, and easily manipulate that data from NumPy. For example, you can slice into multi-terabyte datasets stored on disk, as if they were real NumPy arrays. Thousands of datasets can be stored in a single file, categorized and tagged however you want. Documentation is at: http://docs.h5py.org Changes ------- * Experimental SWMR support * Group and AttributeManager classes now inherit from the appropriate ABCs * Fixed an issue with 64-bit float VLENS * Cython warning cleanups related to "const" * Entire code base ported to "six"; 2to3 removed from setup.py Acknowledgements --------------- This release incorporates changes from, among others: * Ulrik Pedersen * James Tocknell * Will Parkin * Antony Lee * Peter H. Li * Peter Colberg * Ghislain Antony Vaillant Where to get it --------------- Downloads, documentation, and more are available at the h5py website: http://www.h5py.org From njs at pobox.com Thu Apr 9 14:41:08 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 9 Apr 2015 14:41:08 -0400 Subject: [Numpy-discussion] ANN: HDF5 for Python 2.5.0 In-Reply-To: References: Message-ID: (Off-list) Congrats! Also btw, you might want to switch to a new subject line format for these emails -- the mention of Python 2.5 getting hdf5 support made me do a serious double take before I figured out what was going on, and 2.6 and 2.7 will be even worse :-) On Apr 9, 2015 2:07 PM, "Andrew Collette" wrote: > Announcing HDF5 for Python (h5py) 2.5.0 > ======================================== > > The h5py team is happy to announce the availability of h5py 2.5.0. > > This release introduces experimental support for the highly-anticipated > "Single Writer Multiple Reader" (SWMR) feature in the upcoming HDF5 1.10 > release. SWMR allows sharing of a single HDF5 file between multiple > processes without the complexity of MPI or multiprocessing-based > solutions. > > This is an experimental feature that should NOT be used in production > code. We are interested in getting feedback from the broader community > with respect to performance and the API design. > > For more details, check out the h5py user guide: > http://docs.h5py.org/en/latest/swmr.html > > SWMR support was contributed by Ulrik Pedersen. > > > What's h5py? > ------------ > > The h5py package is a Pythonic interface to the HDF5 binary data format. > > It lets you store huge amounts of numerical data, and easily manipulate > that data from NumPy. For example, you can slice into multi-terabyte > datasets stored on disk, as if they were real NumPy arrays. Thousands of > datasets can be stored in a single file, categorized and tagged however > you want. > > Documentation is at: > > http://docs.h5py.org > > > Changes > ------- > > * Experimental SWMR support > * Group and AttributeManager classes now inherit from the appropriate ABCs > * Fixed an issue with 64-bit float VLENS > * Cython warning cleanups related to "const" > * Entire code base ported to "six"; 2to3 removed from setup.py > > > Acknowledgements > --------------- > > This release incorporates changes from, among others: > > * Ulrik Pedersen > * James Tocknell > * Will Parkin > * Antony Lee > * Peter H. Li > * Peter Colberg > * Ghislain Antony Vaillant > > > Where to get it > --------------- > > Downloads, documentation, and more are available at the h5py website: > > http://www.h5py.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Apr 9 14:55:56 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 9 Apr 2015 14:55:56 -0400 Subject: [Numpy-discussion] ANN: HDF5 for Python 2.5.0 In-Reply-To: References: Message-ID: On Thu, Apr 9, 2015 at 2:41 PM, Nathaniel Smith wrote: > (Off-list) > > Congrats! Also btw, you might want to switch to a new subject line format > for these emails -- the mention of Python 2.5 getting hdf5 support made me > do a serious double take before I figured out what was going on, and 2.6 and > 2.7 will be even worse :-) (offlist) I also had to read the subject line and the first paragraph several times to see who is using python 2.5 Josef :} > > On Apr 9, 2015 2:07 PM, "Andrew Collette" wrote: >> >> Announcing HDF5 for Python (h5py) 2.5.0 >> ======================================== >> >> The h5py team is happy to announce the availability of h5py 2.5.0. >> >> This release introduces experimental support for the highly-anticipated >> "Single Writer Multiple Reader" (SWMR) feature in the upcoming HDF5 1.10 >> release. SWMR allows sharing of a single HDF5 file between multiple >> processes without the complexity of MPI or multiprocessing-based >> solutions. >> >> This is an experimental feature that should NOT be used in production >> code. We are interested in getting feedback from the broader community >> with respect to performance and the API design. >> >> For more details, check out the h5py user guide: >> http://docs.h5py.org/en/latest/swmr.html >> >> SWMR support was contributed by Ulrik Pedersen. >> >> >> What's h5py? >> ------------ >> >> The h5py package is a Pythonic interface to the HDF5 binary data format. >> >> It lets you store huge amounts of numerical data, and easily manipulate >> that data from NumPy. For example, you can slice into multi-terabyte >> datasets stored on disk, as if they were real NumPy arrays. Thousands of >> datasets can be stored in a single file, categorized and tagged however >> you want. >> >> Documentation is at: >> >> http://docs.h5py.org >> >> >> Changes >> ------- >> >> * Experimental SWMR support >> * Group and AttributeManager classes now inherit from the appropriate ABCs >> * Fixed an issue with 64-bit float VLENS >> * Cython warning cleanups related to "const" >> * Entire code base ported to "six"; 2to3 removed from setup.py >> >> >> Acknowledgements >> --------------- >> >> This release incorporates changes from, among others: >> >> * Ulrik Pedersen >> * James Tocknell >> * Will Parkin >> * Antony Lee >> * Peter H. Li >> * Peter Colberg >> * Ghislain Antony Vaillant >> >> >> Where to get it >> --------------- >> >> Downloads, documentation, and more are available at the h5py website: >> >> http://www.h5py.org >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From njs at pobox.com Thu Apr 9 15:35:21 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 9 Apr 2015 15:35:21 -0400 Subject: [Numpy-discussion] ANN: HDF5 for Python 2.5.0 In-Reply-To: References: Message-ID: On Apr 9, 2015 2:41 PM, "Nathaniel Smith" wrote: > > (Off-list) Doh, we do reply-to munging, don't we. Oh well. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From andrew.collette at gmail.com Thu Apr 9 15:41:34 2015 From: andrew.collette at gmail.com (Andrew Collette) Date: Thu, 9 Apr 2015 13:41:34 -0600 Subject: [Numpy-discussion] ANN: HDF5 for Python 2.5.0 In-Reply-To: References: Message-ID: > Congrats! Also btw, you might want to switch to a new subject line format > for these emails -- the mention of Python 2.5 getting hdf5 support made me > do a serious double take before I figured out what was going on, and 2.6 and > 2.7 will be even worse :-) Ha! Didn't even think of that. For our next release I guess we'll have to go straight to h5py 3.5. Andrew From derek at astro.physik.uni-goettingen.de Thu Apr 9 15:53:12 2015 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Thu, 9 Apr 2015 21:53:12 +0200 Subject: [Numpy-discussion] ANN: HDF5 for Python 2.5.0 In-Reply-To: References: Message-ID: <7AE22878-8A8E-47F3-84AA-870C6BE8BDFF@astro.physik.uni-goettingen.de> On 9 Apr 2015, at 9:41 pm, Andrew Collette wrote: > >> Congrats! Also btw, you might want to switch to a new subject line format >> for these emails -- the mention of Python 2.5 getting hdf5 support made me >> do a serious double take before I figured out what was going on, and 2.6 and >> 2.7 will be even worse :-) > > Ha! Didn't even think of that. For our next release I guess we'll > have to go straight to h5py 3.5. You may have to hurry though ;-) "Monday, March 30, 2015 Python 3.5.0a3 has been released. This is the third alpha release of Python 3.5, which will be the next major release of Python. Python 3.5 is still under heavy development and is far from complete.? 3 alpha releases in 7 weeks? On a more serious note though, h5py 2.5.x in the subject would be perfectly clear enough, I think, and also help to distinguish from pytables releases. Derek From ndarray at mac.com Thu Apr 9 20:23:44 2015 From: ndarray at mac.com (Alexander Belopolsky) Date: Thu, 9 Apr 2015 20:23:44 -0400 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: References: <55256EF2.6040602@gmail.com> <55257BE4.1040205@hawaii.edu> <55258972.6050501@gmail.com> <552688AA.4050305@gmail.com> Message-ID: On Thu, Apr 9, 2015 at 12:12 PM, wrote: > On Thu, Apr 9, 2015 at 10:11 AM, Alan G Isaac > wrote: > > > Alan wrote: > >>> 3. I admit, my students are NOT using non-boolen fancy indexing on > >>> >multidimensional arrays. (As far as I know.) Are yours? > > The only confusing case is mixing slices and integer array indexing > for ndim > 2. The rest looks unsurprising, AFAIR What I find somewhat annoying is the difficulty of simultaneously selecting a subset of rows and columns from a given matrix. Suppose I have >>> a array([[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34], [41, 42, 43, 44]]) If I want to select the first two rows and first two columns, I do >>> a[:2,:2] array([[11, 12], [21, 22]]) If I want rows 1 and 2 or columns 1 and 2, it is easy and natural >>> a[[1,2]] array([[21, 22, 23, 24], [31, 32, 33, 34]]) >>> a[:,[1,2]] array([[12, 13], [22, 23], [32, 33], [42, 43]]) but if I try to do both, I get the diagonal instead >>> a[[1,2],[1,2]] array([22, 33]) I could do >>> a[[1,2]][:,[1,2]] array([[22, 23], [32, 33]]) but this creates an extra copy. The best solution I can think of involves something like >>> i = np.array([[1,2]]) >>> a.flat[i + len(a)*i.T] array([[22, 23], [32, 33]]) which is hardly elegant or obvious. -------------- next part -------------- An HTML attachment was scrubbed... URL: From derek at astro.physik.uni-goettingen.de Thu Apr 9 20:41:40 2015 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Fri, 10 Apr 2015 02:41:40 +0200 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: References: <55256EF2.6040602@gmail.com> <55257BE4.1040205@hawaii.edu> <55258972.6050501@gmail.com> <552688AA.4050305@gmail.com> Message-ID: <96A2A6DA-2F57-408C-9DD8-3BBF2792D256@astro.physik.uni-goettingen.de> On 10 Apr 2015, at 2:23 am, Alexander Belopolsky wrote: > > What I find somewhat annoying is the difficulty of simultaneously selecting a subset of rows and columns from a given matrix. > > Suppose I have > > >>> a > array([[11, 12, 13, 14], > [21, 22, 23, 24], > [31, 32, 33, 34], > [41, 42, 43, 44]]) > > If I want to select the first two rows and first two columns, I do > > >>> a[:2,:2] > array([[11, 12], > [21, 22]]) > > If I want rows 1 and 2 or columns 1 and 2, it is easy and natural > > >>> a[[1,2]] > array([[21, 22, 23, 24], > [31, 32, 33, 34]]) > > >>> a[:,[1,2]] > array([[12, 13], > [22, 23], > [32, 33], > [42, 43]]) > > but if I try to do both, I get the diagonal instead > > >>> a[[1,2],[1,2]] > array([22, 33]) > a[1:3,1:3]? Can?t be generalised to arbitrary selections of rows,columns, though (e.g. a[1::2,::2] still works?) Derek From ndarray at mac.com Thu Apr 9 23:26:08 2015 From: ndarray at mac.com (Alexander Belopolsky) Date: Thu, 9 Apr 2015 23:26:08 -0400 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: <96A2A6DA-2F57-408C-9DD8-3BBF2792D256@astro.physik.uni-goettingen.de> References: <55256EF2.6040602@gmail.com> <55257BE4.1040205@hawaii.edu> <55258972.6050501@gmail.com> <552688AA.4050305@gmail.com> <96A2A6DA-2F57-408C-9DD8-3BBF2792D256@astro.physik.uni-goettingen.de> Message-ID: On Thu, Apr 9, 2015 at 8:41 PM, Derek Homeier < derek at astro.physik.uni-goettingen.de> wrote: > > but if I try to do both, I get the diagonal instead > > > > >>> a[[1,2],[1,2]] > > array([22, 33]) > > > a[1:3,1:3]? > > Can?t be generalised to arbitrary selections of rows,columns, though (e.g. > a[1::2,::2] still works?) > I am interested in the arbitrary selection of rows and columns given by indices or by boolean selectors. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Thu Apr 9 23:37:22 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Thu, 9 Apr 2015 20:37:22 -0700 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: References: <55256EF2.6040602@gmail.com> <55257BE4.1040205@hawaii.edu> <55258972.6050501@gmail.com> <552688AA.4050305@gmail.com> <96A2A6DA-2F57-408C-9DD8-3BBF2792D256@astro.physik.uni-goettingen.de> Message-ID: On Thu, Apr 9, 2015 at 8:26 PM, Alexander Belopolsky wrote: > > On Thu, Apr 9, 2015 at 8:41 PM, Derek Homeier < > derek at astro.physik.uni-goettingen.de> wrote: > >> > but if I try to do both, I get the diagonal instead >> > >> > >>> a[[1,2],[1,2]] >> > array([22, 33]) >> > >> a[1:3,1:3]? >> >> Can?t be generalised to arbitrary selections of rows,columns, though >> (e.g. a[1::2,::2] still works?) >> > > I am interested in the arbitrary selection of rows and columns given by > indices or by boolean selectors. > This is what you are looking for: a[np.ix_([1, 2], [1, 2])] Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From permafacture at gmail.com Fri Apr 10 00:30:14 2015 From: permafacture at gmail.com (Elliot) Date: Thu, 9 Apr 2015 21:30:14 -0700 (MST) Subject: [Numpy-discussion] Trouble subclassing ndarray Message-ID: <1428640214104-40176.post@n7.nabble.com> Hi all, Sorry if this is the wrong forum for a question like this. I'm trying to create an object with multiple inheritance, one of which is from numpy.ndarray. The other gives it cacheable properties and defines a __getattr__ to deliver the cached properties. The initial instantiation is successful, but ufuncs and slices cause an infinite recursion where the __getattr__ function is used but the _cacheable attribute is not set (ie: from __init__ ) I am using docs.scipy.org/doc/numpy/user/basics.subclassing.html as a reference. Here is code that shows my problem (python 2.7, numpy 1.8.2) ===================== from __future__ import print_function import numpy as np class Cacheable(object): def __init__(self,*args,**kwargs): self._cacheable = {} def __getattr__(self,key): print("getting %s"%key) if key in self._cacheable: print(" found it") self._cacheable[key]() return self.__dict__[key] #if chache function does't update # data you're going to have a bad time else: raise AttributeError def _clear(self): '''clears derived properties''' for key in self._cacheable: if key in self.__dict__: del self.__dict__[key] class BaseGeometry(np.ndarray,Cacheable): '''Numpy array with extra attributes that are cacheable arrays''' def __new__(cls,input_array,dtype=np.float64,*args,**kwargs): # Input array is an already formed ndarray instance # We first cast to be our class type obj = np.asarray(input_array,dtype=dtype).view(cls) # Finally, we must return the newly created object: return obj def __init__(self,dims=None,dtype=np.float64,readonly=True): #TODO: sort through args and kwargs to make better self.readonly=readonly if readonly: self.flags.writeable=False self.dims=dims self._dtype = dtype Cacheable.__init__(self) def writeable_copy(self): ret = np.copy(self) ret.flags.writeable = True return ret def __array_finalize_(self,obj): #New object, will be created in __new__ print("array_finalize") if obj is None: return # created from slice or template print("finalizing slice") self._cacheable = getattr(obj, '_cacheable', None) if __name__ == "__main__": n = 5 test = BaseGeometry(np.random.randint(-25,25,(n,2))) print("this works:",test._cacheable) broken = test[1:4] #interestingly, no problem here print(broken) #infinite recursion =================== array_finalize is never called. output is: this works: {} getting _cacheable getting _cacheable getting _cacheable [and on and on] getting _cacheable getting _cacheable ) failed: RuntimeError: maximum recursion depth exceeded while calling a Python object> -- View this message in context: http://numpy-discussion.10968.n7.nabble.com/Trouble-subclassing-ndarray-tp40176.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From njs at pobox.com Fri Apr 10 01:05:32 2015 From: njs at pobox.com (Nathaniel Smith) Date: Fri, 10 Apr 2015 01:05:32 -0400 Subject: [Numpy-discussion] Trouble subclassing ndarray In-Reply-To: <1428640214104-40176.post@n7.nabble.com> References: <1428640214104-40176.post@n7.nabble.com> Message-ID: On Fri, Apr 10, 2015 at 12:30 AM, Elliot wrote: > I'm trying to create an object with multiple inheritance, one of which is > from numpy.ndarray. Sorry that this is an unhelpful answer, but I want to quickly say that this sentence sets off all kinds of alarm bells in my mind. Subclassing ndarray is almost always a bad idea (really it is always a bad idea, just sometimes you have absolutely no alternative), and multiple inheritance is almost always a bad idea (well, personally I think it actually always is a bad idea, but I recognize that opinions differ), and I am 99.999% sure that any design that can be described by the sentence quoted above is a design that you will look back on and regret. Sorry to be the bearer of bad news. Maybe you can just have a simple object that implements the cacheable behaviour and also has an ndarray as an attribute (i.e., your object could HAS-A ndarray instead of IS-A ndarray)? Hopefully someone with a bit more time will be more helpful and figure out what is actually going on in your example, I'm sure it's some wackily weird issue... -n -- Nathaniel J. Smith -- http://vorpus.org From sebastian at sipsolutions.net Fri Apr 10 03:13:26 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 10 Apr 2015 09:13:26 +0200 Subject: [Numpy-discussion] Trouble subclassing ndarray In-Reply-To: <1428640214104-40176.post@n7.nabble.com> References: <1428640214104-40176.post@n7.nabble.com> Message-ID: <1428650006.3270.2.camel@sipsolutions.net> On Do, 2015-04-09 at 21:30 -0700, Elliot wrote: > Hi all, > > Sorry if this is the wrong forum for a question like this. > > I'm trying to create an object with multiple inheritance, one of which is > from numpy.ndarray. The other gives it cacheable properties and defines a > __getattr__ to deliver the cached properties. The initial instantiation is > successful, but ufuncs and slices cause an infinite recursion where the > __getattr__ function is used but the _cacheable attribute is not set (ie: > from __init__ ) > You have a typo in your __array_finalize__ it misses the last underscore, that is probably why it is never called. About the infinite recursion, not sure on first sight. > > I am using docs.scipy.org/doc/numpy/user/basics.subclassing.html as a > reference. > Here is code that shows my problem (python 2.7, numpy 1.8.2) > > ===================== > > from __future__ import print_function > import numpy as np > > class Cacheable(object): > > def __init__(self,*args,**kwargs): > self._cacheable = {} > > def __getattr__(self,key): > print("getting %s"%key) > if key in self._cacheable: > print(" found it") > self._cacheable[key]() > return self.__dict__[key] #if chache function does't update > # data you're going to have a bad time > else: > raise AttributeError > > def _clear(self): > '''clears derived properties''' > for key in self._cacheable: > if key in self.__dict__: > del self.__dict__[key] > > > class BaseGeometry(np.ndarray,Cacheable): > '''Numpy array with extra attributes that are cacheable arrays''' > > def __new__(cls,input_array,dtype=np.float64,*args,**kwargs): > # Input array is an already formed ndarray instance > # We first cast to be our class type > obj = np.asarray(input_array,dtype=dtype).view(cls) > # Finally, we must return the newly created object: > return obj > > def __init__(self,dims=None,dtype=np.float64,readonly=True): > #TODO: sort through args and kwargs to make better > self.readonly=readonly > if readonly: > self.flags.writeable=False > self.dims=dims > self._dtype = dtype > Cacheable.__init__(self) > > def writeable_copy(self): > ret = np.copy(self) > ret.flags.writeable = True > return ret > > def __array_finalize_(self,obj): > #New object, will be created in __new__ > print("array_finalize") > if obj is None: return > # created from slice or template > print("finalizing slice") > self._cacheable = getattr(obj, '_cacheable', None) > > > if __name__ == "__main__": > n = 5 > test = BaseGeometry(np.random.randint(-25,25,(n,2))) > print("this works:",test._cacheable) > broken = test[1:4] #interestingly, no problem here > print(broken) #infinite recursion > > =================== > > array_finalize is never called. > > output is: > > this works: {} > getting _cacheable > getting _cacheable > getting _cacheable > > [and on and on] > > getting _cacheable > getting _cacheable > ) failed: RuntimeError: > maximum recursion depth exceeded while calling a Python object> > > > > > -- > View this message in context: http://numpy-discussion.10968.n7.nabble.com/Trouble-subclassing-ndarray-tp40176.html > Sent from the Numpy-discussion mailing list archive at Nabble.com. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From pav at iki.fi Fri Apr 10 06:50:50 2015 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 10 Apr 2015 13:50:50 +0300 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: References: <55256EF2.6040602@gmail.com> <55257BE4.1040205@hawaii.edu> <55258972.6050501@gmail.com> <552688AA.4050305@gmail.com> Message-ID: 10.04.2015, 03:23, Alexander Belopolsky kirjoitti: [clip] > Suppose I have > >>>> a > array([[11, 12, 13, 14], > [21, 22, 23, 24], > [31, 32, 33, 34], > [41, 42, 43, 44]]) [clip] > but if I try to do both, I get the diagonal instead > >>>> a[[1,2],[1,2]] > array([22, 33]) You want from numpy import ix_, array a = array([[11, 12, 13, 14], [21, 22, 23, 24], [31, 32, 33, 34], [41, 42, 43, 44]]) print(a[ix_([1,2], [1,2])]) What it ix_ actually does can be understood looking at print(ix_([1,2],[1,2])) From alan.isaac at gmail.com Fri Apr 10 12:22:09 2015 From: alan.isaac at gmail.com (Alan G Isaac) Date: Fri, 10 Apr 2015 12:22:09 -0400 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: References: <55256EF2.6040602@gmail.com> <55257BE4.1040205@hawaii.edu> <55258972.6050501@gmail.com> <552688AA.4050305@gmail.com> <96A2A6DA-2F57-408C-9DD8-3BBF2792D256@astro.physik.uni-goettingen.de> Message-ID: <5527F8B1.3030907@gmail.com> On 4/9/2015 11:26 PM, Alexander Belopolsky wrote: > > but if I try to do both, I get the diagonal instead > > > > >>> a[[1,2],[1,2]] > > array([22, 33]) > > > On Thu, Apr 9, 2015 at 8:41 PM, Derek Homeier > wrote: > a[1:3,1:3]? > Can?t be generalised to arbitrary selections of rows,columns, though (e.g. a[1::2,::2] still works?) On 4/9/2015 11:26 PM, Alexander Belopolsky wrote: > I am interested in the arbitrary selection of rows and columns given by indices or by boolean selectors. You mean like this? import numpy as np a = np.arange(20).reshape((4,5)) rows = [0,3] cols = [1,2,4] print a[rows][:,cols] Alan Isaac From Permafacture at gmail.com Fri Apr 10 12:40:33 2015 From: Permafacture at gmail.com (Elliot Hallmark) Date: Fri, 10 Apr 2015 11:40:33 -0500 Subject: [Numpy-discussion] Trouble subclassing ndarray In-Reply-To: <1428650006.3270.2.camel@sipsolutions.net> References: <1428640214104-40176.post@n7.nabble.com> <1428650006.3270.2.camel@sipsolutions.net> Message-ID: > You have a typo in your __array_finalize__ it misses the last underscore, that is probably why it is never called. About the infinite recursion, not sure on first sight. Oh gosh, it was the underscrore! infinite recursion no longer. I was searching all over for a misspelled "_cacheable". >Subclassing ndarray is almost always a bad idea (really it is always a bad idea, just sometimes you have absolutely no alternative), and multiple inheritance is almost always a bad idea (well, personally I think it actually always is a bad idea, but I recognize that opinions differ), and I am 99.999% sure that any design that can be described by the sentence quoted above is a design that you will look back on and regret. So, now that this works, I'm open to hear more about why this is an awful idea (if it is). Why might I regret this later? And will this add object creation overhead to every ufunc and slice or otherwise degrade performance? Thanks -------------- next part -------------- An HTML attachment was scrubbed... URL: From derek at astro.physik.uni-goettingen.de Fri Apr 10 12:58:56 2015 From: derek at astro.physik.uni-goettingen.de (Derek Homeier) Date: Fri, 10 Apr 2015 18:58:56 +0200 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: <5527F8B1.3030907@gmail.com> References: <55256EF2.6040602@gmail.com> <55257BE4.1040205@hawaii.edu> <55258972.6050501@gmail.com> <552688AA.4050305@gmail.com> <96A2A6DA-2F57-408C-9DD8-3BBF2792D256@astro.physik.uni-goettingen.de> <5527F8B1.3030907@gmail.com> Message-ID: <459565D7-B19B-4678-8A9F-CB5510F01DA4@astro.physik.uni-goettingen.de> On 10 Apr 2015, at 06:22 pm, Alan G Isaac wrote: >> >> On Thu, Apr 9, 2015 at 8:41 PM, Derek Homeier > wrote: >> a[1:3,1:3]? >> Can?t be generalised to arbitrary selections of rows,columns, though (e.g. a[1::2,::2] still works?) > > > > On 4/9/2015 11:26 PM, Alexander Belopolsky wrote: >> I am interested in the arbitrary selection of rows and columns given by indices or by boolean selectors. > > > > You mean like this? > import numpy as np > a = np.arange(20).reshape((4,5)) > rows = [0,3] > cols = [1,2,4] > print a[rows][:,cols] This creates a copy, same apparently with np.ix_ - an objection I had cut from the original post? Compare to b = a[::2,1::2] b *= 2 print(a) On 10 Apr 2015, at 02:23 am, Alexander Belopolsky wrote: > I could do > > >>> a[[1,2]][:,[1,2]] > array([[22, 23], > [32, 33]]) > > but this creates an extra copy. > > The best solution I can think of involves something like > > >>> i = np.array([[1,2]]) > >>> a.flat[i + len(a)*i.T] > array([[22, 23], > [32, 33]]) > > which is hardly elegant or obvious. > From efiring at hawaii.edu Fri Apr 10 13:19:07 2015 From: efiring at hawaii.edu (Eric Firing) Date: Fri, 10 Apr 2015 07:19:07 -1000 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: <459565D7-B19B-4678-8A9F-CB5510F01DA4@astro.physik.uni-goettingen.de> References: <55256EF2.6040602@gmail.com> <55257BE4.1040205@hawaii.edu> <55258972.6050501@gmail.com> <552688AA.4050305@gmail.com> <96A2A6DA-2F57-408C-9DD8-3BBF2792D256@astro.physik.uni-goettingen.de> <5527F8B1.3030907@gmail.com> <459565D7-B19B-4678-8A9F-CB5510F01DA4@astro.physik.uni-goettingen.de> Message-ID: <5528060B.2010505@hawaii.edu> On 2015/04/10 6:58 AM, Derek Homeier wrote: > On 10 Apr 2015, at 06:22 pm, Alan G Isaac wrote: > >>> >>> On Thu, Apr 9, 2015 at 8:41 PM, Derek Homeier > wrote: >>> a[1:3,1:3]? >>> Can?t be generalised to arbitrary selections of rows,columns, though (e.g. a[1::2,::2] still works?) >> >> >> >> On 4/9/2015 11:26 PM, Alexander Belopolsky wrote: >>> I am interested in the arbitrary selection of rows and columns given by indices or by boolean selectors. >> >> >> >> You mean like this? >> import numpy as np >> a = np.arange(20).reshape((4,5)) >> rows = [0,3] >> cols = [1,2,4] >> print a[rows][:,cols] I think this will actually make a copy, and then another copy; it is doing fancy indexing twice, sequentially. There is now way to get around making at least one copy, though, because an ndarray has strided memory access. > > This creates a copy, same apparently with np.ix_ - an objection I had cut from the original post? > Compare to > b = a[::2,1::2] Slicing doesn't require a copy because it remains compatible with strided access. Eric > b *= 2 > print(a) > > On 10 Apr 2015, at 02:23 am, Alexander Belopolsky wrote: > >> I could do >> >>>>> a[[1,2]][:,[1,2]] >> array([[22, 23], >> [32, 33]]) >> >> but this creates an extra copy. >> >> The best solution I can think of involves something like >> >>>>> i = np.array([[1,2]]) >>>>> a.flat[i + len(a)*i.T] >> array([[22, 23], >> [32, 33]]) >> >> which is hardly elegant or obvious. >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From jaime.frio at gmail.com Fri Apr 10 13:25:30 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Fri, 10 Apr 2015 10:25:30 -0700 Subject: [Numpy-discussion] On responding to dubious ideas (was: Re: Advanced indexing: "fancy" vs. orthogonal) In-Reply-To: <459565D7-B19B-4678-8A9F-CB5510F01DA4@astro.physik.uni-goettingen.de> References: <55256EF2.6040602@gmail.com> <55257BE4.1040205@hawaii.edu> <55258972.6050501@gmail.com> <552688AA.4050305@gmail.com> <96A2A6DA-2F57-408C-9DD8-3BBF2792D256@astro.physik.uni-goettingen.de> <5527F8B1.3030907@gmail.com> <459565D7-B19B-4678-8A9F-CB5510F01DA4@astro.physik.uni-goettingen.de> Message-ID: On Fri, Apr 10, 2015 at 9:58 AM, Derek Homeier < derek at astro.physik.uni-goettingen.de> wrote: > On 10 Apr 2015, at 06:22 pm, Alan G Isaac wrote: > > >> > >> On Thu, Apr 9, 2015 at 8:41 PM, Derek Homeier < > derek at astro.physik.uni-goettingen.de derek at astro.physik.uni-goettingen.de>> wrote: > >> a[1:3,1:3]? > >> Can?t be generalised to arbitrary selections of rows,columns, though > (e.g. a[1::2,::2] still works?) > > > > > > > > On 4/9/2015 11:26 PM, Alexander Belopolsky wrote: > >> I am interested in the arbitrary selection of rows and columns given by > indices or by boolean selectors. > > > > > > > > You mean like this? > > import numpy as np > > a = np.arange(20).reshape((4,5)) > > rows = [0,3] > > cols = [1,2,4] > > print a[rows][:,cols] > > This creates a copy, same apparently with np.ix_ - an objection I had cut > from the original post? > Compare to > b = a[::2,1::2] > b *= 2 > print(a) > Well, the numpy ndarray model requires constant strides along each dimension, so yes, for consistency fancy indexing always makes a copy. I believe Alexander's complaint was not that a[[1, 2]][:, [1, 2]] makes one copy, but that it makes two. Also, with that double fancy indexing approach you cannot assign to the subarray, something that np.ix_ does enable: >>> a = np.arange(16).reshape(4, 4) >>> a array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11], [12, 13, 14, 15]]) >>> a[np.ix_([1, 2], [1, 2])] *= 2 >>> a array([[ 0, 1, 2, 3], [ 4, 10, 12, 7], [ 8, 18, 20, 11], [12, 13, 14, 15]]) Where it gets complicated is when you have an at least 3D array, and you want to orthogonally index the first and last dimensions, while extracting a slice from the middle one. There's no easy answer to that one, and that's where this whole discussion on modifying numpy's indexing starts to get some real traction. Jaime > > On 10 Apr 2015, at 02:23 am, Alexander Belopolsky wrote: > > > I could do > > > > >>> a[[1,2]][:,[1,2]] > > array([[22, 23], > > [32, 33]]) > > > > but this creates an extra copy. > > > > The best solution I can think of involves something like > > > > >>> i = np.array([[1,2]]) > > >>> a.flat[i + len(a)*i.T] > > array([[22, 23], > > [32, 33]]) > > > > which is hardly elegant or obvious. > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Fri Apr 10 13:43:50 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 10 Apr 2015 19:43:50 +0200 Subject: [Numpy-discussion] Trouble subclassing ndarray In-Reply-To: References: <1428640214104-40176.post@n7.nabble.com> <1428650006.3270.2.camel@sipsolutions.net> Message-ID: <1428687830.3270.18.camel@sipsolutions.net> On Fr, 2015-04-10 at 11:40 -0500, Elliot Hallmark wrote: > > You have a typo in your __array_finalize__ it misses the last > underscore, that is probably why it is never called. About the > infinite recursion, not sure on first sight. > > Oh gosh, it was the underscrore! infinite recursion no longer. I was > searching all over for a misspelled "_cacheable". > > >Subclassing ndarray is almost always a bad idea (really it is always > a bad idea, just sometimes you have absolutely no alternative), and > multiple inheritance is almost always a bad idea (well, personally I > think it actually always is a bad idea, but I recognize that opinions > differ), and I am 99.999% sure that any design that can be described > by the sentence quoted above is a design that you will look back on > and regret. > > > So, now that this works, I'm open to hear more about why this is an > awful idea (if it is). Why might I regret this later? And will this > add object creation overhead to every ufunc and slice or otherwise > degrade performance? > Performance wise it should not matter significantly, that is not the problem. However, know that you will never get it to work quite right with some functionality. So if at some point you find a quirk and do not know how to fix it.... It is quite likely there is no fix. On the other hand, if you do not care much about losing your subclass (i.e. getting a normal array) for some numpy functions, nor do weirder things (messing with the shape or such) you are probably mostly fine. For bigger projects I might still worry if it is the right path, but for something more limited, maybe it is the simplest way to get to something good enough. - Sebastian > > Thanks > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From jaime.frio at gmail.com Fri Apr 10 14:27:14 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Fri, 10 Apr 2015 11:27:14 -0700 Subject: [Numpy-discussion] Non-meta indexing improvements discussion In-Reply-To: <1428562243.26878.26.camel@sipsolutions.net> References: <1428562243.26878.26.camel@sipsolutions.net> Message-ID: On Wed, Apr 8, 2015 at 11:50 PM, Sebastian Berg wrote: > Hi all, > > Let me take a shot at summing up some suggestions to make the indexing > less surprising, and maybe we can gather some more in a more > concentrated way now. > > 1. Implement something like `arr.fancy_index[...]` and > `arr.ortho_index[...]` (i.e. Jaimes PR is the start for trying this) > > 2. Add warnings for non-consecutive advanced indexing (i.e. the original > example `arr[0, :, index_array]`). > > 3. I do not know if it possible or useful, but I could imagine a module > wide switch (similar to __future__ imports) to change the default > indexing behaviour. > > > One more thing, implementing this (especially the "new" indexing) is > non-trivial, so as always help beyond just a discussion is appreciated > and in my opinion the best way to push an actual change to happen sooner > rather then in some far off future. I do not have time for concentrating > much on an implementation for a while myself for a while at least. > I don't think that should be the biggest hurdle if we decided to go for it. I'm pretty sure that, with some adult supervision, I could pull that one off. And you don't always get a chance to put your hands on code as cool as numpy's indexing! ;-) But before we go down that route, I think we should spend a non-trivial amount of effort on figuring out exactly what we want. While the proposal that has started to emerge from this discussion, with the two indexers, sort of makes sense, we may very well be doing this ... To paraphrase General Ripper , I have neither the time, the training, nor the inclination for strategic thought. But we should probably write this into a NEP, and share it with the involved parties (netcdf, xay, pandas, Blaze, dynd...) and try to come to some agreement. We are already late for SciPy15, but perhaps a talk on the proposal would be in order for EuroSciPy, or for one of the PyData conferences, to try and gather some feedback from potential users. Without that, this isn't going anywhere... Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Sat Apr 11 12:06:16 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 11 Apr 2015 12:06:16 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: Message-ID: > > On Wed, Apr 8, 2015 at 7:34 PM, Neil Girdhar > wrote: > > > * Numpy's outer product works fine with vectors. However, I seem to always > *> > * want len(outer(a, b).shape) to be equal to len(a.shape) + len(b.shape). * > > > * Wolfram-alpha seems to agree *> > * https://reference.wolfram.com/language/ref/Outer.html > with respect to > matrix *> > * outer products. * You're probably right that this is the correct > definition of the outer > product in an n-dimensional world. But this seems to go beyond being > just a bug in handling 0-d arrays (which is the kind of corner case > we've fixed in the past); np.outer is documented to always ravel its > inputs to 1d. > In fact the implementation is literally just: > a = asarray(a) > b = asarray(b) > return multiply(a.ravel()[:, newaxis], b.ravel()[newaxis,:], out) > Sebastian's np.multiply.outer is much more generic and effective. > Maybe we should just deprecate np.outer? I don't see what use it > serves. (When and whether it actually got removed after being > deprecated would depend on how much use it actually gets in real code, > which I certainly don't know while typing a quick email. But we could > start telling people not to use it any time.) > +1 with everything you said. (And thanks Sebastian for the pointer to np.multiply.outer!) > -n On Wed, Apr 8, 2015 at 7:34 PM, Neil Girdhar wrote: > Numpy's outer product works fine with vectors. However, I seem to always > want len(outer(a, b).shape) to be equal to len(a.shape) + len(b.shape). > Wolfram-alpha seems to agree > https://reference.wolfram.com/language/ref/Outer.html with respect to > matrix outer products. My suggestion is to define outer as defined below. > I've contrasted it with numpy's current outer product. > > In [36]: def a(n): return np.ones(n) > > In [37]: b = a(()) > > In [38]: c = a(4) > > In [39]: d = a(5) > > In [40]: np.outer(b, d).shape > Out[40]: (1, 5) > > In [41]: np.outer(c, d).shape > Out[41]: (4, 5) > > In [42]: np.outer(c, b).shape > Out[42]: (4, 1) > > In [43]: def outer(a, b): > return a[(...,) + len(b.shape) * (np.newaxis,)] * b > ....: > > In [44]: outer(b, d).shape > Out[44]: (5,) > > In [45]: outer(c, d).shape > Out[45]: (4, 5) > > In [46]: outer(c, b).shape > Out[46]: (4,) > > Best, > > Neil > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Apr 11 12:29:01 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 11 Apr 2015 12:29:01 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: Message-ID: On Sat, Apr 11, 2015 at 12:06 PM, Neil Girdhar wrote: >> On Wed, Apr 8, 2015 at 7:34 PM, Neil Girdhar >> wrote: >> > Numpy's outer product works fine with vectors. However, I seem to always >> > want len(outer(a, b).shape) to be equal to len(a.shape) + len(b.shape). >> > Wolfram-alpha seems to agree >> > https://reference.wolfram.com/language/ref/Outer.html with respect to >> > matrix >> > outer products. >> You're probably right that this is the correct definition of the outer >> product in an n-dimensional world. But this seems to go beyond being >> just a bug in handling 0-d arrays (which is the kind of corner case >> we've fixed in the past); np.outer is documented to always ravel its >> inputs to 1d. >> In fact the implementation is literally just: >> a = asarray(a) >> b = asarray(b) >> return multiply(a.ravel()[:, newaxis], b.ravel()[newaxis,:], out) >> Sebastian's np.multiply.outer is much more generic and effective. >> Maybe we should just deprecate np.outer? I don't see what use it >> serves. (When and whether it actually got removed after being >> deprecated would depend on how much use it actually gets in real code, >> which I certainly don't know while typing a quick email. But we could >> start telling people not to use it any time.) > > > +1 with everything you said. Want to write a PR? :-) -- Nathaniel J. Smith -- http://vorpus.org From mistersheik at gmail.com Sat Apr 11 12:39:44 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Sat, 11 Apr 2015 12:39:44 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: Message-ID: I would be happy to, but I'm not sure what that involves? It's just a documentation changelist? On Sat, Apr 11, 2015 at 12:29 PM, Nathaniel Smith wrote: > On Sat, Apr 11, 2015 at 12:06 PM, Neil Girdhar > wrote: > >> On Wed, Apr 8, 2015 at 7:34 PM, Neil Girdhar > >> wrote: > >> > Numpy's outer product works fine with vectors. However, I seem to > always > >> > want len(outer(a, b).shape) to be equal to len(a.shape) + > len(b.shape). > >> > Wolfram-alpha seems to agree > >> > https://reference.wolfram.com/language/ref/Outer.html with respect to > >> > matrix > >> > outer products. > >> You're probably right that this is the correct definition of the outer > >> product in an n-dimensional world. But this seems to go beyond being > >> just a bug in handling 0-d arrays (which is the kind of corner case > >> we've fixed in the past); np.outer is documented to always ravel its > >> inputs to 1d. > >> In fact the implementation is literally just: > >> a = asarray(a) > >> b = asarray(b) > >> return multiply(a.ravel()[:, newaxis], b.ravel()[newaxis,:], out) > >> Sebastian's np.multiply.outer is much more generic and effective. > >> Maybe we should just deprecate np.outer? I don't see what use it > >> serves. (When and whether it actually got removed after being > >> deprecated would depend on how much use it actually gets in real code, > >> which I certainly don't know while typing a quick email. But we could > >> start telling people not to use it any time.) > > > > > > +1 with everything you said. > > Want to write a PR? :-) > > -- > Nathaniel J. Smith -- http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Apr 11 12:49:32 2015 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 11 Apr 2015 12:49:32 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: Message-ID: Documentation and a call to warnings.warn(DeprecationWarning(...)), I guess. On Sat, Apr 11, 2015 at 12:39 PM, Neil Girdhar wrote: > I would be happy to, but I'm not sure what that involves? It's just a > documentation changelist? > > On Sat, Apr 11, 2015 at 12:29 PM, Nathaniel Smith wrote: >> >> On Sat, Apr 11, 2015 at 12:06 PM, Neil Girdhar >> wrote: >> >> On Wed, Apr 8, 2015 at 7:34 PM, Neil Girdhar >> >> wrote: >> >> > Numpy's outer product works fine with vectors. However, I seem to >> >> > always >> >> > want len(outer(a, b).shape) to be equal to len(a.shape) + >> >> > len(b.shape). >> >> > Wolfram-alpha seems to agree >> >> > https://reference.wolfram.com/language/ref/Outer.html with respect to >> >> > matrix >> >> > outer products. >> >> You're probably right that this is the correct definition of the outer >> >> product in an n-dimensional world. But this seems to go beyond being >> >> just a bug in handling 0-d arrays (which is the kind of corner case >> >> we've fixed in the past); np.outer is documented to always ravel its >> >> inputs to 1d. >> >> In fact the implementation is literally just: >> >> a = asarray(a) >> >> b = asarray(b) >> >> return multiply(a.ravel()[:, newaxis], b.ravel()[newaxis,:], out) >> >> Sebastian's np.multiply.outer is much more generic and effective. >> >> Maybe we should just deprecate np.outer? I don't see what use it >> >> serves. (When and whether it actually got removed after being >> >> deprecated would depend on how much use it actually gets in real code, >> >> which I certainly don't know while typing a quick email. But we could >> >> start telling people not to use it any time.) >> > >> > >> > +1 with everything you said. >> >> Want to write a PR? :-) >> >> -- >> Nathaniel J. Smith -- http://vorpus.org >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Nathaniel J. Smith -- http://vorpus.org From nayyarv at gmail.com Sun Apr 12 03:19:20 2015 From: nayyarv at gmail.com (Varun) Date: Sun, 12 Apr 2015 07:19:20 +0000 (UTC) Subject: [Numpy-discussion] Automatic number of bins for numpy histograms Message-ID: http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta tistics/A utomating%20Binwidth%20Choice%20for%20Histogram.ipynb Long story short, histogram visualisations that depend on numpy (such as matplotlib, or nearly all of them) have poor default behaviour as I have to constantly play around with the number of bins to get a good idea of what I'm looking at. The bins=10 works ok for up to 1000 points or very normal data, but has poor performance for anything else, and doesn't account for variability either. I don't have a method easily available to scale the number of bins given the data. R doesn't suffer from these problems and provides methods for use with it's hist method. I would like to provide similar functionality for matplotlib, to at least provide some kind of good starting point, as histograms are very useful for initial data discovery. The notebook above provides an explanation of the problem as well as some proposed alternatives. Use different datasets (type and size) to see the performance of the suggestions. All of the methods proposed exist in R and literature. I've put together an implementation to add this new functionality, but am hesitant to make a pull request as I would like some feedback from a maintainer before doing so. https://github.com/numpy/numpy/compare/master...nayyarv:master I've provided them as functions for easy refactoring, as it can be argued that it should be in it's own function/file/class, or alternatively can be turned into simple if, elif statements. I believe this belongs in numpy as it is where the functionality exists for histogram methods that most libraries build on, and it would useful for them to not require scipy for example. I will update the documentation accordingly before making a pull request, and add in more tests to show it's functionality. I can adapt my ipython notebook into a quick tutorial/help file if need be. I've already attempted to add this into matplotlib before being redirected here https://github.com/matplotlib/matplotlib/issues/4316 From jaime.frio at gmail.com Sun Apr 12 03:45:12 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Sun, 12 Apr 2015 00:45:12 -0700 Subject: [Numpy-discussion] Automatic number of bins for numpy histograms In-Reply-To: References: Message-ID: On Sun, Apr 12, 2015 at 12:19 AM, Varun wrote: > > http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta > tistics/A utomating%20Binwidth%20Choice%20for%20Histogram.ipynb > > Long story short, histogram visualisations that depend on numpy (such as > matplotlib, or nearly all of them) have poor default behaviour as I have > to > constantly play around with the number of bins to get a good idea of what > I'm > looking at. The bins=10 works ok for up to 1000 points or very normal > data, > but has poor performance for anything else, and doesn't account for > variability either. I don't have a method easily available to scale the > number > of bins given the data. > > R doesn't suffer from these problems and provides methods for use with it's > hist method. I would like to provide similar functionality for > matplotlib, to > at least provide some kind of good starting point, as histograms are very > useful for initial data discovery. > > The notebook above provides an explanation of the problem as well as some > proposed alternatives. Use different datasets (type and size) to see the > performance of the suggestions. All of the methods proposed exist in R and > literature. > > I've put together an implementation to add this new functionality, but am > hesitant to make a pull request as I would like some feedback from a > maintainer before doing so. > +1 on the PR. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nayyarv at gmail.com Sun Apr 12 03:46:58 2015 From: nayyarv at gmail.com (Varun) Date: Sun, 12 Apr 2015 07:46:58 +0000 (UTC) Subject: [Numpy-discussion] Automatic number of bins for numpy histograms References: Message-ID: Using a URL shortener for the notebook to get around the 80 char width limit http://goo.gl/JmfTRJ From ralf.gommers at gmail.com Sun Apr 12 04:02:36 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 12 Apr 2015 10:02:36 +0200 Subject: [Numpy-discussion] Automatic number of bins for numpy histograms In-Reply-To: References: Message-ID: On Sun, Apr 12, 2015 at 9:45 AM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > On Sun, Apr 12, 2015 at 12:19 AM, Varun wrote: > >> >> http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta >> tistics/A >> >> utomating%20Binwidth%20Choice%20for%20Histogram.ipynb >> >> Long story short, histogram visualisations that depend on numpy (such as >> matplotlib, or nearly all of them) have poor default behaviour as I have >> to >> constantly play around with the number of bins to get a good idea of >> what I'm >> looking at. The bins=10 works ok for up to 1000 points or very normal >> data, >> but has poor performance for anything else, and doesn't account for >> variability either. I don't have a method easily available to scale the >> number >> of bins given the data. >> >> R doesn't suffer from these problems and provides methods for use with >> it's >> hist method. I would like to provide similar functionality for >> matplotlib, to >> at least provide some kind of good starting point, as histograms are very >> useful for initial data discovery. >> >> The notebook above provides an explanation of the problem as well as some >> proposed alternatives. Use different datasets (type and size) to see the >> performance of the suggestions. All of the methods proposed exist in R >> and >> literature. >> >> I've put together an implementation to add this new functionality, but am >> hesitant to make a pull request as I would like some feedback from a >> maintainer before doing so. >> > > +1 on the PR. > +1 as well. Unfortunately we can't change the default of 10, but a number of string methods, with a "bins=auto" or some such name prominently recommended in the docstring, would be very good to have. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From pkerpedjiev at gmail.com Sun Apr 12 10:15:17 2015 From: pkerpedjiev at gmail.com (Peter Kerpedjiev) Date: Sun, 12 Apr 2015 16:15:17 +0200 Subject: [Numpy-discussion] Numpy compilation error Message-ID: <552A7DF5.1090709@gmail.com> Dear all, Upon trying to install numpy using 'pip install numpy' in a virtualenv, I get the following error messages: creating build/temp.linux-x86_64-2.7/numpy/random/mtrand compile options: '-D_FILE_OFFSET_BITS=64 -D_LARGEFILE_SOURCE=1 -D_LARGEFILE64_SOURCE=1 -Inumpy/core/include -Ibuild/src.linux-x86_64-2.7/numpy/core/include/numpy -Inumpy/core/src/private -Inumpy/core/src -Inumpy/core -Inumpy/core/src/npymath -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/src/npysort -Inumpy/core/include -I/usr/include/python2.7 -Ibuild/src.linux-x86_64-2.7/numpy/core/src/private -Ibuild/src.linux-x86_64-2.7/numpy/core/src/private -Ibuild/src.linux-x86_64-2.7/numpy/core/src/private -Ibuild/src.linux-x86_64-2.7/numpy/core/src/private -c' gcc: numpy/random/mtrand/distributions.c numpy/random/mtrand/distributions.c: In function ?loggam?: numpy/random/mtrand/distributions.c:892:1: internal compiler error: Illegal instruction } ^ Please submit a full bug report, with preprocessed source if appropriate. See for instructions. Preprocessed source stored into /tmp/ccjkBSd2.out file, please attach this to your bugreport. This leads to the compilation process failing with this error: Cleaning up... Command /home/mescalin/pkerp/.virtualenvs/notebooks/bin/python -c "import setuptools;__file__='/home/mescalin/pkerp/.virtualenvs/notebooks/build/numpy/setup.py';exec(compile(open(__file__).read().replace('\r\n', '\n'), __file __, 'exec'))" install --record /tmp/pip-c_Cd7B-record/install-record.txt --single-version-externally-managed --install-headers /home/mescalin/pkerp/.virtualenvs/notebooks/include/site/python2.7 failed with error code 1 in /ho me/mescalin/pkerp/.virtualenvs/notebooks/build/numpy Traceback (most recent call last): File "/home/mescalin/pkerp/.virtualenvs/notebooks/bin/pip", line 9, in load_entry_point('pip==1.4.1', 'console_scripts', 'pip')() File "/home/mescalin/pkerp/.virtualenvs/notebooks/lib/python2.7/site-packages/pip/__init__.py", line 148, in main return command.main(args[1:], options) File "/home/mescalin/pkerp/.virtualenvs/notebooks/lib/python2.7/site-packages/pip/basecommand.py", line 169, in main text = '\n'.join(complete_log) UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 72: ordinal not in range(128) Have any of you encountered a similar problem before? Thanks in advance, -Peter ================================================ The gcc version is: [pkerp at fluidspace ~]$ gcc --version gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-7) Copyright (C) 2013 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. PS: -------------- next part -------------- A non-text attachment was scrubbed... Name: ccjkBSd2.out Type: chemical/x-gulp Size: 101537 bytes Desc: not available URL: From pav at iki.fi Sun Apr 12 10:59:09 2015 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 12 Apr 2015 17:59:09 +0300 Subject: [Numpy-discussion] Numpy compilation error In-Reply-To: <552A7DF5.1090709@gmail.com> References: <552A7DF5.1090709@gmail.com> Message-ID: 12.04.2015, 17:15, Peter Kerpedjiev kirjoitti: [clip] > numpy/random/mtrand/distributions.c:892:1: internal compiler error: > Illegal instruction An internal compiler error means your compiler (in this case, gcc) is broken. The easiest solution is to use a newer version of the compiler, assuming the compiler bug in question has been fixed. Here, it probably has, since I have not seen similar error reports before from this code. From faltet at gmail.com Tue Apr 14 12:07:58 2015 From: faltet at gmail.com (Francesc Alted) Date: Tue, 14 Apr 2015 18:07:58 +0200 Subject: [Numpy-discussion] ANN: numexpr 2.4.1 released Message-ID: ========================= Announcing Numexpr 2.4.1 ========================= Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. What's new ========== In this version there is improved support for newer MKL library as well as other minor improvements. This version is meant for production. In case you want to know more in detail what has changed in this version, see: https://github.com/pydata/numexpr/wiki/Release-Notes or have a look at RELEASE_NOTES.txt in the tarball. Where I can find Numexpr? ========================= The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Tue Apr 14 08:13:19 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Tue, 14 Apr 2015 08:13:19 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: Message-ID: It also appears that cumsum has a lot of unnecessary overhead over add.accumulate: In [51]: %timeit np.add.accumulate(a) The slowest run took 46.31 times longer than the fastest. This could mean that an intermediate result is being cached 1000000 loops, best of 3: 372 ns per loop In [52]: %timeit np.cum np.cumprod np.cumproduct np.cumsum In [52]: %timeit np.cumsum(a) The slowest run took 18.44 times longer than the fastest. This could mean that an intermediate result is being cached 1000000 loops, best of 3: 912 ns per loop In [53]: %timeit np.add.accumulate(a.flatten()) The slowest run took 25.59 times longer than the fastest. This could mean that an intermediate result is being cached 1000000 loops, best of 3: 834 ns per loop On Tue, Apr 14, 2015 at 7:42 AM, Neil Girdhar wrote: > Okay, but by the same token, why do we have cumsum? Isn't it identical to > > np.add.accumulate > > ? or if you're passing in multidimensional data ? > > np.add.accumulate(a.flatten()) > > ? > > add.accumulate feels more generic, would make the other ufunc things more > discoverable, and is self-documenting. > > Similarly, cumprod is just np.multiply.accumulate. > > Best, > > Neil > > > On Sat, Apr 11, 2015 at 12:49 PM, Nathaniel Smith wrote: > >> Documentation and a call to warnings.warn(DeprecationWarning(...)), I >> guess. >> >> On Sat, Apr 11, 2015 at 12:39 PM, Neil Girdhar >> wrote: >> > I would be happy to, but I'm not sure what that involves? It's just a >> > documentation changelist? >> > >> > On Sat, Apr 11, 2015 at 12:29 PM, Nathaniel Smith >> wrote: >> >> >> >> On Sat, Apr 11, 2015 at 12:06 PM, Neil Girdhar >> >> wrote: >> >> >> On Wed, Apr 8, 2015 at 7:34 PM, Neil Girdhar > > >> >> >> wrote: >> >> >> > Numpy's outer product works fine with vectors. However, I seem to >> >> >> > always >> >> >> > want len(outer(a, b).shape) to be equal to len(a.shape) + >> >> >> > len(b.shape). >> >> >> > Wolfram-alpha seems to agree >> >> >> > https://reference.wolfram.com/language/ref/Outer.html with >> respect to >> >> >> > matrix >> >> >> > outer products. >> >> >> You're probably right that this is the correct definition of the >> outer >> >> >> product in an n-dimensional world. But this seems to go beyond being >> >> >> just a bug in handling 0-d arrays (which is the kind of corner case >> >> >> we've fixed in the past); np.outer is documented to always ravel its >> >> >> inputs to 1d. >> >> >> In fact the implementation is literally just: >> >> >> a = asarray(a) >> >> >> b = asarray(b) >> >> >> return multiply(a.ravel()[:, newaxis], b.ravel()[newaxis,:], out) >> >> >> Sebastian's np.multiply.outer is much more generic and effective. >> >> >> Maybe we should just deprecate np.outer? I don't see what use it >> >> >> serves. (When and whether it actually got removed after being >> >> >> deprecated would depend on how much use it actually gets in real >> code, >> >> >> which I certainly don't know while typing a quick email. But we >> could >> >> >> start telling people not to use it any time.) >> >> > >> >> > >> >> > +1 with everything you said. >> >> >> >> Want to write a PR? :-) >> >> >> >> -- >> >> Nathaniel J. Smith -- http://vorpus.org >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> >> >> >> -- >> Nathaniel J. Smith -- http://vorpus.org >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Tue Apr 14 07:42:26 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Tue, 14 Apr 2015 07:42:26 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: Message-ID: Okay, but by the same token, why do we have cumsum? Isn't it identical to np.add.accumulate ? or if you're passing in multidimensional data ? np.add.accumulate(a.flatten()) ? add.accumulate feels more generic, would make the other ufunc things more discoverable, and is self-documenting. Similarly, cumprod is just np.multiply.accumulate. Best, Neil On Sat, Apr 11, 2015 at 12:49 PM, Nathaniel Smith wrote: > Documentation and a call to warnings.warn(DeprecationWarning(...)), I > guess. > > On Sat, Apr 11, 2015 at 12:39 PM, Neil Girdhar > wrote: > > I would be happy to, but I'm not sure what that involves? It's just a > > documentation changelist? > > > > On Sat, Apr 11, 2015 at 12:29 PM, Nathaniel Smith wrote: > >> > >> On Sat, Apr 11, 2015 at 12:06 PM, Neil Girdhar > >> wrote: > >> >> On Wed, Apr 8, 2015 at 7:34 PM, Neil Girdhar > >> >> wrote: > >> >> > Numpy's outer product works fine with vectors. However, I seem to > >> >> > always > >> >> > want len(outer(a, b).shape) to be equal to len(a.shape) + > >> >> > len(b.shape). > >> >> > Wolfram-alpha seems to agree > >> >> > https://reference.wolfram.com/language/ref/Outer.html with > respect to > >> >> > matrix > >> >> > outer products. > >> >> You're probably right that this is the correct definition of the > outer > >> >> product in an n-dimensional world. But this seems to go beyond being > >> >> just a bug in handling 0-d arrays (which is the kind of corner case > >> >> we've fixed in the past); np.outer is documented to always ravel its > >> >> inputs to 1d. > >> >> In fact the implementation is literally just: > >> >> a = asarray(a) > >> >> b = asarray(b) > >> >> return multiply(a.ravel()[:, newaxis], b.ravel()[newaxis,:], out) > >> >> Sebastian's np.multiply.outer is much more generic and effective. > >> >> Maybe we should just deprecate np.outer? I don't see what use it > >> >> serves. (When and whether it actually got removed after being > >> >> deprecated would depend on how much use it actually gets in real > code, > >> >> which I certainly don't know while typing a quick email. But we could > >> >> start telling people not to use it any time.) > >> > > >> > > >> > +1 with everything you said. > >> > >> Want to write a PR? :-) > >> > >> -- > >> Nathaniel J. Smith -- http://vorpus.org > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > -- > Nathaniel J. Smith -- http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Apr 14 15:37:54 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 14 Apr 2015 15:37:54 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: Message-ID: On Apr 14, 2015 2:48 PM, "Neil Girdhar" wrote: > > Okay, but by the same token, why do we have cumsum? Isn't it identical to > > np.add.accumulate > > ? or if you're passing in multidimensional data ? > > np.add.accumulate(a.flatten()) > > ? > > add.accumulate feels more generic, would make the other ufunc things more discoverable, and is self-documenting. > > Similarly, cumprod is just np.multiply.accumulate. Yeah, but these do have several differences than np.outer: - they get used much more - their definitions are less obviously broken (cumsum has no obvious definition for an n-d array so you have to pick one; outer does have an obvious definition and np.outer got it wrong) - they're more familiar from other systems (R, MATLAB) - they allow for special dispatch rules (e.g. np.sum(a) will try calling a.sum() before it tries coercing a to an ndarray, so e.g. on np.ma objects np.sum works and np.add.accumulate doesn't. Eventually this will perhaps be obviated by __numpy_ufunc__, but that is still some ways off.) So the situation is much less clear cut. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Tue Apr 14 15:48:48 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Tue, 14 Apr 2015 15:48:48 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: Message-ID: Yes, I totally agree with you regarding np.sum and np.product, which is why I didn't suggest np.add.reduce, np.multiply.reduce. I wasn't sure whether cumsum and cumprod might be on the line in your judgment. Best, Neil On Tue, Apr 14, 2015 at 3:37 PM, Nathaniel Smith wrote: > On Apr 14, 2015 2:48 PM, "Neil Girdhar" wrote: > > > > Okay, but by the same token, why do we have cumsum? Isn't it identical > to > > > > np.add.accumulate > > > > ? or if you're passing in multidimensional data ? > > > > np.add.accumulate(a.flatten()) > > > > ? > > > > add.accumulate feels more generic, would make the other ufunc things > more discoverable, and is self-documenting. > > > > Similarly, cumprod is just np.multiply.accumulate. > > Yeah, but these do have several differences than np.outer: > > - they get used much more > - their definitions are less obviously broken (cumsum has no obvious > definition for an n-d array so you have to pick one; outer does have an > obvious definition and np.outer got it wrong) > - they're more familiar from other systems (R, MATLAB) > - they allow for special dispatch rules (e.g. np.sum(a) will try calling > a.sum() before it tries coercing a to an ndarray, so e.g. on np.ma > objects np.sum works and np.add.accumulate doesn't. Eventually this will > perhaps be obviated by __numpy_ufunc__, but that is still some ways off.) > > So the situation is much less clear cut. > > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Mon Apr 13 08:02:27 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Mon, 13 Apr 2015 08:02:27 -0400 Subject: [Numpy-discussion] Automatic number of bins for numpy histograms In-Reply-To: References: Message-ID: Can I suggest that we instead add the P-square algorithm for the dynamic calculation of histograms? ( http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf ) This is already implemented in C++'s boost library ( http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp ) I implemented it in Boost Python as a module, which I'm happy to share. This is much better than fixed-width histograms in practice. Rather than adjusting the number of bins, it adjusts what you really want, which is the resolution of the bins throughout the domain. Best, Neil On Sun, Apr 12, 2015 at 4:02 AM, Ralf Gommers wrote: > > > On Sun, Apr 12, 2015 at 9:45 AM, Jaime Fern?ndez del R?o < > jaime.frio at gmail.com> wrote: > >> On Sun, Apr 12, 2015 at 12:19 AM, Varun wrote: >> >>> >>> http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta >>> tistics/A >>> >>> utomating%20Binwidth%20Choice%20for%20Histogram.ipynb >>> >>> Long story short, histogram visualisations that depend on numpy (such as >>> matplotlib, or nearly all of them) have poor default behaviour as I >>> have to >>> constantly play around with the number of bins to get a good idea of >>> what I'm >>> looking at. The bins=10 works ok for up to 1000 points or very normal >>> data, >>> but has poor performance for anything else, and doesn't account for >>> variability either. I don't have a method easily available to scale the >>> number >>> of bins given the data. >>> >>> R doesn't suffer from these problems and provides methods for use with >>> it's >>> hist method. I would like to provide similar functionality for >>> matplotlib, to >>> at least provide some kind of good starting point, as histograms are >>> very >>> useful for initial data discovery. >>> >>> The notebook above provides an explanation of the problem as well as some >>> proposed alternatives. Use different datasets (type and size) to see the >>> performance of the suggestions. All of the methods proposed exist in R >>> and >>> literature. >>> >>> I've put together an implementation to add this new functionality, but am >>> hesitant to make a pull request as I would like some feedback from a >>> maintainer before doing so. >>> >> >> +1 on the PR. >> > > +1 as well. > > Unfortunately we can't change the default of 10, but a number of string > methods, with a "bins=auto" or some such name prominently recommended in > the docstring, would be very good to have. > > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From antony.lee at berkeley.edu Tue Apr 14 17:02:05 2015 From: antony.lee at berkeley.edu (Antony Lee) Date: Tue, 14 Apr 2015 14:02:05 -0700 Subject: [Numpy-discussion] Automatic number of bins for numpy histograms In-Reply-To: References: Message-ID: Another improvement would be to make sure, for integer-valued datasets, that all bins cover the same number of integer, as it is easy to end up otherwise with bins "effectively" wider than others: hist(np.random.randint(11, size=10000)) shows a peak in the last bin, as it covers both 9 and 10. Antony 2015-04-13 5:02 GMT-07:00 Neil Girdhar : > Can I suggest that we instead add the P-square algorithm for the dynamic > calculation of histograms? ( > http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf > ) > > This is already implemented in C++'s boost library ( > http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp > ) > > I implemented it in Boost Python as a module, which I'm happy to share. > This is much better than fixed-width histograms in practice. Rather than > adjusting the number of bins, it adjusts what you really want, which is the > resolution of the bins throughout the domain. > > Best, > > Neil > > On Sun, Apr 12, 2015 at 4:02 AM, Ralf Gommers > wrote: > >> >> >> On Sun, Apr 12, 2015 at 9:45 AM, Jaime Fern?ndez del R?o < >> jaime.frio at gmail.com> wrote: >> >>> On Sun, Apr 12, 2015 at 12:19 AM, Varun wrote: >>> >>>> >>>> http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta >>>> tistics/A >>>> >>>> utomating%20Binwidth%20Choice%20for%20Histogram.ipynb >>>> >>>> Long story short, histogram visualisations that depend on numpy (such as >>>> matplotlib, or nearly all of them) have poor default behaviour as I >>>> have to >>>> constantly play around with the number of bins to get a good idea of >>>> what I'm >>>> looking at. The bins=10 works ok for up to 1000 points or very normal >>>> data, >>>> but has poor performance for anything else, and doesn't account for >>>> variability either. I don't have a method easily available to scale the >>>> number >>>> of bins given the data. >>>> >>>> R doesn't suffer from these problems and provides methods for use with >>>> it's >>>> hist method. I would like to provide similar functionality for >>>> matplotlib, to >>>> at least provide some kind of good starting point, as histograms are >>>> very >>>> useful for initial data discovery. >>>> >>>> The notebook above provides an explanation of the problem as well as >>>> some >>>> proposed alternatives. Use different datasets (type and size) to see >>>> the >>>> performance of the suggestions. All of the methods proposed exist in R >>>> and >>>> literature. >>>> >>>> I've put together an implementation to add this new functionality, but >>>> am >>>> hesitant to make a pull request as I would like some feedback from a >>>> maintainer before doing so. >>>> >>> >>> +1 on the PR. >>> >> >> +1 as well. >> >> Unfortunately we can't change the default of 10, but a number of string >> methods, with a "bins=auto" or some such name prominently recommended in >> the docstring, would be very good to have. >> >> Ralf >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Tue Apr 14 17:08:34 2015 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 14 Apr 2015 14:08:34 -0700 Subject: [Numpy-discussion] Automatic number of bins for numpy histograms In-Reply-To: References: Message-ID: On Mon, Apr 13, 2015 at 5:02 AM, Neil Girdhar wrote: > Can I suggest that we instead add the P-square algorithm for the dynamic > calculation of histograms? ( > http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf > ) > This look slike a great thing to have in numpy. However, I suspect that a lot of the downstream code that uses histogram expects equally-spaced bins. So this should probably be a "in addition to", rather than an "instead of" -CHB > > This is already implemented in C++'s boost library ( > http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp > ) > > I implemented it in Boost Python as a module, which I'm happy to share. > This is much better than fixed-width histograms in practice. Rather than > adjusting the number of bins, it adjusts what you really want, which is the > resolution of the bins throughout the domain. > > Best, > > Neil > > On Sun, Apr 12, 2015 at 4:02 AM, Ralf Gommers > wrote: > >> >> >> On Sun, Apr 12, 2015 at 9:45 AM, Jaime Fern?ndez del R?o < >> jaime.frio at gmail.com> wrote: >> >>> On Sun, Apr 12, 2015 at 12:19 AM, Varun wrote: >>> >>>> >>>> http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta >>>> tistics/A >>>> >>>> utomating%20Binwidth%20Choice%20for%20Histogram.ipynb >>>> >>>> Long story short, histogram visualisations that depend on numpy (such as >>>> matplotlib, or nearly all of them) have poor default behaviour as I >>>> have to >>>> constantly play around with the number of bins to get a good idea of >>>> what I'm >>>> looking at. The bins=10 works ok for up to 1000 points or very normal >>>> data, >>>> but has poor performance for anything else, and doesn't account for >>>> variability either. I don't have a method easily available to scale the >>>> number >>>> of bins given the data. >>>> >>>> R doesn't suffer from these problems and provides methods for use with >>>> it's >>>> hist method. I would like to provide similar functionality for >>>> matplotlib, to >>>> at least provide some kind of good starting point, as histograms are >>>> very >>>> useful for initial data discovery. >>>> >>>> The notebook above provides an explanation of the problem as well as >>>> some >>>> proposed alternatives. Use different datasets (type and size) to see >>>> the >>>> performance of the suggestions. All of the methods proposed exist in R >>>> and >>>> literature. >>>> >>>> I've put together an implementation to add this new functionality, but >>>> am >>>> hesitant to make a pull request as I would like some feedback from a >>>> maintainer before doing so. >>>> >>> >>> +1 on the PR. >>> >> >> +1 as well. >> >> Unfortunately we can't change the default of 10, but a number of string >> methods, with a "bins=auto" or some such name prominently recommended in >> the docstring, would be very good to have. >> >> Ralf >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Tue Apr 14 17:28:57 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Tue, 14 Apr 2015 17:28:57 -0400 Subject: [Numpy-discussion] Automatic number of bins for numpy histograms In-Reply-To: References: Message-ID: Yes, you're right. Although in practice, people almost always want adaptive bins. On Tue, Apr 14, 2015 at 5:08 PM, Chris Barker wrote: > On Mon, Apr 13, 2015 at 5:02 AM, Neil Girdhar > wrote: > >> Can I suggest that we instead add the P-square algorithm for the dynamic >> calculation of histograms? ( >> http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf >> ) >> > > This look slike a great thing to have in numpy. However, I suspect that a > lot of the downstream code that uses histogram expects equally-spaced bins. > > So this should probably be a "in addition to", rather than an "instead of" > > -CHB > > > >> >> This is already implemented in C++'s boost library ( >> http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp >> ) >> >> I implemented it in Boost Python as a module, which I'm happy to share. >> This is much better than fixed-width histograms in practice. Rather than >> adjusting the number of bins, it adjusts what you really want, which is the >> resolution of the bins throughout the domain. >> >> Best, >> >> Neil >> >> On Sun, Apr 12, 2015 at 4:02 AM, Ralf Gommers >> wrote: >> >>> >>> >>> On Sun, Apr 12, 2015 at 9:45 AM, Jaime Fern?ndez del R?o < >>> jaime.frio at gmail.com> wrote: >>> >>>> On Sun, Apr 12, 2015 at 12:19 AM, Varun wrote: >>>> >>>>> >>>>> http://nbviewer.ipython.org/github/nayyarv/matplotlib/blob/master/examples/sta >>>>> tistics/A >>>>> >>>>> utomating%20Binwidth%20Choice%20for%20Histogram.ipynb >>>>> >>>>> Long story short, histogram visualisations that depend on numpy (such >>>>> as >>>>> matplotlib, or nearly all of them) have poor default behaviour as I >>>>> have to >>>>> constantly play around with the number of bins to get a good idea of >>>>> what I'm >>>>> looking at. The bins=10 works ok for up to 1000 points or very normal >>>>> data, >>>>> but has poor performance for anything else, and doesn't account for >>>>> variability either. I don't have a method easily available to scale >>>>> the number >>>>> of bins given the data. >>>>> >>>>> R doesn't suffer from these problems and provides methods for use with >>>>> it's >>>>> hist method. I would like to provide similar functionality for >>>>> matplotlib, to >>>>> at least provide some kind of good starting point, as histograms are >>>>> very >>>>> useful for initial data discovery. >>>>> >>>>> The notebook above provides an explanation of the problem as well as >>>>> some >>>>> proposed alternatives. Use different datasets (type and size) to see >>>>> the >>>>> performance of the suggestions. All of the methods proposed exist in >>>>> R and >>>>> literature. >>>>> >>>>> I've put together an implementation to add this new functionality, but >>>>> am >>>>> hesitant to make a pull request as I would like some feedback from a >>>>> maintainer before doing so. >>>>> >>>> >>>> +1 on the PR. >>>> >>> >>> +1 as well. >>> >>> Unfortunately we can't change the default of 10, but a number of string >>> methods, with a "bins=auto" or some such name prominently recommended in >>> the docstring, would be very good to have. >>> >>> Ralf >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Apr 14 19:12:15 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 14 Apr 2015 19:12:15 -0400 Subject: [Numpy-discussion] Automatic number of bins for numpy histograms In-Reply-To: References: Message-ID: On Mon, Apr 13, 2015 at 8:02 AM, Neil Girdhar wrote: > Can I suggest that we instead add the P-square algorithm for the dynamic > calculation of histograms? > (http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf) > > This is already implemented in C++'s boost library > (http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp) > > I implemented it in Boost Python as a module, which I'm happy to share. > This is much better than fixed-width histograms in practice. Rather than > adjusting the number of bins, it adjusts what you really want, which is the > resolution of the bins throughout the domain. This definitely sounds like a useful thing to have in numpy or scipy (though if it's possible to do without using Boost/C++ that would be nice). But yeah, we should leave the existing histogram alone (in this regard) and add a new name for this like "adaptive_histogram" or something. Then you can set about convincing matplotlib and friends to use it by default :-) -n -- Nathaniel J. Smith -- http://vorpus.org From njs at pobox.com Tue Apr 14 19:16:27 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 14 Apr 2015 19:16:27 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: Message-ID: On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar wrote: > Yes, I totally agree with you regarding np.sum and np.product, which is why > I didn't suggest np.add.reduce, np.multiply.reduce. I wasn't sure whether > cumsum and cumprod might be on the line in your judgment. Ah, I see. I think we should treat them the same for now -- all the comments I made apply to a lesser or greater extent (in particular, cumsum and cumprod both do the thing where they dispatch to .cumsum() .cumprod() method). -n -- Nathaniel J. Smith -- http://vorpus.org From jaime.frio at gmail.com Tue Apr 14 19:24:55 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Tue, 14 Apr 2015 16:24:55 -0700 Subject: [Numpy-discussion] Automatic number of bins for numpy histograms In-Reply-To: References: Message-ID: On Tue, Apr 14, 2015 at 4:12 PM, Nathaniel Smith wrote: > On Mon, Apr 13, 2015 at 8:02 AM, Neil Girdhar > wrote: > > Can I suggest that we instead add the P-square algorithm for the dynamic > > calculation of histograms? > > ( > http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf > ) > > > > This is already implemented in C++'s boost library > > ( > http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp > ) > > > > I implemented it in Boost Python as a module, which I'm happy to share. > > This is much better than fixed-width histograms in practice. Rather than > > adjusting the number of bins, it adjusts what you really want, which is > the > > resolution of the bins throughout the domain. > > This definitely sounds like a useful thing to have in numpy or scipy > (though if it's possible to do without using Boost/C++ that would be > nice). But yeah, we should leave the existing histogram alone (in this > regard) and add a new name for this like "adaptive_histogram" or > something. Then you can set about convincing matplotlib and friends to > use it by default :-) > Would having a negative number of bins mean "this many, but with optimized boundaries" be too clever an interface? I have taken a look at the paper linked, and the P-2 algorithm would not be too complicated to implement from scratch, although it would require writing some C code I'm afraid. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Tue Apr 14 21:16:46 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Tue, 14 Apr 2015 21:16:46 -0400 Subject: [Numpy-discussion] Automatic number of bins for numpy histograms In-Reply-To: References: Message-ID: If you're going to C, is there a reason not to go to C++ and include the already-written Boost code? Otherwise, why not use Python? On Tue, Apr 14, 2015 at 7:24 PM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > On Tue, Apr 14, 2015 at 4:12 PM, Nathaniel Smith wrote: > >> On Mon, Apr 13, 2015 at 8:02 AM, Neil Girdhar >> wrote: >> > Can I suggest that we instead add the P-square algorithm for the dynamic >> > calculation of histograms? >> > ( >> http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf >> ) >> > >> > This is already implemented in C++'s boost library >> > ( >> http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp >> ) >> > >> > I implemented it in Boost Python as a module, which I'm happy to share. >> > This is much better than fixed-width histograms in practice. Rather >> than >> > adjusting the number of bins, it adjusts what you really want, which is >> the >> > resolution of the bins throughout the domain. >> >> This definitely sounds like a useful thing to have in numpy or scipy >> (though if it's possible to do without using Boost/C++ that would be >> nice). But yeah, we should leave the existing histogram alone (in this >> regard) and add a new name for this like "adaptive_histogram" or >> something. Then you can set about convincing matplotlib and friends to >> use it by default :-) >> > > Would having a negative number of bins mean "this many, but with optimized > boundaries" be too clever an interface? > > I have taken a look at the paper linked, and the P-2 algorithm would not > be too complicated to implement from scratch, although it would require > writing some C code I'm afraid. > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Tue Apr 14 21:17:24 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Tue, 14 Apr 2015 21:17:24 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: Message-ID: Ok, I didn't know that. Are you at pycon by any chance? On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith wrote: > On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar > wrote: > > Yes, I totally agree with you regarding np.sum and np.product, which is > why > > I didn't suggest np.add.reduce, np.multiply.reduce. I wasn't sure > whether > > cumsum and cumprod might be on the line in your judgment. > > Ah, I see. I think we should treat them the same for now -- all the > comments I made apply to a lesser or greater extent (in particular, > cumsum and cumprod both do the thing where they dispatch to .cumsum() > .cumprod() method). > > -n > > -- > Nathaniel J. Smith -- http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pmhobson at gmail.com Tue Apr 14 22:00:25 2015 From: pmhobson at gmail.com (Paul Hobson) Date: Tue, 14 Apr 2015 19:00:25 -0700 Subject: [Numpy-discussion] Automatic number of bins for numpy histograms In-Reply-To: References: Message-ID: On Tue, Apr 14, 2015 at 4:24 PM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > On Tue, Apr 14, 2015 at 4:12 PM, Nathaniel Smith wrote: > >> On Mon, Apr 13, 2015 at 8:02 AM, Neil Girdhar >> wrote: >> > Can I suggest that we instead add the P-square algorithm for the dynamic >> > calculation of histograms? >> > ( >> http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf >> ) >> > >> > This is already implemented in C++'s boost library >> > ( >> http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp >> ) >> > >> > I implemented it in Boost Python as a module, which I'm happy to share. >> > This is much better than fixed-width histograms in practice. Rather >> than >> > adjusting the number of bins, it adjusts what you really want, which is >> the >> > resolution of the bins throughout the domain. >> >> This definitely sounds like a useful thing to have in numpy or scipy >> (though if it's possible to do without using Boost/C++ that would be >> nice). But yeah, we should leave the existing histogram alone (in this >> regard) and add a new name for this like "adaptive_histogram" or >> something. Then you can set about convincing matplotlib and friends to >> use it by default :-) >> > > Would having a negative number of bins mean "this many, but with optimized > boundaries" be too clever an interface? > As a user, I think so. Wouldn't np.histogram(..., adaptive=True) do well enough? -p -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Tue Apr 14 22:05:18 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Tue, 14 Apr 2015 22:05:18 -0400 Subject: [Numpy-discussion] Automatic number of bins for numpy histograms In-Reply-To: References: Message-ID: By the way, the p^2 algorithm still needs to know how many bins you want. It just adapts the endpoints of the bins. I like adaptive=True. However, you will have to find a way to return both the bins and and their calculated endpoints. The P^2 algorithm can also give approximate answers to numpy.percentile, numpy.median. How approximate they are depends on the number of bins you let it keep track of. I believe the authors bound the error as a function of number of points and bins. On Tue, Apr 14, 2015 at 10:00 PM, Paul Hobson wrote: > > > On Tue, Apr 14, 2015 at 4:24 PM, Jaime Fern?ndez del R?o < > jaime.frio at gmail.com> wrote: > >> On Tue, Apr 14, 2015 at 4:12 PM, Nathaniel Smith wrote: >> >>> On Mon, Apr 13, 2015 at 8:02 AM, Neil Girdhar >>> wrote: >>> > Can I suggest that we instead add the P-square algorithm for the >>> dynamic >>> > calculation of histograms? >>> > ( >>> http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf >>> ) >>> > >>> > This is already implemented in C++'s boost library >>> > ( >>> http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp >>> ) >>> > >>> > I implemented it in Boost Python as a module, which I'm happy to share. >>> > This is much better than fixed-width histograms in practice. Rather >>> than >>> > adjusting the number of bins, it adjusts what you really want, which >>> is the >>> > resolution of the bins throughout the domain. >>> >>> This definitely sounds like a useful thing to have in numpy or scipy >>> (though if it's possible to do without using Boost/C++ that would be >>> nice). But yeah, we should leave the existing histogram alone (in this >>> regard) and add a new name for this like "adaptive_histogram" or >>> something. Then you can set about convincing matplotlib and friends to >>> use it by default :-) >>> >> >> Would having a negative number of bins mean "this many, but with >> optimized boundaries" be too clever an interface? >> > > As a user, I think so. Wouldn't np.histogram(..., adaptive=True) do well > enough? > -p > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Apr 14 22:18:17 2015 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 14 Apr 2015 22:18:17 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: Message-ID: I am, yes. On Apr 14, 2015 9:17 PM, "Neil Girdhar" wrote: > Ok, I didn't know that. Are you at pycon by any chance? > > On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith wrote: > >> On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar >> wrote: >> > Yes, I totally agree with you regarding np.sum and np.product, which is >> why >> > I didn't suggest np.add.reduce, np.multiply.reduce. I wasn't sure >> whether >> > cumsum and cumprod might be on the line in your judgment. >> >> Ah, I see. I think we should treat them the same for now -- all the >> comments I made apply to a lesser or greater extent (in particular, >> cumsum and cumprod both do the thing where they dispatch to .cumsum() >> .cumprod() method). >> >> -n >> >> -- >> Nathaniel J. Smith -- http://vorpus.org >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Wed Apr 15 01:48:37 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Tue, 14 Apr 2015 22:48:37 -0700 Subject: [Numpy-discussion] Automatic number of bins for numpy histograms In-Reply-To: References: Message-ID: On Tue, Apr 14, 2015 at 6:16 PM, Neil Girdhar wrote: > If you're going to C, is there a reason not to go to C++ and include the > already-written Boost code? Otherwise, why not use Python? > I think we have an explicit rule against C++, although I may be wrong. Not sure how much of boost we would have to make part of numpy to use that, the whole accumulators lib I'm guessing? Seems like an awful lot given what we are after. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Wed Apr 15 04:32:03 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 15 Apr 2015 10:32:03 +0200 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: Message-ID: <1429086723.5810.5.camel@sipsolutions.net> Just a general thing, if someone has a few minutes, I think it would make sense to add the ufunc.reduce thing to all of these functions at least in the "See Also" or "Notes" section in the documentation. These special attributes are not that well known, and I think that might be a nice way to make it easier to find. - Sebastian On Di, 2015-04-14 at 22:18 -0400, Nathaniel Smith wrote: > I am, yes. > > On Apr 14, 2015 9:17 PM, "Neil Girdhar" wrote: > Ok, I didn't know that. Are you at pycon by any chance? > > On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith > wrote: > On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar > wrote: > > Yes, I totally agree with you regarding np.sum and > np.product, which is why > > I didn't suggest np.add.reduce, np.multiply.reduce. > I wasn't sure whether > > cumsum and cumprod might be on the line in your > judgment. > > Ah, I see. I think we should treat them the same for > now -- all the > comments I made apply to a lesser or greater extent > (in particular, > cumsum and cumprod both do the thing where they > dispatch to .cumsum() > .cumprod() method). > > -n > > -- > Nathaniel J. Smith -- http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From mistersheik at gmail.com Wed Apr 15 07:35:09 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 15 Apr 2015 07:35:09 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: <1429086723.5810.5.camel@sipsolutions.net> References: <1429086723.5810.5.camel@sipsolutions.net> Message-ID: Yes, I totally agree. If I get started on the PR to deprecate np.outer, maybe I can do it as part of the same PR? On Wed, Apr 15, 2015 at 4:32 AM, Sebastian Berg wrote: > Just a general thing, if someone has a few minutes, I think it would > make sense to add the ufunc.reduce thing to all of these functions at > least in the "See Also" or "Notes" section in the documentation. > > These special attributes are not that well known, and I think that might > be a nice way to make it easier to find. > > - Sebastian > > On Di, 2015-04-14 at 22:18 -0400, Nathaniel Smith wrote: > > I am, yes. > > > > On Apr 14, 2015 9:17 PM, "Neil Girdhar" wrote: > > Ok, I didn't know that. Are you at pycon by any chance? > > > > On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith > > wrote: > > On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar > > wrote: > > > Yes, I totally agree with you regarding np.sum and > > np.product, which is why > > > I didn't suggest np.add.reduce, np.multiply.reduce. > > I wasn't sure whether > > > cumsum and cumprod might be on the line in your > > judgment. > > > > Ah, I see. I think we should treat them the same for > > now -- all the > > comments I made apply to a lesser or greater extent > > (in particular, > > cumsum and cumprod both do the thing where they > > dispatch to .cumsum() > > .cumprod() method). > > > > -n > > > > -- > > Nathaniel J. Smith -- http://vorpus.org > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Wed Apr 15 07:36:48 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 15 Apr 2015 07:36:48 -0400 Subject: [Numpy-discussion] Automatic number of bins for numpy histograms In-Reply-To: References: Message-ID: Yeah, I'm not arguing, I'm just curious about your reasoning. That explains why not C++. Why would you want to do this in C and not Python? On Wed, Apr 15, 2015 at 1:48 AM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > On Tue, Apr 14, 2015 at 6:16 PM, Neil Girdhar > wrote: > >> If you're going to C, is there a reason not to go to C++ and include the >> already-written Boost code? Otherwise, why not use Python? >> > > I think we have an explicit rule against C++, although I may be wrong. Not > sure how much of boost we would have to make part of numpy to use that, the > whole accumulators lib I'm guessing? Seems like an awful lot given what we > are after. > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Wed Apr 15 10:02:57 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Wed, 15 Apr 2015 07:02:57 -0700 Subject: [Numpy-discussion] Automatic number of bins for numpy histograms In-Reply-To: References: Message-ID: On Wed, Apr 15, 2015 at 4:36 AM, Neil Girdhar wrote: > Yeah, I'm not arguing, I'm just curious about your reasoning. That > explains why not C++. Why would you want to do this in C and not Python? > Well, the algorithm has to iterate over all the inputs, updating the estimated percentile positions at every iteration. Because the estimated percentiles may change in every iteration, I don't think there is an easy way of vectorizing the calculation with numpy. So I think it would be very slow if done in Python. Looking at this in some more details, how is this typically used? Because it gives you approximate values that should split your sample into similarly filled bins, but because the values are approximate, to compute a proper histogram you would still need to do the binning to get the exact results, right? Even with this drawback P-2 does have an algorithmic advantage, so for huge inputs and many bins it should come ahead. But for many medium sized problems it may be faster to simply use np.partition, which gives you the whole thing in a single go. And it would be much simpler to implement. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Wed Apr 15 11:06:48 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 15 Apr 2015 11:06:48 -0400 Subject: [Numpy-discussion] Automatic number of bins for numpy histograms In-Reply-To: References: Message-ID: You got it. I remember this from when I worked at Google and we would process (many many) logs. With enough bins, the approximation is still really close. It's great if you want to make an automatic plot of data. Calling numpy.partition a hundred times is probably slower than calling P^2 with n=100 bins. I don't think it does O(n) computations per point. I think it's more like O(log(n)). Best, Neil On Wed, Apr 15, 2015 at 10:02 AM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > On Wed, Apr 15, 2015 at 4:36 AM, Neil Girdhar > wrote: > >> Yeah, I'm not arguing, I'm just curious about your reasoning. That >> explains why not C++. Why would you want to do this in C and not Python? >> > > Well, the algorithm has to iterate over all the inputs, updating the > estimated percentile positions at every iteration. Because the estimated > percentiles may change in every iteration, I don't think there is an easy > way of vectorizing the calculation with numpy. So I think it would be very > slow if done in Python. > > Looking at this in some more details, how is this typically used? Because > it gives you approximate values that should split your sample into > similarly filled bins, but because the values are approximate, to compute a > proper histogram you would still need to do the binning to get the exact > results, right? Even with this drawback P-2 does have an algorithmic > advantage, so for huge inputs and many bins it should come ahead. But for > many medium sized problems it may be faster to simply use np.partition, > which gives you the whole thing in a single go. And it would be much > simpler to implement. > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Wed Apr 15 11:24:36 2015 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 15 Apr 2015 11:24:36 -0400 Subject: [Numpy-discussion] Automatic number of bins for numpy histograms In-Reply-To: References: Message-ID: "Then you can set about convincing matplotlib and friends to use it by default" Just to note, this proposal was originally made over in the matplotlib project. We sent it over here where its benefits would have wider reach. Matplotlib's plan is not to change the defaults, but to offload as much as possible to numpy so that it can support these new features if they are available. We might need to do some input validation so that users running older version of numpy can get a sensible error message. Cheers! Ben Root On Tue, Apr 14, 2015 at 7:12 PM, Nathaniel Smith wrote: > On Mon, Apr 13, 2015 at 8:02 AM, Neil Girdhar > wrote: > > Can I suggest that we instead add the P-square algorithm for the dynamic > > calculation of histograms? > > ( > http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf > ) > > > > This is already implemented in C++'s boost library > > ( > http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp > ) > > > > I implemented it in Boost Python as a module, which I'm happy to share. > > This is much better than fixed-width histograms in practice. Rather than > > adjusting the number of bins, it adjusts what you really want, which is > the > > resolution of the bins throughout the domain. > > This definitely sounds like a useful thing to have in numpy or scipy > (though if it's possible to do without using Boost/C++ that would be > nice). But yeah, we should leave the existing histogram alone (in this > regard) and add a new name for this like "adaptive_histogram" or > something. Then you can set about convincing matplotlib and friends to > use it by default :-) > > -n > > -- > Nathaniel J. Smith -- http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ewm at redtetrahedron.org Wed Apr 15 12:14:59 2015 From: ewm at redtetrahedron.org (Eric Moore) Date: Wed, 15 Apr 2015 12:14:59 -0400 Subject: [Numpy-discussion] Automatic number of bins for numpy histograms In-Reply-To: References: Message-ID: This blog post, and the links within also seem relevant. Appears to have python code available to try things out as well. https://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest -Eric On Wed, Apr 15, 2015 at 11:24 AM, Benjamin Root wrote: > "Then you can set about convincing matplotlib and friends to > use it by default" > > Just to note, this proposal was originally made over in the matplotlib > project. We sent it over here where its benefits would have wider reach. > Matplotlib's plan is not to change the defaults, but to offload as much as > possible to numpy so that it can support these new features if they are > available. We might need to do some input validation so that users running > older version of numpy can get a sensible error message. > > Cheers! > Ben Root > > > On Tue, Apr 14, 2015 at 7:12 PM, Nathaniel Smith wrote: > >> On Mon, Apr 13, 2015 at 8:02 AM, Neil Girdhar >> wrote: >> > Can I suggest that we instead add the P-square algorithm for the dynamic >> > calculation of histograms? >> > ( >> http://pierrechainais.ec-lille.fr/Centrale/Option_DAD/IMPACT_files/Dynamic%20quantiles%20calcultation%20-%20P2%20Algorythm.pdf >> ) >> > >> > This is already implemented in C++'s boost library >> > ( >> http://www.boost.org/doc/libs/1_44_0/boost/accumulators/statistics/extended_p_square.hpp >> ) >> > >> > I implemented it in Boost Python as a module, which I'm happy to share. >> > This is much better than fixed-width histograms in practice. Rather >> than >> > adjusting the number of bins, it adjusts what you really want, which is >> the >> > resolution of the bins throughout the domain. >> >> This definitely sounds like a useful thing to have in numpy or scipy >> (though if it's possible to do without using Boost/C++ that would be >> nice). But yeah, we should leave the existing histogram alone (in this >> regard) and add a new name for this like "adaptive_histogram" or >> something. Then you can set about convincing matplotlib and friends to >> use it by default :-) >> >> -n >> >> -- >> Nathaniel J. Smith -- http://vorpus.org >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Wed Apr 15 12:40:58 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Wed, 15 Apr 2015 09:40:58 -0700 Subject: [Numpy-discussion] Automatic number of bins for numpy histograms In-Reply-To: References: Message-ID: On Wed, Apr 15, 2015 at 8:06 AM, Neil Girdhar wrote: > You got it. I remember this from when I worked at Google and we would > process (many many) logs. With enough bins, the approximation is still > really close. It's great if you want to make an automatic plot of data. > Calling numpy.partition a hundred times is probably slower than calling P^2 > with n=100 bins. I don't think it does O(n) computations per point. I > think it's more like O(log(n)). > Looking at it again, it probably is O(n) after all: it does a binary search, which is O(log n), but it then goes on to update all the n bin counters and estimations, so O(n) I'm afraid. So there is no algorithmic advantage over partition/percentile: if there are m samples and n bins, P-2 that O(n) m times, while partition does O(m) n times, so both end up being O(m n). It seems to me that the big thing of P^2 is not having to hold the full dataset in memory. Online statistics (is that the name for this?), even if only estimations, is a cool thing, but I am not sure numpy is the place for them. That's not to say that we couldn't eventually have P^2 implemented for histogram, but I would start off with a partition based one. Would SciPy have a place for online statistics? Perhaps there's room for yet another scikit? Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Wed Apr 15 12:52:51 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Wed, 15 Apr 2015 09:52:51 -0700 Subject: [Numpy-discussion] Automatic number of bins for numpy histograms In-Reply-To: References: Message-ID: On Wed, Apr 15, 2015 at 9:14 AM, Eric Moore wrote: > This blog post, and the links within also seem relevant. Appears to have > python code available to try things out as well. > > > https://dataorigami.net/blogs/napkin-folding/19055451-percentile-and-quantile-estimation-of-big-data-the-t-digest > Very cool indeed... The original works is licensed under an Apache 2.0 license (https://github.com/tdunning/t-digest/blob/master/LICENSE). I am not fluent in legalese, so not sure whether that means we can use it or not, seems awfully more complicated than what we normally use. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From valentin at haenel.co Wed Apr 15 14:56:17 2015 From: valentin at haenel.co (Valentin Haenel) Date: Wed, 15 Apr 2015 20:56:17 +0200 Subject: [Numpy-discussion] [ANN] python-blosc v1.2.5 Message-ID: <20150415185617.GA8455@kudu.in-berlin.de> ============================= Announcing python-blosc 1.2.5 ============================= What is new? ============ This release contains support for Blosc v1.5.4 including changes to how the GIL is kept. This was required because Blosc was refactored in the v1.5.x line to remove global variables and to use context objects instead. As such, it became necessary to keep the GIL while calling Blosc from Python code that uses the multiprocessing module. In addition, is now possible to change the blocksize used by Blosc using ``set_blocksize``. When using this however, bear in mind that the blocksize has been finely tuned to be a good default value and that randomly messing with this value may have unforeseen and unpredictable consequences on the performance of Blosc. Additionally, we can now compile on Posix architectures, thanks again to Andreas Schwab for that one. For more info, you can have a look at the release notes in: https://github.com/Blosc/python-blosc/wiki/Release-notes More docs and examples are available in the documentation site: http://python-blosc.blosc.org What is it? =========== Blosc (http://www.blosc.org) is a high performance compressor optimized for binary data. It has been designed to transmit data to the processor cache faster than the traditional, non-compressed, direct memory fetch approach via a memcpy() OS call. Blosc is the first compressor that is meant not only to reduce the size of large datasets on-disk or in-memory, but also to accelerate object manipulations that are memory-bound (http://www.blosc.org/docs/StarvingCPUs.pdf). See http://www.blosc.org/synthetic-benchmarks.html for some benchmarks on how much speed it can achieve in some datasets. Blosc works well for compressing numerical arrays that contains data with relatively low entropy, like sparse data, time series, grids with regular-spaced values, etc. python-blosc (http://python-blosc.blosc.org/) is the Python wrapper for the Blosc compression library. There is also a handy tool built on Blosc called Bloscpack (https://github.com/Blosc/bloscpack). It features a commmand line interface that allows you to compress large binary datafiles on-disk. It also comes with a Python API that has built-in support for serializing and deserializing Numpy arrays both on-disk and in-memory at speeds that are competitive with regular Pickle/cPickle machinery. Installing ========== python-blosc is in PyPI repository, so installing it is easy: $ pip install -U blosc # yes, you should omit the python- prefix Download sources ================ The sources are managed through github services at: http://github.com/Blosc/python-blosc Documentation ============= There is Sphinx-based documentation site at: http://python-blosc.blosc.org/ Mailing list ============ There is an official mailing list for Blosc at: blosc at googlegroups.com http://groups.google.es/group/blosc Licenses ======== Both Blosc and its Python wrapper are distributed using the MIT license. See: https://github.com/Blosc/python-blosc/blob/master/LICENSES for more details. ---- **Enjoy data!** From joseph.martinot-lagarde at m4x.org Wed Apr 15 15:27:12 2015 From: joseph.martinot-lagarde at m4x.org (Joseph Martinot-Lagarde) Date: Wed, 15 Apr 2015 21:27:12 +0200 Subject: [Numpy-discussion] IDE's for numpy development? In-Reply-To: References: Message-ID: <552EBB90.3010800@m4x.org> Le 08/04/2015 21:19, Yuxiang Wang a ?crit : > I think spyder supports code highlighting in C and that's all... > There's no way to compile in Spyder, is there? > Well, you could write a compilation script using Scons and run it from spyder ! :) But no, spyder is very python-oriented and there is no way to compile C in spyder. For information the next version should have a better support for plugins so it could be done as a third-party extension. Joseph From josef.pktd at gmail.com Wed Apr 15 17:29:05 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 15 Apr 2015 17:29:05 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> Message-ID: On Wed, Apr 15, 2015 at 7:35 AM, Neil Girdhar wrote: > Yes, I totally agree. If I get started on the PR to deprecate np.outer, > maybe I can do it as part of the same PR? > > On Wed, Apr 15, 2015 at 4:32 AM, Sebastian Berg > wrote: >> >> Just a general thing, if someone has a few minutes, I think it would >> make sense to add the ufunc.reduce thing to all of these functions at >> least in the "See Also" or "Notes" section in the documentation. >> >> These special attributes are not that well known, and I think that might >> be a nice way to make it easier to find. >> >> - Sebastian >> >> On Di, 2015-04-14 at 22:18 -0400, Nathaniel Smith wrote: >> > I am, yes. >> > >> > On Apr 14, 2015 9:17 PM, "Neil Girdhar" wrote: >> > Ok, I didn't know that. Are you at pycon by any chance? >> > >> > On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith >> > wrote: >> > On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar >> > wrote: >> > > Yes, I totally agree with you regarding np.sum and >> > np.product, which is why >> > > I didn't suggest np.add.reduce, np.multiply.reduce. >> > I wasn't sure whether >> > > cumsum and cumprod might be on the line in your >> > judgment. >> > >> > Ah, I see. I think we should treat them the same for >> > now -- all the >> > comments I made apply to a lesser or greater extent >> > (in particular, >> > cumsum and cumprod both do the thing where they >> > dispatch to .cumsum() >> > .cumprod() method). >> > >> > -n >> > >> > -- >> > Nathaniel J. Smith -- http://vorpus.org >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > I'm just looking at this thread. I see outer used quite often corrcoef = cov / np.outer(std, std) (even I use it sometimes instead of cov / std[:,None] / std Josef From mistersheik at gmail.com Wed Apr 15 17:31:41 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 15 Apr 2015 17:31:41 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> Message-ID: Does it work for you to set outer = np.multiply.outer ? It's actually faster on my machine. On Wed, Apr 15, 2015 at 5:29 PM, wrote: > On Wed, Apr 15, 2015 at 7:35 AM, Neil Girdhar > wrote: > > Yes, I totally agree. If I get started on the PR to deprecate np.outer, > > maybe I can do it as part of the same PR? > > > > On Wed, Apr 15, 2015 at 4:32 AM, Sebastian Berg < > sebastian at sipsolutions.net> > > wrote: > >> > >> Just a general thing, if someone has a few minutes, I think it would > >> make sense to add the ufunc.reduce thing to all of these functions at > >> least in the "See Also" or "Notes" section in the documentation. > >> > >> These special attributes are not that well known, and I think that might > >> be a nice way to make it easier to find. > >> > >> - Sebastian > >> > >> On Di, 2015-04-14 at 22:18 -0400, Nathaniel Smith wrote: > >> > I am, yes. > >> > > >> > On Apr 14, 2015 9:17 PM, "Neil Girdhar" > wrote: > >> > Ok, I didn't know that. Are you at pycon by any chance? > >> > > >> > On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith > >> > wrote: > >> > On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar > >> > wrote: > >> > > Yes, I totally agree with you regarding np.sum and > >> > np.product, which is why > >> > > I didn't suggest np.add.reduce, np.multiply.reduce. > >> > I wasn't sure whether > >> > > cumsum and cumprod might be on the line in your > >> > judgment. > >> > > >> > Ah, I see. I think we should treat them the same for > >> > now -- all the > >> > comments I made apply to a lesser or greater extent > >> > (in particular, > >> > cumsum and cumprod both do the thing where they > >> > dispatch to .cumsum() > >> > .cumprod() method). > >> > > >> > -n > >> > > >> > -- > >> > Nathaniel J. Smith -- http://vorpus.org > >> > _______________________________________________ > >> > NumPy-Discussion mailing list > >> > NumPy-Discussion at scipy.org > >> > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > >> > > >> > > >> > > >> > _______________________________________________ > >> > NumPy-Discussion mailing list > >> > NumPy-Discussion at scipy.org > >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > >> > _______________________________________________ > >> > NumPy-Discussion mailing list > >> > NumPy-Discussion at scipy.org > >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > I'm just looking at this thread. > > I see outer used quite often > > corrcoef = cov / np.outer(std, std) > > (even I use it sometimes instead of > cov / std[:,None] / std > > Josef > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Apr 15 18:08:45 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 15 Apr 2015 18:08:45 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> Message-ID: On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar wrote: > Does it work for you to set > > outer = np.multiply.outer > > ? > > It's actually faster on my machine. I assume it does because np.corrcoeff uses it, and it's the same type of use cases. However, I'm not using it very often (I prefer broadcasting), but I've seen it often enough when reviewing code. This is mainly to point out that it could be a popular function (that maybe shouldn't be deprecated) https://github.com/search?utf8=%E2%9C%93&q=np.outer 416914 Josef > > On Wed, Apr 15, 2015 at 5:29 PM, wrote: >> >> On Wed, Apr 15, 2015 at 7:35 AM, Neil Girdhar >> wrote: >> > Yes, I totally agree. If I get started on the PR to deprecate np.outer, >> > maybe I can do it as part of the same PR? >> > >> > On Wed, Apr 15, 2015 at 4:32 AM, Sebastian Berg >> > >> > wrote: >> >> >> >> Just a general thing, if someone has a few minutes, I think it would >> >> make sense to add the ufunc.reduce thing to all of these functions at >> >> least in the "See Also" or "Notes" section in the documentation. >> >> >> >> These special attributes are not that well known, and I think that >> >> might >> >> be a nice way to make it easier to find. >> >> >> >> - Sebastian >> >> >> >> On Di, 2015-04-14 at 22:18 -0400, Nathaniel Smith wrote: >> >> > I am, yes. >> >> > >> >> > On Apr 14, 2015 9:17 PM, "Neil Girdhar" >> >> > wrote: >> >> > Ok, I didn't know that. Are you at pycon by any chance? >> >> > >> >> > On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith >> >> > wrote: >> >> > On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar >> >> > wrote: >> >> > > Yes, I totally agree with you regarding np.sum and >> >> > np.product, which is why >> >> > > I didn't suggest np.add.reduce, np.multiply.reduce. >> >> > I wasn't sure whether >> >> > > cumsum and cumprod might be on the line in your >> >> > judgment. >> >> > >> >> > Ah, I see. I think we should treat them the same for >> >> > now -- all the >> >> > comments I made apply to a lesser or greater extent >> >> > (in particular, >> >> > cumsum and cumprod both do the thing where they >> >> > dispatch to .cumsum() >> >> > .cumprod() method). >> >> > >> >> > -n >> >> > >> >> > -- >> >> > Nathaniel J. Smith -- http://vorpus.org >> >> > _______________________________________________ >> >> > NumPy-Discussion mailing list >> >> > NumPy-Discussion at scipy.org >> >> > >> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > >> >> > >> >> > >> >> > >> >> > _______________________________________________ >> >> > NumPy-Discussion mailing list >> >> > NumPy-Discussion at scipy.org >> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > >> >> > _______________________________________________ >> >> > NumPy-Discussion mailing list >> >> > NumPy-Discussion at scipy.org >> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> >> >> I'm just looking at this thread. >> >> I see outer used quite often >> >> corrcoef = cov / np.outer(std, std) >> >> (even I use it sometimes instead of >> cov / std[:,None] / std >> >> Josef >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Wed Apr 15 18:12:06 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 15 Apr 2015 18:12:06 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> Message-ID: On Wed, Apr 15, 2015 at 6:08 PM, wrote: > On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar wrote: >> Does it work for you to set >> >> outer = np.multiply.outer >> >> ? >> >> It's actually faster on my machine. > > I assume it does because np.corrcoeff uses it, and it's the same type > of use cases. > However, I'm not using it very often (I prefer broadcasting), but I've > seen it often enough when reviewing code. > > This is mainly to point out that it could be a popular function (that > maybe shouldn't be deprecated) > > https://github.com/search?utf8=%E2%9C%93&q=np.outer > 416914 After thinking another minute: I think it should not be deprecated, it's like toepliz. We can use it also to normalize 2d arrays where columns and rows are different not symmetric as in the corrcoef case. Josef > > Josef > > >> >> On Wed, Apr 15, 2015 at 5:29 PM, wrote: >>> >>> On Wed, Apr 15, 2015 at 7:35 AM, Neil Girdhar >>> wrote: >>> > Yes, I totally agree. If I get started on the PR to deprecate np.outer, >>> > maybe I can do it as part of the same PR? >>> > >>> > On Wed, Apr 15, 2015 at 4:32 AM, Sebastian Berg >>> > >>> > wrote: >>> >> >>> >> Just a general thing, if someone has a few minutes, I think it would >>> >> make sense to add the ufunc.reduce thing to all of these functions at >>> >> least in the "See Also" or "Notes" section in the documentation. >>> >> >>> >> These special attributes are not that well known, and I think that >>> >> might >>> >> be a nice way to make it easier to find. >>> >> >>> >> - Sebastian >>> >> >>> >> On Di, 2015-04-14 at 22:18 -0400, Nathaniel Smith wrote: >>> >> > I am, yes. >>> >> > >>> >> > On Apr 14, 2015 9:17 PM, "Neil Girdhar" >>> >> > wrote: >>> >> > Ok, I didn't know that. Are you at pycon by any chance? >>> >> > >>> >> > On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith >>> >> > wrote: >>> >> > On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar >>> >> > wrote: >>> >> > > Yes, I totally agree with you regarding np.sum and >>> >> > np.product, which is why >>> >> > > I didn't suggest np.add.reduce, np.multiply.reduce. >>> >> > I wasn't sure whether >>> >> > > cumsum and cumprod might be on the line in your >>> >> > judgment. >>> >> > >>> >> > Ah, I see. I think we should treat them the same for >>> >> > now -- all the >>> >> > comments I made apply to a lesser or greater extent >>> >> > (in particular, >>> >> > cumsum and cumprod both do the thing where they >>> >> > dispatch to .cumsum() >>> >> > .cumprod() method). >>> >> > >>> >> > -n >>> >> > >>> >> > -- >>> >> > Nathaniel J. Smith -- http://vorpus.org >>> >> > _______________________________________________ >>> >> > NumPy-Discussion mailing list >>> >> > NumPy-Discussion at scipy.org >>> >> > >>> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> > >>> >> > >>> >> > >>> >> > >>> >> > _______________________________________________ >>> >> > NumPy-Discussion mailing list >>> >> > NumPy-Discussion at scipy.org >>> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> > >>> >> > _______________________________________________ >>> >> > NumPy-Discussion mailing list >>> >> > NumPy-Discussion at scipy.org >>> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >>> >> >>> >> _______________________________________________ >>> >> NumPy-Discussion mailing list >>> >> NumPy-Discussion at scipy.org >>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >>> > >>> > >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at scipy.org >>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> > >>> >>> >>> I'm just looking at this thread. >>> >>> I see outer used quite often >>> >>> corrcoef = cov / np.outer(std, std) >>> >>> (even I use it sometimes instead of >>> cov / std[:,None] / std >>> >>> Josef >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> From mistersheik at gmail.com Wed Apr 15 18:16:18 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 15 Apr 2015 18:16:18 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> Message-ID: I don't understand. Are you at pycon by any chance? On Wed, Apr 15, 2015 at 6:12 PM, wrote: > On Wed, Apr 15, 2015 at 6:08 PM, wrote: > > On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar > wrote: > >> Does it work for you to set > >> > >> outer = np.multiply.outer > >> > >> ? > >> > >> It's actually faster on my machine. > > > > I assume it does because np.corrcoeff uses it, and it's the same type > > of use cases. > > However, I'm not using it very often (I prefer broadcasting), but I've > > seen it often enough when reviewing code. > > > > This is mainly to point out that it could be a popular function (that > > maybe shouldn't be deprecated) > > > > https://github.com/search?utf8=%E2%9C%93&q=np.outer > > 416914 > > After thinking another minute: > > I think it should not be deprecated, it's like toepliz. We can use it > also to normalize 2d arrays where columns and rows are different not > symmetric as in the corrcoef case. > > Josef > > > > > > Josef > > > > > >> > >> On Wed, Apr 15, 2015 at 5:29 PM, wrote: > >>> > >>> On Wed, Apr 15, 2015 at 7:35 AM, Neil Girdhar > >>> wrote: > >>> > Yes, I totally agree. If I get started on the PR to deprecate > np.outer, > >>> > maybe I can do it as part of the same PR? > >>> > > >>> > On Wed, Apr 15, 2015 at 4:32 AM, Sebastian Berg > >>> > > >>> > wrote: > >>> >> > >>> >> Just a general thing, if someone has a few minutes, I think it would > >>> >> make sense to add the ufunc.reduce thing to all of these functions > at > >>> >> least in the "See Also" or "Notes" section in the documentation. > >>> >> > >>> >> These special attributes are not that well known, and I think that > >>> >> might > >>> >> be a nice way to make it easier to find. > >>> >> > >>> >> - Sebastian > >>> >> > >>> >> On Di, 2015-04-14 at 22:18 -0400, Nathaniel Smith wrote: > >>> >> > I am, yes. > >>> >> > > >>> >> > On Apr 14, 2015 9:17 PM, "Neil Girdhar" > >>> >> > wrote: > >>> >> > Ok, I didn't know that. Are you at pycon by any chance? > >>> >> > > >>> >> > On Tue, Apr 14, 2015 at 7:16 PM, Nathaniel Smith > >>> >> > wrote: > >>> >> > On Tue, Apr 14, 2015 at 3:48 PM, Neil Girdhar > >>> >> > wrote: > >>> >> > > Yes, I totally agree with you regarding np.sum > and > >>> >> > np.product, which is why > >>> >> > > I didn't suggest np.add.reduce, > np.multiply.reduce. > >>> >> > I wasn't sure whether > >>> >> > > cumsum and cumprod might be on the line in your > >>> >> > judgment. > >>> >> > > >>> >> > Ah, I see. I think we should treat them the same > for > >>> >> > now -- all the > >>> >> > comments I made apply to a lesser or greater > extent > >>> >> > (in particular, > >>> >> > cumsum and cumprod both do the thing where they > >>> >> > dispatch to .cumsum() > >>> >> > .cumprod() method). > >>> >> > > >>> >> > -n > >>> >> > > >>> >> > -- > >>> >> > Nathaniel J. Smith -- http://vorpus.org > >>> >> > _______________________________________________ > >>> >> > NumPy-Discussion mailing list > >>> >> > NumPy-Discussion at scipy.org > >>> >> > > >>> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >>> >> > > >>> >> > > >>> >> > > >>> >> > > >>> >> > _______________________________________________ > >>> >> > NumPy-Discussion mailing list > >>> >> > NumPy-Discussion at scipy.org > >>> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >>> >> > > >>> >> > _______________________________________________ > >>> >> > NumPy-Discussion mailing list > >>> >> > NumPy-Discussion at scipy.org > >>> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >>> >> > >>> >> > >>> >> _______________________________________________ > >>> >> NumPy-Discussion mailing list > >>> >> NumPy-Discussion at scipy.org > >>> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >>> >> > >>> > > >>> > > >>> > _______________________________________________ > >>> > NumPy-Discussion mailing list > >>> > NumPy-Discussion at scipy.org > >>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >>> > > >>> > >>> > >>> I'm just looking at this thread. > >>> > >>> I see outer used quite often > >>> > >>> corrcoef = cov / np.outer(std, std) > >>> > >>> (even I use it sometimes instead of > >>> cov / std[:,None] / std > >>> > >>> Josef > >>> _______________________________________________ > >>> NumPy-Discussion mailing list > >>> NumPy-Discussion at scipy.org > >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > >> > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Apr 15 18:40:45 2015 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 15 Apr 2015 18:40:45 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> Message-ID: On Wed, Apr 15, 2015 at 6:08 PM, wrote: > On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar wrote: >> Does it work for you to set >> >> outer = np.multiply.outer >> >> ? >> >> It's actually faster on my machine. > > I assume it does because np.corrcoeff uses it, and it's the same type > of use cases. > However, I'm not using it very often (I prefer broadcasting), but I've > seen it often enough when reviewing code. > > This is mainly to point out that it could be a popular function (that > maybe shouldn't be deprecated) > > https://github.com/search?utf8=%E2%9C%93&q=np.outer > 416914 For future reference, that's not the number -- you have to click through to "Code" and then look at a single-language result to get anything remotely meaningful. In this case b/c they're different by an order of magnitude, and in general because sometimes the "top line" number is completely made up (like it has no relation to the per-language numbers on the left and then changes around randomly if you simply reload the page). (So 29,397 is what you want in this case.) Also that count then tends to have tons of duplicates (e.g. b/c there are hundreds of copies of numpy itself on github), so you need a big grain of salt when looking at the absolute number, but it can be useful, esp. for relative comparisons. -n From mistersheik at gmail.com Wed Apr 15 18:46:40 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 15 Apr 2015 18:46:40 -0400 Subject: [Numpy-discussion] Automatic number of bins for numpy histograms In-Reply-To: References: Message-ID: Cool, thanks for looking at this. P2 might still be better even if the whole dataset is in memory because of cache misses. Partition, which I guess is based on quickselect, is going to run over all of the data as many times as there are bins roughly, whereas p2 only runs over it once. From a cache miss standpoint, I think p2 is better? Anyway, it might be worth maybe coding to verify any performance advantages? Not sure if it should be in numpy or not since it really should accept an iterable rather than a numpy vector, right? Best, Neil On Wed, Apr 15, 2015 at 12:40 PM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > On Wed, Apr 15, 2015 at 8:06 AM, Neil Girdhar > wrote: > >> You got it. I remember this from when I worked at Google and we would >> process (many many) logs. With enough bins, the approximation is still >> really close. It's great if you want to make an automatic plot of data. >> Calling numpy.partition a hundred times is probably slower than calling P^2 >> with n=100 bins. I don't think it does O(n) computations per point. I >> think it's more like O(log(n)). >> > > Looking at it again, it probably is O(n) after all: it does a binary > search, which is O(log n), but it then goes on to update all the n bin > counters and estimations, so O(n) I'm afraid. So there is no algorithmic > advantage over partition/percentile: if there are m samples and n bins, P-2 > that O(n) m times, while partition does O(m) n times, so both end up being > O(m n). It seems to me that the big thing of P^2 is not having to hold the > full dataset in memory. Online statistics (is that the name for this?), > even if only estimations, is a cool thing, but I am not sure numpy is the > place for them. That's not to say that we couldn't eventually have P^2 > implemented for histogram, but I would start off with a partition based one. > > Would SciPy have a place for online statistics? Perhaps there's room for > yet another scikit? > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed Apr 15 20:02:23 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 15 Apr 2015 20:02:23 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> Message-ID: On Wed, Apr 15, 2015 at 6:40 PM, Nathaniel Smith wrote: > On Wed, Apr 15, 2015 at 6:08 PM, wrote: >> On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar wrote: >>> Does it work for you to set >>> >>> outer = np.multiply.outer >>> >>> ? >>> >>> It's actually faster on my machine. >> >> I assume it does because np.corrcoeff uses it, and it's the same type >> of use cases. >> However, I'm not using it very often (I prefer broadcasting), but I've >> seen it often enough when reviewing code. >> >> This is mainly to point out that it could be a popular function (that >> maybe shouldn't be deprecated) >> >> https://github.com/search?utf8=%E2%9C%93&q=np.outer >> 416914 > > For future reference, that's not the number -- you have to click > through to "Code" and then look at a single-language result to get > anything remotely meaningful. In this case b/c they're different by an > order of magnitude, and in general because sometimes the "top line" > number is completely made up (like it has no relation to the > per-language numbers on the left and then changes around randomly if > you simply reload the page). > > (So 29,397 is what you want in this case.) > > Also that count then tends to have tons of duplicates (e.g. b/c there > are hundreds of copies of numpy itself on github), so you need a big > grain of salt when looking at the absolute number, but it can be > useful, esp. for relative comparisons. My mistake, rushing too much. github show only 25 code references in numpy itself. in quotes, python only (namespace conscious packages on github) (I think github counts modules not instances) "np.cumsum" 11,022 "np.cumprod" 1,290 "np.outer" 6,838 statsmodels "np.cumsum" 21 "np.cumprod" 2 "np.outer" 15 Josef > > -n > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From mistersheik at gmail.com Thu Apr 16 10:53:10 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 16 Apr 2015 10:53:10 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> Message-ID: Would it be possible to deprecate np.outer's usage on non one-dimensional vectors for a few versions, and then reintroduce it with definition np.outer == np.multiply.outer? On Wed, Apr 15, 2015 at 8:02 PM, wrote: > On Wed, Apr 15, 2015 at 6:40 PM, Nathaniel Smith wrote: > > On Wed, Apr 15, 2015 at 6:08 PM, wrote: > >> On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar > wrote: > >>> Does it work for you to set > >>> > >>> outer = np.multiply.outer > >>> > >>> ? > >>> > >>> It's actually faster on my machine. > >> > >> I assume it does because np.corrcoeff uses it, and it's the same type > >> of use cases. > >> However, I'm not using it very often (I prefer broadcasting), but I've > >> seen it often enough when reviewing code. > >> > >> This is mainly to point out that it could be a popular function (that > >> maybe shouldn't be deprecated) > >> > >> https://github.com/search?utf8=%E2%9C%93&q=np.outer > >> 416914 > > > > For future reference, that's not the number -- you have to click > > through to "Code" and then look at a single-language result to get > > anything remotely meaningful. In this case b/c they're different by an > > order of magnitude, and in general because sometimes the "top line" > > number is completely made up (like it has no relation to the > > per-language numbers on the left and then changes around randomly if > > you simply reload the page). > > > > (So 29,397 is what you want in this case.) > > > > Also that count then tends to have tons of duplicates (e.g. b/c there > > are hundreds of copies of numpy itself on github), so you need a big > > grain of salt when looking at the absolute number, but it can be > > useful, esp. for relative comparisons. > > My mistake, rushing too much. > github show only 25 code references in numpy itself. > > in quotes, python only (namespace conscious packages on github) > (I think github counts modules not instances) > > "np.cumsum" 11,022 > "np.cumprod" 1,290 > "np.outer" 6,838 > > statsmodels > "np.cumsum" 21 > "np.cumprod" 2 > "np.outer" 15 > > Josef > > > > > -n > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Thu Apr 16 18:19:50 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 16 Apr 2015 18:19:50 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> Message-ID: Actually, looking at the docs, numpy.outer is *only* defined for 1-d vectors. Should anyone who used it with multi-dimensional arrays have an expectation that it will keep working in the same way? On Thu, Apr 16, 2015 at 10:53 AM, Neil Girdhar wrote: > Would it be possible to deprecate np.outer's usage on non one-dimensional > vectors for a few versions, and then reintroduce it with definition > np.outer == np.multiply.outer? > > On Wed, Apr 15, 2015 at 8:02 PM, wrote: > >> On Wed, Apr 15, 2015 at 6:40 PM, Nathaniel Smith wrote: >> > On Wed, Apr 15, 2015 at 6:08 PM, wrote: >> >> On Wed, Apr 15, 2015 at 5:31 PM, Neil Girdhar >> wrote: >> >>> Does it work for you to set >> >>> >> >>> outer = np.multiply.outer >> >>> >> >>> ? >> >>> >> >>> It's actually faster on my machine. >> >> >> >> I assume it does because np.corrcoeff uses it, and it's the same type >> >> of use cases. >> >> However, I'm not using it very often (I prefer broadcasting), but I've >> >> seen it often enough when reviewing code. >> >> >> >> This is mainly to point out that it could be a popular function (that >> >> maybe shouldn't be deprecated) >> >> >> >> https://github.com/search?utf8=%E2%9C%93&q=np.outer >> >> 416914 >> > >> > For future reference, that's not the number -- you have to click >> > through to "Code" and then look at a single-language result to get >> > anything remotely meaningful. In this case b/c they're different by an >> > order of magnitude, and in general because sometimes the "top line" >> > number is completely made up (like it has no relation to the >> > per-language numbers on the left and then changes around randomly if >> > you simply reload the page). >> > >> > (So 29,397 is what you want in this case.) >> > >> > Also that count then tends to have tons of duplicates (e.g. b/c there >> > are hundreds of copies of numpy itself on github), so you need a big >> > grain of salt when looking at the absolute number, but it can be >> > useful, esp. for relative comparisons. >> >> My mistake, rushing too much. >> github show only 25 code references in numpy itself. >> >> in quotes, python only (namespace conscious packages on github) >> (I think github counts modules not instances) >> >> "np.cumsum" 11,022 >> "np.cumprod" 1,290 >> "np.outer" 6,838 >> >> statsmodels >> "np.cumsum" 21 >> "np.cumprod" 2 >> "np.outer" 15 >> >> Josef >> >> > >> > -n >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Thu Apr 16 18:28:34 2015 From: matthew.brett at gmail.com (Matthew Brett) Date: Thu, 16 Apr 2015 15:28:34 -0700 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> Message-ID: Hi, On Thu, Apr 16, 2015 at 3:19 PM, Neil Girdhar wrote: > Actually, looking at the docs, numpy.outer is *only* defined for 1-d > vectors. Should anyone who used it with multi-dimensional arrays have an > expectation that it will keep working in the same way? > > On Thu, Apr 16, 2015 at 10:53 AM, Neil Girdhar > wrote: >> >> Would it be possible to deprecate np.outer's usage on non one-dimensional >> vectors for a few versions, and then reintroduce it with definition np.outer >> == np.multiply.outer? I think the general idea is that a) people often miss deprecation warnings b) there is lots of legacy code out there, and c) it's very bad if legacy code silently gives different answers in newer numpy versions d) it's not so bad if newer numpy gives an intelligible error for code that used to work. So, how about a slight modification of your proposal? 1) Raise deprecation warning for np.outer for non 1D arrays for a few versions, with depraction in favor of np.multiply.outer, then 2) Raise error for np.outer on non 1D arrays Best, Matthew From njs at pobox.com Thu Apr 16 18:32:44 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 16 Apr 2015 18:32:44 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> Message-ID: On Thu, Apr 16, 2015 at 6:19 PM, Neil Girdhar wrote: > Actually, looking at the docs, numpy.outer is *only* defined for 1-d > vectors. Should anyone who used it with multi-dimensional arrays have an > expectation that it will keep working in the same way? Yes. Generally what we do is more important than what we say we do. Changing behaviour can break code. Changing docs can change whose "fault" this is, but broken code is still broken code. And if you put on your user hat, what do you do when numpy acts weird -- shake your fist at the heavens and give up, or sigh and update your code to match? It's pretty common for even undocumented behaviour to still be depended on. Also FWIW, np.outer's docstring says "Input is flattened if not already 1-dimensional", so we actually did document this. -n -- Nathaniel J. Smith -- http://vorpus.org From mistersheik at gmail.com Thu Apr 16 18:37:36 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 16 Apr 2015 18:37:36 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> Message-ID: That sounds good to me. I can always put np.outer = np.multiply.outer at the start of my code to get what I want. Or could that break things? On Thu, Apr 16, 2015 at 6:28 PM, Matthew Brett wrote: > Hi, > > On Thu, Apr 16, 2015 at 3:19 PM, Neil Girdhar > wrote: > > Actually, looking at the docs, numpy.outer is *only* defined for 1-d > > vectors. Should anyone who used it with multi-dimensional arrays have an > > expectation that it will keep working in the same way? > > > > On Thu, Apr 16, 2015 at 10:53 AM, Neil Girdhar > > wrote: > >> > >> Would it be possible to deprecate np.outer's usage on non > one-dimensional > >> vectors for a few versions, and then reintroduce it with definition > np.outer > >> == np.multiply.outer? > > I think the general idea is that > > a) people often miss deprecation warnings > b) there is lots of legacy code out there, and > c) it's very bad if legacy code silently gives different answers in > newer numpy versions > d) it's not so bad if newer numpy gives an intelligible error for code > that used to work. > > So, how about a slight modification of your proposal? > > 1) Raise deprecation warning for np.outer for non 1D arrays for a few > versions, with depraction in favor of np.multiply.outer, then > 2) Raise error for np.outer on non 1D arrays > > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Thu Apr 16 18:38:09 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 16 Apr 2015 18:38:09 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> Message-ID: On Thu, Apr 16, 2015 at 6:32 PM, Nathaniel Smith wrote: > On Thu, Apr 16, 2015 at 6:19 PM, Neil Girdhar > wrote: > > Actually, looking at the docs, numpy.outer is *only* defined for 1-d > > vectors. Should anyone who used it with multi-dimensional arrays have an > > expectation that it will keep working in the same way? > > Yes. Generally what we do is more important than what we say we do. > Changing behaviour can break code. Changing docs can change whose > "fault" this is, but broken code is still broken code. And if you put > on your user hat, what do you do when numpy acts weird -- shake your > fist at the heavens and give up, or sigh and update your code to > match? It's pretty common for even undocumented behaviour to still be > depended on. > > Also FWIW, np.outer's docstring says "Input is flattened if not > already 1-dimensional", so we actually did document this. > > Ah, yeah, somehow I missed that! > -n > > -- > Nathaniel J. Smith -- http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Apr 16 18:44:13 2015 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 16 Apr 2015 18:44:13 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> Message-ID: On Thu, Apr 16, 2015 at 6:37 PM, Neil Girdhar wrote: > I can always put np.outer = np.multiply.outer at the start of my code to get > what I want. Or could that break things? Please don't do this. It means that there are any calls to np.outer in libraries you are using (or other libraries that are also used by anyone who is using your code), they will silently get np.multiply.outer instead of np.outer. And then if this breaks things we end up getting extremely confusing bug reports from angry users who think we broke np.outer. Just do 'outer = np.multiply.outer' and leave the np namespace alone :-) -n -- Nathaniel J. Smith -- http://vorpus.org From mistersheik at gmail.com Thu Apr 16 18:44:56 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Thu, 16 Apr 2015 18:44:56 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> Message-ID: Right. On Thu, Apr 16, 2015 at 6:44 PM, Nathaniel Smith wrote: > On Thu, Apr 16, 2015 at 6:37 PM, Neil Girdhar > wrote: > > I can always put np.outer = np.multiply.outer at the start of my code to > get > > what I want. Or could that break things? > > Please don't do this. It means that there are any calls to np.outer in > libraries you are using (or other libraries that are also used by > anyone who is using your code), they will silently get > np.multiply.outer instead of np.outer. And then if this breaks things > we end up getting extremely confusing bug reports from angry users who > think we broke np.outer. > > Just do 'outer = np.multiply.outer' and leave the np namespace alone :-) > > -n > > -- > Nathaniel J. Smith -- http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nicolas.Rougier at inria.fr Fri Apr 17 07:38:05 2015 From: Nicolas.Rougier at inria.fr (Nicolas P. Rougier) Date: Fri, 17 Apr 2015 13:38:05 +0200 Subject: [Numpy-discussion] EuroScipy 2015 : Call for talks, posters and tutorials [Reminder] Message-ID: <1D3622CF-21F1-4603-A60D-02BEBA818323@inria.fr> [Apology for cross-posting] Dear all, EuroScipy 2015, the annual conference on Python in science will take place in Cambridge, UK on 26-30 August 2015. The conference features two days of tutorials followed by two days of scientific talks & posters and an extra day dedicated to developer sprints. It is the major event in Europe in the field of technical/scientific computing within the Python ecosystem. Data scientists, analysts, quants, PhD's, scientists and students from more than 20 countries attended the conference last year. The topics presented at EuroSciPy are very diverse, with a focus on advanced software engineering and original uses of Python and its scientific libraries, either in theoretical or experimental research, from both academia and the industry. Submissions for posters, talks & tutorials (beginner and advanced) are welcome on our website at http://www.euroscipy.org/2015/ Sprint proposals should be addressed directly to the organisation at euroscipy-org at python.org Important dates Mar 24, 2015 Call for talks, posters & tutorials Apr 30, 2015 Talk and tutorials submission deadline May 1, 2015 Registration opens May 30, 2015 Final program announced Jun 15, 2015 Early-bird registration ends Aug 26-27, 2015 Tutorials Aug 28-29, 2015 Main conference Aug 30, 2015 Sprints We look forward to an exciting conference and hope to see you in Cambridge The EuroSciPy 2015 Team - http://www.euroscipy.org/2015/ From sebastian at sipsolutions.net Fri Apr 17 10:07:49 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 17 Apr 2015 16:07:49 +0200 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> Message-ID: <1429279669.3440.2.camel@sipsolutions.net> On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote: > Hi, > > > So, how about a slight modification of your proposal? > > 1) Raise deprecation warning for np.outer for non 1D arrays for a few > versions, with depraction in favor of np.multiply.outer, then > 2) Raise error for np.outer on non 1D arrays > I think that was Neil's proposal a bit earlier, too. +1 for it in any case, since at least for the moment I doubt outer is used a lot for non 1-d arrays. Possible step 3) make it work on higher dims after a long period. - Sebastian > Best, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From josef.pktd at gmail.com Fri Apr 17 10:47:47 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 17 Apr 2015 10:47:47 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: <1429279669.3440.2.camel@sipsolutions.net> References: <1429086723.5810.5.camel@sipsolutions.net> <1429279669.3440.2.camel@sipsolutions.net> Message-ID: On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg wrote: > On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote: >> Hi, >> > >> >> So, how about a slight modification of your proposal? >> >> 1) Raise deprecation warning for np.outer for non 1D arrays for a few >> versions, with depraction in favor of np.multiply.outer, then >> 2) Raise error for np.outer on non 1D arrays >> > > I think that was Neil's proposal a bit earlier, too. +1 for it in any > case, since at least for the moment I doubt outer is used a lot for non > 1-d arrays. Possible step 3) make it work on higher dims after a long > period. sounds ok to me Some random comments of what I remember or guess in terms of usage I think there are at most very few np.outer usages with 2d or higher dimension. (statsmodels has two models that switch between 2d and 1d parameterization where we don't use outer but it has similar characteristics. However, we need to control the ravel order, which IIRC is Fortran) The current behavior of 0-D scalars in the initial post might be useful if a numpy function returns a scalar instead of a 1-D array in size=1. np.diag which is a common case, doesn't return a scalar (in my version of numpy). I don't know any use case where I would ever want to have the 2d behavior of np.multiply.outer. I guess we will or would have applications for outer along an axis, for example if x.shape = (100, 10), then we have x[:,None, :] * x[:, :, None] (I guess) Something like this shows up reasonably often in econometrics as "Outer Product". However in most cases we can avoid constructing this matrix and get the final results in a more memory efficient or faster way. (example an array of covariance matrices) Josef > > - Sebastian > > >> Best, >> >> Matthew >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sebastian at sipsolutions.net Fri Apr 17 10:59:58 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 17 Apr 2015 16:59:58 +0200 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> <1429279669.3440.2.camel@sipsolutions.net> Message-ID: <1429282798.3167.1.camel@sipsolutions.net> On Fr, 2015-04-17 at 10:47 -0400, josef.pktd at gmail.com wrote: > On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg > wrote: > > On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote: > >> Hi, > >> > > > >> > >> So, how about a slight modification of your proposal? > >> > >> 1) Raise deprecation warning for np.outer for non 1D arrays for a few > >> versions, with depraction in favor of np.multiply.outer, then > >> 2) Raise error for np.outer on non 1D arrays > >> > > > > I think that was Neil's proposal a bit earlier, too. +1 for it in any > > case, since at least for the moment I doubt outer is used a lot for non > > 1-d arrays. Possible step 3) make it work on higher dims after a long > > period. > > sounds ok to me > > Some random comments of what I remember or guess in terms of usage > > I think there are at most very few np.outer usages with 2d or higher dimension. > (statsmodels has two models that switch between 2d and 1d > parameterization where we don't use outer but it has similar > characteristics. However, we need to control the ravel order, which > IIRC is Fortran) > > The current behavior of 0-D scalars in the initial post might be > useful if a numpy function returns a scalar instead of a 1-D array in > size=1. np.diag which is a common case, doesn't return a scalar (in my > version of numpy). > > I don't know any use case where I would ever want to have the 2d > behavior of np.multiply.outer. > I guess we will or would have applications for outer along an axis, > for example if x.shape = (100, 10), then we have > x[:,None, :] * x[:, :, None] (I guess) > Something like this shows up reasonably often in econometrics as > "Outer Product". However in most cases we can avoid constructing this > matrix and get the final results in a more memory efficient or faster > way. > (example an array of covariance matrices) > So basically outer product of stacked vectors (fitting basically into how np.linalg functions now work). I think that might be a good idea, but even then we first need to do the deprecation and it would be a long term project. Or you add np.linalg.outer or such sooner and in the longer run it will be an alias to that instead of np.multiple.outer. > Josef > > > > > > > > - Sebastian > > > > > >> Best, > >> > >> Matthew > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From josef.pktd at gmail.com Fri Apr 17 11:11:32 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 17 Apr 2015 11:11:32 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: <1429282798.3167.1.camel@sipsolutions.net> References: <1429086723.5810.5.camel@sipsolutions.net> <1429279669.3440.2.camel@sipsolutions.net> <1429282798.3167.1.camel@sipsolutions.net> Message-ID: On Fri, Apr 17, 2015 at 10:59 AM, Sebastian Berg wrote: > On Fr, 2015-04-17 at 10:47 -0400, josef.pktd at gmail.com wrote: >> On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg >> wrote: >> > On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote: >> >> Hi, >> >> >> > >> >> >> >> So, how about a slight modification of your proposal? >> >> >> >> 1) Raise deprecation warning for np.outer for non 1D arrays for a few >> >> versions, with depraction in favor of np.multiply.outer, then >> >> 2) Raise error for np.outer on non 1D arrays >> >> >> > >> > I think that was Neil's proposal a bit earlier, too. +1 for it in any >> > case, since at least for the moment I doubt outer is used a lot for non >> > 1-d arrays. Possible step 3) make it work on higher dims after a long >> > period. >> >> sounds ok to me >> >> Some random comments of what I remember or guess in terms of usage >> >> I think there are at most very few np.outer usages with 2d or higher dimension. >> (statsmodels has two models that switch between 2d and 1d >> parameterization where we don't use outer but it has similar >> characteristics. However, we need to control the ravel order, which >> IIRC is Fortran) >> >> The current behavior of 0-D scalars in the initial post might be >> useful if a numpy function returns a scalar instead of a 1-D array in >> size=1. np.diag which is a common case, doesn't return a scalar (in my >> version of numpy). >> >> I don't know any use case where I would ever want to have the 2d >> behavior of np.multiply.outer. >> I guess we will or would have applications for outer along an axis, >> for example if x.shape = (100, 10), then we have >> x[:,None, :] * x[:, :, None] (I guess) >> Something like this shows up reasonably often in econometrics as >> "Outer Product". However in most cases we can avoid constructing this >> matrix and get the final results in a more memory efficient or faster >> way. >> (example an array of covariance matrices) >> > > So basically outer product of stacked vectors (fitting basically into > how np.linalg functions now work). I think that might be a good idea, > but even then we first need to do the deprecation and it would be a long > term project. Or you add np.linalg.outer or such sooner and in the > longer run it will be an alias to that instead of np.multiple.outer. Essentially yes, but I don't have an opinion about location or implementation in numpy, nor do I know enough. I always considered np.outer conceptually as belonging to linalg that provides a more convenient interface than np.dot if both arrays are 1-D. (no need to add extra axis and transpose) Josef > > >> Josef >> >> >> >> >> > >> > - Sebastian >> > >> > >> >> Best, >> >> >> >> Matthew >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From mistersheik at gmail.com Fri Apr 17 11:22:41 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 17 Apr 2015 11:22:41 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> <1429279669.3440.2.camel@sipsolutions.net> Message-ID: On Fri, Apr 17, 2015 at 10:47 AM, wrote: > On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg > wrote: > > On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote: > >> Hi, > >> > > > >> > >> So, how about a slight modification of your proposal? > >> > >> 1) Raise deprecation warning for np.outer for non 1D arrays for a few > >> versions, with depraction in favor of np.multiply.outer, then > >> 2) Raise error for np.outer on non 1D arrays > >> > > > > I think that was Neil's proposal a bit earlier, too. +1 for it in any > > case, since at least for the moment I doubt outer is used a lot for non > > 1-d arrays. Possible step 3) make it work on higher dims after a long > > period. > > sounds ok to me > > Some random comments of what I remember or guess in terms of usage > > I think there are at most very few np.outer usages with 2d or higher > dimension. > (statsmodels has two models that switch between 2d and 1d > parameterization where we don't use outer but it has similar > characteristics. However, we need to control the ravel order, which > IIRC is Fortran) > > The current behavior of 0-D scalars in the initial post might be > useful if a numpy function returns a scalar instead of a 1-D array in > size=1. np.diag which is a common case, doesn't return a scalar (in my > version of numpy). > > I don't know any use case where I would ever want to have the 2d > behavior of np.multiply.outer. > My use case is pretty simple. Given an input vector x, and a weight matrix W, and a model y=Wx, I calculate the gradient of the loss L with respect W. It is the outer product of x with the vector of gradients dL/dy. So the code is simply: W -= outer(x, dL_by_dy) Sometimes, I have some x_indices and y_indices. Now I want to do: W[x_indices, y_indices] -= outer(x[x_indices], dL_by_dy[y_indices]) Unfortunately, if x_indices or y_indices are "int" or slice in some way that removes a dimension, the left side will have fewer dimensions than the right. np.multipy.outer does the right thing without the ugly cases: if isinstance(x_indices, int): ? # ugly hacks follow. I guess we will or would have applications for outer along an axis, > for example if x.shape = (100, 10), then we have > x[:,None, :] * x[:, :, None] (I guess) > Something like this shows up reasonably often in econometrics as > "Outer Product". However in most cases we can avoid constructing this > matrix and get the final results in a more memory efficient or faster > way. > (example an array of covariance matrices) > Not sure I see this. outer(a, b) should return something that has shape: (a.shape + b.shape). If you're doing it "along an axis", you mean you're reshuffling the resulting shape vector? > > Josef > > > > > > > > - Sebastian > > > > > >> Best, > >> > >> Matthew > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Fri Apr 17 11:30:03 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 17 Apr 2015 11:30:03 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> <1429279669.3440.2.camel@sipsolutions.net> <1429282798.3167.1.camel@sipsolutions.net> Message-ID: This relationship between outer an dot only holds for vectors. For tensors, and other kinds of vector spaces, I'm not sure if outer products and dot products have anything to do with each other. On Fri, Apr 17, 2015 at 11:11 AM, wrote: > On Fri, Apr 17, 2015 at 10:59 AM, Sebastian Berg > wrote: > > On Fr, 2015-04-17 at 10:47 -0400, josef.pktd at gmail.com wrote: > >> On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg > >> wrote: > >> > On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote: > >> >> Hi, > >> >> > >> > > >> >> > >> >> So, how about a slight modification of your proposal? > >> >> > >> >> 1) Raise deprecation warning for np.outer for non 1D arrays for a few > >> >> versions, with depraction in favor of np.multiply.outer, then > >> >> 2) Raise error for np.outer on non 1D arrays > >> >> > >> > > >> > I think that was Neil's proposal a bit earlier, too. +1 for it in any > >> > case, since at least for the moment I doubt outer is used a lot for > non > >> > 1-d arrays. Possible step 3) make it work on higher dims after a long > >> > period. > >> > >> sounds ok to me > >> > >> Some random comments of what I remember or guess in terms of usage > >> > >> I think there are at most very few np.outer usages with 2d or higher > dimension. > >> (statsmodels has two models that switch between 2d and 1d > >> parameterization where we don't use outer but it has similar > >> characteristics. However, we need to control the ravel order, which > >> IIRC is Fortran) > >> > >> The current behavior of 0-D scalars in the initial post might be > >> useful if a numpy function returns a scalar instead of a 1-D array in > >> size=1. np.diag which is a common case, doesn't return a scalar (in my > >> version of numpy). > >> > >> I don't know any use case where I would ever want to have the 2d > >> behavior of np.multiply.outer. > >> I guess we will or would have applications for outer along an axis, > >> for example if x.shape = (100, 10), then we have > >> x[:,None, :] * x[:, :, None] (I guess) > >> Something like this shows up reasonably often in econometrics as > >> "Outer Product". However in most cases we can avoid constructing this > >> matrix and get the final results in a more memory efficient or faster > >> way. > >> (example an array of covariance matrices) > >> > > > > So basically outer product of stacked vectors (fitting basically into > > how np.linalg functions now work). I think that might be a good idea, > > but even then we first need to do the deprecation and it would be a long > > term project. Or you add np.linalg.outer or such sooner and in the > > longer run it will be an alias to that instead of np.multiple.outer. > > > Essentially yes, but I don't have an opinion about location or > implementation in numpy, nor do I know enough. > > I always considered np.outer conceptually as belonging to linalg that > provides a more convenient interface than np.dot if both arrays are > 1-D. (no need to add extra axis and transpose) > > Josef > > > > > > >> Josef > >> > >> > >> > >> > >> > > >> > - Sebastian > >> > > >> > > >> >> Best, > >> >> > >> >> Matthew > >> >> _______________________________________________ > >> >> NumPy-Discussion mailing list > >> >> NumPy-Discussion at scipy.org > >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> >> > >> > > >> > > >> > _______________________________________________ > >> > NumPy-Discussion mailing list > >> > NumPy-Discussion at scipy.org > >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Apr 17 11:59:21 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 17 Apr 2015 11:59:21 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> <1429279669.3440.2.camel@sipsolutions.net> <1429282798.3167.1.camel@sipsolutions.net> Message-ID: Neil, please reply inline or at the bottom which is customary for numpy scipy related mailing lists. It's sometimes difficult to figure out what the context of your reply is. (and the context is all over the place) On Fri, Apr 17, 2015 at 11:30 AM, Neil Girdhar wrote: > This relationship between outer an dot only holds for vectors. For tensors, > and other kinds of vector spaces, I'm not sure if outer products and dot > products have anything to do with each other. That may be the case, and I never figured out what to do with dot in more than 2 dimensions. 90% (a guess) of what I work on or see is in a 2-D or vectorized 3-D world with 2-D linalg, or can be reduced to it. (general tensor algebra creates endless loops in my brain :) Josef > > On Fri, Apr 17, 2015 at 11:11 AM, wrote: >> >> On Fri, Apr 17, 2015 at 10:59 AM, Sebastian Berg >> wrote: >> > On Fr, 2015-04-17 at 10:47 -0400, josef.pktd at gmail.com wrote: >> >> On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg >> >> wrote: >> >> > On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote: >> >> >> Hi, >> >> >> >> >> > >> >> >> >> >> >> So, how about a slight modification of your proposal? >> >> >> >> >> >> 1) Raise deprecation warning for np.outer for non 1D arrays for a >> >> >> few >> >> >> versions, with depraction in favor of np.multiply.outer, then >> >> >> 2) Raise error for np.outer on non 1D arrays >> >> >> >> >> > >> >> > I think that was Neil's proposal a bit earlier, too. +1 for it in any >> >> > case, since at least for the moment I doubt outer is used a lot for >> >> > non >> >> > 1-d arrays. Possible step 3) make it work on higher dims after a long >> >> > period. >> >> >> >> sounds ok to me >> >> >> >> Some random comments of what I remember or guess in terms of usage >> >> >> >> I think there are at most very few np.outer usages with 2d or higher >> >> dimension. >> >> (statsmodels has two models that switch between 2d and 1d >> >> parameterization where we don't use outer but it has similar >> >> characteristics. However, we need to control the ravel order, which >> >> IIRC is Fortran) >> >> >> >> The current behavior of 0-D scalars in the initial post might be >> >> useful if a numpy function returns a scalar instead of a 1-D array in >> >> size=1. np.diag which is a common case, doesn't return a scalar (in my >> >> version of numpy). >> >> >> >> I don't know any use case where I would ever want to have the 2d >> >> behavior of np.multiply.outer. >> >> I guess we will or would have applications for outer along an axis, >> >> for example if x.shape = (100, 10), then we have >> >> x[:,None, :] * x[:, :, None] (I guess) >> >> Something like this shows up reasonably often in econometrics as >> >> "Outer Product". However in most cases we can avoid constructing this >> >> matrix and get the final results in a more memory efficient or faster >> >> way. >> >> (example an array of covariance matrices) >> >> >> > >> > So basically outer product of stacked vectors (fitting basically into >> > how np.linalg functions now work). I think that might be a good idea, >> > but even then we first need to do the deprecation and it would be a long >> > term project. Or you add np.linalg.outer or such sooner and in the >> > longer run it will be an alias to that instead of np.multiple.outer. >> >> >> Essentially yes, but I don't have an opinion about location or >> implementation in numpy, nor do I know enough. >> >> I always considered np.outer conceptually as belonging to linalg that >> provides a more convenient interface than np.dot if both arrays are >> 1-D. (no need to add extra axis and transpose) >> >> Josef >> >> > >> > >> >> Josef >> >> >> >> >> >> >> >> >> >> > >> >> > - Sebastian >> >> > >> >> > >> >> >> Best, >> >> >> >> >> >> Matthew >> >> >> _______________________________________________ >> >> >> NumPy-Discussion mailing list >> >> >> NumPy-Discussion at scipy.org >> >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> >> > >> >> > >> >> > _______________________________________________ >> >> > NumPy-Discussion mailing list >> >> > NumPy-Discussion at scipy.org >> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Fri Apr 17 12:09:27 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 17 Apr 2015 12:09:27 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> <1429279669.3440.2.camel@sipsolutions.net> Message-ID: On Fri, Apr 17, 2015 at 11:22 AM, Neil Girdhar wrote: > > > On Fri, Apr 17, 2015 at 10:47 AM, wrote: >> >> On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg >> wrote: >> > On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote: >> >> Hi, >> >> >> > >> >> >> >> So, how about a slight modification of your proposal? >> >> >> >> 1) Raise deprecation warning for np.outer for non 1D arrays for a few >> >> versions, with depraction in favor of np.multiply.outer, then >> >> 2) Raise error for np.outer on non 1D arrays >> >> >> > >> > I think that was Neil's proposal a bit earlier, too. +1 for it in any >> > case, since at least for the moment I doubt outer is used a lot for non >> > 1-d arrays. Possible step 3) make it work on higher dims after a long >> > period. >> >> sounds ok to me >> >> Some random comments of what I remember or guess in terms of usage >> >> I think there are at most very few np.outer usages with 2d or higher >> dimension. >> (statsmodels has two models that switch between 2d and 1d >> parameterization where we don't use outer but it has similar >> characteristics. However, we need to control the ravel order, which >> IIRC is Fortran) >> >> The current behavior of 0-D scalars in the initial post might be >> useful if a numpy function returns a scalar instead of a 1-D array in >> size=1. np.diag which is a common case, doesn't return a scalar (in my >> version of numpy). >> >> I don't know any use case where I would ever want to have the 2d >> behavior of np.multiply.outer. > I only understand part of your example, but it looks similar to what we are doing in statsmodels. > > My use case is pretty simple. Given an input vector x, and a weight matrix > W, and a model y=Wx, I calculate the gradient of the loss L with respect W. > It is the outer product of x with the vector of gradients dL/dy. So the > code is simply: > > W -= outer(x, dL_by_dy) if you sum/subtract over all the values, isn't this the same as np.dot(x, dL_by_dy) > > Sometimes, I have some x_indices and y_indices. Now I want to do: > > W[x_indices, y_indices] -= outer(x[x_indices], dL_by_dy[y_indices]) > > Unfortunately, if x_indices or y_indices are "int" or slice in some way that > removes a dimension, the left side will have fewer dimensions than the > right. np.multipy.outer does the right thing without the ugly cases: > > if isinstance(x_indices, int): ? # ugly hacks follow. My usual hacks are either to use np.atleast_1d or np.atleast_1d or np.squeeze if there is shape mismatch in some cases. > >> I guess we will or would have applications for outer along an axis, >> for example if x.shape = (100, 10), then we have >> x[:,None, :] * x[:, :, None] (I guess) >> Something like this shows up reasonably often in econometrics as >> "Outer Product". However in most cases we can avoid constructing this >> matrix and get the final results in a more memory efficient or faster >> way. >> (example an array of covariance matrices) > > > Not sure I see this. outer(a, b) should return something that has shape: > (a.shape + b.shape). If you're doing it "along an axis", you mean you're > reshuffling the resulting shape vector? No I'm not reshaping the full tensor product. It's a vectorized version of looping over independent outer products np.array([outer(xi, yi) for xi,yi in zip(x, y)]) (which I would never use with outer) but I have code that works similar for a reduce (or reduce_at) loop over this. Josef >> >> >> Josef >> >> >> >> >> > >> > - Sebastian >> > >> > >> >> Best, >> >> >> >> Matthew >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From mistersheik at gmail.com Fri Apr 17 12:16:01 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 17 Apr 2015 12:16:01 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> <1429279669.3440.2.camel@sipsolutions.net> Message-ID: On Fri, Apr 17, 2015 at 12:09 PM, wrote: > On Fri, Apr 17, 2015 at 11:22 AM, Neil Girdhar > wrote: > > > > > > On Fri, Apr 17, 2015 at 10:47 AM, wrote: > >> > >> On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg > >> wrote: > >> > On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote: > >> >> Hi, > >> >> > >> > > >> >> > >> >> So, how about a slight modification of your proposal? > >> >> > >> >> 1) Raise deprecation warning for np.outer for non 1D arrays for a few > >> >> versions, with depraction in favor of np.multiply.outer, then > >> >> 2) Raise error for np.outer on non 1D arrays > >> >> > >> > > >> > I think that was Neil's proposal a bit earlier, too. +1 for it in any > >> > case, since at least for the moment I doubt outer is used a lot for > non > >> > 1-d arrays. Possible step 3) make it work on higher dims after a long > >> > period. > >> > >> sounds ok to me > >> > >> Some random comments of what I remember or guess in terms of usage > >> > >> I think there are at most very few np.outer usages with 2d or higher > >> dimension. > >> (statsmodels has two models that switch between 2d and 1d > >> parameterization where we don't use outer but it has similar > >> characteristics. However, we need to control the ravel order, which > >> IIRC is Fortran) > >> > >> The current behavior of 0-D scalars in the initial post might be > >> useful if a numpy function returns a scalar instead of a 1-D array in > >> size=1. np.diag which is a common case, doesn't return a scalar (in my > >> version of numpy). > >> > >> I don't know any use case where I would ever want to have the 2d > >> behavior of np.multiply.outer. > > > > I only understand part of your example, but it looks similar to what > we are doing in statsmodels. > > > > > My use case is pretty simple. Given an input vector x, and a weight > matrix > > W, and a model y=Wx, I calculate the gradient of the loss L with respect > W. > > It is the outer product of x with the vector of gradients dL/dy. So the > > code is simply: > > > > W -= outer(x, dL_by_dy) > > if you sum/subtract over all the values, isn't this the same as > np.dot(x, dL_by_dy) > > What? Matrix subtraction is element-wise: In [1]: x = np.array([2,3,4]) In [2]: dL_by_dy = np.array([7,9]) In [5]: W = np.zeros((3, 2)) In [6]: W -= np.outer(x, dL_by_dy) In [7]: W Out[7]: array([[-14., -18.], [-21., -27.], [-28., -36.]]) > > > Sometimes, I have some x_indices and y_indices. Now I want to do: > > > > W[x_indices, y_indices] -= outer(x[x_indices], dL_by_dy[y_indices]) > > > > Unfortunately, if x_indices or y_indices are "int" or slice in some way > that > > removes a dimension, the left side will have fewer dimensions than the > > right. np.multipy.outer does the right thing without the ugly cases: > > > > if isinstance(x_indices, int): ? # ugly hacks follow. > > My usual hacks are either to use np.atleast_1d or np.atleast_1d or > np.squeeze if there is shape mismatch in some cases. > Yes, but in this case, the left side is the problem, which has too few dimensions. So atleast_1d doesn't work. I was conditionally squeezing, but that is extremely ugly. Especially if you're conditionally squeezing based on both x_indices and y_indices. > > > > >> I guess we will or would have applications for outer along an axis, > >> for example if x.shape = (100, 10), then we have > >> x[:,None, :] * x[:, :, None] (I guess) > >> Something like this shows up reasonably often in econometrics as > >> "Outer Product". However in most cases we can avoid constructing this > >> matrix and get the final results in a more memory efficient or faster > >> way. > >> (example an array of covariance matrices) > > > > > > Not sure I see this. outer(a, b) should return something that has shape: > > (a.shape + b.shape). If you're doing it "along an axis", you mean you're > > reshuffling the resulting shape vector? > > No I'm not reshaping the full tensor product. > > It's a vectorized version of looping over independent outer products > > np.array([outer(xi, yi) for xi,yi in zip(x, y)]) > (which I would never use with outer) > > but I have code that works similar for a reduce (or reduce_at) loop over > this. > > Josef > > > >> > >> > >> Josef > >> > >> > >> > >> > >> > > >> > - Sebastian > >> > > >> > > >> >> Best, > >> >> > >> >> Matthew > >> >> _______________________________________________ > >> >> NumPy-Discussion mailing list > >> >> NumPy-Discussion at scipy.org > >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> >> > >> > > >> > > >> > _______________________________________________ > >> > NumPy-Discussion mailing list > >> > NumPy-Discussion at scipy.org > >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Fri Apr 17 12:19:21 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Fri, 17 Apr 2015 12:19:21 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> <1429279669.3440.2.camel@sipsolutions.net> Message-ID: On Fri, Apr 17, 2015 at 12:09 PM, wrote: > On Fri, Apr 17, 2015 at 11:22 AM, Neil Girdhar > wrote: > > > > > > On Fri, Apr 17, 2015 at 10:47 AM, wrote: > >> > >> On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg > >> wrote: > >> > On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote: > >> >> Hi, > >> >> > >> > > >> >> > >> >> So, how about a slight modification of your proposal? > >> >> > >> >> 1) Raise deprecation warning for np.outer for non 1D arrays for a few > >> >> versions, with depraction in favor of np.multiply.outer, then > >> >> 2) Raise error for np.outer on non 1D arrays > >> >> > >> > > >> > I think that was Neil's proposal a bit earlier, too. +1 for it in any > >> > case, since at least for the moment I doubt outer is used a lot for > non > >> > 1-d arrays. Possible step 3) make it work on higher dims after a long > >> > period. > >> > >> sounds ok to me > >> > >> Some random comments of what I remember or guess in terms of usage > >> > >> I think there are at most very few np.outer usages with 2d or higher > >> dimension. > >> (statsmodels has two models that switch between 2d and 1d > >> parameterization where we don't use outer but it has similar > >> characteristics. However, we need to control the ravel order, which > >> IIRC is Fortran) > >> > >> The current behavior of 0-D scalars in the initial post might be > >> useful if a numpy function returns a scalar instead of a 1-D array in > >> size=1. np.diag which is a common case, doesn't return a scalar (in my > >> version of numpy). > >> > >> I don't know any use case where I would ever want to have the 2d > >> behavior of np.multiply.outer. > > > > I only understand part of your example, but it looks similar to what > we are doing in statsmodels. > > > > > My use case is pretty simple. Given an input vector x, and a weight > matrix > > W, and a model y=Wx, I calculate the gradient of the loss L with respect > W. > > It is the outer product of x with the vector of gradients dL/dy. So the > > code is simply: > > > > W -= outer(x, dL_by_dy) > > if you sum/subtract over all the values, isn't this the same as > np.dot(x, dL_by_dy) > > > > > > Sometimes, I have some x_indices and y_indices. Now I want to do: > > > > W[x_indices, y_indices] -= outer(x[x_indices], dL_by_dy[y_indices]) > > > > Unfortunately, if x_indices or y_indices are "int" or slice in some way > that > > removes a dimension, the left side will have fewer dimensions than the > > right. np.multipy.outer does the right thing without the ugly cases: > > > > if isinstance(x_indices, int): ? # ugly hacks follow. > > My usual hacks are either to use np.atleast_1d or np.atleast_1d or > np.squeeze if there is shape mismatch in some cases. > > > > >> I guess we will or would have applications for outer along an axis, > >> for example if x.shape = (100, 10), then we have > >> x[:,None, :] * x[:, :, None] (I guess) > >> Something like this shows up reasonably often in econometrics as > >> "Outer Product". However in most cases we can avoid constructing this > >> matrix and get the final results in a more memory efficient or faster > >> way. > >> (example an array of covariance matrices) > > > > > > Not sure I see this. outer(a, b) should return something that has shape: > > (a.shape + b.shape). If you're doing it "along an axis", you mean you're > > reshuffling the resulting shape vector? > > No I'm not reshaping the full tensor product. > > It's a vectorized version of looping over independent outer products > > np.array([outer(xi, yi) for xi,yi in zip(x, y)]) > (which I would never use with outer) > > but I have code that works similar for a reduce (or reduce_at) loop over > this. > Hmmm? I see what your'e writing. This doesn't really have a geometrical meaning as far as I can tell. You're interpreting the first index of x, y, and your result, as if it were a list ? as if x and y are lists of vectors, and you want a list of matrices. That really should be written as a loop in my opinion. > > Josef > > > >> > >> > >> Josef > >> > >> > >> > >> > >> > > >> > - Sebastian > >> > > >> > > >> >> Best, > >> >> > >> >> Matthew > >> >> _______________________________________________ > >> >> NumPy-Discussion mailing list > >> >> NumPy-Discussion at scipy.org > >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> >> > >> > > >> > > >> > _______________________________________________ > >> > NumPy-Discussion mailing list > >> > NumPy-Discussion at scipy.org > >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri Apr 17 12:40:09 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 17 Apr 2015 12:40:09 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> <1429279669.3440.2.camel@sipsolutions.net> Message-ID: On Fri, Apr 17, 2015 at 12:16 PM, Neil Girdhar wrote: > > > On Fri, Apr 17, 2015 at 12:09 PM, wrote: >> >> On Fri, Apr 17, 2015 at 11:22 AM, Neil Girdhar >> wrote: >> > >> > >> > On Fri, Apr 17, 2015 at 10:47 AM, wrote: >> >> >> >> On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg >> >> wrote: >> >> > On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote: >> >> >> Hi, >> >> >> >> >> > >> >> >> >> >> >> So, how about a slight modification of your proposal? >> >> >> >> >> >> 1) Raise deprecation warning for np.outer for non 1D arrays for a >> >> >> few >> >> >> versions, with depraction in favor of np.multiply.outer, then >> >> >> 2) Raise error for np.outer on non 1D arrays >> >> >> >> >> > >> >> > I think that was Neil's proposal a bit earlier, too. +1 for it in any >> >> > case, since at least for the moment I doubt outer is used a lot for >> >> > non >> >> > 1-d arrays. Possible step 3) make it work on higher dims after a long >> >> > period. >> >> >> >> sounds ok to me >> >> >> >> Some random comments of what I remember or guess in terms of usage >> >> >> >> I think there are at most very few np.outer usages with 2d or higher >> >> dimension. >> >> (statsmodels has two models that switch between 2d and 1d >> >> parameterization where we don't use outer but it has similar >> >> characteristics. However, we need to control the ravel order, which >> >> IIRC is Fortran) >> >> >> >> The current behavior of 0-D scalars in the initial post might be >> >> useful if a numpy function returns a scalar instead of a 1-D array in >> >> size=1. np.diag which is a common case, doesn't return a scalar (in my >> >> version of numpy). >> >> >> >> I don't know any use case where I would ever want to have the 2d >> >> behavior of np.multiply.outer. >> > >> >> I only understand part of your example, but it looks similar to what >> we are doing in statsmodels. >> >> > >> > My use case is pretty simple. Given an input vector x, and a weight >> > matrix >> > W, and a model y=Wx, I calculate the gradient of the loss L with respect >> > W. >> > It is the outer product of x with the vector of gradients dL/dy. So the >> > code is simply: >> > >> > W -= outer(x, dL_by_dy) >> >> if you sum/subtract over all the values, isn't this the same as >> np.dot(x, dL_by_dy) >> > > What? Matrix subtraction is element-wise: > > In [1]: x = np.array([2,3,4]) > > In [2]: dL_by_dy = np.array([7,9]) > > In [5]: W = np.zeros((3, 2)) > > In [6]: W -= np.outer(x, dL_by_dy) > > In [7]: W > Out[7]: > array([[-14., -18.], > [-21., -27.], > [-28., -36.]]) Ok, different use case mine are more like variations on the following >>> a1 = np.arange(18).reshape(6,3) >>> a2 = np.arange(12).reshape(6, 2) >>> index = [1, 2, 5] text book version >>> np.sum([np.outer(a1[i], a2[i]) for i in index], 0) array([[180, 204], [196, 223], [212, 242]]) simpler >>> np.dot(a1[index].T, a2[index]) array([[180, 204], [196, 223], [212, 242]]) > >> > >> > Sometimes, I have some x_indices and y_indices. Now I want to do: >> > >> > W[x_indices, y_indices] -= outer(x[x_indices], dL_by_dy[y_indices]) >> > >> > Unfortunately, if x_indices or y_indices are "int" or slice in some way >> > that >> > removes a dimension, the left side will have fewer dimensions than the >> > right. np.multipy.outer does the right thing without the ugly cases: >> > >> > if isinstance(x_indices, int): ? # ugly hacks follow. >> >> My usual hacks are either to use np.atleast_1d or np.atleast_1d or >> np.squeeze if there is shape mismatch in some cases. > > > Yes, but in this case, the left side is the problem, which has too few > dimensions. So atleast_1d doesn't work. I was conditionally squeezing, but > that is extremely ugly. Especially if you're conditionally squeezing based > on both x_indices and y_indices. I don't remember if I ever used something like this >>> a1[0, 1] 1 >>> a1[np.atleast_1d(0), np.atleast_1d(1)] array([1]) >>> a1[np.atleast_1d(0), np.atleast_1d(1)] = [[100]] >>> a1[0, 1] = [[100]] Traceback (most recent call last): File "", line 1, in a1[0, 1] = [[100]] ValueError: setting an array element with a sequence. Josef > >> >> >> > >> >> I guess we will or would have applications for outer along an axis, >> >> for example if x.shape = (100, 10), then we have >> >> x[:,None, :] * x[:, :, None] (I guess) >> >> Something like this shows up reasonably often in econometrics as >> >> "Outer Product". However in most cases we can avoid constructing this >> >> matrix and get the final results in a more memory efficient or faster >> >> way. >> >> (example an array of covariance matrices) >> > >> > >> > Not sure I see this. outer(a, b) should return something that has >> > shape: >> > (a.shape + b.shape). If you're doing it "along an axis", you mean >> > you're >> > reshuffling the resulting shape vector? >> >> No I'm not reshaping the full tensor product. >> >> It's a vectorized version of looping over independent outer products >> >> np.array([outer(xi, yi) for xi,yi in zip(x, y)]) >> (which I would never use with outer) >> >> but I have code that works similar for a reduce (or reduce_at) loop over >> this. >> >> Josef >> >> >> >> >> >> >> >> Josef >> >> >> >> >> >> >> >> >> >> > >> >> > - Sebastian >> >> > >> >> > >> >> >> Best, >> >> >> >> >> >> Matthew >> >> >> _______________________________________________ >> >> >> NumPy-Discussion mailing list >> >> >> NumPy-Discussion at scipy.org >> >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> >> > >> >> > >> >> > _______________________________________________ >> >> > NumPy-Discussion mailing list >> >> > NumPy-Discussion at scipy.org >> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sebastian at sipsolutions.net Fri Apr 17 14:56:26 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 17 Apr 2015 20:56:26 +0200 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> <1429279669.3440.2.camel@sipsolutions.net> Message-ID: <1429296986.12410.0.camel@sipsolutions.net> On Fr, 2015-04-17 at 12:40 -0400, josef.pktd at gmail.com wrote: > On Fri, Apr 17, 2015 at 12:16 PM, Neil Girdhar wrote: > > > > > > On Fri, Apr 17, 2015 at 12:09 PM, wrote: > >> > >> On Fri, Apr 17, 2015 at 11:22 AM, Neil Girdhar > >> wrote: > >> > > >> > > >> > On Fri, Apr 17, 2015 at 10:47 AM, wrote: > >> >> > >> >> On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg > >> >> wrote: > >> >> > On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote: > >> >> >> Hi, > >> >> >> > >> >> > > >> >> >> > >> >> >> So, how about a slight modification of your proposal? > >> >> >> > >> >> >> 1) Raise deprecation warning for np.outer for non 1D arrays for a > >> >> >> few > >> >> >> versions, with depraction in favor of np.multiply.outer, then > >> >> >> 2) Raise error for np.outer on non 1D arrays > >> >> >> > >> >> > > >> >> > I think that was Neil's proposal a bit earlier, too. +1 for it in any > >> >> > case, since at least for the moment I doubt outer is used a lot for > >> >> > non > >> >> > 1-d arrays. Possible step 3) make it work on higher dims after a long > >> >> > period. > >> >> > >> >> sounds ok to me > >> >> > >> >> Some random comments of what I remember or guess in terms of usage > >> >> > >> >> I think there are at most very few np.outer usages with 2d or higher > >> >> dimension. > >> >> (statsmodels has two models that switch between 2d and 1d > >> >> parameterization where we don't use outer but it has similar > >> >> characteristics. However, we need to control the ravel order, which > >> >> IIRC is Fortran) > >> >> > >> >> The current behavior of 0-D scalars in the initial post might be > >> >> useful if a numpy function returns a scalar instead of a 1-D array in > >> >> size=1. np.diag which is a common case, doesn't return a scalar (in my > >> >> version of numpy). > >> >> > >> >> I don't know any use case where I would ever want to have the 2d > >> >> behavior of np.multiply.outer. > >> > > >> > >> I only understand part of your example, but it looks similar to what > >> we are doing in statsmodels. > >> > >> > > >> > My use case is pretty simple. Given an input vector x, and a weight > >> > matrix > >> > W, and a model y=Wx, I calculate the gradient of the loss L with respect > >> > W. > >> > It is the outer product of x with the vector of gradients dL/dy. So the > >> > code is simply: > >> > > >> > W -= outer(x, dL_by_dy) > >> > >> if you sum/subtract over all the values, isn't this the same as > >> np.dot(x, dL_by_dy) > >> > > > > What? Matrix subtraction is element-wise: > > > > In [1]: x = np.array([2,3,4]) > > > > In [2]: dL_by_dy = np.array([7,9]) > > > > In [5]: W = np.zeros((3, 2)) > > > > In [6]: W -= np.outer(x, dL_by_dy) > > > > In [7]: W > > Out[7]: > > array([[-14., -18.], > > [-21., -27.], > > [-28., -36.]]) > > > Ok, different use case > > mine are more like variations on the following > > >>> a1 = np.arange(18).reshape(6,3) > >>> a2 = np.arange(12).reshape(6, 2) > >>> index = [1, 2, 5] > > > text book version > >>> np.sum([np.outer(a1[i], a2[i]) for i in index], 0) > array([[180, 204], > [196, 223], > [212, 242]]) > > simpler > >>> np.dot(a1[index].T, a2[index]) > array([[180, 204], > [196, 223], > [212, 242]]) > > > > > >> > > >> > Sometimes, I have some x_indices and y_indices. Now I want to do: > >> > > >> > W[x_indices, y_indices] -= outer(x[x_indices], dL_by_dy[y_indices]) > >> > > >> > Unfortunately, if x_indices or y_indices are "int" or slice in some way > >> > that > >> > removes a dimension, the left side will have fewer dimensions than the > >> > right. np.multipy.outer does the right thing without the ugly cases: > >> > > >> > if isinstance(x_indices, int): ? # ugly hacks follow. > >> > >> My usual hacks are either to use np.atleast_1d or np.atleast_1d or > >> np.squeeze if there is shape mismatch in some cases. > > > > > > Yes, but in this case, the left side is the problem, which has too few > > dimensions. So atleast_1d doesn't work. I was conditionally squeezing, but > > that is extremely ugly. Especially if you're conditionally squeezing based > > on both x_indices and y_indices. > > I don't remember if I ever used something like this > > >>> a1[0, 1] > 1 > >>> a1[np.atleast_1d(0), np.atleast_1d(1)] > array([1]) > > >>> a1[np.atleast_1d(0), np.atleast_1d(1)] = [[100]] > > >>> a1[0, 1] = [[100]] > Traceback (most recent call last): > File "", line 1, in > a1[0, 1] = [[100]] > ValueError: setting an array element with a sequence. > Hehe, yeah, that difference. But if you really want that, you can usually do a1[0, 1, ...] if you don't mind the ugliness. > Josef > > > > > >> > >> > >> > > >> >> I guess we will or would have applications for outer along an axis, > >> >> for example if x.shape = (100, 10), then we have > >> >> x[:,None, :] * x[:, :, None] (I guess) > >> >> Something like this shows up reasonably often in econometrics as > >> >> "Outer Product". However in most cases we can avoid constructing this > >> >> matrix and get the final results in a more memory efficient or faster > >> >> way. > >> >> (example an array of covariance matrices) > >> > > >> > > >> > Not sure I see this. outer(a, b) should return something that has > >> > shape: > >> > (a.shape + b.shape). If you're doing it "along an axis", you mean > >> > you're > >> > reshuffling the resulting shape vector? > >> > >> No I'm not reshaping the full tensor product. > >> > >> It's a vectorized version of looping over independent outer products > >> > >> np.array([outer(xi, yi) for xi,yi in zip(x, y)]) > >> (which I would never use with outer) > >> > >> but I have code that works similar for a reduce (or reduce_at) loop over > >> this. > >> > >> Josef > >> > >> > >> >> > >> >> > >> >> Josef > >> >> > >> >> > >> >> > >> >> > >> >> > > >> >> > - Sebastian > >> >> > > >> >> > > >> >> >> Best, > >> >> >> > >> >> >> Matthew > >> >> >> _______________________________________________ > >> >> >> NumPy-Discussion mailing list > >> >> >> NumPy-Discussion at scipy.org > >> >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> >> >> > >> >> > > >> >> > > >> >> > _______________________________________________ > >> >> > NumPy-Discussion mailing list > >> >> > NumPy-Discussion at scipy.org > >> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> >> > > >> >> _______________________________________________ > >> >> NumPy-Discussion mailing list > >> >> NumPy-Discussion at scipy.org > >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > >> > > >> > > >> > _______________________________________________ > >> > NumPy-Discussion mailing list > >> > NumPy-Discussion at scipy.org > >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From josef.pktd at gmail.com Fri Apr 17 15:18:16 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 17 Apr 2015 15:18:16 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: <1429296986.12410.0.camel@sipsolutions.net> References: <1429086723.5810.5.camel@sipsolutions.net> <1429279669.3440.2.camel@sipsolutions.net> <1429296986.12410.0.camel@sipsolutions.net> Message-ID: On Fri, Apr 17, 2015 at 2:56 PM, Sebastian Berg wrote: > On Fr, 2015-04-17 at 12:40 -0400, josef.pktd at gmail.com wrote: >> On Fri, Apr 17, 2015 at 12:16 PM, Neil Girdhar wrote: >> > >> > >> > On Fri, Apr 17, 2015 at 12:09 PM, wrote: >> >> >> >> On Fri, Apr 17, 2015 at 11:22 AM, Neil Girdhar >> >> wrote: >> >> > >> >> > >> >> > On Fri, Apr 17, 2015 at 10:47 AM, wrote: >> >> >> >> >> >> On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg >> >> >> wrote: >> >> >> > On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote: >> >> >> >> Hi, >> >> >> >> >> >> >> > >> >> >> >> >> >> >> >> So, how about a slight modification of your proposal? >> >> >> >> >> >> >> >> 1) Raise deprecation warning for np.outer for non 1D arrays for a >> >> >> >> few >> >> >> >> versions, with depraction in favor of np.multiply.outer, then >> >> >> >> 2) Raise error for np.outer on non 1D arrays >> >> >> >> >> >> >> > >> >> >> > I think that was Neil's proposal a bit earlier, too. +1 for it in any >> >> >> > case, since at least for the moment I doubt outer is used a lot for >> >> >> > non >> >> >> > 1-d arrays. Possible step 3) make it work on higher dims after a long >> >> >> > period. >> >> >> >> >> >> sounds ok to me >> >> >> >> >> >> Some random comments of what I remember or guess in terms of usage >> >> >> >> >> >> I think there are at most very few np.outer usages with 2d or higher >> >> >> dimension. >> >> >> (statsmodels has two models that switch between 2d and 1d >> >> >> parameterization where we don't use outer but it has similar >> >> >> characteristics. However, we need to control the ravel order, which >> >> >> IIRC is Fortran) >> >> >> >> >> >> The current behavior of 0-D scalars in the initial post might be >> >> >> useful if a numpy function returns a scalar instead of a 1-D array in >> >> >> size=1. np.diag which is a common case, doesn't return a scalar (in my >> >> >> version of numpy). >> >> >> >> >> >> I don't know any use case where I would ever want to have the 2d >> >> >> behavior of np.multiply.outer. >> >> > >> >> >> >> I only understand part of your example, but it looks similar to what >> >> we are doing in statsmodels. >> >> >> >> > >> >> > My use case is pretty simple. Given an input vector x, and a weight >> >> > matrix >> >> > W, and a model y=Wx, I calculate the gradient of the loss L with respect >> >> > W. >> >> > It is the outer product of x with the vector of gradients dL/dy. So the >> >> > code is simply: >> >> > >> >> > W -= outer(x, dL_by_dy) >> >> >> >> if you sum/subtract over all the values, isn't this the same as >> >> np.dot(x, dL_by_dy) >> >> >> > >> > What? Matrix subtraction is element-wise: >> > >> > In [1]: x = np.array([2,3,4]) >> > >> > In [2]: dL_by_dy = np.array([7,9]) >> > >> > In [5]: W = np.zeros((3, 2)) >> > >> > In [6]: W -= np.outer(x, dL_by_dy) >> > >> > In [7]: W >> > Out[7]: >> > array([[-14., -18.], >> > [-21., -27.], >> > [-28., -36.]]) >> >> >> Ok, different use case >> >> mine are more like variations on the following >> >> >>> a1 = np.arange(18).reshape(6,3) >> >>> a2 = np.arange(12).reshape(6, 2) >> >>> index = [1, 2, 5] >> >> >> text book version >> >>> np.sum([np.outer(a1[i], a2[i]) for i in index], 0) >> array([[180, 204], >> [196, 223], >> [212, 242]]) >> >> simpler >> >>> np.dot(a1[index].T, a2[index]) >> array([[180, 204], >> [196, 223], >> [212, 242]]) >> >> >> > >> >> > >> >> > Sometimes, I have some x_indices and y_indices. Now I want to do: >> >> > >> >> > W[x_indices, y_indices] -= outer(x[x_indices], dL_by_dy[y_indices]) >> >> > >> >> > Unfortunately, if x_indices or y_indices are "int" or slice in some way >> >> > that >> >> > removes a dimension, the left side will have fewer dimensions than the >> >> > right. np.multipy.outer does the right thing without the ugly cases: >> >> > >> >> > if isinstance(x_indices, int): ? # ugly hacks follow. >> >> >> >> My usual hacks are either to use np.atleast_1d or np.atleast_1d or >> >> np.squeeze if there is shape mismatch in some cases. >> > >> > >> > Yes, but in this case, the left side is the problem, which has too few >> > dimensions. So atleast_1d doesn't work. I was conditionally squeezing, but >> > that is extremely ugly. Especially if you're conditionally squeezing based >> > on both x_indices and y_indices. >> >> I don't remember if I ever used something like this >> >> >>> a1[0, 1] >> 1 >> >>> a1[np.atleast_1d(0), np.atleast_1d(1)] >> array([1]) >> >> >>> a1[np.atleast_1d(0), np.atleast_1d(1)] = [[100]] >> >> >>> a1[0, 1] = [[100]] >> Traceback (most recent call last): >> File "", line 1, in >> a1[0, 1] = [[100]] >> ValueError: setting an array element with a sequence. >> > > Hehe, yeah, that difference. But if you really want that, you can > usually do a1[0, 1, ...] if you don't mind the ugliness. I'm not sure what you mean, although it sounds like a nice trick. This doesn't work for me >>> a1[0, 1, ...] = [[100]] Traceback (most recent call last): File "", line 1, in a1[0, 1, ...] = [[100]] ValueError: assignment to 0-d array >>> np.__version__ '1.9.2rc1' >>> a1[0, 1, Josef > >> Josef >> >> >> > >> >> >> >> >> >> > >> >> >> I guess we will or would have applications for outer along an axis, >> >> >> for example if x.shape = (100, 10), then we have >> >> >> x[:,None, :] * x[:, :, None] (I guess) >> >> >> Something like this shows up reasonably often in econometrics as >> >> >> "Outer Product". However in most cases we can avoid constructing this >> >> >> matrix and get the final results in a more memory efficient or faster >> >> >> way. >> >> >> (example an array of covariance matrices) >> >> > >> >> > >> >> > Not sure I see this. outer(a, b) should return something that has >> >> > shape: >> >> > (a.shape + b.shape). If you're doing it "along an axis", you mean >> >> > you're >> >> > reshuffling the resulting shape vector? >> >> >> >> No I'm not reshaping the full tensor product. >> >> >> >> It's a vectorized version of looping over independent outer products >> >> >> >> np.array([outer(xi, yi) for xi,yi in zip(x, y)]) >> >> (which I would never use with outer) >> >> >> >> but I have code that works similar for a reduce (or reduce_at) loop over >> >> this. >> >> >> >> Josef >> >> >> >> >> >> >> >> >> >> >> >> >> Josef >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > >> >> >> > - Sebastian >> >> >> > >> >> >> > >> >> >> >> Best, >> >> >> >> >> >> >> >> Matthew >> >> >> >> _______________________________________________ >> >> >> >> NumPy-Discussion mailing list >> >> >> >> NumPy-Discussion at scipy.org >> >> >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> >> >> >> > >> >> >> > >> >> >> > _______________________________________________ >> >> >> > NumPy-Discussion mailing list >> >> >> > NumPy-Discussion at scipy.org >> >> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> > >> >> >> _______________________________________________ >> >> >> NumPy-Discussion mailing list >> >> >> NumPy-Discussion at scipy.org >> >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > >> >> > >> >> > >> >> > _______________________________________________ >> >> > NumPy-Discussion mailing list >> >> > NumPy-Discussion at scipy.org >> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sebastian at sipsolutions.net Fri Apr 17 15:54:31 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 17 Apr 2015 21:54:31 +0200 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: <1429296986.12410.0.camel@sipsolutions.net> References: <1429086723.5810.5.camel@sipsolutions.net> <1429279669.3440.2.camel@sipsolutions.net> <1429296986.12410.0.camel@sipsolutions.net> Message-ID: <1429300471.14976.2.camel@sipsolutions.net> On Fr, 2015-04-17 at 20:56 +0200, Sebastian Berg wrote: > On Fr, 2015-04-17 at 12:40 -0400, josef.pktd at gmail.com wrote: > > On Fri, Apr 17, 2015 at 12:16 PM, Neil Girdhar wrote: > > > > > > > > > On Fri, Apr 17, 2015 at 12:09 PM, wrote: > > >> > > >> On Fri, Apr 17, 2015 at 11:22 AM, Neil Girdhar > > >> wrote: > > >> > > > >> > > > >> > On Fri, Apr 17, 2015 at 10:47 AM, wrote: > > >> >> > > >> >> On Fri, Apr 17, 2015 at 10:07 AM, Sebastian Berg > > >> >> wrote: > > >> >> > On Do, 2015-04-16 at 15:28 -0700, Matthew Brett wrote: > > >> >> >> Hi, > > >> >> >> > > >> >> > > > >> >> >> > > >> >> >> So, how about a slight modification of your proposal? > > >> >> >> > > >> >> >> 1) Raise deprecation warning for np.outer for non 1D arrays for a > > >> >> >> few > > >> >> >> versions, with depraction in favor of np.multiply.outer, then > > >> >> >> 2) Raise error for np.outer on non 1D arrays > > >> >> >> > > >> >> > > > >> >> > I think that was Neil's proposal a bit earlier, too. +1 for it in any > > >> >> > case, since at least for the moment I doubt outer is used a lot for > > >> >> > non > > >> >> > 1-d arrays. Possible step 3) make it work on higher dims after a long > > >> >> > period. > > >> >> > > >> >> sounds ok to me > > >> >> > > >> >> Some random comments of what I remember or guess in terms of usage > > >> >> > > >> >> I think there are at most very few np.outer usages with 2d or higher > > >> >> dimension. > > >> >> (statsmodels has two models that switch between 2d and 1d > > >> >> parameterization where we don't use outer but it has similar > > >> >> characteristics. However, we need to control the ravel order, which > > >> >> IIRC is Fortran) > > >> >> > > >> >> The current behavior of 0-D scalars in the initial post might be > > >> >> useful if a numpy function returns a scalar instead of a 1-D array in > > >> >> size=1. np.diag which is a common case, doesn't return a scalar (in my > > >> >> version of numpy). > > >> >> > > >> >> I don't know any use case where I would ever want to have the 2d > > >> >> behavior of np.multiply.outer. > > >> > > > >> > > >> I only understand part of your example, but it looks similar to what > > >> we are doing in statsmodels. > > >> > > >> > > > >> > My use case is pretty simple. Given an input vector x, and a weight > > >> > matrix > > >> > W, and a model y=Wx, I calculate the gradient of the loss L with respect > > >> > W. > > >> > It is the outer product of x with the vector of gradients dL/dy. So the > > >> > code is simply: > > >> > > > >> > W -= outer(x, dL_by_dy) > > >> > > >> if you sum/subtract over all the values, isn't this the same as > > >> np.dot(x, dL_by_dy) > > >> > > > > > > What? Matrix subtraction is element-wise: > > > > > > In [1]: x = np.array([2,3,4]) > > > > > > In [2]: dL_by_dy = np.array([7,9]) > > > > > > In [5]: W = np.zeros((3, 2)) > > > > > > In [6]: W -= np.outer(x, dL_by_dy) > > > > > > In [7]: W > > > Out[7]: > > > array([[-14., -18.], > > > [-21., -27.], > > > [-28., -36.]]) > > > > > > Ok, different use case > > > > mine are more like variations on the following > > > > >>> a1 = np.arange(18).reshape(6,3) > > >>> a2 = np.arange(12).reshape(6, 2) > > >>> index = [1, 2, 5] > > > > > > text book version > > >>> np.sum([np.outer(a1[i], a2[i]) for i in index], 0) > > array([[180, 204], > > [196, 223], > > [212, 242]]) > > > > simpler > > >>> np.dot(a1[index].T, a2[index]) > > array([[180, 204], > > [196, 223], > > [212, 242]]) > > > > > > > > > >> > > > >> > Sometimes, I have some x_indices and y_indices. Now I want to do: > > >> > > > >> > W[x_indices, y_indices] -= outer(x[x_indices], dL_by_dy[y_indices]) > > >> > > > >> > Unfortunately, if x_indices or y_indices are "int" or slice in some way > > >> > that > > >> > removes a dimension, the left side will have fewer dimensions than the > > >> > right. np.multipy.outer does the right thing without the ugly cases: > > >> > > > >> > if isinstance(x_indices, int): ? # ugly hacks follow. > > >> > > >> My usual hacks are either to use np.atleast_1d or np.atleast_1d or > > >> np.squeeze if there is shape mismatch in some cases. > > > > > > > > > Yes, but in this case, the left side is the problem, which has too few > > > dimensions. So atleast_1d doesn't work. I was conditionally squeezing, but > > > that is extremely ugly. Especially if you're conditionally squeezing based > > > on both x_indices and y_indices. > > > > I don't remember if I ever used something like this > > > > >>> a1[0, 1] > > 1 > > >>> a1[np.atleast_1d(0), np.atleast_1d(1)] > > array([1]) > > > > >>> a1[np.atleast_1d(0), np.atleast_1d(1)] = [[100]] > > > > >>> a1[0, 1] = [[100]] > > Traceback (most recent call last): > > File "", line 1, in > > a1[0, 1] = [[100]] > > ValueError: setting an array element with a sequence. > > > > Hehe, yeah, that difference. But if you really want that, you can > usually do a1[0, 1, ...] if you don't mind the ugliness. > Though actually I think I would usually prefer the other way around: a1[None, None, 0, 1] = [[100]] or instead a1[0, 1] = np.array([[100]])[0, 0] > > Josef > > > > > > > > > >> > > >> > > >> > > > >> >> I guess we will or would have applications for outer along an axis, > > >> >> for example if x.shape = (100, 10), then we have > > >> >> x[:,None, :] * x[:, :, None] (I guess) > > >> >> Something like this shows up reasonably often in econometrics as > > >> >> "Outer Product". However in most cases we can avoid constructing this > > >> >> matrix and get the final results in a more memory efficient or faster > > >> >> way. > > >> >> (example an array of covariance matrices) > > >> > > > >> > > > >> > Not sure I see this. outer(a, b) should return something that has > > >> > shape: > > >> > (a.shape + b.shape). If you're doing it "along an axis", you mean > > >> > you're > > >> > reshuffling the resulting shape vector? > > >> > > >> No I'm not reshaping the full tensor product. > > >> > > >> It's a vectorized version of looping over independent outer products > > >> > > >> np.array([outer(xi, yi) for xi,yi in zip(x, y)]) > > >> (which I would never use with outer) > > >> > > >> but I have code that works similar for a reduce (or reduce_at) loop over > > >> this. > > >> > > >> Josef > > >> > > >> > > >> >> > > >> >> > > >> >> Josef > > >> >> > > >> >> > > >> >> > > >> >> > > >> >> > > > >> >> > - Sebastian > > >> >> > > > >> >> > > > >> >> >> Best, > > >> >> >> > > >> >> >> Matthew > > >> >> >> _______________________________________________ > > >> >> >> NumPy-Discussion mailing list > > >> >> >> NumPy-Discussion at scipy.org > > >> >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > >> >> >> > > >> >> > > > >> >> > > > >> >> > _______________________________________________ > > >> >> > NumPy-Discussion mailing list > > >> >> > NumPy-Discussion at scipy.org > > >> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > >> >> > > > >> >> _______________________________________________ > > >> >> NumPy-Discussion mailing list > > >> >> NumPy-Discussion at scipy.org > > >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > >> > > > >> > > > >> > > > >> > _______________________________________________ > > >> > NumPy-Discussion mailing list > > >> > NumPy-Discussion at scipy.org > > >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > >> > > > >> _______________________________________________ > > >> NumPy-Discussion mailing list > > >> NumPy-Discussion at scipy.org > > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > _______________________________________________ > > > NumPy-Discussion mailing list > > > NumPy-Discussion at scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From sebastian at sipsolutions.net Fri Apr 17 16:03:04 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 17 Apr 2015 22:03:04 +0200 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: References: <1429086723.5810.5.camel@sipsolutions.net> <1429279669.3440.2.camel@sipsolutions.net> <1429296986.12410.0.camel@sipsolutions.net> Message-ID: <1429300984.14976.7.camel@sipsolutions.net> On Fr, 2015-04-17 at 15:18 -0400, josef.pktd at gmail.com wrote: > On Fri, Apr 17, 2015 at 2:56 PM, Sebastian Berg > > Hehe, yeah, that difference. But if you really want that, you can > > usually do a1[0, 1, ...] if you don't mind the ugliness. > > I'm not sure what you mean, although it sounds like a nice trick. > This doesn't work for me > Oh, mindslip. I thought the problem was that maybe scalar assignment does not remove trailing dimensions. But the actual reason was that you do not have an array on the right hand side. And the assignment code isn't sure if you might want to do object assignment in that case, so it can't do the funny broadcasting of the left hand side (or trailing dimension removing, whichever way around you like to think of it). > >>> a1[0, 1, ...] = [[100]] > Traceback (most recent call last): > File "", line 1, in > a1[0, 1, ...] = [[100]] > ValueError: assignment to 0-d array > > >>> np.__version__ > '1.9.2rc1' > >>> a1[0, 1, > > Josef > > > > > > >> Josef > >> > >> > >> > > >> >> > >> >> > >> >> > > >> >> >> I guess we will or would have applications for outer along an axis, > >> >> >> for example if x.shape = (100, 10), then we have > >> >> >> x[:,None, :] * x[:, :, None] (I guess) > >> >> >> Something like this shows up reasonably often in econometrics as > >> >> >> "Outer Product". However in most cases we can avoid constructing this > >> >> >> matrix and get the final results in a more memory efficient or faster > >> >> >> way. > >> >> >> (example an array of covariance matrices) > >> >> > > >> >> > > >> >> > Not sure I see this. outer(a, b) should return something that has > >> >> > shape: > >> >> > (a.shape + b.shape). If you're doing it "along an axis", you mean > >> >> > you're > >> >> > reshuffling the resulting shape vector? > >> >> > >> >> No I'm not reshaping the full tensor product. > >> >> > >> >> It's a vectorized version of looping over independent outer products > >> >> > >> >> np.array([outer(xi, yi) for xi,yi in zip(x, y)]) > >> >> (which I would never use with outer) > >> >> > >> >> but I have code that works similar for a reduce (or reduce_at) loop over > >> >> this. > >> >> > >> >> Josef > >> >> > >> >> > >> >> >> > >> >> >> > >> >> >> Josef > >> >> >> > >> >> >> > >> >> >> > >> >> >> > >> >> >> > > >> >> >> > - Sebastian > >> >> >> > > >> >> >> > > >> >> >> >> Best, > >> >> >> >> > >> >> >> >> Matthew > >> >> >> >> _______________________________________________ > >> >> >> >> NumPy-Discussion mailing list > >> >> >> >> NumPy-Discussion at scipy.org > >> >> >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> >> >> >> > >> >> >> > > >> >> >> > > >> >> >> > _______________________________________________ > >> >> >> > NumPy-Discussion mailing list > >> >> >> > NumPy-Discussion at scipy.org > >> >> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> >> >> > > >> >> >> _______________________________________________ > >> >> >> NumPy-Discussion mailing list > >> >> >> NumPy-Discussion at scipy.org > >> >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> >> > > >> >> > > >> >> > > >> >> > _______________________________________________ > >> >> > NumPy-Discussion mailing list > >> >> > NumPy-Discussion at scipy.org > >> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> >> > > >> >> _______________________________________________ > >> >> NumPy-Discussion mailing list > >> >> NumPy-Discussion at scipy.org > >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > >> > > >> > > >> > _______________________________________________ > >> > NumPy-Discussion mailing list > >> > NumPy-Discussion at scipy.org > >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From josef.pktd at gmail.com Fri Apr 17 16:30:21 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 17 Apr 2015 16:30:21 -0400 Subject: [Numpy-discussion] Consider improving numpy.outer's behavior with zero-dimensional vectors In-Reply-To: <1429300984.14976.7.camel@sipsolutions.net> References: <1429086723.5810.5.camel@sipsolutions.net> <1429279669.3440.2.camel@sipsolutions.net> <1429296986.12410.0.camel@sipsolutions.net> <1429300984.14976.7.camel@sipsolutions.net> Message-ID: On Fri, Apr 17, 2015 at 4:03 PM, Sebastian Berg wrote: > On Fr, 2015-04-17 at 15:18 -0400, josef.pktd at gmail.com wrote: >> On Fri, Apr 17, 2015 at 2:56 PM, Sebastian Berg > >> > Hehe, yeah, that difference. But if you really want that, you can >> > usually do a1[0, 1, ...] if you don't mind the ugliness. >> >> I'm not sure what you mean, although it sounds like a nice trick. >> This doesn't work for me >> > > Oh, mindslip. I thought the problem was that maybe scalar assignment > does not remove trailing dimensions. But the actual reason was that you > do not have an array on the right hand side. And the assignment code > isn't sure if you might want to do object assignment in that case, so it > can't do the funny broadcasting of the left hand side (or trailing > dimension removing, whichever way around you like to think of it). Now I'm getting confused I had thought that these two are the same >>> a1[0, 1] = np.array([[100]]) >>> a1[0, 1] = [[100]] but trying it out and from your explanation, they are not I thought Neil's initial use case was that a1[0, 1] = np.outer(5, 1) doesn't work, because of >>> np.outer(5, 1).shape (1, 1) But that works for me. In any case, the thread is getting long, and I explained my perception of use cases for np.outer. Josef > >> >>> a1[0, 1, ...] = [[100]] >> Traceback (most recent call last): >> File "", line 1, in >> a1[0, 1, ...] = [[100]] >> ValueError: assignment to 0-d array >> >> >>> np.__version__ >> '1.9.2rc1' >> >>> a1[0, 1, >> >> Josef >> >> >> >> > >> >> Josef >> >> >> >> >> >> > >> >> >> >> >> >> >> >> >> > >> >> >> >> I guess we will or would have applications for outer along an axis, >> >> >> >> for example if x.shape = (100, 10), then we have >> >> >> >> x[:,None, :] * x[:, :, None] (I guess) >> >> >> >> Something like this shows up reasonably often in econometrics as >> >> >> >> "Outer Product". However in most cases we can avoid constructing this >> >> >> >> matrix and get the final results in a more memory efficient or faster >> >> >> >> way. >> >> >> >> (example an array of covariance matrices) >> >> >> > >> >> >> > >> >> >> > Not sure I see this. outer(a, b) should return something that has >> >> >> > shape: >> >> >> > (a.shape + b.shape). If you're doing it "along an axis", you mean >> >> >> > you're >> >> >> > reshuffling the resulting shape vector? >> >> >> >> >> >> No I'm not reshaping the full tensor product. >> >> >> >> >> >> It's a vectorized version of looping over independent outer products >> >> >> >> >> >> np.array([outer(xi, yi) for xi,yi in zip(x, y)]) >> >> >> (which I would never use with outer) >> >> >> >> >> >> but I have code that works similar for a reduce (or reduce_at) loop over >> >> >> this. >> >> >> >> >> >> Josef >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> Josef >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> >> > >> >> >> >> > - Sebastian >> >> >> >> > >> >> >> >> > >> >> >> >> >> Best, >> >> >> >> >> >> >> >> >> >> Matthew >> >> >> >> >> _______________________________________________ >> >> >> >> >> NumPy-Discussion mailing list >> >> >> >> >> NumPy-Discussion at scipy.org >> >> >> >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> >> >> >> >> >> > >> >> >> >> > >> >> >> >> > _______________________________________________ >> >> >> >> > NumPy-Discussion mailing list >> >> >> >> > NumPy-Discussion at scipy.org >> >> >> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> > >> >> >> >> _______________________________________________ >> >> >> >> NumPy-Discussion mailing list >> >> >> >> NumPy-Discussion at scipy.org >> >> >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> > >> >> >> > >> >> >> > >> >> >> > _______________________________________________ >> >> >> > NumPy-Discussion mailing list >> >> >> > NumPy-Discussion at scipy.org >> >> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> > >> >> >> _______________________________________________ >> >> >> NumPy-Discussion mailing list >> >> >> NumPy-Discussion at scipy.org >> >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > >> >> > >> >> > >> >> > _______________________________________________ >> >> > NumPy-Discussion mailing list >> >> > NumPy-Discussion at scipy.org >> >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sturla.molden at gmail.com Sat Apr 18 13:04:40 2015 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 18 Apr 2015 17:04:40 +0000 (UTC) Subject: [Numpy-discussion] Automatic number of bins for numpy histograms References: Message-ID: <133734897451069298.278467sturla.molden-gmail.com@news.gmane.org> Jaime Fern?ndez del R?o wrote: > I think we have an explicit rule against C++, although I may be wrong. Currently there is Python, C and Cython in NumPy. SciPy also has C++ and Fortran code. Sturla From faltet at gmail.com Tue Apr 21 09:04:08 2015 From: faltet at gmail.com (Francesc Alted) Date: Tue, 21 Apr 2015 15:04:08 +0200 Subject: [Numpy-discussion] ANN: PyTables 3.2.0 release candidate 1 is out Message-ID: =========================== Announcing PyTables 3.2.0rc1 =========================== We are happy to announce PyTables 3.2.0rc1. ******************************* IMPORTANT NOTICE: If you are a user of PyTables, it needs your help to keep going. Please read the next thread as it contains important information about the future of the project: https://groups.google.com/forum/#!topic/pytables-users/yY2aUa4H7W4 Thanks! ******************************* What's new ========== This is a major release of PyTables and it is the result of more than a year of accumulated patches, but most specially it fixes a nasty problem with indexed queries not returning the correct results in some scenarios. There are many usablity and performance improvements too. In case you want to know more in detail what has changed in this version, please refer to: http://pytables.github.io/release_notes.html You can download a source package with generated PDF and HTML docs, as well as binaries for Windows, from: http://sourceforge.net/projects/pytables/files/pytables/3.2.0rc1 For an online version of the manual, visit: http://pytables.github.io/usersguide/index.html What it is? =========== PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. PyTables includes OPSI, a new indexing technology, allowing to perform data lookups in tables exceeding 10 gigarows (10**10 rows) in less than a tenth of a second. Resources ========= About PyTables: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments =============== Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Most specially, a lot of kudos go to the HDF5 and NumPy makers. Without them, PyTables simply would not exist. Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. ---- **Enjoy data!** -- The PyTables Developers -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nicolas.Rougier at inria.fr Mon Apr 27 03:07:07 2015 From: Nicolas.Rougier at inria.fr (Nicolas P. Rougier) Date: Mon, 27 Apr 2015 09:07:07 +0200 Subject: [Numpy-discussion] EuroScipy 2015: Submission deadline in 3 days !!! Message-ID: <4940B405-E1F2-4DB2-AB9D-B283581825A7@inria.fr> --------------------------------- Submission deadline in 3 days !!! --------------------------------- EuroScipy 2015, the annual conference on Python in science will take place in Cambridge, UK on 26-30 August 2015. The conference features two days of tutorials followed by two days of scientific talks & posters and an extra day dedicated to developer sprints. It is the major event in Europe in the field of technical/scientific computing within the Python ecosystem. Data scientists, analysts, quants, PhD's, scientists and students from more than 20 countries attended the conference last year. The topics presented at EuroSciPy are very diverse, with a focus on advanced software engineering and original uses of Python and its scientific libraries, either in theoretical or experimental research, from both academia and the industry. Submissions for posters, talks & tutorials (beginner and advanced) are welcome on our website at http://www.euroscipy.org/2015/ Sprint proposals should be addressed directly to the organisation at euroscipy-org at python.org Important dates =============== Mar 24, 2015 Call for talks, posters & tutorials Apr 30, 2015 Talk and tutorials submission deadline May 1, 2015 Registration opens May 30, 2015 Final program announced Jun 15, 2015 Early-bird registration ends Aug 26-27, 2015 Tutorials Aug 28-29, 2015 Main conference Aug 30, 2015 Sprints We look forward to an exciting conference and hope to see you in Cambridge The EuroSciPy 2015 Team - http://www.euroscipy.org/2015/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Apr 27 08:04:28 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 27 Apr 2015 14:04:28 +0200 Subject: [Numpy-discussion] numpy vendor repo In-Reply-To: References: Message-ID: On Sat, Apr 4, 2015 at 6:55 PM, Charles R Harris wrote: > > > On Sat, Apr 4, 2015 at 9:52 AM, Ralf Gommers > wrote: > >> Hi, >> >> Today I wanted to add something to https://github.com/numpy/vendor and >> realised that this repo is in pretty bad shape. A couple of years ago >> Ondrej took a copy of the ATLAS binaries in that repo and started a new >> repo (not a fork) at https://github.com/certik/numpy-vendor. The latest >> improvements were made by Julian and live at >> https://github.com/juliantaylor/numpy-vendor. >> >> I'd like to start from numpy/vendor, then add all commits from Julian's >> numpy-vendor on top of it, then move things around so we have the >> binaries/sources/tools layout back and finally update the README so it's >> clear how to build both the ATLAS binaries and Numpy releases. >> >> Any objections or better ideas? >> >> > No objections from me, getting all the good stuff together in an easily > found place is a plus. > Done in the master branch of https://github.com/rgommers/vendor. I think that "numpy-vendor" is a better repo name than "vendor" (which is pretty much meaningless outside of the numpy github org), so I propose to push my master branch to https://github.com/numpy/numpy-vendor and remove the current https://github.com/numpy/vendor repo. I'll do this in a couple of days, unless there are objections by then. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Mon Apr 27 08:17:05 2015 From: cournape at gmail.com (David Cournapeau) Date: Mon, 27 Apr 2015 13:17:05 +0100 Subject: [Numpy-discussion] numpy vendor repo In-Reply-To: References: Message-ID: On Mon, Apr 27, 2015 at 1:04 PM, Ralf Gommers wrote: > > > On Sat, Apr 4, 2015 at 6:55 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Sat, Apr 4, 2015 at 9:52 AM, Ralf Gommers >> wrote: >> >>> Hi, >>> >>> Today I wanted to add something to https://github.com/numpy/vendor and >>> realised that this repo is in pretty bad shape. A couple of years ago >>> Ondrej took a copy of the ATLAS binaries in that repo and started a new >>> repo (not a fork) at https://github.com/certik/numpy-vendor. The latest >>> improvements were made by Julian and live at >>> https://github.com/juliantaylor/numpy-vendor. >>> >>> I'd like to start from numpy/vendor, then add all commits from Julian's >>> numpy-vendor on top of it, then move things around so we have the >>> binaries/sources/tools layout back and finally update the README so it's >>> clear how to build both the ATLAS binaries and Numpy releases. >>> >>> Any objections or better ideas? >>> >>> >> No objections from me, getting all the good stuff together in an easily >> found place is a plus. >> > > Done in the master branch of https://github.com/rgommers/vendor. I think > that "numpy-vendor" is a better repo name than "vendor" (which is pretty > much meaningless outside of the numpy github org), so I propose to push my > master branch to https://github.com/numpy/numpy-vendor and remove the > current https://github.com/numpy/vendor repo. > > I'll do this in a couple of days, unless there are objections by then. > As the original creator of vendor, I am +1 on this as well ! David > > Ralf > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.j.a.cock at googlemail.com Mon Apr 27 11:04:36 2015 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Mon, 27 Apr 2015 16:04:36 +0100 Subject: [Numpy-discussion] numpy vendor repo In-Reply-To: References: Message-ID: On Mon, Apr 27, 2015 at 1:04 PM, Ralf Gommers wrote: > > Done in the master branch of https://github.com/rgommers/vendor. I think > that "numpy-vendor" is a better repo name than "vendor" (which is pretty > much meaningless outside of the numpy github org), so I propose to push my > master branch to https://github.com/numpy/numpy-vendor and remove the > current https://github.com/numpy/vendor repo. > > I'll do this in a couple of days, unless there are objections by then. > > Ralf Can you not just rename the repository on GitHub? Peter From ralf.gommers at gmail.com Mon Apr 27 11:20:48 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 27 Apr 2015 17:20:48 +0200 Subject: [Numpy-discussion] numpy vendor repo In-Reply-To: References: Message-ID: On Mon, Apr 27, 2015 at 5:04 PM, Peter Cock wrote: > On Mon, Apr 27, 2015 at 1:04 PM, Ralf Gommers > wrote: > > > > Done in the master branch of https://github.com/rgommers/vendor. I think > > that "numpy-vendor" is a better repo name than "vendor" (which is pretty > > much meaningless outside of the numpy github org), so I propose to push > my > > master branch to https://github.com/numpy/numpy-vendor and remove the > > current https://github.com/numpy/vendor repo. > > > > I'll do this in a couple of days, unless there are objections by then. > > > > Ralf > > Can you not just rename the repository on GitHub? > Yes, that is possible. The difference is small in this case (retaining the 1 closed PR fixing a typo; there are no issues), but after looking it up I think renaming is a bit less work than creating a new repo. So I'll rename. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at gmail.com Mon Apr 27 11:50:14 2015 From: faltet at gmail.com (Francesc Alted) Date: Mon, 27 Apr 2015 17:50:14 +0200 Subject: [Numpy-discussion] ANN: numexpr 2.4.3 released Message-ID: Announcing Numexpr 2.4.3 ========================= Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. What's new ========== This is a maintenance release to cope with an old bug affecting comparisons with empty strings. Fixes #121 and PyTables #184. In case you want to know more in detail what has changed in this version, see: https://github.com/pydata/numexpr/wiki/Release-Notes or have a look at RELEASE_NOTES.txt in the tarball. Where I can find Numexpr? ========================= The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Mon Apr 27 16:32:36 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Mon, 27 Apr 2015 22:32:36 +0200 Subject: [Numpy-discussion] GSoC'15 accepted students for Scipy/Numpy Message-ID: Hi all, Google has just announced which students got accepted for this year's GSoC. For Scipy these are: - Nikolay Mayorov, "Improve nonlinear least squares minimization functionality in SciPy" mentors: Chuck & Evgeni - Abraham Escalante, "SciPy: scipy.stats improvements" mentor: Ralf (Evgeni is backup mentor) Furthermore, this proposal was accepted for Scikit-image: - Aman Singh, "Scikit-Image: rewriting scipy.ndimage to cython" mentors: Jaime, Ralf & the scikit-image devs Congratulations to all of you! We had a lot of interest this year, which is great to see. GSoC applications are competitive, and unfortunately there are students who didn't make it. To those students I would say: please stay involved, and you're very welcome to apply again next year! Today is also the start of the "community bonding period", where the students aren't yet expected to start working on their project but do get time to further figure out how things work, interact with the community and ensure that they can hit the ground running on day 1 of the coding period: http://googlesummerofcode.blogspot.nl/2007/04/so-what-is-this-community-bonding-all.html. It looks like it'll be an interesting and productive summer! Cheers, Ralf P.S. all proposals are linked on https://github.com/scipy/scipy/wiki/GSoC-project-ideas#student-applications-for-2015-to-scipy-and-numpy for who's interested in the details. P.P.S. some students have asked to get some feedback about why they were/weren't accepted, in order to learn from it for a next time. Until today we weren't allowed to say much, but now that Google has announced the results I'd be happy to give some feedback - please contact me in private if you want. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Mon Apr 27 16:44:19 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Mon, 27 Apr 2015 16:44:19 -0400 Subject: [Numpy-discussion] ANN: numexpr 2.4.3 released In-Reply-To: References: Message-ID: I've always wondered why numexpr accepts strings rather than looking a function's source code, using ast to parse it, and then transforming the AST. I just looked at another project, pyautodiff, which does that. And I think numba does that for llvm code generation. Wouldn't it be nicer to just apply a decorator to a function than to write the function as a Python string? On Mon, Apr 27, 2015 at 11:50 AM, Francesc Alted wrote: > Announcing Numexpr 2.4.3 > ========================= > > Numexpr is a fast numerical expression evaluator for NumPy. With it, > expressions that operate on arrays (like "3*a+4*b") are accelerated > and use less memory than doing the same calculation in Python. > > It wears multi-threaded capabilities, as well as support for Intel's > MKL (Math Kernel Library), which allows an extremely fast evaluation > of transcendental functions (sin, cos, tan, exp, log...) while > squeezing the last drop of performance out of your multi-core > processors. Look here for a some benchmarks of numexpr using MKL: > > https://github.com/pydata/numexpr/wiki/NumexprMKL > > Its only dependency is NumPy (MKL is optional), so it works well as an > easy-to-deploy, easy-to-use, computational engine for projects that > don't want to adopt other solutions requiring more heavy dependencies. > > What's new > ========== > > This is a maintenance release to cope with an old bug affecting > comparisons with empty strings. Fixes #121 and PyTables #184. > > In case you want to know more in detail what has changed in this > version, see: > > https://github.com/pydata/numexpr/wiki/Release-Notes > > or have a look at RELEASE_NOTES.txt in the tarball. > > Where I can find Numexpr? > ========================= > > The project is hosted at GitHub in: > > https://github.com/pydata/numexpr > > You can get the packages from PyPI as well (but not for RC releases): > > http://pypi.python.org/pypi/numexpr > > Share your experience > ===================== > > Let us know of any bugs, suggestions, gripes, kudos, etc. you may > have. > > > Enjoy data! > > -- > Francesc Alted > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Apr 27 19:14:50 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 27 Apr 2015 16:14:50 -0700 Subject: [Numpy-discussion] ANN: numexpr 2.4.3 released In-Reply-To: References: Message-ID: On Apr 27, 2015 1:44 PM, "Neil Girdhar" wrote: > > I've always wondered why numexpr accepts strings rather than looking a function's source code, using ast to parse it, and then transforming the AST. I just looked at another project, pyautodiff, which does that. And I think numba does that for llvm code generation. Wouldn't it be nicer to just apply a decorator to a function than to write the function as a Python string? Numba works from byte code, not the ast. There's no way to access the ast reliably at runtime in python -- it gets thrown away during compilation. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Mon Apr 27 19:23:57 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Mon, 27 Apr 2015 19:23:57 -0400 Subject: [Numpy-discussion] ANN: numexpr 2.4.3 released In-Reply-To: References: Message-ID: I was told that numba did similar ast parsing, but maybe that's not true. Regarding the ast, I don't know about reliability, but take a look at get_ast in pyautodiff: https://github.com/LowinData/pyautodiff/blob/7973e26f1c233570ed4bb10d08634ec7378e2152/autodiff/context.py It looks up the __file__ attribute and passes that through compile to get the ast. Of course that won't work when you don't have source code (a .pyc only module, or when else?) Since I'm looking into this kind of solution for the future of my code, I'm curious if you think that's too unreliable for some reason? From a usability standpoint, I do think that's better than feeding in strings, which: * are not syntax highlighted, and * require porting code from regular numpy expressions to numexpr strings (applying a decorator is so much easier). Best, Neil On Mon, Apr 27, 2015 at 7:14 PM, Nathaniel Smith wrote: > On Apr 27, 2015 1:44 PM, "Neil Girdhar" wrote: > > > > I've always wondered why numexpr accepts strings rather than looking a > function's source code, using ast to parse it, and then transforming the > AST. I just looked at another project, pyautodiff, which does that. And I > think numba does that for llvm code generation. Wouldn't it be nicer to > just apply a decorator to a function than to write the function as a Python > string? > > Numba works from byte code, not the ast. There's no way to access the ast > reliably at runtime in python -- it gets thrown away during compilation. > > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Mon Apr 27 19:35:51 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Mon, 27 Apr 2015 19:35:51 -0400 Subject: [Numpy-discussion] ANN: numexpr 2.4.3 released In-Reply-To: References: Message-ID: Also, FYI: http://numba.pydata.org/numba-doc/0.6/doc/modules/transforms.html It appears that numba does get the ast similar to pyautodiff and only get the ast from source code as a fallback? On Mon, Apr 27, 2015 at 7:23 PM, Neil Girdhar wrote: > I was told that numba did similar ast parsing, but maybe that's not true. > Regarding the ast, I don't know about reliability, but take a look at > get_ast in pyautodiff: > https://github.com/LowinData/pyautodiff/blob/7973e26f1c233570ed4bb10d08634ec7378e2152/autodiff/context.py > It looks up the __file__ attribute and passes that through compile to get > the ast. Of course that won't work when you don't have source code (a .pyc > only module, or when else?) > > Since I'm looking into this kind of solution for the future of my code, > I'm curious if you think that's too unreliable for some reason? From a > usability standpoint, I do think that's better than feeding in strings, > which: > * are not syntax highlighted, and > * require porting code from regular numpy expressions to numexpr strings > (applying a decorator is so much easier). > > Best, > > Neil > > On Mon, Apr 27, 2015 at 7:14 PM, Nathaniel Smith wrote: > >> On Apr 27, 2015 1:44 PM, "Neil Girdhar" wrote: >> > >> > I've always wondered why numexpr accepts strings rather than looking a >> function's source code, using ast to parse it, and then transforming the >> AST. I just looked at another project, pyautodiff, which does that. And I >> think numba does that for llvm code generation. Wouldn't it be nicer to >> just apply a decorator to a function than to write the function as a Python >> string? >> >> Numba works from byte code, not the ast. There's no way to access the ast >> reliably at runtime in python -- it gets thrown away during compilation. >> >> -n >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Apr 27 19:42:26 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 27 Apr 2015 16:42:26 -0700 Subject: [Numpy-discussion] ANN: numexpr 2.4.3 released In-Reply-To: References: Message-ID: On Mon, Apr 27, 2015 at 4:23 PM, Neil Girdhar wrote: > I was told that numba did similar ast parsing, but maybe that's not true. > Regarding the ast, I don't know about reliability, but take a look at > get_ast in pyautodiff: > https://github.com/LowinData/pyautodiff/blob/7973e26f1c233570ed4bb10d08634ec7378e2152/autodiff/context.py > It looks up the __file__ attribute and passes that through compile to get > the ast. Of course that won't work when you don't have source code (a .pyc > only module, or when else?) > > Since I'm looking into this kind of solution for the future of my code, I'm > curious if you think that's too unreliable for some reason? I'd certainly hesitate to rely on it for anything I cared about or would be used by a lot of people... it's just intrinsically pretty hacky. No guarantee that the source code you find via __file__ will match what was used to compile the function, doesn't work when working interactively or from the ipython notebook, etc. Or else you have to trust a decompiler, which is a pretty serious complex chunk of code just to avoid typing quote marks. > From a > usability standpoint, I do think that's better than feeding in strings, > which: > * are not syntax highlighted, and > * require porting code from regular numpy expressions to numexpr strings > (applying a decorator is so much easier). Yes, but then you have to write a program that knows how to port code from numpy expressions to numexpr strings :-). numexpr only knows a tiny restricted subset of Python... The general approach I'd take to solve these kinds of problems would be similar to that used by Theano or dask -- use regular python source code that generates an expression graph in memory. E.g. this could look like def do_stuff(arr1, arr2): arr1 = deferred(arr1) arr2 = deferred(arr2) arr3 = np.sum(arr1 + (arr2 ** 2)) return force(arr3 / np.sum(arr3)) -n -- Nathaniel J. Smith -- http://vorpus.org From mistersheik at gmail.com Mon Apr 27 20:29:53 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Mon, 27 Apr 2015 20:29:53 -0400 Subject: [Numpy-discussion] ANN: numexpr 2.4.3 released In-Reply-To: References: Message-ID: On Mon, Apr 27, 2015 at 7:42 PM, Nathaniel Smith wrote: > On Mon, Apr 27, 2015 at 4:23 PM, Neil Girdhar > wrote: > > I was told that numba did similar ast parsing, but maybe that's not true. > > Regarding the ast, I don't know about reliability, but take a look at > > get_ast in pyautodiff: > > > https://github.com/LowinData/pyautodiff/blob/7973e26f1c233570ed4bb10d08634ec7378e2152/autodiff/context.py > > It looks up the __file__ attribute and passes that through compile to get > > the ast. Of course that won't work when you don't have source code (a > .pyc > > only module, or when else?) > > > > Since I'm looking into this kind of solution for the future of my code, > I'm > > curious if you think that's too unreliable for some reason? > > I'd certainly hesitate to rely on it for anything I cared about or > would be used by a lot of people... it's just intrinsically pretty > hacky. No guarantee that the source code you find via __file__ will > match what was used to compile the function, doesn't work when working > interactively or from the ipython notebook, etc. Or else you have to > trust a decompiler, which is a pretty serious complex chunk of code > just to avoid typing quote marks. > Those are all good points. However, it's more than just typing quote marks. The code might have non-numpy things mixed in. It might have context managers and function calls and so on. More comments below. > > > From a > > usability standpoint, I do think that's better than feeding in strings, > > which: > > * are not syntax highlighted, and > > * require porting code from regular numpy expressions to numexpr strings > > (applying a decorator is so much easier). > > Yes, but then you have to write a program that knows how to port code > from numpy expressions to numexpr strings :-). numexpr only knows a > tiny restricted subset of Python... > > The general approach I'd take to solve these kinds of problems would > be similar to that used by Theano or dask -- use regular python source > code that generates an expression graph in memory. E.g. this could > look like > > def do_stuff(arr1, arr2): > arr1 = deferred(arr1) > arr2 = deferred(arr2) > arr3 = np.sum(arr1 + (arr2 ** 2)) > return force(arr3 / np.sum(arr3)) > > -n > > Right, there are three basic approaches: string processing, AST processing, and compile-time expression graphs. The big advantage to AST processing over the other two is that you can write and test your code as regular numpy code along with regular tests. Then, with the application of a decorator, you get the speedup you're looking for. The problem with porting the numpy code to numexpr strings or Theano-like expression-graphs is that porting can introduce bugs, and even if you're careful, every time you make a change to the numpy version of the code, you have port it again. Also, I personally want to do more than just AST transformations of the numpy code. For example, I have some methods that call super. The super calls can be collapsed since the mro is known at compile time. Best, Neil > -- > Nathaniel J. Smith -- http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndarray at mac.com Mon Apr 27 21:07:43 2015 From: ndarray at mac.com (Alexander Belopolsky) Date: Mon, 27 Apr 2015 21:07:43 -0400 Subject: [Numpy-discussion] ANN: numexpr 2.4.3 released In-Reply-To: References: Message-ID: On Mon, Apr 27, 2015 at 7:14 PM, Nathaniel Smith wrote: > There's no way to access the ast reliably at runtime in python -- it gets > thrown away during compilation. The "meta" package supports bytecode to ast translation. See < http://meta.readthedocs.org/en/latest/api/decompile.html>. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Mon Apr 27 21:19:16 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Mon, 27 Apr 2015 21:19:16 -0400 Subject: [Numpy-discussion] ANN: numexpr 2.4.3 released In-Reply-To: References: Message-ID: Wow, cool! Are there any users of this package? On Mon, Apr 27, 2015 at 9:07 PM, Alexander Belopolsky wrote: > > On Mon, Apr 27, 2015 at 7:14 PM, Nathaniel Smith wrote: > >> There's no way to access the ast reliably at runtime in python -- it gets >> thrown away during compilation. > > > The "meta" package supports bytecode to ast translation. See < > http://meta.readthedocs.org/en/latest/api/decompile.html>. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Apr 27 22:47:37 2015 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 27 Apr 2015 19:47:37 -0700 Subject: [Numpy-discussion] ANN: numexpr 2.4.3 released In-Reply-To: References: Message-ID: On Apr 27, 2015 5:30 PM, "Neil Girdhar" wrote: > > > > On Mon, Apr 27, 2015 at 7:42 PM, Nathaniel Smith wrote: >> >> On Mon, Apr 27, 2015 at 4:23 PM, Neil Girdhar wrote: >> > I was told that numba did similar ast parsing, but maybe that's not true. >> > Regarding the ast, I don't know about reliability, but take a look at >> > get_ast in pyautodiff: >> > https://github.com/LowinData/pyautodiff/blob/7973e26f1c233570ed4bb10d08634ec7378e2152/autodiff/context.py >> > It looks up the __file__ attribute and passes that through compile to get >> > the ast. Of course that won't work when you don't have source code (a .pyc >> > only module, or when else?) >> > >> > Since I'm looking into this kind of solution for the future of my code, I'm >> > curious if you think that's too unreliable for some reason? >> >> I'd certainly hesitate to rely on it for anything I cared about or >> would be used by a lot of people... it's just intrinsically pretty >> hacky. No guarantee that the source code you find via __file__ will >> match what was used to compile the function, doesn't work when working >> interactively or from the ipython notebook, etc. Or else you have to >> trust a decompiler, which is a pretty serious complex chunk of code >> just to avoid typing quote marks. > > > Those are all good points. However, it's more than just typing quote marks. The code might have non-numpy things mixed in. It might have context managers and function calls and so on. More comments below. > >> >> >> > From a >> > usability standpoint, I do think that's better than feeding in strings, >> > which: >> > * are not syntax highlighted, and >> > * require porting code from regular numpy expressions to numexpr strings >> > (applying a decorator is so much easier). >> >> Yes, but then you have to write a program that knows how to port code >> from numpy expressions to numexpr strings :-). numexpr only knows a >> tiny restricted subset of Python... >> >> The general approach I'd take to solve these kinds of problems would >> be similar to that used by Theano or dask -- use regular python source >> code that generates an expression graph in memory. E.g. this could >> look like >> >> def do_stuff(arr1, arr2): >> arr1 = deferred(arr1) >> arr2 = deferred(arr2) >> arr3 = np.sum(arr1 + (arr2 ** 2)) >> return force(arr3 / np.sum(arr3)) >> >> -n >> > > Right, there are three basic approaches: string processing, AST processing, and compile-time expression graphs. > > The big advantage to AST processing over the other two is that you can write and test your code as regular numpy code along with regular tests. Then, with the application of a decorator, you get the speedup you're looking for. The problem with porting the numpy code to numexpr strings or Theano-like expression-graphs is that porting can introduce bugs, and even if you're careful, every time you make a change to the numpy version of the code, you have port it again. > > Also, I personally want to do more than just AST transformations of the numpy code. For example, I have some methods that call super. The super calls can be collapsed since the mro is known at compile time. If you want something that handles arbitrary python code ('with' etc.), and produces results identical to cpython (so tests are reliable), except in cases where it violates the semantics for speed (super), then yeah, you want a full replacement python implementation, and I agree that the proper input to a python implementation is .py files :-). That's getting a bit far afield from numexpr's goals though... -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From mistersheik at gmail.com Mon Apr 27 22:59:59 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Mon, 27 Apr 2015 22:59:59 -0400 Subject: [Numpy-discussion] ANN: numexpr 2.4.3 released In-Reply-To: References: Message-ID: I don't think I'm asking for so much. Somewhere inside numexpr it builds an AST of its own, which it converts into the optimized code. It would be more useful to me if that AST were in the same format as the one returned by Python's ast module. This way, I could glue in the bits of numexpr that I like with my code. For my purpose, this would have been the more ideal design. On Mon, Apr 27, 2015 at 10:47 PM, Nathaniel Smith wrote: > On Apr 27, 2015 5:30 PM, "Neil Girdhar" wrote: > > > > > > > > On Mon, Apr 27, 2015 at 7:42 PM, Nathaniel Smith wrote: > >> > >> On Mon, Apr 27, 2015 at 4:23 PM, Neil Girdhar > wrote: > >> > I was told that numba did similar ast parsing, but maybe that's not > true. > >> > Regarding the ast, I don't know about reliability, but take a look at > >> > get_ast in pyautodiff: > >> > > https://github.com/LowinData/pyautodiff/blob/7973e26f1c233570ed4bb10d08634ec7378e2152/autodiff/context.py > >> > It looks up the __file__ attribute and passes that through compile to > get > >> > the ast. Of course that won't work when you don't have source code > (a .pyc > >> > only module, or when else?) > >> > > >> > Since I'm looking into this kind of solution for the future of my > code, I'm > >> > curious if you think that's too unreliable for some reason? > >> > >> I'd certainly hesitate to rely on it for anything I cared about or > >> would be used by a lot of people... it's just intrinsically pretty > >> hacky. No guarantee that the source code you find via __file__ will > >> match what was used to compile the function, doesn't work when working > >> interactively or from the ipython notebook, etc. Or else you have to > >> trust a decompiler, which is a pretty serious complex chunk of code > >> just to avoid typing quote marks. > > > > > > Those are all good points. However, it's more than just typing quote > marks. The code might have non-numpy things mixed in. It might have > context managers and function calls and so on. More comments below. > > > >> > >> > >> > From a > >> > usability standpoint, I do think that's better than feeding in > strings, > >> > which: > >> > * are not syntax highlighted, and > >> > * require porting code from regular numpy expressions to numexpr > strings > >> > (applying a decorator is so much easier). > >> > >> Yes, but then you have to write a program that knows how to port code > >> from numpy expressions to numexpr strings :-). numexpr only knows a > >> tiny restricted subset of Python... > >> > >> The general approach I'd take to solve these kinds of problems would > >> be similar to that used by Theano or dask -- use regular python source > >> code that generates an expression graph in memory. E.g. this could > >> look like > >> > >> def do_stuff(arr1, arr2): > >> arr1 = deferred(arr1) > >> arr2 = deferred(arr2) > >> arr3 = np.sum(arr1 + (arr2 ** 2)) > >> return force(arr3 / np.sum(arr3)) > >> > >> -n > >> > > > > Right, there are three basic approaches: string processing, AST > processing, and compile-time expression graphs. > > > > The big advantage to AST processing over the other two is that you can > write and test your code as regular numpy code along with regular tests. > Then, with the application of a decorator, you get the speedup you're > looking for. The problem with porting the numpy code to numexpr strings or > Theano-like expression-graphs is that porting can introduce bugs, and even > if you're careful, every time you make a change to the numpy version of the > code, you have port it again. > > > > Also, I personally want to do more than just AST transformations of the > numpy code. For example, I have some methods that call super. The super > calls can be collapsed since the mro is known at compile time. > > If you want something that handles arbitrary python code ('with' etc.), > and produces results identical to cpython (so tests are reliable), except > in cases where it violates the semantics for speed (super), then yeah, you > want a full replacement python implementation, and I agree that the proper > input to a python implementation is .py files :-). That's getting a bit far > afield from numexpr's goals though... > > -n > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From solipsis at pitrou.net Tue Apr 28 04:50:00 2015 From: solipsis at pitrou.net (Antoine Pitrou) Date: Tue, 28 Apr 2015 10:50:00 +0200 Subject: [Numpy-discussion] ANN: numexpr 2.4.3 released References: Message-ID: <20150428105000.33d1107f@fsol> On Mon, 27 Apr 2015 19:35:51 -0400 Neil Girdhar wrote: > Also, FYI: http://numba.pydata.org/numba-doc/0.6/doc/modules/transforms.html > > It appears that numba does get the ast similar to pyautodiff and only get > the ast from source code as a fallback? That documentation is terribly obsolete (the latest Numba version is 0.18.2). Modern Numba starts from the CPython bytecode, it doesn't look at the AST. We explain the architecture in some detail here: http://numba.pydata.org/numba-doc/dev/developer/architecture.html Regards Antoine. From faltet at gmail.com Tue Apr 28 06:08:06 2015 From: faltet at gmail.com (Francesc Alted) Date: Tue, 28 Apr 2015 12:08:06 +0200 Subject: [Numpy-discussion] ANN: numexpr 2.4.3 released In-Reply-To: References: Message-ID: 2015-04-28 4:59 GMT+02:00 Neil Girdhar : > I don't think I'm asking for so much. Somewhere inside numexpr it builds > an AST of its own, which it converts into the optimized code. It would be > more useful to me if that AST were in the same format as the one returned > by Python's ast module. This way, I could glue in the bits of numexpr that > I like with my code. For my purpose, this would have been the more ideal > design. > I don't think implementing this for numexpr would be that complex. So for example, one could add a new numexpr.eval_ast(ast_expr) function. Pull requests are welcome. At any rate, which is your use case? I am curious. -- Francesc Alted -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Tue Apr 28 10:00:41 2015 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 28 Apr 2015 10:00:41 -0400 Subject: [Numpy-discussion] how to set a fixed sized dtype suitable for bitwise operations Message-ID: I have a need to have a numpy array of 17 byte (more specifically, at least 147 bits) values that I would be doing some bit twiddling on. I have found that doing a dtype of "i17" yields a dtype of int32, which is completely not what I intended. Doing 'u17' gets an "data type not understood". I have tried 'a17', but then bitwise_or() and left_shift() do not work (returns "NotImplemented"). How should I be going about this? Cheers! Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Tue Apr 28 10:19:20 2015 From: jaime.frio at gmail.com (=?UTF-8?Q?Jaime_Fern=C3=A1ndez_del_R=C3=ADo?=) Date: Tue, 28 Apr 2015 07:19:20 -0700 Subject: [Numpy-discussion] how to set a fixed sized dtype suitable for bitwise operations In-Reply-To: References: Message-ID: On Tue, Apr 28, 2015 at 7:00 AM, Benjamin Root wrote: > I have a need to have a numpy array of 17 byte (more specifically, at > least 147 bits) values that I would be doing some bit twiddling on. I have > found that doing a dtype of "i17" yields a dtype of int32, which is > completely not what I intended. Doing 'u17' gets an "data type not > understood". I have tried 'a17', but then bitwise_or() and left_shift() do > not work (returns "NotImplemented"). > > How should I be going about this? > The correct type to use would be a void dtype: >>> dt = np.dtype('V17') >>> dt.itemsize 17 Unfortunately, it does not support bitwise operations either, which seems like an oddity to me: >>> a = np.empty(2, dt) >>> a[0] = 'abcdef' >>> a[1] = bytearray([64, 56, 78]) >>> a[0] | a[1] Traceback (most recent call last): File "", line 1, in TypeError: unsupported operand type(s) for |: 'numpy.void' and 'numpy.void' Any fundamental reason for this? Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Tue Apr 28 11:59:15 2015 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 28 Apr 2015 11:59:15 -0400 Subject: [Numpy-discussion] how to set a fixed sized dtype suitable for bitwise operations In-Reply-To: References: Message-ID: Yeah, I am not seeing any way around it at the moment. I guess I will have to use the bitarray package for now. I was hoping for some fast per-element processing, but at the moment, I guess I will have to sacrifice that just to have something that worked correctly. Ben Root On Tue, Apr 28, 2015 at 10:19 AM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > On Tue, Apr 28, 2015 at 7:00 AM, Benjamin Root wrote: > >> I have a need to have a numpy array of 17 byte (more specifically, at >> least 147 bits) values that I would be doing some bit twiddling on. I have >> found that doing a dtype of "i17" yields a dtype of int32, which is >> completely not what I intended. Doing 'u17' gets an "data type not >> understood". I have tried 'a17', but then bitwise_or() and left_shift() do >> not work (returns "NotImplemented"). >> > >> How should I be going about this? >> > > The correct type to use would be a void dtype: > > >>> dt = np.dtype('V17') > >>> dt.itemsize > 17 > > Unfortunately, it does not support bitwise operations either, which seems > like an oddity to me: > > >>> a = np.empty(2, dt) > >>> a[0] = 'abcdef' > >>> a[1] = bytearray([64, 56, 78]) > >>> a[0] | a[1] > Traceback (most recent call last): > File "", line 1, in > TypeError: unsupported operand type(s) for |: 'numpy.void' and 'numpy.void' > > Any fundamental reason for this? > > Jaime > > -- > (\__/) > ( O.o) > ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes > de dominaci?n mundial. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rays at blue-cove.com Tue Apr 28 16:38:58 2015 From: rays at blue-cove.com (R Schumacher) Date: Tue, 28 Apr 2015 13:38:58 -0700 Subject: [Numpy-discussion] scipy.fftpack.diff operation question Message-ID: <201504282038.t3SKcsid019628@blue-cove.com> We are looking to plot to time series accelerometer data as velocity and displacement. To this end we tried scipy.fftpack.diff, but in looking at the test code direct_diff() we get odd results, and, why is the doc using "sqrt(-1)*j" in its explanation? So, I tried a few different integration methods. We were first looking at the doc at http://docs.scipy.org/doc/scipy-0.15.1/reference/generated/scipy.fftpack.diff.html and code at https://github.com/scipy/scipy/blob/v0.15.1/scipy/fftpack/pseudo_diffs.py#L26 as well as the tests at https://github.com/scipy/scipy/blob/master/benchmarks/benchmarks/fftpack_pseudo_diffs.py The direct_diff() test function in bench_pseudo_diffs.py seems odd, since I have to pass period=-1 to get a matching sign plot for the first integration. And why the scaling differences required, even for diff() and direct_diff()? Is my understanding fundamentally flawed? Emacs! Ray -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: 2a6b1958.jpg Type: image/jpeg Size: 362369 bytes Desc: not available URL: -------------- next part -------------- #----------------------------------------------------------------------------- # Name: plotter.py # Purpose: # # Author: Ray Schumacher # # Created: 2015/04/22 # RCS-ID: $Id: plotter.py $ # Copyright: (c) 2015 # Licence: #----------------------------------------------------------------------------- """ """ import numpy import time #------------ stand-alone test -------------------------------------------------- def intf(a, fs, f_lo=0.0, f_hi=1.0e12, times=1, winlen=1, unwin=False): """ Numerically integrate a time series in the frequency domain. This function integrates a time series in the frequency domain using 'Omega Arithmetic', over a defined frequency band. Parameters ---------- a : array_like Inumpyut time series. fs : int Sampling rate (Hz) of the inumpyut time series. f_lo : float, optional Lower frequency bound over which integration takes place. Defaults to 0 Hz. f_hi : float, optional Upper frequency bound over which integration takes place. Defaults to the Nyquist frequency ( = fs / 2). times : int, optional Number of times to integrate inumpyut time series a. Can be either 0, 1 or 2. If 0 is used, function effectively applies a 'brick wall' frequency domain filter to a. Defaults to 1. winlen : int, optional Number of seconds at the beginning and end of a file to apply half a Hanning window to. Limited to half the record length. Defaults to 1 second. unwin : Boolean, optional Whether or not to remove the window applied to the inumpyut time series from the output time series. Returns ------- out : complex ndarray The zero-, single- or double-integrated acceleration time series. Versions ---------- 1.1 First development version. Uses rfft to avoid complex return values. Checks for even length time series; if not, end-pad with single zero. 1.2 Zero-means time series to avoid spurious errors when applying Hanning window. """ t0 = time.clock() EPS = numpy.finfo(float).eps *2**8 a = a - a.mean() # Convert time series to zero-mean if numpy.mod(a.size,2) != 0: # Check for even length time series odd = True a = numpy.append(a, 0) # If not, append zero to array else: odd = False f_hi = min(fs/2, f_hi) # Upper frequency limited to Nyquist winlen = min(a.size/2, winlen) # Limit window to half record length ni = a.size # No. of points in data (int) nf = float(ni) # No. of points in data (float) fs = float(fs) # Sampling rate (Hz) df = fs/nf # Frequency increment in FFT stf_i = int(f_lo/df) # Index of lower frequency bound enf_i = int(f_hi/df) # Index of upper frequency bound window = numpy.ones(ni) # Create window function es = int(winlen*fs) # No. of samples to window from ends edge_win = numpy.hanning(es) # Hanning window edge window[:es/2] = edge_win[:es/2] window[-es/2:] = edge_win[-es/2:] a_w = a*window FFTspec_a = numpy.fft.rfft(a_w) # Calculate complex FFT of inumpyut FFTfreq = numpy.fft.fftfreq(ni, d=1/fs)[:ni/2+1] w = (2*numpy.pi*FFTfreq) # Omega iw = (0+1j)*w # i*Omega mask = numpy.zeros(ni/2+1) # Half-length mask for +ve freqs mask[stf_i:enf_i] = 1.0 # Mask = 1 for desired +ve freqs if times == 2: # Double integration FFTspec = -FFTspec_a*w / (w+EPS)**3 elif times == 1: # Single integration FFTspec = FFTspec_a*iw / (iw+EPS)**2 elif times == 0: # No integration FFTspec = FFTspec_a else: print 'Error' FFTspec *= mask # Select frequencies to use out_w = numpy.fft.irfft(FFTspec) # Return to time domain if unwin == True: out = out_w*window/(window+EPS)**2 # Remove window from time series else: out = out_w print 'elapsed', time.clock()-t0 if odd == True: # Check for even length time series return out[:-1] # If not, remove last entry else: return out def direct_diff(x,k=1,period=None): fx = numpy.fft.fft(x) n = len (fx) if period is None: period = 2*numpy.pi w = numpy.fft.fftfreq(n)*2j*numpy.pi/period*n if k<0: with numpy.errstate(divide="ignore", invalid="ignore"): w = 1 / w**k w[0] = 0.0 else: w = w**k if n>2000: w[250:n-250] = 0.0 return numpy.fft.ifft(w*fx).real def dc_int(sine, sample_rate): # fourier transform ft = numpy.fft.rfft(sine) N = len(sine) # bin frequencies bin_width = float(sample_rate) / N bin_freqs = numpy.arange(N//2 + 1) * bin_width # include extra frequency for Nyquist bin # OK, don't divide by zero, and the Nyquist bin actually has a negative frequency bin_freqs[0] = bin_freqs[1] bin_freqs[-1] *= -1 # integrate! integ_ft = ft / (2j*numpy.pi*bin_freqs) integ_ft[0] = 0 # zero out the DC bin # inverse rfft return numpy.fft.irfft(integ_ft) def int2(ys): N = ys.shape[0] w = (numpy.arange(N) - N /2.) / float(N) # integration Fys = numpy.fft.fft(ys) with numpy.errstate(divide="ignore", invalid="ignore"): modFys = numpy.fft.ifftshift(1./ (2 * numpy.pi * 1j * w) * numpy.fft.fftshift(Fys)) # modFys[0] will hold the result of dividing the DC component of y by 0, so it # will be nan or inf. Setting modFys[0] to 0 amounts to choosing a specific # constant of integration. modFys[0] = 0 return numpy.fft.ifft(modFys).real / float(N) def test(pth=r'C:\Users\Ray\Dropbox (Jan Medical)\SD Datasets\Normals\Other Subject Data\Data from JH Unit\001_2008Sep12_142457.csv'): """ run on a sample CSV file b: blue g: green r: red c: cyan m: magenta y: yellow k: black w: white """ import os.path import matplotlib.pyplot as plt from scipy.fftpack import diff from mpl_toolkits.axes_grid1 import host_subplot import mpl_toolkits.axisartist as AA length = 2.**14 fh=open(pth,'r') ## for this sample data sampleRate=1024 x = numpy.arange(length)*2*numpy.pi/length f = numpy.sin(x)*numpy.cos(4*x) plt.plot(f, label="data") intg_data = diff(f,-1) plt.plot(intg_data-.01, label="scipy diff - .01") intg_data1 = direct_diff(f, -1, period=-1) adj0 = intg_data.max()/intg_data1.max() plt.plot(intg_data1*adj0, label="direct_diff, adj:"+str(adj0)) # diverges wildly... intg_data3 = dc_int(f, sampleRate) adjdc = intg_data.max()/intg_data3.max() intg_data3 *= adjdc plt.plot(intg_data3+.01, label="int dc +.01, adj:"+str(adjdc)) int2_res = int2(f) adj = intg_data.max()/int2_res.max() int2_res *= adj plt.plot(int2_res, label="int2 adj:"+str(adj)) plt.legend() plt.draw() plt.show() intg_data = diff(f,-2) plt.plot(intg_data-.01, label="scipy diff - .01") intg_data1 = direct_diff(f, -2, period=1) adj0 = intg_data.max()/intg_data1.max() plt.plot(intg_data1*adj0, label="direct_diff, adj:"+str(adj0)) # diverges wildly... intg_data3 = dc_int(dc_int(f, sampleRate), sampleRate) adjdc = intg_data.max()/intg_data3.max() intg_data3 *= adjdc plt.plot(intg_data3+.01, label="int dc +.01, adj:"+str(adjdc)) int2_res = int2(int2(f)) adj = intg_data.max()/int2_res.max() int2_res *= adj plt.plot(int2_res, label="int2 adj:"+str(adj)) plt.legend() plt.draw() plt.show() if __name__ == '__main__': import sys if len(sys.argv)>1: test(sys.argv[1]) else: test() From mistersheik at gmail.com Wed Apr 29 07:26:08 2015 From: mistersheik at gmail.com (Neil Girdhar) Date: Wed, 29 Apr 2015 07:26:08 -0400 Subject: [Numpy-discussion] ANN: numexpr 2.4.3 released In-Reply-To: References: Message-ID: Sorry for the late reply. I will definitely consider submitting a pull request to numexpr if it's the direction I decide to go. Right now I'm still evaluating all of the many options for my project. I am implementing a machine learning algorithm as part of my thesis work. I'm in the "make it work", but quickly approaching the "make it fast" part. With research, you usually want to iterate quickly, and so whatever solution I choose has to be automated. I can't be coding things in an intuitive, natural way, and then porting it to a different implementation to make it fast. What I want is for that conversion to be automated. I'm still evaluating how to best achieve that. On Tue, Apr 28, 2015 at 6:08 AM, Francesc Alted wrote: > 2015-04-28 4:59 GMT+02:00 Neil Girdhar : > >> I don't think I'm asking for so much. Somewhere inside numexpr it builds >> an AST of its own, which it converts into the optimized code. It would be >> more useful to me if that AST were in the same format as the one returned >> by Python's ast module. This way, I could glue in the bits of numexpr that >> I like with my code. For my purpose, this would have been the more ideal >> design. >> > > I don't think implementing this for numexpr would be that complex. So for > example, one could add a new numexpr.eval_ast(ast_expr) function. Pull > requests are welcome. > > At any rate, which is your use case? I am curious. > > -- > Francesc Alted > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From afylot at gmail.com Wed Apr 29 11:05:49 2015 From: afylot at gmail.com (simona bellavista) Date: Wed, 29 Apr 2015 17:05:49 +0200 Subject: [Numpy-discussion] performance of numpy.array() Message-ID: I work on two distinct scientific clusters. I have run the same python code on the two clusters and I have noticed that one is faster by an order of magnitude than the other (1min vs 10min, this is important because I run this function many times). I have investigated with a profiler and I have found that the cause of this is that (same code and same data) is the function numpy.array that is being called 10^5 times. On cluster A it takes 2 s in total, whereas on cluster B it takes ~6 min. For what regards the other functions, they are generally faster on cluster A. I understand that the clusters are quite different, both as hardware and installed libraries. It strikes me that on this particular function the performance is so different. I would have though that this is due to a difference in the available memory, but actually by looking with `top` the memory seems to be used only at 0.1% on cluster B. In theory numpy is compiled with atlas on cluster B, and on cluster A it is not clear, because numpy.__config__.show() returns NOT AVAILABLE for anything. Does anybody has any insight on that, and if I can improve the performance on cluster B? -------------- next part -------------- An HTML attachment was scrubbed... URL: From nickpapior at gmail.com Wed Apr 29 11:18:07 2015 From: nickpapior at gmail.com (Nick Papior Andersen) Date: Wed, 29 Apr 2015 17:18:07 +0200 Subject: [Numpy-discussion] performance of numpy.array() In-Reply-To: References: Message-ID: Compile it yourself to know the limitations/benefits of the dependency libraries. Otherwise, have you checked which versions of numpy they are, i.e. are they the same version? 2015-04-29 17:05 GMT+02:00 simona bellavista : > I work on two distinct scientific clusters. I have run the same python > code on the two clusters and I have noticed that one is faster by an order > of magnitude than the other (1min vs 10min, this is important because I run > this function many times). > > I have investigated with a profiler and I have found that the cause of > this is that (same code and same data) is the function numpy.array that is > being called 10^5 times. On cluster A it takes 2 s in total, whereas on > cluster B it takes ~6 min. For what regards the other functions, they are > generally faster on cluster A. I understand that the clusters are quite > different, both as hardware and installed libraries. It strikes me that on > this particular function the performance is so different. I would have > though that this is due to a difference in the available memory, but > actually by looking with `top` the memory seems to be used only at 0.1% on > cluster B. In theory numpy is compiled with atlas on cluster B, and on > cluster A it is not clear, because numpy.__config__.show() returns NOT > AVAILABLE for anything. > > Does anybody has any insight on that, and if I can improve the performance > on cluster B? > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Kind regards Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From afylot at gmail.com Wed Apr 29 11:40:21 2015 From: afylot at gmail.com (simona bellavista) Date: Wed, 29 Apr 2015 17:40:21 +0200 Subject: [Numpy-discussion] performance of numpy.array() In-Reply-To: References: Message-ID: on cluster A 1.9.0 and on cluster B 1.8.2 2015-04-29 17:18 GMT+02:00 Nick Papior Andersen : > Compile it yourself to know the limitations/benefits of the dependency > libraries. > > Otherwise, have you checked which versions of numpy they are, i.e. are > they the same version? > > 2015-04-29 17:05 GMT+02:00 simona bellavista : > >> I work on two distinct scientific clusters. I have run the same python >> code on the two clusters and I have noticed that one is faster by an order >> of magnitude than the other (1min vs 10min, this is important because I run >> this function many times). >> >> I have investigated with a profiler and I have found that the cause of >> this is that (same code and same data) is the function numpy.array that is >> being called 10^5 times. On cluster A it takes 2 s in total, whereas on >> cluster B it takes ~6 min. For what regards the other functions, they are >> generally faster on cluster A. I understand that the clusters are quite >> different, both as hardware and installed libraries. It strikes me that on >> this particular function the performance is so different. I would have >> though that this is due to a difference in the available memory, but >> actually by looking with `top` the memory seems to be used only at 0.1% on >> cluster B. In theory numpy is compiled with atlas on cluster B, and on >> cluster A it is not clear, because numpy.__config__.show() returns NOT >> AVAILABLE for anything. >> >> Does anybody has any insight on that, and if I can improve the >> performance on cluster B? >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > Kind regards Nick > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nickpapior at gmail.com Wed Apr 29 11:41:02 2015 From: nickpapior at gmail.com (Nick Papior Andersen) Date: Wed, 29 Apr 2015 17:41:02 +0200 Subject: [Numpy-discussion] performance of numpy.array() In-Reply-To: References: Message-ID: You could try and install your own numpy to check whether that resolves the problem. 2015-04-29 17:40 GMT+02:00 simona bellavista : > on cluster A 1.9.0 and on cluster B 1.8.2 > > 2015-04-29 17:18 GMT+02:00 Nick Papior Andersen : > >> Compile it yourself to know the limitations/benefits of the dependency >> libraries. >> >> Otherwise, have you checked which versions of numpy they are, i.e. are >> they the same version? >> >> 2015-04-29 17:05 GMT+02:00 simona bellavista : >> >>> I work on two distinct scientific clusters. I have run the same python >>> code on the two clusters and I have noticed that one is faster by an order >>> of magnitude than the other (1min vs 10min, this is important because I run >>> this function many times). >>> >>> I have investigated with a profiler and I have found that the cause of >>> this is that (same code and same data) is the function numpy.array that is >>> being called 10^5 times. On cluster A it takes 2 s in total, whereas on >>> cluster B it takes ~6 min. For what regards the other functions, they are >>> generally faster on cluster A. I understand that the clusters are quite >>> different, both as hardware and installed libraries. It strikes me that on >>> this particular function the performance is so different. I would have >>> though that this is due to a difference in the available memory, but >>> actually by looking with `top` the memory seems to be used only at 0.1% on >>> cluster B. In theory numpy is compiled with atlas on cluster B, and on >>> cluster A it is not clear, because numpy.__config__.show() returns NOT >>> AVAILABLE for anything. >>> >>> Does anybody has any insight on that, and if I can improve the >>> performance on cluster B? >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> >> -- >> Kind regards Nick >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Kind regards Nick -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Wed Apr 29 11:47:30 2015 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 29 Apr 2015 17:47:30 +0200 Subject: [Numpy-discussion] performance of numpy.array() In-Reply-To: References: Message-ID: <1430322450.16041.4.camel@sipsolutions.net> There was a major improvement to np.array in some cases. You can probably work around this by using np.concatenate instead of np.array in your case (depends on the usecase, but I will guess you have code doing: np.array([arr1, arr2, arr3]) or similar. If your use case is different, you may be out of luck and only an upgrade would help. On Mi, 2015-04-29 at 17:41 +0200, Nick Papior Andersen wrote: > You could try and install your own numpy to check whether that > resolves the problem. > > 2015-04-29 17:40 GMT+02:00 simona bellavista : > on cluster A 1.9.0 and on cluster B 1.8.2 > > 2015-04-29 17:18 GMT+02:00 Nick Papior Andersen > : > Compile it yourself to know the limitations/benefits > of the dependency libraries. > > > Otherwise, have you checked which versions of numpy > they are, i.e. are they the same version? > > > 2015-04-29 17:05 GMT+02:00 simona bellavista > : > > I work on two distinct scientific clusters. I > have run the same python code on the two > clusters and I have noticed that one is faster > by an order of magnitude than the other (1min > vs 10min, this is important because I run this > function many times). > > > I have investigated with a profiler and I have > found that the cause of this is that (same > code and same data) is the function > numpy.array that is being called 10^5 times. > On cluster A it takes 2 s in total, whereas on > cluster B it takes ~6 min. For what regards > the other functions, they are generally faster > on cluster A. I understand that the clusters > are quite different, both as hardware and > installed libraries. It strikes me that on > this particular function the performance is so > different. I would have though that this is > due to a difference in the available memory, > but actually by looking with `top` the memory > seems to be used only at 0.1% on cluster B. In > theory numpy is compiled with atlas on cluster > B, and on cluster A it is not clear, because > numpy.__config__.show() returns NOT AVAILABLE > for anything. > > > Does anybody has any insight on that, and if I > can improve the performance on cluster B? > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > -- > Kind regards Nick > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > -- > Kind regards Nick > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 819 bytes Desc: This is a digitally signed message part URL: From robert.kern at gmail.com Wed Apr 29 11:50:26 2015 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 29 Apr 2015 16:50:26 +0100 Subject: [Numpy-discussion] performance of numpy.array() In-Reply-To: References: Message-ID: On Wed, Apr 29, 2015 at 4:05 PM, simona bellavista wrote: > > I work on two distinct scientific clusters. I have run the same python code on the two clusters and I have noticed that one is faster by an order of magnitude than the other (1min vs 10min, this is important because I run this function many times). > > I have investigated with a profiler and I have found that the cause of this is that (same code and same data) is the function numpy.array that is being called 10^5 times. On cluster A it takes 2 s in total, whereas on cluster B it takes ~6 min. For what regards the other functions, they are generally faster on cluster A. I understand that the clusters are quite different, both as hardware and installed libraries. It strikes me that on this particular function the performance is so different. I would have though that this is due to a difference in the available memory, but actually by looking with `top` the memory seems to be used only at 0.1% on cluster B. In theory numpy is compiled with atlas on cluster B, and on cluster A it is not clear, because numpy.__config__.show() returns NOT AVAILABLE for anything. > > Does anybody has any insight on that, and if I can improve the performance on cluster B? Check to see if you have the "Transparent Hugepages" (THP) Linux kernel feature enabled on each cluster. You may want to try turning it off. I have recently run into a problem with a large-memory multicore machine with THP for programs that had many large numpy.array() memory allocations. Usually, THP helps memory-hungry applications (you can Google for the reasons), but it does require defragmenting the memory space to get contiguous hugepages. The system can get into a state where the memory space is so fragmented such that trying to get each new hugepage requires a lot of extra work to create the contiguous memory regions. In my case, a perfectly well-performing program would suddenly slow down immensely during it's memory-allocation-intensive actions. When I turned THP off, it started working normally again. If you have root, try using `perf top` to see what C functions in user space and kernel space are taking up the most time in your process. If you see anything like `do_page_fault()`, this, or a similar issue, is your problem. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Wed Apr 29 14:08:40 2015 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 29 Apr 2015 20:08:40 +0200 Subject: [Numpy-discussion] performance of numpy.array() In-Reply-To: References: Message-ID: <55411E28.70403@googlemail.com> numpy 1.9 makes array(list) performance similar in performance to vstack in 1.8 its very slow. On 29.04.2015 17:40, simona bellavista wrote: > on cluster A 1.9.0 and on cluster B 1.8.2 > > 2015-04-29 17:18 GMT+02:00 Nick Papior Andersen >: > > Compile it yourself to know the limitations/benefits of the > dependency libraries. > > Otherwise, have you checked which versions of numpy they are, i.e. > are they the same version? > > 2015-04-29 17:05 GMT+02:00 simona bellavista >: > > I work on two distinct scientific clusters. I have run the same > python code on the two clusters and I have noticed that one is > faster by an order of magnitude than the other (1min vs 10min, > this is important because I run this function many times). > > I have investigated with a profiler and I have found that the > cause of this is that (same code and same data) is the function > numpy.array that is being called 10^5 times. On cluster A it > takes 2 s in total, whereas on cluster B it takes ~6 min. For > what regards the other functions, they are generally faster on > cluster A. I understand that the clusters are quite different, > both as hardware and installed libraries. It strikes me that on > this particular function the performance is so different. I > would have though that this is due to a difference in the > available memory, but actually by looking with `top` the memory > seems to be used only at 0.1% on cluster B. In theory numpy is > compiled with atlas on cluster B, and on cluster A it is not > clear, because numpy.__config__.show() returns NOT AVAILABLE for > anything. > > Does anybody has any insight on that, and if I can improve the > performance on cluster B? > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > -- > Kind regards Nick > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From jtaylor.debian at googlemail.com Wed Apr 29 14:13:59 2015 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 29 Apr 2015 20:13:59 +0200 Subject: [Numpy-discussion] performance of numpy.array() In-Reply-To: References: Message-ID: <55411F67.7090309@googlemail.com> On 29.04.2015 17:50, Robert Kern wrote: > On Wed, Apr 29, 2015 at 4:05 PM, simona bellavista > wrote: >> >> I work on two distinct scientific clusters. I have run the same python > code on the two clusters and I have noticed that one is faster by an > order of magnitude than the other (1min vs 10min, this is important > because I run this function many times). >> >> I have investigated with a profiler and I have found that the cause of > this is that (same code and same data) is the function numpy.array that > is being called 10^5 times. On cluster A it takes 2 s in total, whereas > on cluster B it takes ~6 min. For what regards the other functions, > they are generally faster on cluster A. I understand that the clusters > are quite different, both as hardware and installed libraries. It > strikes me that on this particular function the performance is so > different. I would have though that this is due to a difference in the > available memory, but actually by looking with `top` the memory seems to > be used only at 0.1% on cluster B. In theory numpy is compiled with > atlas on cluster B, and on cluster A it is not clear, because > numpy.__config__.show() returns NOT AVAILABLE for anything. >> >> Does anybody has any insight on that, and if I can improve the > performance on cluster B? > > Check to see if you have the "Transparent Hugepages" (THP) Linux kernel > feature enabled on each cluster. You may want to try turning it off. I > have recently run into a problem with a large-memory multicore machine > with THP for programs that had many large numpy.array() memory > allocations. Usually, THP helps memory-hungry applications (you can > Google for the reasons), but it does require defragmenting the memory > space to get contiguous hugepages. The system can get into a state where > the memory space is so fragmented such that trying to get each new > hugepage requires a lot of extra work to create the contiguous memory > regions. In my case, a perfectly well-performing program would suddenly > slow down immensely during it's memory-allocation-intensive actions. > When I turned THP off, it started working normally again. > > If you have root, try using `perf top` to see what C functions in user > space and kernel space are taking up the most time in your process. If > you see anything like `do_page_fault()`, this, or a similar issue, is > your problem. > this issue it has nothing to do with thp, its a change in array in numpy 1.9. Its now as fast as vstack, while before it was really really slow. But the memory compaction is indeed awful, especially the backport redhat did for their enterprise linux. Typically it is enough to only disable the automatic defragmentation on allocation only, not the full thps, e.g. via echo never | sudo tee /sys/kernel/mm/transparent_hugepage/defrag (on redhat backports its a different path) You still have the hugepaged running defrags at times of low load and in limited fashion, you can also manually trigger a defrag by writting to: /prog/sys/vm/compact_memory Though the hugepaged which runs only occasionally should already do a good job. From charlesr.harris at gmail.com Wed Apr 29 16:51:15 2015 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 29 Apr 2015 14:51:15 -0600 Subject: [Numpy-discussion] Weighted covariance. Message-ID: The weighted covariance function in PR #4960 is evolving to the following, where frequency weights are `f` and reliability weights are `a`. Assume that the observations are in the columns of the observation matrix. the steps to compute the weighted covariance are as follows:: >>> w = f * a >>> v1 = np.sum(w) >>> v2 = np.sum(a * w) >>> m -= np.sum(m * w, axis=1, keepdims=True) / v1 >>> cov = np.dot(m * w, m.T) * v1 / (v1**2 - ddof * v2) Note that when ``a == 1``, the normalization factor ``v1 / (v1**2 - ddof * v2)`` goes over to ``1 / (np.sum(f) - ddof)`` as it should. This is probably a good time for comments from all the kibitzers out there. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From afylot at gmail.com Thu Apr 30 10:03:05 2015 From: afylot at gmail.com (simona bellavista) Date: Thu, 30 Apr 2015 16:03:05 +0200 Subject: [Numpy-discussion] performance of numpy.array() In-Reply-To: <1430322450.16041.4.camel@sipsolutions.net> References: <1430322450.16041.4.camel@sipsolutions.net> Message-ID: I have seen a big improvement in performance with numpy 1.9.2 with python 2.7.8, numpy.array takes 5 s instead of 300s. On the other side, I have also tried numpy 1.9.2 and 1.9.0 with python 3.4 and the results are terrible: numpy.array takes 20s, but the other routines are slowed down, for example concatenate and astype and copy and uniform. Most of all, the sort function of numpy.dnarray is slowed down by a factor at least 10. On the other cluster I am using python 3.3 with numpy 1.9.0 and it is working very well (but I think it is so also because of the hardware). I was trying to install python 3.3 on this cluster, but because of other issues (error at compile time of h5py library and bug at runtime in the dill library) I cannot test it right now. 2015-04-29 17:47 GMT+02:00 Sebastian Berg : > There was a major improvement to np.array in some cases. > > You can probably work around this by using np.concatenate instead of > np.array in your case (depends on the usecase, but I will guess you have > code doing: > > np.array([arr1, arr2, arr3]) > > or similar. If your use case is different, you may be out of luck and > only an upgrade would help. > > > On Mi, 2015-04-29 at 17:41 +0200, Nick Papior Andersen wrote: > > You could try and install your own numpy to check whether that > > resolves the problem. > > > > 2015-04-29 17:40 GMT+02:00 simona bellavista : > > on cluster A 1.9.0 and on cluster B 1.8.2 > > > > 2015-04-29 17:18 GMT+02:00 Nick Papior Andersen > > : > > Compile it yourself to know the limitations/benefits > > of the dependency libraries. > > > > > > Otherwise, have you checked which versions of numpy > > they are, i.e. are they the same version? > > > > > > 2015-04-29 17:05 GMT+02:00 simona bellavista > > : > > > > I work on two distinct scientific clusters. I > > have run the same python code on the two > > clusters and I have noticed that one is faster > > by an order of magnitude than the other (1min > > vs 10min, this is important because I run this > > function many times). > > > > > > I have investigated with a profiler and I have > > found that the cause of this is that (same > > code and same data) is the function > > numpy.array that is being called 10^5 times. > > On cluster A it takes 2 s in total, whereas on > > cluster B it takes ~6 min. For what regards > > the other functions, they are generally faster > > on cluster A. I understand that the clusters > > are quite different, both as hardware and > > installed libraries. It strikes me that on > > this particular function the performance is so > > different. I would have though that this is > > due to a difference in the available memory, > > but actually by looking with `top` the memory > > seems to be used only at 0.1% on cluster B. In > > theory numpy is compiled with atlas on cluster > > B, and on cluster A it is not clear, because > > numpy.__config__.show() returns NOT AVAILABLE > > for anything. > > > > > > Does anybody has any insight on that, and if I > > can improve the performance on cluster B? > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > -- > > Kind regards Nick > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > > > > > -- > > Kind regards Nick > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rnelsonchem at gmail.com Thu Apr 30 10:24:40 2015 From: rnelsonchem at gmail.com (Ryan Nelson) Date: Thu, 30 Apr 2015 10:24:40 -0400 Subject: [Numpy-discussion] performance of numpy.array() In-Reply-To: References: <1430322450.16041.4.camel@sipsolutions.net> Message-ID: I have had good luck with Continuum's Miniconda Python distributions on Linux. http://conda.pydata.org/miniconda.html The `conda` command makes it very easy to create specific testing environments for Python 2 and 3 with many different packages. Everything is precompiled, so you won't have to worry about system library differences between the two clusters. Hope that helps. Ryan On Thu, Apr 30, 2015 at 10:03 AM, simona bellavista wrote: > I have seen a big improvement in performance with numpy 1.9.2 with python > 2.7.8, numpy.array takes 5 s instead of 300s. > > On the other side, I have also tried numpy 1.9.2 and 1.9.0 with python 3.4 > and the results are terrible: numpy.array takes 20s, but the other routines > are slowed down, for example concatenate and astype and copy and uniform. > Most of all, the sort function of numpy.dnarray is slowed down by a factor > at least 10. > > On the other cluster I am using python 3.3 with numpy 1.9.0 and it is > working very well (but I think it is so also because of the hardware). I > was trying to install python 3.3 on this cluster, but because of other > issues (error at compile time of h5py library and bug at runtime in the > dill library) I cannot test it right now. > > 2015-04-29 17:47 GMT+02:00 Sebastian Berg : > >> There was a major improvement to np.array in some cases. >> >> You can probably work around this by using np.concatenate instead of >> np.array in your case (depends on the usecase, but I will guess you have >> code doing: >> >> np.array([arr1, arr2, arr3]) >> >> or similar. If your use case is different, you may be out of luck and >> only an upgrade would help. >> >> >> On Mi, 2015-04-29 at 17:41 +0200, Nick Papior Andersen wrote: >> > You could try and install your own numpy to check whether that >> > resolves the problem. >> > >> > 2015-04-29 17:40 GMT+02:00 simona bellavista : >> > on cluster A 1.9.0 and on cluster B 1.8.2 >> > >> > 2015-04-29 17:18 GMT+02:00 Nick Papior Andersen >> > : >> > Compile it yourself to know the limitations/benefits >> > of the dependency libraries. >> > >> > >> > Otherwise, have you checked which versions of numpy >> > they are, i.e. are they the same version? >> > >> > >> > 2015-04-29 17:05 GMT+02:00 simona bellavista >> > : >> > >> > I work on two distinct scientific clusters. I >> > have run the same python code on the two >> > clusters and I have noticed that one is faster >> > by an order of magnitude than the other (1min >> > vs 10min, this is important because I run this >> > function many times). >> > >> > >> > I have investigated with a profiler and I have >> > found that the cause of this is that (same >> > code and same data) is the function >> > numpy.array that is being called 10^5 times. >> > On cluster A it takes 2 s in total, whereas on >> > cluster B it takes ~6 min. For what regards >> > the other functions, they are generally faster >> > on cluster A. I understand that the clusters >> > are quite different, both as hardware and >> > installed libraries. It strikes me that on >> > this particular function the performance is so >> > different. I would have though that this is >> > due to a difference in the available memory, >> > but actually by looking with `top` the memory >> > seems to be used only at 0.1% on cluster B. In >> > theory numpy is compiled with atlas on cluster >> > B, and on cluster A it is not clear, because >> > numpy.__config__.show() returns NOT AVAILABLE >> > for anything. >> > >> > >> > Does anybody has any insight on that, and if I >> > can improve the performance on cluster B? >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> > >> > >> > >> > -- >> > Kind regards Nick >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> > >> > >> > >> > -- >> > Kind regards Nick >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Apr 30 14:24:50 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 30 Apr 2015 14:24:50 -0400 Subject: [Numpy-discussion] code snippet: assert all close or large Message-ID: -------------- next part -------------- An HTML attachment was scrubbed... URL: From cgodshall at enthought.com Thu Apr 30 14:24:57 2015 From: cgodshall at enthought.com (Courtenay Godshall (Enthought)) Date: Thu, 30 Apr 2015 13:24:57 -0500 Subject: [Numpy-discussion] ANN: SciPy 2015 Tutorial Schedule Posted - Register Today - Already 30% Sold Out Message-ID: <013801d08372$f6f1b350$e4d519f0$@enthought.com> **The #SciPy2015 Conference (Scientific Computing with #Python) Tutorial Schedule is up! It is 1st come, 1st served and already 30% sold out. Register today!** http://www.scipy2015.scipy.org/ehome/115969/289057/? &.This year you can choose from 16 different SciPy tutorials OR select the 2 day Software Carpentry course on scientific Python that assumes some programming experience but no Python knowledge.? Please share! Tutorials include: *Introduction to NumPy (Beginner) *Machine Learning with Scikit-Learn (Intermediate) *Cython: Blend of the Best of Python and C/++ (Intermediate) *Image Analysis in Python with SciPy and Scikit-Image (Intermediate) *Analyzing and Manipulating Data with Pandas (Beginner) *Machine Learning with Scikit-Learn (Advanced) *Building Python Data Applications with Blaze and Bokeh (Intermediate) *Multibody Dynamics and Control with Python (Intermediate) *Anatomy of Matplotlib (Beginner) *Computational Statistics I (Intermediate) *Efficient Python for High-Performance Parallel Computing (Intermediate) *Geospatial Data with Open Source Tools in Python (Intermediate) *Decorating Drones: Using Drones to Delve Deeper into Intermediate Python (Intermediate) *Computational Statistics II (Intermediate) *Modern Optimization Methods in Python (Advanced) *Jupyter Advanced Topics Tutorial (Advanced) -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Apr 30 14:27:39 2015 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 30 Apr 2015 14:27:39 -0400 Subject: [Numpy-discussion] code snippet: assert all close or large In-Reply-To: References: Message-ID: Sorry, hit the wrong key just an example that I think is not covered by numpy.testing assert absolute tolerance for `inf`: "assert x and y are allclose or x is large if y is inf" On Thu, Apr 30, 2015 at 2:24 PM, wrote: > > > def assert_allclose_large(x, y, rtol=1e-6, atol=0, ltol=1e30): """ assert x and y are allclose or x is large if y is inf """ mask_inf = np.isinf(y) & ~np.isinf(x) assert_allclose(x[~mask_inf], y[~mask_inf], rtol=rtol, atol=atol) assert_array_less(ltol, x[mask_inf]) Josef -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Thu Apr 30 15:32:52 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 30 Apr 2015 21:32:52 +0200 Subject: [Numpy-discussion] numpy vendor repo In-Reply-To: References: Message-ID: On Mon, Apr 27, 2015 at 5:20 PM, Ralf Gommers wrote: > > > > On Mon, Apr 27, 2015 at 5:04 PM, Peter Cock > wrote: > >> On Mon, Apr 27, 2015 at 1:04 PM, Ralf Gommers >> wrote: >> > >> > Done in the master branch of https://github.com/rgommers/vendor. I >> think >> > that "numpy-vendor" is a better repo name than "vendor" (which is pretty >> > much meaningless outside of the numpy github org), so I propose to push >> my >> > master branch to https://github.com/numpy/numpy-vendor and remove the >> > current https://github.com/numpy/vendor repo. >> > >> > I'll do this in a couple of days, unless there are objections by then. >> > >> > Ralf >> >> Can you not just rename the repository on GitHub? >> > > Yes, that is possible. The difference is small in this case (retaining the > 1 closed PR fixing a typo; there are no issues), but after looking it up I > think renaming is a bit less work than creating a new repo. So I'll rename. > This is done now. If anyone wants to give it a try, the instructions in README.txt should work for producing working Windows installers. The only thing that's not covered is the install and use of Vagrant itself, because the former is platform-dependent and the latter is basically only the very first terminal line of http://docs.vagrantup.com/v2/getting-started/ (but read on for a bit, it's useful). Feedback on how to document that repo better are very welcome. I'll be looking at improving the documentation of how to release and what a release manager does as well soon. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Thu Apr 30 16:40:18 2015 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Thu, 30 Apr 2015 22:40:18 +0200 Subject: [Numpy-discussion] numpy vendor repo In-Reply-To: References: Message-ID: On Thu, Apr 30, 2015 at 9:32 PM, Ralf Gommers wrote: > > > On Mon, Apr 27, 2015 at 5:20 PM, Ralf Gommers > wrote: > >> >> >> >> On Mon, Apr 27, 2015 at 5:04 PM, Peter Cock >> wrote: >> >>> On Mon, Apr 27, 2015 at 1:04 PM, Ralf Gommers >>> wrote: >>> > >>> > Done in the master branch of https://github.com/rgommers/vendor. I >>> think >>> > that "numpy-vendor" is a better repo name than "vendor" (which is >>> pretty >>> > much meaningless outside of the numpy github org), so I propose to >>> push my >>> > master branch to https://github.com/numpy/numpy-vendor and remove the >>> > current https://github.com/numpy/vendor repo. >>> > >>> > I'll do this in a couple of days, unless there are objections by then. >>> > >>> > Ralf >>> >>> Can you not just rename the repository on GitHub? >>> >> >> Yes, that is possible. The difference is small in this case (retaining >> the 1 closed PR fixing a typo; there are no issues), but after looking it >> up I think renaming is a bit less work than creating a new repo. So I'll >> rename. >> > > This is done now. > One other thing: would be good to agree how we deal with updates to that repo. The users of numpy-vendor can be counted on one hand at the moment, so we probably should be less formal about it then for our other repos. How about everyone can push simple doc and maintenance updates directly, and more interesting changes go through a PR that the author can merge himself after a couple of days? That at least ensures that everyone who follows the repo gets notified on nontrivial changes. Ralf > > If anyone wants to give it a try, the instructions in README.txt should > work for producing working Windows installers. The only thing that's not > covered is the install and use of Vagrant itself, because the former is > platform-dependent and the latter is basically only the very first terminal > line of http://docs.vagrantup.com/v2/getting-started/ (but read on for a > bit, it's useful). > > Feedback on how to document that repo better are very welcome. I'll be > looking at improving the documentation of how to release and what a release > manager does as well soon. > > Cheers, > Ralf > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: