From ralf.gommers at gmail.com Sat Feb 1 10:40:16 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 1 Feb 2014 16:40:16 +0100 Subject: [Numpy-discussion] De Bruijn sequence In-Reply-To: References: Message-ID: On Fri, Jan 24, 2014 at 12:26 AM, Vincent Davis wrote: > I happen to be working with De Bruijn sequences. Is there any interest in > this being part of numpy/scipy? > > https://gist.github.com/vincentdavis/8588879 > That looks like an old copy of GPL code from Sage: http://git.sagemath.org/sage.git/tree/src/sage/combinat/debruijn_sequence.pyx Besides the licensing issue, it doesn't really belong in scipy and certainly not in numpy imho. Ralf > Vincent Davis > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From p.j.a.cock at googlemail.com Sat Feb 1 10:57:32 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Sat, 1 Feb 2014 15:57:32 +0000 Subject: [Numpy-discussion] De Bruijn sequence In-Reply-To: References: Message-ID: On Sat, Feb 1, 2014 at 3:40 PM, Ralf Gommers wrote: > > On Fri, Jan 24, 2014 at 12:26 AM, Vincent Davis > wrote: >> >> I happen to be working with De Bruijn sequences. Is there any interest in >> this being part of numpy/scipy? >> >> https://gist.github.com/vincentdavis/8588879 > > That looks like an old copy of GPL code from Sage: > http://git.sagemath.org/sage.git/tree/src/sage/combinat/debruijn_sequence.pyx > > Besides the licensing issue, it doesn't really belong in scipy and certainly > not in numpy imho. > > Ralf If it is GPL code that would be a problem, but in terms of scope it might fit under Biopython given how much De Bruijn graphs are used in current sequence analysis. Regards, Peter From vincent at vincentdavis.net Sat Feb 1 11:28:57 2014 From: vincent at vincentdavis.net (Vincent Davis) Date: Sat, 1 Feb 2014 09:28:57 -0700 Subject: [Numpy-discussion] De Bruijn sequence In-Reply-To: References: Message-ID: Sage sites "The algorithm used is from Frank Ruskey?s ?Combinatorial Generation?." It looks like Frank Ruskey's orginal publication on the algorithm was Joe Sawada and Frank Ruskey, "An Efficient Algorithm for Generating Necklaces with Fixed Density", SIAM Journal of Computing 29:671-684, 1999. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.2.5237&rep=rep1&type=pdf If you take a look at this page http://www.theory.csc.uvic.ca/~cos/inf/neck/NecklaceInfo.html You can get a C or Pascal program, the License in the attached program is. /**************************************************************************** * C program to generate necklaces, Lyndon words, and De Bruijn * * sequences. The algorithm is CAT and is described in the book * * "Combinatorial Generation." This program, was obtained from the * * (Combinatorial) Object Server, COS, at http://www.theory.csc.uvic.ca/ * * The inputs are n, the length of the string, k, the arity of the * * string, and density, the maximum number of non-0's in the string. * * The De Bruijn option doesn't make sense unless density >= n. * * The program can be modified, translated to other languages, etc., * * so long as proper acknowledgement is given (author and source). * * Programmer: Frank Ruskey (1994), translated to C by Joe Sawada * *****************************************************************************/ Vincent Davis 720-301-3003 On Sat, Feb 1, 2014 at 8:57 AM, Peter Cock wrote: > > On Sat, Feb 1, 2014 at 3:40 PM, Ralf Gommers wrote: > > > > On Fri, Jan 24, 2014 at 12:26 AM, Vincent Davis < vincent at vincentdavis.net> > > wrote: > >> > >> I happen to be working with De Bruijn sequences. Is there any interest in > >> this being part of numpy/scipy? > >> > >> https://gist.github.com/vincentdavis/8588879 > > > > That looks like an old copy of GPL code from Sage: > > http://git.sagemath.org/sage.git/tree/src/sage/combinat/debruijn_sequence.pyx > > > > Besides the licensing issue, it doesn't really belong in scipy and certainly > > not in numpy imho. > > > > Ralf > > If it is GPL code that would be a problem, but in terms of scope it might > fit under Biopython given how much De Bruijn graphs are used in > current sequence analysis. > > Regards, > > Peter -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmkleffner at gmail.com Sun Feb 2 07:25:19 2014 From: cmkleffner at gmail.com (Carl Kleffner) Date: Sun, 2 Feb 2014 13:25:19 +0100 Subject: [Numpy-discussion] MKL and OpenBLAS In-Reply-To: References: Message-ID: Concerning numpy-MKL licence I refer to this question on the Intel Forum: http://software.intel.com/en-us/forums/topic/328344 "Question on Redistribution related to numpy/scipy" In the case of numpy-MKL the MKL binaries are statically linked to the pyd-files. Given the usefulness, performance and robustness of the MKL-based binaries a definite answer to this question would be desirable. Say: "Can I use and re-redistribute a product with a precompiled numpy-MKL in a commercial enviroment without the need to by a Intel licence? Carl 2014-01-26 Dinesh Vadhia : > This conversation gets discussed often with Numpy developers but since > the requirement for optimized Blas is pretty common these days, how about > distributing Numpy with OpenBlas by default? People who don't want > optimized BLAS or OpenBLAS can then edit the site.cfg file to add/remove. > I can never remember if Numpy comes with Atlas by default but either way, > if using MKL is not feasible because of its licensing issues then Numpy > has to be re-compiled with OpenBLAS (for example). Why not make it easier > for developers to use Numpy with an in-built optimized Blas. > > Btw, just in case some folks from Intel are listening: how about > releasing MKL binaries for all platforms for developers to do with it what > they want ie. free. You know it makes sense! > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sun Feb 2 08:27:00 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 2 Feb 2014 13:27:00 +0000 (UTC) Subject: [Numpy-discussion] MKL and OpenBLAS References: Message-ID: <1802721243413039714.559959sturla.molden-gmail.com@news.gmane.org> Carl Kleffner wrote: > In the case of numpy-MKL the MKL binaries are statically linked to the > pyd-files. Given the usefulness, performance and robustness of the > MKL-based binaries a definite answer to this question would be desirable. > Say: "Can I use and re-redistribute a product with a precompiled > numpy-MKL in a commercial enviroment without the need to by a Intel licence? > I don't see why this is relevant. If you make commercial software chances are you can afford a commercial license for Intel's C++ or Fortran compiler. If you don't, you don't charge your customers enough. Also consider this: Can software packed and linked with MKL be sold in a store? That is also redistribution, and the store is likely not to own an MKL license. There is thus only one reasonable answer. And besides, if you make commercial software, chances are your solicitor verifies the license rights. If you don't consult a solicitor, that would be at your own risk. Sturla From charlesr.harris at gmail.com Sun Feb 2 12:06:57 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 2 Feb 2014 10:06:57 -0700 Subject: [Numpy-discussion] Indexing changes in 1.9 Message-ID: Sebastian has done a lot of work to refactor/rationalize numpy indexing. The changes are extensive enough that it would be good to have more public review, so here is the release note. The NumPy indexing has seen a complete rewrite in this version. This makes > most advanced integer indexing operations much faster and should have no > other implications. > However some subtle changes and deprecations were introduced in advanced > indexing operations: > > * Boolean indexing into scalar arrays will always return a new 1-d array. > This means that ``array(1)[array(True)]`` gives ``array([1])`` and > not the original array. > * Advanced indexing into one dimensional arrays used to have > (undocumented) > special handling regarding repeating the value array in assignments > when the shape of the value array was too small or did not match. > Code using this will raise an error. For compatibility you can use > ``arr.flat[index] = values``, which uses the old code branch. > * The iteration order over advanced indexes used to be always C-order. > In NumPy 1.9. the iteration order adapts to the inputs and is not > guaranteed (with the exception of a *single* advanced index which is > never reversed for compatibility reasons). This means that the result > is > undefined if multiple values are assigned to the same element. > An example for this is ``arr[[0, 0], [1, 1]] = [1, 2]``, which may > set ``arr[0, 1]`` to either 1 or 2. > * Equivalent to the iteration order, the memory layout of the advanced > indexing result is adapted for faster indexing and cannot be predicted. > * All indexing operations return a view or a copy. No indexing operation > will return the original array object. > * In the future Boolean array-likes (such as lists of python bools) > will always be treated as Boolean indexes and Boolean scalars > (including > python `True`) will be a legal *boolean* index. At this time, this is > already the case for scalar arrays to allow the general > ``positive = a[a > 0]`` to work when ``a`` is zero dimensional. > * In NumPy 1.8 it was possible to use `array(True)` and `array(False)` > equivalent to 1 and 0 if the result of the operation was a scalar. > This will raise an error in NumPy 1.9 and, as noted above, treated as a > boolean index in the future. > * All non-integer array-likes are deprecated, object arrays of custom > integer like objects may have to be cast explicitly. > * The error reporting for advanced indexing is more informative, however > the error type has changed in some cases. (Broadcasting errors of > indexing arrays are reported as `IndexError`) > * Indexing with more then one ellipsis (`...`) is deprecated. > Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Feb 2 12:17:59 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 2 Feb 2014 10:17:59 -0700 Subject: [Numpy-discussion] Indexing changes in 1.9 In-Reply-To: References: Message-ID: On Sun, Feb 2, 2014 at 10:06 AM, Charles R Harris wrote: > Sebastian has done a lot of work to refactor/rationalize numpy indexing. > The changes are extensive enough that it would be good to have more public > review, so here is the release note. > > The NumPy indexing has seen a complete rewrite in this version. This makes >> most advanced integer indexing operations much faster and should have no >> other implications. >> However some subtle changes and deprecations were introduced in advanced >> indexing operations: >> >> * Boolean indexing into scalar arrays will always return a new 1-d >> array. >> This means that ``array(1)[array(True)]`` gives ``array([1])`` and >> not the original array. >> * Advanced indexing into one dimensional arrays used to have >> (undocumented) >> special handling regarding repeating the value array in assignments >> when the shape of the value array was too small or did not match. >> Code using this will raise an error. For compatibility you can use >> ``arr.flat[index] = values``, which uses the old code branch. >> * The iteration order over advanced indexes used to be always C-order. >> In NumPy 1.9. the iteration order adapts to the inputs and is not >> guaranteed (with the exception of a *single* advanced index which is >> never reversed for compatibility reasons). This means that the result >> is >> undefined if multiple values are assigned to the same element. >> An example for this is ``arr[[0, 0], [1, 1]] = [1, 2]``, which may >> set ``arr[0, 1]`` to either 1 or 2. >> * Equivalent to the iteration order, the memory layout of the advanced >> indexing result is adapted for faster indexing and cannot be >> predicted. >> * All indexing operations return a view or a copy. No indexing operation >> will return the original array object. >> * In the future Boolean array-likes (such as lists of python bools) >> will always be treated as Boolean indexes and Boolean scalars >> (including >> python `True`) will be a legal *boolean* index. At this time, this is >> already the case for scalar arrays to allow the general >> ``positive = a[a > 0]`` to work when ``a`` is zero dimensional. >> * In NumPy 1.8 it was possible to use `array(True)` and `array(False)` >> equivalent to 1 and 0 if the result of the operation was a scalar. >> This will raise an error in NumPy 1.9 and, as noted above, treated as >> a >> boolean index in the future. >> * All non-integer array-likes are deprecated, object arrays of custom >> integer like objects may have to be cast explicitly. >> * The error reporting for advanced indexing is more informative, however >> the error type has changed in some cases. (Broadcasting errors of >> indexing arrays are reported as `IndexError`) >> * Indexing with more then one ellipsis (`...`) is deprecated. >> > > Thoughts? > > The PR is #3798 if you want to test it. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From travis at continuum.io Sun Feb 2 14:11:01 2014 From: travis at continuum.io (Travis Oliphant) Date: Sun, 2 Feb 2014 13:11:01 -0600 Subject: [Numpy-discussion] Indexing changes in 1.9 In-Reply-To: References: Message-ID: This sounds like a great and welcome work and improvements. Does it make sense to also do something about the behavior of advanced indexing when slices are interleaved between lists and integers. I know that jay borque has some preliminary work to fix this. There are a some straightforward fixes -- like doing iterative application of indexing in those cases which would be more sensical in the cases where current code gets tripped up. Travis On Feb 2, 2014 11:07 AM, "Charles R Harris" wrote: > Sebastian has done a lot of work to refactor/rationalize numpy indexing. > The changes are extensive enough that it would be good to have more public > review, so here is the release note. > > The NumPy indexing has seen a complete rewrite in this version. This makes >> most advanced integer indexing operations much faster and should have no >> other implications. >> However some subtle changes and deprecations were introduced in advanced >> indexing operations: >> >> * Boolean indexing into scalar arrays will always return a new 1-d >> array. >> This means that ``array(1)[array(True)]`` gives ``array([1])`` and >> not the original array. >> * Advanced indexing into one dimensional arrays used to have >> (undocumented) >> special handling regarding repeating the value array in assignments >> when the shape of the value array was too small or did not match. >> Code using this will raise an error. For compatibility you can use >> ``arr.flat[index] = values``, which uses the old code branch. >> * The iteration order over advanced indexes used to be always C-order. >> In NumPy 1.9. the iteration order adapts to the inputs and is not >> guaranteed (with the exception of a *single* advanced index which is >> never reversed for compatibility reasons). This means that the result >> is >> undefined if multiple values are assigned to the same element. >> An example for this is ``arr[[0, 0], [1, 1]] = [1, 2]``, which may >> set ``arr[0, 1]`` to either 1 or 2. >> * Equivalent to the iteration order, the memory layout of the advanced >> indexing result is adapted for faster indexing and cannot be >> predicted. >> * All indexing operations return a view or a copy. No indexing operation >> will return the original array object. >> * In the future Boolean array-likes (such as lists of python bools) >> will always be treated as Boolean indexes and Boolean scalars >> (including >> python `True`) will be a legal *boolean* index. At this time, this is >> already the case for scalar arrays to allow the general >> ``positive = a[a > 0]`` to work when ``a`` is zero dimensional. >> * In NumPy 1.8 it was possible to use `array(True)` and `array(False)` >> equivalent to 1 and 0 if the result of the operation was a scalar. >> This will raise an error in NumPy 1.9 and, as noted above, treated as >> a >> boolean index in the future. >> * All non-integer array-likes are deprecated, object arrays of custom >> integer like objects may have to be cast explicitly. >> * The error reporting for advanced indexing is more informative, however >> the error type has changed in some cases. (Broadcasting errors of >> indexing arrays are reported as `IndexError`) >> * Indexing with more then one ellipsis (`...`) is deprecated. >> > > Thoughts? > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmkleffner at gmail.com Sun Feb 2 14:11:26 2014 From: cmkleffner at gmail.com (Carl Kleffner) Date: Sun, 2 Feb 2014 20:11:26 +0100 Subject: [Numpy-discussion] MKL and OpenBLAS In-Reply-To: <1802721243413039714.559959sturla.molden-gmail.com@news.gmane.org> References: <1802721243413039714.559959sturla.molden-gmail.com@news.gmane.org> Message-ID: If you work in an academia world it can be relevant once third parties are involved in a bigger project. A situation may be reached, where you just have to prove the license situation of all of your software components. Numpy and scipy is 'selled' as BSD or MIT based foundation for scientific software without components with copyleft licences. For the MKL part a clear statement would be welcome. Otherwise the usage of MKL based binaries has to be avoided in such situations, even if you don't sell something. 2014-02-02 Sturla Molden : > Carl Kleffner wrote: > > > In the case of numpy-MKL the MKL binaries are statically linked to the > > pyd-files. Given the usefulness, performance and robustness of the > > MKL-based binaries a definite answer to this question would be desirable. > > Say: "Can I use and re-redistribute a product with a precompiled > > numpy-MKL in a commercial enviroment without the need to by a Intel > licence? > > > > I don't see why this is relevant. > > If you make commercial software chances are you can afford a commercial > license for Intel's C++ or Fortran compiler. If you don't, you don't charge > your customers enough. > > Also consider this: Can software packed and linked with MKL be sold in a > store? That is also redistribution, and the store is likely not to own an > MKL license. There is thus only one reasonable answer. > > And besides, if you make commercial software, chances are your solicitor > verifies the license rights. If you don't consult a solicitor, that would > be at your own risk. > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mads.ipsen at gmail.com Sun Feb 2 14:58:14 2014 From: mads.ipsen at gmail.com (Mads Ipsen) Date: Sun, 02 Feb 2014 20:58:14 +0100 Subject: [Numpy-discussion] Fast decrementation of indices Message-ID: <52EEA356.1070707@gmail.com> Hi, I have run into a potential 'for loop' bottleneck. Let me outline: The following array describes bonds (connections) in a benzene molecule b = [[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 7, 8, 9, 10, 11], [5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1, 2, 3, 4, 5]] ie. bond 0 connects atoms 0 and 5, bond 1 connects atom 0 and 6, etc. In practical examples, the list can be much larger (N > 100.000 connections. Suppose atoms with indices a = [1,2,3,7,8] are deleted, then all bonds connecting those atoms must be deleted. I achieve this doing i_0 = numpy.in1d(b[0], a) i_1 = numpy.in1d(b[1], a) b_i = numpy.where(i_0 | i_1)[0] b = b[:,~(i_0 | i_1)] If you find this approach lacking, feel free to comment. This results in the following updated bond list b = [[0, 0, 4, 4, 5, 5, 5, 6, 10, 11] [5, 6, 10, 5, 4, 11, 0, 0, 4, 5]] This list is however not correct: Since atoms [1,2,3,7,8] have been deleted, the remaining atoms with indices larger than the deleted atoms must be decremented. I do this as follows: for i in a: b = numpy.where(b > i, bonds-1, bonds) (*) yielding the correct result b = [[0, 0, 1, 1, 2, 2, 2, 3, 5, 6], [2, 3, 5, 2, 1, 6, 0, 0, 1, 2]] The Python for loop in (*) may easily contain 50.000 iteration. Is there a smart way to utilize numpy functionality to avoid this? Thanks and best regards, Mads -- +---------------------------------------------------------+ | Mads Ipsen | +----------------------+----------------------------------+ | G?seb?ksvej 7, 4. tv | phone: +45-29716388 | | DK-2500 Valby | email: mads.ipsen at gmail.com | | Denmark | map : www.tinyurl.com/ns52fpa | +----------------------+----------------------------------+ From ndarray at mac.com Sun Feb 2 17:43:39 2014 From: ndarray at mac.com (Alexander Belopolsky) Date: Sun, 2 Feb 2014 17:43:39 -0500 Subject: [Numpy-discussion] Fast decrementation of indices In-Reply-To: <52EEA356.1070707@gmail.com> References: <52EEA356.1070707@gmail.com> Message-ID: On Sun, Feb 2, 2014 at 2:58 PM, Mads Ipsen wrote: > Since atoms [1,2,3,7,8] have been > deleted, the remaining atoms with indices larger than the deleted atoms > must be decremented. > Let >>> x array([[ 0, 1, 2, 3], [ 4, 5, 6, 7], [ 8, 9, 10, 11]]) and >>> i = [1, 0, 2] Create a shape of x matrix with 1's at (k, i[k]) and zeros elsewhere >>> b = zeros_like(x) >>> b.put(i + arange(3)*4 + 1, 1) # there must be a simpler way >>> x - b.cumsum(1) array([[ 0, 1, 1, 2], [ 4, 4, 5, 6], [ 8, 9, 10, 10]]) seems to be the result you want. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Sun Feb 2 17:57:51 2014 From: jaime.frio at gmail.com (=?ISO-8859-1?Q?Jaime_Fern=E1ndez_del_R=EDo?=) Date: Sun, 2 Feb 2014 14:57:51 -0800 Subject: [Numpy-discussion] Fast decrementation of indices In-Reply-To: <52EEA356.1070707@gmail.com> References: <52EEA356.1070707@gmail.com> Message-ID: Cannot test right now, but np.unique(b, return_inverse=True)[1].reshape(2, -1) should do what you are after, I think. On Feb 2, 2014 11:58 AM, "Mads Ipsen" wrote: > Hi, > > I have run into a potential 'for loop' bottleneck. Let me outline: > > The following array describes bonds (connections) in a benzene molecule > > b = [[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 7, > 8, 9, 10, 11], > [5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1, > 2, 3, 4, 5]] > > ie. bond 0 connects atoms 0 and 5, bond 1 connects atom 0 and 6, etc. In > practical examples, the list can be much larger (N > 100.000 connections. > > Suppose atoms with indices a = [1,2,3,7,8] are deleted, then all bonds > connecting those atoms must be deleted. I achieve this doing > > i_0 = numpy.in1d(b[0], a) > i_1 = numpy.in1d(b[1], a) > b_i = numpy.where(i_0 | i_1)[0] > b = b[:,~(i_0 | i_1)] > > If you find this approach lacking, feel free to comment. > > This results in the following updated bond list > > b = [[0, 0, 4, 4, 5, 5, 5, 6, 10, 11] > [5, 6, 10, 5, 4, 11, 0, 0, 4, 5]] > > This list is however not correct: Since atoms [1,2,3,7,8] have been > deleted, the remaining atoms with indices larger than the deleted atoms > must be decremented. I do this as follows: > > for i in a: > b = numpy.where(b > i, bonds-1, bonds) (*) > > yielding the correct result > > b = [[0, 0, 1, 1, 2, 2, 2, 3, 5, 6], > [2, 3, 5, 2, 1, 6, 0, 0, 1, 2]] > > The Python for loop in (*) may easily contain 50.000 iteration. Is there > a smart way to utilize numpy functionality to avoid this? > > Thanks and best regards, > > Mads > > -- > +---------------------------------------------------------+ > | Mads Ipsen | > +----------------------+----------------------------------+ > | G?seb?ksvej 7, 4. tv | phone: +45-29716388 | > | DK-2500 Valby | email: mads.ipsen at gmail.com | > | Denmark | map : www.tinyurl.com/ns52fpa | > +----------------------+----------------------------------+ > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sun Feb 2 19:36:24 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 3 Feb 2014 00:36:24 +0000 (UTC) Subject: [Numpy-discussion] MKL and OpenBLAS References: <1802721243413039714.559959sturla.molden-gmail.com@news.gmane.org> Message-ID: <1204694997413078883.454443sturla.molden-gmail.com@news.gmane.org> Carl Kleffner wrote: > If you work in an academia world it can be relevant once third parties > are involved in a bigger project. A situation may be reached, where you > just have to prove the license situation of all of your software components. If you involve third parties outside academia, you need a commercial license. Binaries with academic license is for academic use only. Personally I pay Enthought Inc. to provide me with NumPy, and then it's their responsibility to work out the license details in their software stack. I am licensing my Python software from Enthought Inc., and I cannot go through and verify every single one of their licenses. If asked I will just refer to the license that comes with my Canopy subscription, and that will be the end of it. > Numpy and scipy is 'selled' as BSD or MIT based foundation for scientific > software without components with copyleft licences. For the MKL part a > clear statement would be welcome. Otherwise the usage of MKL based > binaries has to be avoided in such situations, even if you don't sell something. That is utter nonsence. MKL is not different from any other commercial software. With this way of backwards thinking, no commercial software could ever be used. You could e.g. never use Windows, because you might be asked to prove Microsoft's license for third-party libraries used by their operating system. That is just bullshit. I might be asked to prove my license with Microsoft, but it's Microsoft's responsibility to work out the internal license details for the software they sell. Sturla From sturla.molden at gmail.com Sun Feb 2 19:42:20 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 3 Feb 2014 00:42:20 +0000 (UTC) Subject: [Numpy-discussion] MKL and OpenBLAS References: <20140126170640.GB4256@shinobi> <52E55065.5020002@googlemail.com> <566884211412464364.536717sturla.molden-gmail.com@news.gmane.org> <52E5887D.3070805@googlemail.com> <27092663412469230.633377sturla.molden-gmail.com@news.gmane.org> Message-ID: <1816484237413080660.495142sturla.molden-gmail.com@news.gmane.org> Sturla Molden wrote: > > Yes, it seems to be a GNU problem: > > http://bisqwit.iki.fi/story/howto/openmp/#OpenmpAndFork > > This Howto also claims Intel compilers is not affected. It seems another patch has been proposed to the libgomp team today: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035 Unfortunaltely I don't think the libgomp team really appreciates the absurdity that a call to BLAS function like DGEMM can interfere with a subsequent call to fork. Mixing multithreading and forks is one thing, but the raison d'etre for OpenMP is to simplify vectorization of serial code. Sturla From dineshbvadhia at hotmail.com Mon Feb 3 03:41:29 2014 From: dineshbvadhia at hotmail.com (Dinesh Vadhia) Date: Mon, 3 Feb 2014 00:41:29 -0800 Subject: [Numpy-discussion] Indexing changes in 1.9 In-Reply-To: References: Message-ID: Does the numpy indexing refactorizing address the performance of fancy indexing highlighted in wes mckinney's blog some years back - http://wesmckinney.com/blog/?p=215 - where numpy.take() was shown to be preferable than fancy indexing? -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidmenhur at gmail.com Mon Feb 3 04:04:49 2014 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Mon, 3 Feb 2014 10:04:49 +0100 Subject: [Numpy-discussion] Fast decrementation of indices In-Reply-To: <52EEA356.1070707@gmail.com> References: <52EEA356.1070707@gmail.com> Message-ID: On 2 February 2014 20:58, Mads Ipsen wrote: > ie. bond 0 connects atoms 0 and 5, bond 1 connects atom 0 and 6, etc. In > practical examples, the list can be much larger (N > 100.000 connections. > Perhaps you should consider an alternative approach. You could consider it a graph, and you could use Networkx or Scipy to work with them (provided it actually works well with the rest of your problem) In the case of Scipy, the graph is described by its adjacency matrix, and you just want to delete a row and a column. But, in any case, not knowing at all what is your overall project, renumbering nodes is not something one has to usually do when working with graphs, except for final results. The labels are that, labels, with no further meaning. /David. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Mon Feb 3 06:38:48 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Mon, 3 Feb 2014 12:38:48 +0100 Subject: [Numpy-discussion] Fast decrementation of indices In-Reply-To: References: <52EEA356.1070707@gmail.com> Message-ID: Seconding Jaime; I use this trick in mesh manipulations a lot as well. There are a lot of graph-type manipulations you can express effectively in numpy using np.unique and related functionality. On Sun, Feb 2, 2014 at 11:57 PM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > Cannot test right now, but np.unique(b, return_inverse=True)[1].reshape(2, > -1) should do what you are after, I think. > On Feb 2, 2014 11:58 AM, "Mads Ipsen" wrote: > >> Hi, >> >> I have run into a potential 'for loop' bottleneck. Let me outline: >> >> The following array describes bonds (connections) in a benzene molecule >> >> b = [[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 7, >> 8, 9, 10, 11], >> [5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1, >> 2, 3, 4, 5]] >> >> ie. bond 0 connects atoms 0 and 5, bond 1 connects atom 0 and 6, etc. In >> practical examples, the list can be much larger (N > 100.000 connections. >> >> Suppose atoms with indices a = [1,2,3,7,8] are deleted, then all bonds >> connecting those atoms must be deleted. I achieve this doing >> >> i_0 = numpy.in1d(b[0], a) >> i_1 = numpy.in1d(b[1], a) >> b_i = numpy.where(i_0 | i_1)[0] >> b = b[:,~(i_0 | i_1)] >> >> If you find this approach lacking, feel free to comment. >> >> This results in the following updated bond list >> >> b = [[0, 0, 4, 4, 5, 5, 5, 6, 10, 11] >> [5, 6, 10, 5, 4, 11, 0, 0, 4, 5]] >> >> This list is however not correct: Since atoms [1,2,3,7,8] have been >> deleted, the remaining atoms with indices larger than the deleted atoms >> must be decremented. I do this as follows: >> >> for i in a: >> b = numpy.where(b > i, bonds-1, bonds) (*) >> >> yielding the correct result >> >> b = [[0, 0, 1, 1, 2, 2, 2, 3, 5, 6], >> [2, 3, 5, 2, 1, 6, 0, 0, 1, 2]] >> >> The Python for loop in (*) may easily contain 50.000 iteration. Is there >> a smart way to utilize numpy functionality to avoid this? >> >> Thanks and best regards, >> >> Mads >> >> -- >> +---------------------------------------------------------+ >> | Mads Ipsen | >> +----------------------+----------------------------------+ >> | G?seb?ksvej 7, 4. tv | phone: +45-29716388 | >> | DK-2500 Valby | email: mads.ipsen at gmail.com | >> | Denmark | map : www.tinyurl.com/ns52fpa | >> +----------------------+----------------------------------+ >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rlw at stsci.edu Mon Feb 3 08:36:21 2014 From: rlw at stsci.edu (Rick White) Date: Mon, 3 Feb 2014 13:36:21 +0000 Subject: [Numpy-discussion] Fast decrementation of indices In-Reply-To: References: Message-ID: <12786008-CA50-457B-BC56-4A4D68E6C738@stsci.edu> I think you'll find the algorithm below to be a lot faster, especially if the arrays are big. Checking each array index against the list of included or excluded elements is must slower than simply creating a secondary array and looking up whether the elements are included or not. b = np.array([ [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 7, 8, 9, 10, 11], [5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1, 2, 3, 4, 5] ]) a = [1,2,3,7,8] keepdata = np.ones(12, dtype=np.bool) keepdata[a] = False w = np.where(keepdata[b[0]] & keepdata[b[1]]) newindex = keepdata.cumsum()-1 c = newindex[b[:,w[0]]] Cheers, Rick On 2 February 2014 20:58, Mads Ipsen wrote: > Hi, > > I have run into a potential 'for loop' bottleneck. Let me outline: > > The following array describes bonds (connections) in a benzene molecule > > b = [[0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 7, > 8, 9, 10, 11], > [5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1, > 2, 3, 4, 5]] > > ie. bond 0 connects atoms 0 and 5, bond 1 connects atom 0 and 6, etc. In > practical examples, the list can be much larger (N > 100.000 connections. > > Suppose atoms with indices a = [1,2,3,7,8] are deleted, then all bonds > connecting those atoms must be deleted. I achieve this doing > > i_0 = numpy.in1d(b[0], a) > i_1 = numpy.in1d(b[1], a) > b_i = numpy.where(i_0 | i_1)[0] > b = b[:,~(i_0 | i_1)] > > If you find this approach lacking, feel free to comment. > > This results in the following updated bond list > > b = [[0, 0, 4, 4, 5, 5, 5, 6, 10, 11] > [5, 6, 10, 5, 4, 11, 0, 0, 4, 5]] > > This list is however not correct: Since atoms [1,2,3,7,8] have been > deleted, the remaining atoms with indices larger than the deleted atoms > must be decremented. I do this as follows: > > for i in a: > b = numpy.where(b > i, bonds-1, bonds) (*) > > yielding the correct result > > b = [[0, 0, 1, 1, 2, 2, 2, 3, 5, 6], > [2, 3, 5, 2, 1, 6, 0, 0, 1, 2]] > > The Python for loop in (*) may easily contain 50.000 iteration. Is there > a smart way to utilize numpy functionality to avoid this? > > Thanks and best regards, > > Mads > > -- > +---------------------------------------------------------+ > | Mads Ipsen | > +----------------------+----------------------------------+ > | G?seb?ksvej 7, 4. tv | phone: +45-29716388 | > | DK-2500 Valby | email: mads.ipsen at gmail.com | > | Denmark | map : www.tinyurl.com/ns52fpa | > +----------------------+----------------------------------+ From sebastian at sipsolutions.net Mon Feb 3 13:19:57 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 03 Feb 2014 19:19:57 +0100 Subject: [Numpy-discussion] Indexing changes in 1.9 In-Reply-To: References: Message-ID: <1391451597.2972.35.camel@sebastian-laptop> On Mon, 2014-02-03 at 00:41 -0800, Dinesh Vadhia wrote: > Does the numpy indexing refactorizing address the performance of fancy > indexing highlighted in wes mckinney's blog some years back - > http://wesmckinney.com/blog/?p=215 - where numpy.take() was shown to > be preferable than fancy indexing? Well, technically there is a pretty big difference in how those two approach the problem and the indexing is much more general[1] and (especially now) optimized for large subspaces[2]. So while the new code is faster even for small subspaces, the linked test hits the performance wise worst spot (of course make the thing 10000x2 will be a bit worse): In [4]: arr = np.random.randn(10000, 5) In [5]: indexer = np.arange(10000) In [6]: random.shuffle(indexer) In [7]: %timeit arr[indexer] 1000 loops, best of 3: 300 us per loop In [8]: %timeit arr.take(indexer, axis=0) 10000 loops, best of 3: 95.4 us per loop # With a larger subspace the result is (now) different though: In [9]: arr = np.random.randn(10000, 100) In [10]: %timeit arr[indexer] 100 loops, best of 3: 4.85 ms per loop In [11]: %timeit arr.take(indexer, axis=0) 100 loops, best of 3: 5.02 ms per loop So the performance in this use case improved (maybe a factor of 3, which is neither shabby nor sensational), but the subspace design and no big special cases for this operation leads to overhead. You could likely squeeze out a bit, but squeezing out a lot is non-trivial [3]. However this should be one of the few cases where take should still outperform advanced indexing. - Sebastian [1] Mostly multiple integer array indices, which take does not support and arbitrary memory order. [2] subspace here means the non-advanced indexing part. In the case of arr[integer_array, :] the (or a) subspace is arr[0, :]. [3] Basically, we have the value and the indexing arrays to iterate, since they are arbitrary (unlike take which assumes contiguous) and broadcasting etc. I did not try to squeeze them into a single iterator when there is a subspace (it is complicating, basically I want buffering for the index arrays, but if you use buffering on the value array you can't easily get the pointer into the subspace, etc.). This means you have two iterators, which again means that the external loop optimization is not quite trivial (since the two inner loops can have different sizes). So *if* you were to just add the logic for the different sized inner loops, then you could use the external loop optimization of NpyIter and probably remove the overhead causing (most of) this discrepancy. Though I would suggest timing first to be sure this is the reason ^^. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sebastian at sipsolutions.net Mon Feb 3 13:26:28 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Mon, 03 Feb 2014 19:26:28 +0100 Subject: [Numpy-discussion] Indexing changes in 1.9 In-Reply-To: References: Message-ID: <1391451988.2972.41.camel@sebastian-laptop> On Sun, 2014-02-02 at 13:11 -0600, Travis Oliphant wrote: > This sounds like a great and welcome work and improvements. > > Does it make sense to also do something about the behavior of advanced > indexing when slices are interleaved between lists and integers. > > I know that jay borque has some preliminary work to fix this. There > are a some straightforward fixes -- like doing iterative application > of indexing in those cases which would be more sensical in the cases > where current code gets tripped up. > I guess you are talking about the funky transposing logic and maybe the advanced indexing logic as such? I didn't really think about changing any of that, not sure if we easily can? Personally, I always wondered if it would make sense to add some new type of indexing mechanism to switch to R/matlab style non-advanced integer-array indexing. I don't think this will make it substantially easier to do (the basic logic remains the same -- we need an extra/different preparation and then transpose the result differently), though it might be a bit more obvious where/how to plug it in. But it seems very unlikely I will look into that in the near future (but if someone wants hints on how to go about it, just ask). - Sebastian > Travis > > On Feb 2, 2014 11:07 AM, "Charles R Harris" > wrote: > Sebastian has done a lot of work to refactor/rationalize numpy > indexing. The changes are extensive enough that it would be > good to have more public review, so here is the release note. > > The NumPy indexing has seen a complete rewrite in this > version. This makes > most advanced integer indexing operations much faster > and should have no > other implications. > However some subtle changes and deprecations were > introduced in advanced > indexing operations: > > * Boolean indexing into scalar arrays will always > return a new 1-d array. > This means that ``array(1)[array(True)]`` gives > ``array([1])`` and > not the original array. > * Advanced indexing into one dimensional arrays used > to have (undocumented) > special handling regarding repeating the value > array in assignments > when the shape of the value array was too small or > did not match. > Code using this will raise an error. For > compatibility you can use > ``arr.flat[index] = values``, which uses the old > code branch. > * The iteration order over advanced indexes used to > be always C-order. > In NumPy 1.9. the iteration order adapts to the > inputs and is not > guaranteed (with the exception of a *single* > advanced index which is > never reversed for compatibility reasons). This > means that the result is > undefined if multiple values are assigned to the > same element. > An example for this is ``arr[[0, 0], [1, 1]] = [1, > 2]``, which may > set ``arr[0, 1]`` to either 1 or 2. > * Equivalent to the iteration order, the memory > layout of the advanced > indexing result is adapted for faster indexing and > cannot be predicted. > * All indexing operations return a view or a copy. > No indexing operation > will return the original array object. > * In the future Boolean array-likes (such as lists > of python bools) > will always be treated as Boolean indexes and > Boolean scalars (including > python `True`) will be a legal *boolean* index. At > this time, this is > already the case for scalar arrays to allow the > general > ``positive = a[a > 0]`` to work when ``a`` is zero > dimensional. > * In NumPy 1.8 it was possible to use `array(True)` > and `array(False)` > equivalent to 1 and 0 if the result of the > operation was a scalar. > This will raise an error in NumPy 1.9 and, as > noted above, treated as a > boolean index in the future. > * All non-integer array-likes are deprecated, object > arrays of custom > integer like objects may have to be cast > explicitly. > * The error reporting for advanced indexing is more > informative, however > the error type has changed in some cases. > (Broadcasting errors of > indexing arrays are reported as `IndexError`) > * Indexing with more then one ellipsis (`...`) is > deprecated. > > > Thoughts? > > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From travis at continuum.io Mon Feb 3 16:32:33 2014 From: travis at continuum.io (Travis Oliphant) Date: Mon, 3 Feb 2014 15:32:33 -0600 Subject: [Numpy-discussion] Indexing changes in 1.9 In-Reply-To: <1391451988.2972.41.camel@sebastian-laptop> References: <1391451988.2972.41.camel@sebastian-laptop> Message-ID: Hey Sebastien, I didn't mean to imply that you would need to necessarily work on it. But the work Jay has done could use review. There are also conversations to have about what to do to resolve the ambiguity that led to the current behavior. Thank you or all the great work on the indexing code paths. Is their a roadmap for 1.9? Travis On Feb 3, 2014 1:26 PM, "Sebastian Berg" wrote: > On Sun, 2014-02-02 at 13:11 -0600, Travis Oliphant wrote: > > This sounds like a great and welcome work and improvements. > > > > Does it make sense to also do something about the behavior of advanced > > indexing when slices are interleaved between lists and integers. > > > > I know that jay borque has some preliminary work to fix this. There > > are a some straightforward fixes -- like doing iterative application > > of indexing in those cases which would be more sensical in the cases > > where current code gets tripped up. > > > > I guess you are talking about the funky transposing logic and maybe the > advanced indexing logic as such? I didn't really think about changing > any of that, not sure if we easily can? > Personally, I always wondered if it would make sense to add some new > type of indexing mechanism to switch to R/matlab style non-advanced > integer-array indexing. I don't think this will make it substantially > easier to do (the basic logic remains the same -- we need an > extra/different preparation and then transpose the result differently), > though it might be a bit more obvious where/how to plug it in. > > But it seems very unlikely I will look into that in the near future (but > if someone wants hints on how to go about it, just ask). > > - Sebastian > > > Travis > > > > On Feb 2, 2014 11:07 AM, "Charles R Harris" > > wrote: > > Sebastian has done a lot of work to refactor/rationalize numpy > > indexing. The changes are extensive enough that it would be > > good to have more public review, so here is the release note. > > > > The NumPy indexing has seen a complete rewrite in this > > version. This makes > > most advanced integer indexing operations much faster > > and should have no > > other implications. > > However some subtle changes and deprecations were > > introduced in advanced > > indexing operations: > > > > * Boolean indexing into scalar arrays will always > > return a new 1-d array. > > This means that ``array(1)[array(True)]`` gives > > ``array([1])`` and > > not the original array. > > * Advanced indexing into one dimensional arrays used > > to have (undocumented) > > special handling regarding repeating the value > > array in assignments > > when the shape of the value array was too small or > > did not match. > > Code using this will raise an error. For > > compatibility you can use > > ``arr.flat[index] = values``, which uses the old > > code branch. > > * The iteration order over advanced indexes used to > > be always C-order. > > In NumPy 1.9. the iteration order adapts to the > > inputs and is not > > guaranteed (with the exception of a *single* > > advanced index which is > > never reversed for compatibility reasons). This > > means that the result is > > undefined if multiple values are assigned to the > > same element. > > An example for this is ``arr[[0, 0], [1, 1]] = [1, > > 2]``, which may > > set ``arr[0, 1]`` to either 1 or 2. > > * Equivalent to the iteration order, the memory > > layout of the advanced > > indexing result is adapted for faster indexing and > > cannot be predicted. > > * All indexing operations return a view or a copy. > > No indexing operation > > will return the original array object. > > * In the future Boolean array-likes (such as lists > > of python bools) > > will always be treated as Boolean indexes and > > Boolean scalars (including > > python `True`) will be a legal *boolean* index. At > > this time, this is > > already the case for scalar arrays to allow the > > general > > ``positive = a[a > 0]`` to work when ``a`` is zero > > dimensional. > > * In NumPy 1.8 it was possible to use `array(True)` > > and `array(False)` > > equivalent to 1 and 0 if the result of the > > operation was a scalar. > > This will raise an error in NumPy 1.9 and, as > > noted above, treated as a > > boolean index in the future. > > * All non-integer array-likes are deprecated, object > > arrays of custom > > integer like objects may have to be cast > > explicitly. > > * The error reporting for advanced indexing is more > > informative, however > > the error type has changed in some cases. > > (Broadcasting errors of > > indexing arrays are reported as `IndexError`) > > * Indexing with more then one ellipsis (`...`) is > > deprecated. > > > > > > Thoughts? > > > > > > Chuck > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Feb 3 16:48:06 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 3 Feb 2014 14:48:06 -0700 Subject: [Numpy-discussion] Indexing changes in 1.9 In-Reply-To: References: <1391451988.2972.41.camel@sebastian-laptop> Message-ID: On Mon, Feb 3, 2014 at 2:32 PM, Travis Oliphant wrote: > Hey Sebastien, > > I didn't mean to imply that you would need to necessarily work on it. > But the work Jay has done could use review. > > There are also conversations to have about what to do to resolve the > ambiguity that led to the current behavior. > > Thank you or all the great work on the indexing code paths. > > Is their a roadmap for 1.9? > We don't have a roadmap, but the indexing code is something I'd like to see go in. Are you asking in relation to Jay's work? I have a loose idea to branch 1.9 sometime in late April/May. The other important bit to fix up, which isn't started yet, is datetime. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Tue Feb 4 02:16:39 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 4 Feb 2014 08:16:39 +0100 Subject: [Numpy-discussion] ANN: Scipy 0.13.3 release Message-ID: Hi, I'm happy to announce the availability of the scipy 0.13.3 release. This is a bugfix only release; it contains fixes for regressions in ndimage and weave. Source tarballs can be found at https://sourceforge.net/projects/scipy/files/scipy/0.13.3/ and on PyPi. Release notes copied below, binaries will follow later (the regular build machine is not available for the next two weeks). Cheers, Ralf ========================== SciPy 0.13.3 Release Notes ========================== SciPy 0.13.3 is a bug-fix release with no new features compared to 0.13.2. Both the weave and the ndimage.label bugs were severe regressions in 0.13.0, hence this release. Issues fixed ------------ - 3148: fix a memory leak in ``ndimage.label``. - 3216: fix weave issue with too long file names for MSVC. Other changes ------------- - Update Sphinx theme used for html docs so ``>>>`` in examples can be toggled. Checksums ========= 0547c1f8e8afad4009cc9b5ef17a2d4d release/installers/scipy-0.13.3.tar.gz 20ff3a867cc5925ef1d654aed2ff7e88 release/installers/scipy-0.13.3.zip -------------- next part -------------- An HTML attachment was scrubbed... URL: From mads.ipsen at gmail.com Tue Feb 4 02:58:24 2014 From: mads.ipsen at gmail.com (Mads Ipsen) Date: Tue, 04 Feb 2014 08:58:24 +0100 Subject: [Numpy-discussion] Fast decrementation of indices In-Reply-To: <12786008-CA50-457B-BC56-4A4D68E6C738@stsci.edu> References: <12786008-CA50-457B-BC56-4A4D68E6C738@stsci.edu> Message-ID: <52F09DA0.2000809@gmail.com> Hi, Thanks to everybody for all you valuable responses. This approach by Rick White seems to nail it all down: >> b = np.array([ >> [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 7, 8, 9, 10, 11], >> [5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1, 2, 3, 4, 5] >> ]) >> >> a = [1,2,3,7,8] >> >> keepdata = np.ones(12, dtype=np.bool) >> keepdata[a] = False >> w = np.where(keepdata[b[0]] & keepdata[b[1]]) >> newindex = keepdata.cumsum()-1 >> c = newindex[b[:,w[0]]] Also, I'd like to mention that I did think about using the graph module from SciPy. But the index bookkeeping done by numpy is in fact index pointers to memory location in a C++ driver - and not just labels. An when atoms are deleted, there memory chunks are also cleared, and therefore all pointers to these must be decremented. So using numpy for the bookkeeping seems a natural choice. Best regards, Mads On 02/03/2014 02:36 PM, Rick White wrote: > b = np.array([ > [0, 0, 0, 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 7, 8, 9, 10, 11], > [5, 6, 1, 0, 2, 7, 3, 8, 1, 4, 9, 2, 10, 5, 3, 4, 11, 0, 0, 1, 2, 3, 4, 5] > ]) > > a = [1,2,3,7,8] > > keepdata = np.ones(12, dtype=np.bool) > keepdata[a] = False > w = np.where(keepdata[b[0]] & keepdata[b[1]]) > newindex = keepdata.cumsum()-1 > c = newindex[b[:,w[0]]] -- +---------------------------------------------------------+ | Mads Ipsen | +----------------------+----------------------------------+ | G?seb?ksvej 7, 4. tv | phone: +45-29716388 | | DK-2500 Valby | email: mads.ipsen at gmail.com | | Denmark | map : www.tinyurl.com/ns52fpa | +----------------------+----------------------------------+ From ralf.gommers at gmail.com Tue Feb 4 06:21:58 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 4 Feb 2014 12:21:58 +0100 Subject: [Numpy-discussion] Suggestions for GSoC Projects In-Reply-To: <20140131103459.GA24791@gmail.com> References: <20140131103459.GA24791@gmail.com> Message-ID: On Fri, Jan 31, 2014 at 11:34 AM, St?fan van der Walt wrote: > On Fri, 31 Jan 2014 04:31:01 +0530, jennifer stone wrote: > > 3. As stated earlier, we have spherical harmonic functions (with much > scope > > for dev) we are yet to have elliptical and cylindrical harmonic function, > > which may be developed. > > As stated before, I am personally interested in seeing the spherical > harmonics > in SciPy improve. > Finding a suitable mentor for whatever project Jennifer chooses is an important factor in the choice of project, so I have to ask: do you have the bandwidth to be a mentor or help out this summer? Ralf > > > 5. Further reading the road-map given by Mr.Ralf, I would like to develop > > the Bluestein's FFT algorithm. > > https://gist.github.com/endolith/2783807 > > Regards > St?fan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Tue Feb 4 08:42:34 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Tue, 4 Feb 2014 14:42:34 +0100 Subject: [Numpy-discussion] Suggestions for GSoC Projects In-Reply-To: References: Message-ID: On Fri, Jan 31, 2014 at 12:01 AM, jennifer stone wrote: > With GSoC 2014 being round the corner, I hereby put up few projects for > discussion that I would love to pursue as a student. > Guidance, suggestions are cordially welcome:- > > 1. If I am not mistaken, contour integration is not supported by SciPy; in > fact even line integrals of real functions is yet to be implemented in > SciPy, which is surprising. Though we at present have SymPy for line > Integrals, I doubt if there is any open-source python package supporting > the calculation of Contour Integrals. With integrate module of SciPy > already having been properly developed for definite integration, > implementation of line as well as contour integrals, I presume; would not > require work from scratch and shall be a challenging but fruitful project. > > 2. I really have no idea if the purpose of NumPy or SciPy would encompass > this but we are yet to have indefinite integration. An implementation of > that, though highly challenging, may open doors for innumerable other > functions like the ones to calculate the Laplace transform, Hankel > transform and many more. > > 3. As stated earlier, we have spherical harmonic functions (with much > scope for dev) we are yet to have elliptical and cylindrical harmonic > function, which may be developed. > > 4. Lastly, we are yet to have Inverse Laplace transforms which as Ralf has > rightly pointed out it may be too challenging to implement. > > 5. Further reading the road-map given by Mr.Ralf, I would like to develop > the Bluestein's FFT algorithm. > > Thanks for reading along till the end. I shall append to this mail as when > I am struck with ideas. Please do give your valuable guidance > Another idea: add support for discrete wavelet transforms in scipy.signal. There's a fair bit of interest for those here I think. It would start by integrating https://github.com/rgommers/pywt, then adding some new features. Feature ideas: - 1-D and 2-D inverse SWT (have been requested several times on the PyWavelets list and issue tracker). - signal denoising (SureShrink & co, for scipy.signal) - image compression/denoising/... algorithms (for scikit-image) Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From rays at blue-cove.com Tue Feb 4 10:01:48 2014 From: rays at blue-cove.com (RayS) Date: Tue, 04 Feb 2014 07:01:48 -0800 Subject: [Numpy-discussion] striding through arbitrarily large files Message-ID: <201402041501.s14F1qN2020922@blue-cove.com> I was struggling with methods of reading large disk files into numpy efficiently (not FITS or .npy, just raw files of IEEE floats from numpy.tostring()). When loading arbitrarily large files it would be nice to not bother reading more than the plot can display before zooming in. There apparently are no built in methods that allow skipping/striding... With a 2GB file, I want to read n (like 4096) evenly sampled points out of it. I tried making a dtype, and other tricks, to read "Pythonically", but failed. I broke down and used a for loop with fh.seek() and fromfile() The file will be open()ed once, but data read many times. num_channels = 9 desired_len = 4096 bytes_per_val = numpy.dtype(numpy.float32).itemsize f_obj = open(path, 'rb') f_obj.seek(0,2) file_length = f_obj.tell() f_obj.seek(0,0) bytes_per_smp = num_channels * bytes_per_val num_samples = file_length / bytes_per_smp stride_smps = num_samples / desired_len ## an int stride_bytes = (stride_smps - 1) * bytes_per_smp arr = numpy.zeros((desired_len, 9)) for i in range(0, desired_len, 1): f_obj.seek(i*stride_bytes, 0) arr[i] = numpy.fromfile(f_obj, dtype='f32', count=9) So, is there a better way to move the pointer through the file without a for loop? A generator? The dtype and other methods like mmap fail with memoryError since they still try to load the whole file, although apparently you can mmap with 64bit systems, which I might try soon with a new 64bit install. - Ray -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniele at grinta.net Tue Feb 4 10:09:07 2014 From: daniele at grinta.net (Daniele Nicolodi) Date: Tue, 04 Feb 2014 16:09:07 +0100 Subject: [Numpy-discussion] striding through arbitrarily large files In-Reply-To: <201402041501.s14F1qN2020922@blue-cove.com> References: <201402041501.s14F1qN2020922@blue-cove.com> Message-ID: <52F10293.10901@grinta.net> On 04/02/2014 16:01, RayS wrote: > I was struggling with methods of reading large disk files into numpy > efficiently (not FITS or .npy, just raw files of IEEE floats from > numpy.tostring()). When loading arbitrarily large files it would be nice > to not bother reading more than the plot can display before zooming in. > There apparently are no built in methods that allow skipping/striding... If you mmap the data file with np.memmap() you can access the data in a strided way through the numpy array interface and the OS will handle the scheduling of the reads from the disc. Note however if that the data samples you need are quite dense, there is no real advantage in doing this because the OS will have to read a whole page anyway for each read. Cheers, Daniele From rays at blue-cove.com Tue Feb 4 10:27:16 2014 From: rays at blue-cove.com (RayS) Date: Tue, 04 Feb 2014 07:27:16 -0800 Subject: [Numpy-discussion] striding through arbitrarily large files In-Reply-To: <52F10293.10901@grinta.net> References: <201402041501.s14F1qN2020922@blue-cove.com> <52F10293.10901@grinta.net> Message-ID: <201402041527.s14FRHZa030598@blue-cove.com> At 07:09 AM 2/4/2014, you wrote: >On 04/02/2014 16:01, RayS wrote: > > I was struggling with methods of reading large disk files into numpy > > efficiently (not FITS or .npy, just raw files of IEEE floats from > > numpy.tostring()). When loading arbitrarily large files it would be nice > > to not bother reading more than the plot can display before zooming in. > > There apparently are no built in methods that allow skipping/striding... > >If you mmap the data file with np.memmap() you can access the data in a >strided way through the numpy array interface and the OS will handle the >scheduling of the reads from the disc. > >Note however if that the data samples you need are quite dense, there is >no real advantage in doing this because the OS will have to read a whole >page anyway for each read. Thanks Daniele, I'll be trying mmap with Python64. With 32 bit the mmap method throws MemoryError with 2.5GB files... The idea is that we allow the users to inspect the huge files graphically, then they can "zoom" into regions of interest and then load a ~100 MB en block for the usual spectral analysis. - Ray From jtaylor.debian at googlemail.com Tue Feb 4 10:35:08 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 4 Feb 2014 16:35:08 +0100 Subject: [Numpy-discussion] striding through arbitrarily large files In-Reply-To: <201402041527.s14FRHZa030598@blue-cove.com> References: <201402041501.s14F1qN2020922@blue-cove.com> <52F10293.10901@grinta.net> <201402041527.s14FRHZa030598@blue-cove.com> Message-ID: On Tue, Feb 4, 2014 at 4:27 PM, RayS wrote: > At 07:09 AM 2/4/2014, you wrote: > >On 04/02/2014 16:01, RayS wrote: > > > I was struggling with methods of reading large disk files into numpy > > > efficiently (not FITS or .npy, just raw files of IEEE floats from > > > numpy.tostring()). When loading arbitrarily large files it would be > nice > > > to not bother reading more than the plot can display before zooming in. > > > There apparently are no built in methods that allow > skipping/striding... > > > >If you mmap the data file with np.memmap() you can access the data in a > >strided way through the numpy array interface and the OS will handle the > >scheduling of the reads from the disc. > > > >Note however if that the data samples you need are quite dense, there is > >no real advantage in doing this because the OS will have to read a whole > >page anyway for each read. > > Thanks Daniele, I'll be trying mmap with Python64. With 32 bit the > mmap method throws MemoryError with 2.5GB files... > The idea is that we allow the users to inspect the huge files > graphically, then they can "zoom" into regions of interest and then > load a ~100 MB en block for the usual spectral analysis. > > memory maps are limited to the size of the available address space (31 bits with sign), so you would have to slide them, see e.g. the smmap module. But its not likely this is going to be much faster than a loop with explicit seeks depending on the sparseness of the data. memory maps have relatively high overheads at the operating system level. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rays at blue-cove.com Tue Feb 4 10:46:56 2014 From: rays at blue-cove.com (RayS) Date: Tue, 04 Feb 2014 07:46:56 -0800 Subject: [Numpy-discussion] striding through arbitrarily large files In-Reply-To: References: <201402041501.s14F1qN2020922@blue-cove.com> <52F10293.10901@grinta.net> <201402041527.s14FRHZa030598@blue-cove.com> Message-ID: <201402041546.s14FkwKf006641@blue-cove.com> At 07:35 AM 2/4/2014, Julian Taylor wrote: >On Tue, Feb 4, 2014 at 4:27 PM, RayS ><rays at blue-cove.com> wrote: >At 07:09 AM 2/4/2014, you wrote: > >On 04/02/2014 16:01, RayS wrote: > > > I was struggling with methods of reading large disk files into numpy > > > efficiently (not FITS or .npy, just raw files of IEEE floats from > > > numpy.tostring()). When loading arbitrarily large files it would be nice > > > to not bother reading more than the plot can display before zooming in. > > > There apparently are no built in methods that allow skipping/striding... > > > >If you mmap the data file with np.memmap() you can access the data in a > >strided way through the numpy array interface and the OS will handle the > >scheduling of the reads from the disc. > > > >Note however if that the data samples you need are quite dense, there is > >no real advantage in doing this because the OS will have to read a whole > >page anyway for each read. > >Thanks Daniele, I'll be trying mmap with Python64. With 32 bit the >mmap method throws MemoryError with 2.5GB files... >The idea is that we allow the users to inspect the huge files >graphically, then they can "zoom" into regions of interest and then >load a ~100 MB en block for the usual spectral analysis. > >memory maps are limited to the size of the available address space >(31 bits with sign), so you would have to slide them, see e.g. the >smmap module. >But its not likely this is going to be much faster than a loop with >explicit seeks depending on the sparseness of the data. memory maps >have relatively high overheads at the operating system level. yes, very sparse data - ~4k out of 50 million I hadn't tried smmap, https://github.com/Byron/smmap, thanks - Ray -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.laumann at gmail.com Tue Feb 4 12:26:32 2014 From: chris.laumann at gmail.com (Chris Laumann) Date: Tue, 4 Feb 2014 09:26:32 -0800 Subject: [Numpy-discussion] Memory leak? In-Reply-To: <52EBDDF0.2040006@googlemail.com> References: <52EBDDF0.2040006@googlemail.com> Message-ID: Hi all- Thanks for the info re: memory leak. In trying to work around it, I think I?ve discovered another (still using SuperPack). This leaks ~30MB / run: hists = zeros((50,64), dtype=int) for i in range(50): ? ? for j in range(2**13): ? ? ? ? hists[i,j%64] += 1 The code leaks using hists[i,j] = hists[i,j] + 1 as well.? Is this the same leak or different? Doesn?t seem to have much in common. Incidentally, using? a = ones(v.shape[0]) a.dot(v) Instead of np.sum (in the previous example that i sent) does not leak.? Re: superpack.. As a fairly technically proficient user, I?m aware that the super pack installs dev builds and that they may therefore by somewhat less reliable. I?m okay with that tradeoff and I don?t expect you guys to actually treat the super pack as a stable release ? I also try to report that I?m using the superpack when I report bugs. I sometimes run git versions of ipython, numpy, etc in order to fiddle with the code and make tiny bug fixes/contributions myself. I don?t know the statistics re: superpack users but there is no link from scipy.org?s main install page so most new users won?t find it easily. Fonnesbeck?s webpage does say they are dev builds only two sentences into the paragraph. Best, Chris --? Chris Laumann Sent with Airmail On January 31, 2014 at 9:31:40 AM, Julian Taylor (jtaylor.debian at googlemail.com) wrote: On 31.01.2014 18:12, Nathaniel Smith wrote: > On Fri, Jan 31, 2014 at 4:29 PM, Benjamin Root wrote: >> Just to chime in here about the SciPy Superpack... this distribution tracks >> the master branch of many projects, and then puts out releases, on the >> assumption that master contains pristine code, I guess. I have gone down >> strange rabbit holes thinking that a particular bug was fixed already and >> the user telling me a version number that would confirm that, only to >> discover that the superpack actually packaged matplotlib about a month prior >> to releasing a version. >> >> I will not comment on how good or bad of an idea it is for the Superpack to >> do that, but I just wanted to make other developers aware of this to keep >> them from falling down the same rabbit hole. > > Wow, that is good to know. Esp. since the web page: > http://fonnesbeck.github.io/ScipySuperpack/ > simply advertises that it gives you things like numpy 1.9 and scipy > 0.14, which don't exist. (With some note about dev versions buried in > prose a few sentences later.) > > Empirically, development versions of numpy have always contained bugs, > regressions, and compatibility breaks that were fixed in the released > version; and we make absolutely no guarantees about compatibility > between dev versions and any release versions. And it sort of has to > be that way for us to be able to make progress. But if too many people > start using dev versions for daily use, then we and downstream > dependencies will have to start adding compatibility hacks and stuff > to support those dev versions. Which would be a nightmare for > developers and users both. > > Recommending this build for daily use by non-developers strikes me as > dangerous for both users and the wider ecosystem. > while probably not good for the user I think its very good for us. This is the second bug I introduced found by superpack users. This one might have gone unnoticed into the next release as it is pretty much impossible to find via tests. Even in valgrind reports its hard to find as its lumped in with all of pythons hundreds of memory arena still-reachable leaks. Concerning the fix, it seems if python sees tp_free == PYObject_Del/Free it replaces it with the tp_free of the base type which is int_free in this case. int_free uses a special allocator for even lower overhead so we start leaking. We either need to find the right flag to set for our scalars so it stops doing that, add an indirection so the function pointers don't match or stop using the object allocator as we are apparently digging to deep into pythons internal implementation details by doing so. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From jenny.stone125 at gmail.com Tue Feb 4 13:30:14 2014 From: jenny.stone125 at gmail.com (jennifer stone) Date: Wed, 5 Feb 2014 00:00:14 +0530 Subject: [Numpy-discussion] Suggestions for GSoC Projects In-Reply-To: References: Message-ID: 3. As stated earlier, we have spherical harmonic functions (with much scope >> for dev) we are yet to have elliptical and cylindrical harmonic function, >> which may be developed. >> > > This sounds very doable. How much work do you think would be involved? > >> >> As Stefan so rightly pointed out, the function for spherical harmonic function, sph_harm at present calls lpmn thus evaluating all orders From jtaylor.debian at googlemail.com Tue Feb 4 13:37:24 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 04 Feb 2014 19:37:24 +0100 Subject: [Numpy-discussion] Memory leak? In-Reply-To: References: <52EBDDF0.2040006@googlemail.com> Message-ID: <52F13364.90603@googlemail.com> On 04.02.2014 18:26, Chris Laumann wrote: > Hi all- > > Thanks for the info re: memory leak. In trying to work around it, I > think I?ve discovered another (still using SuperPack). This leaks ~30MB > / run: > > hists = zeros((50,64), dtype=int) > for i in range(50): > for j in range(2**13): > hists[i,j%64] += 1 > > The code leaks using hists[i,j] = hists[i,j] + 1 as well. > > Is this the same leak or different? Doesn?t seem to have much in common. Its the same leak, basically all scalar integers appearing somewhere leaked. It is now fixed in git master. Sorry for the inconvenience I caused. > > Incidentally, using > > a = ones(v.shape[0]) > a.dot(v) > > Instead of np.sum (in the previous example that i sent) does not leak. > > Re: superpack.. As a fairly technically proficient user, I?m aware that > the super pack installs dev builds and that they may therefore by > somewhat less reliable. I?m okay with that tradeoff and I don?t expect > you guys to actually treat the super pack as a stable release ? I also > try to report that I?m using the superpack when I report bugs. I > sometimes run git versions of ipython, numpy, etc in order to fiddle > with the code and make tiny bug fixes/contributions myself. I don?t know > the statistics re: superpack users but there is no link from scipy.org?s > main install page so most new users won?t find it easily. Fonnesbeck?s > webpage does say they are dev builds only two sentences into the paragraph. > > Best, Chris > > > > -- > Chris Laumann > Sent with Airmail > > On January 31, 2014 at 9:31:40 AM, Julian Taylor > (jtaylor.debian at googlemail.com ) > wrote: > >> On 31.01.2014 18:12, Nathaniel Smith wrote: >> > On Fri, Jan 31, 2014 at 4:29 PM, Benjamin Root wrote: >> >> Just to chime in here about the SciPy Superpack... this distribution tracks >> >> the master branch of many projects, and then puts out releases, on the >> >> assumption that master contains pristine code, I guess. I have gone down >> >> strange rabbit holes thinking that a particular bug was fixed already and >> >> the user telling me a version number that would confirm that, only to >> >> discover that the superpack actually packaged matplotlib about a month prior >> >> to releasing a version. >> >> >> >> I will not comment on how good or bad of an idea it is for the Superpack to >> >> do that, but I just wanted to make other developers aware of this to keep >> >> them from falling down the same rabbit hole. >> > >> > Wow, that is good to know. Esp. since the web page: >> > http://fonnesbeck.github.io/ScipySuperpack/ >> > simply advertises that it gives you things like numpy 1.9 and scipy >> > 0.14, which don't exist. (With some note about dev versions buried in >> > prose a few sentences later.) >> > >> > Empirically, development versions of numpy have always contained bugs, >> > regressions, and compatibility breaks that were fixed in the released >> > version; and we make absolutely no guarantees about compatibility >> > between dev versions and any release versions. And it sort of has to >> > be that way for us to be able to make progress. But if too many people >> > start using dev versions for daily use, then we and downstream >> > dependencies will have to start adding compatibility hacks and stuff >> > to support those dev versions. Which would be a nightmare for >> > developers and users both. >> > >> > Recommending this build for daily use by non-developers strikes me as >> > dangerous for both users and the wider ecosystem. >> > >> >> while probably not good for the user I think its very good for us. >> This is the second bug I introduced found by superpack users. >> This one might have gone unnoticed into the next release as it is pretty >> much impossible to find via tests. Even in valgrind reports its hard to >> find as its lumped in with all of pythons hundreds of memory arena >> still-reachable leaks. >> >> Concerning the fix, it seems if python sees tp_free == PYObject_Del/Free >> it replaces it with the tp_free of the base type which is int_free in >> this case. int_free uses a special allocator for even lower overhead so >> we start leaking. >> We either need to find the right flag to set for our scalars so it stops >> doing that, add an indirection so the function pointers don't match or >> stop using the object allocator as we are apparently digging to deep >> into pythons internal implementation details by doing so. >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion From jenny.stone125 at gmail.com Tue Feb 4 13:41:27 2014 From: jenny.stone125 at gmail.com (jennifer stone) Date: Wed, 5 Feb 2014 00:11:27 +0530 Subject: [Numpy-discussion] Suggestions for GSoC Projects In-Reply-To: References: <20140131103459.GA24791@gmail.com> Message-ID: > > On Fri, Jan 31, 2014 at 11:34 AM, St?fan van der Walt wrote: > >> On Fri, 31 Jan 2014 04:31:01 +0530, jennifer stone wrote: >> > 3. As stated earlier, we have spherical harmonic functions (with much >> scope >> > for dev) we are yet to have elliptical and cylindrical harmonic >> function, >> > which may be developed. >> >> As stated before, I am personally interested in seeing the spherical >> harmonics >> in SciPy improve. >> > > Finding a suitable mentor for whatever project Jennifer chooses is an > important factor in the choice of project, so I have to ask: do you have > the bandwidth to be a mentor or help out this summer? > > Ralf > Thanks a ton Ralf, for bringing this up. It would my dream come true experience to work under any of you guys.However tough be the project, the guidance of a willing mentor shall make it a smooth and fruitful experience. Please let me know if you can help out. I would be really grateful to you. Regards -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Tue Feb 4 19:14:19 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Wed, 5 Feb 2014 00:14:19 +0000 (UTC) Subject: [Numpy-discussion] striding through arbitrarily large files References: <201402041501.s14F1qN2020922@blue-cove.com> <52F10293.10901@grinta.net> <201402041527.s14FRHZa030598@blue-cove.com> Message-ID: <1279387130413251728.587822sturla.molden-gmail.com@news.gmane.org> RayS wrote: > Thanks Daniele, I'll be trying mmap with Python64. With 32 bit the > mmap method throws MemoryError with 2.5GB files... > The idea is that we allow the users to inspect the huge files > graphically, then they can "zoom" into regions of interest and then > load a ~100 MB en block for the usual spectral analysis. Transfer the file to Pytables, and you can zoom as you like. Or use a recent Python so mmap actually has an offset argument. Sturla From bryanv at continuum.io Tue Feb 4 20:47:29 2014 From: bryanv at continuum.io (Bryan Van de Ven) Date: Tue, 4 Feb 2014 19:47:29 -0600 Subject: [Numpy-discussion] ANN: Bokeh 0.4 Release Message-ID: <540B9DBA-6D09-4BF9-94BC-F3A0D77FFF38@continuum.io> I am pleased to announce the release of Bokeh version 0.4! Bokeh is a Python library for visualizing large and realtime datasets on the web. Its goal is to provide elegant, concise construction of novel graphics in the style of Protovis/D3, while delivering high-performance interactivity to thin clients. Bokeh includes its own Javascript library (BokehJS) that implements a reactive scenegraph representation of the plot, and renders efficiently to HTML5 Canvas. Bokeh works well with IPython Notebook, but can generate standalone graphics that embed into regular HTML. Check out the full documentation and interactive gallery at http://bokeh.pydata.org If you are using Anaconda, you can install with conda: conda install bokeh Alternatively, you can install with pip: pip install bokeh Some of the new features in this release include: * Preliminary work on Matplotlib support: convert MPL figures to Bokeh plots * Free public beta of Bokeh plot hosting at http://bokehplots.com * Tool improvements: - "always on" pan tool and wheel zoom tool (with shift key) - box zoom tool - viewport reset tool * Enhanced datetime axis, with better performance and nicer ticking * Expanded testing, including TravisCI integrations and static image output using PhantomJS * RGBA and color mapped image plots now available from Python * Python 3 supported * Vastly improved documentation for glyphs, with inline examples and JSFiddle integration Also, we've fixed lots of little bugs - see the CHANGELOG for full details. Bokeh will be having a free "Office Hours" later this week! Join us this Thursday at 2pm CST on EngineHereathttps://www.enginehere.com/stream/437/bokeh-04-release/ for a live informational session about the latest release. We'll be covering all the newest features and updates through a combination of live lecture, Q&A, and pair programming. It's all free, just sign up to the EngineHere learning platform. BokehJS is also available by CDN for use in standalone javascript applications: http://cdn.pydata.org/bokeh-0.4.js http://cdn.pydata.org/bokeh-0.4.css http://cdn.pydata.org/bokeh-0.4.min.js http://cdn.pydata.org/bokeh-0.4.min.css Some examples of BokehJS use can be found on the Bokeh JSFiddle page: http://jsfiddle.net/user/bokeh/fiddles/ The release of Bokeh 0.5 is planned for late March. Some notable features we plan to include are: * Abstract Rendering for semantically meaningful downsampling of large datasets * Better grid-based layout system, using Cassowary.js * Selection tools, tooltips, etc. Issues, enhancement requests, and pull requests can be made on the Bokeh Github page: https://github.com/continuumio/bokeh Questions can be directed to the Bokeh mailing list: bokeh at continuum.io Special thanks to recent contributors: Janek Klawe, Samantha Hughes, Rebecca Paz, and Benedikt Sauer. Regards, Bryan Van de Ven Continuum Analytics http://continuum.io From rhattersley at gmail.com Wed Feb 5 15:11:54 2014 From: rhattersley at gmail.com (Richard Hattersley) Date: Wed, 5 Feb 2014 20:11:54 +0000 Subject: [Numpy-discussion] striding through arbitrarily large files In-Reply-To: <201402041546.s14FkwKf006641@blue-cove.com> References: <201402041501.s14F1qN2020922@blue-cove.com> <52F10293.10901@grinta.net> <201402041527.s14FRHZa030598@blue-cove.com> <201402041546.s14FkwKf006641@blue-cove.com> Message-ID: On 4 February 2014 15:01, RayS wrote: > I was struggling with methods of reading large disk files into numpy > efficiently (not FITS or .npy, just raw files of IEEE floats from > numpy.tostring()). When loading arbitrarily large files it would be nice to > not bother reading more than the plot can display before zooming in. There > apparently are no built in methods that allow skipping/striding... > Since you mentioned the plural "files", are your datasets entirely contained within a single file? If not, you might be interested in Biggus ( https://pypi.python.org/pypi/Biggus). It's a small pure-Python module that lets you "glue-together" arrays (such as those from smmap) into a single arbitrarily large virtual array. You can then step over the virtual array and it maps it back to the underlying sources. Richard -------------- next part -------------- An HTML attachment was scrubbed... URL: From rays at blue-cove.com Wed Feb 5 15:37:22 2014 From: rays at blue-cove.com (RayS) Date: Wed, 05 Feb 2014 12:37:22 -0800 Subject: [Numpy-discussion] striding through arbitrarily large files In-Reply-To: References: <201402041501.s14F1qN2020922@blue-cove.com> <52F10293.10901@grinta.net> <201402041527.s14FRHZa030598@blue-cove.com> <201402041546.s14FkwKf006641@blue-cove.com> Message-ID: <201402052037.s15KbNFl019346@blue-cove.com> At 12:11 PM 2/5/2014, Richard Hattersley wrote: >On 4 February 2014 15:01, RayS ><rays at blue-cove.com> wrote: >I was struggling with methods of reading large disk files into >numpy efficiently (not FITS or .npy, just raw files of IEEE floats >from numpy.tostring()). When loading arbitrarily large files it >would be nice to not bother reading more than the plot can display >before zooming in. There apparently are no built in methods that >allow skipping/striding... > > >Since you mentioned the plural "files", are your datasets entirely >contained within a single file? If not, you might be interested in >Biggus >(https://pypi.python.org/pypi/Biggus). >It's a small pure-Python module that lets you "glue-together" arrays >(such as those from smmap) into a single arbitrarily large virtual >array. You can then step over the virtual array and it maps it back >to the underlying sources. > >Richard ooh, that might help they are individual GB files from medical trial studies I see there are some examples about https://github.com/SciTools/biggus/wiki/Sample-usage http://nbviewer.ipython.org/gist/pelson/6139282 Thanks! -------------- next part -------------- An HTML attachment was scrubbed... URL: From antonio.valentino at tiscali.it Wed Feb 5 15:37:33 2014 From: antonio.valentino at tiscali.it (Antonio Valentino) Date: Wed, 05 Feb 2014 21:37:33 +0100 Subject: [Numpy-discussion] ANN: PyTables 3.1.0 released Message-ID: <52F2A10D.5050804@tiscali.it> =========================== Announcing PyTables 3.1.0 =========================== We are happy to announce PyTables 3.1.0. This is a feature release. The upgrading is recommended for users that are running PyTables in production environments. What's new ========== Probably the most relevant changes in this release are internal improvements like the new node cache that is now compatible with the upcoming Python 3.4 and the registry for open files has been deeply reworked. The caching feature of file handlers has been completely dropped so now PyTables is a little bit more "thread friendly". New, user visible, features include: - a new lossy filter for HDF5 datasets (EArray, CArray, VLArray and Table objects). The *quantization* filter truncates floating point data to a specified precision before writing to disk. This can significantly improve the performance of compressors (many thanks to Andreas Hilboll). - support for the H5FD_SPLIT HDF5 driver (thanks to simleo) - all new features introduced in the Blosc_ 1.3.x series, and in particular the ability to leverage different compressors within Blosc_ are now available in PyTables via the blosc filter (a big thank you to Francesc) - the ability to save/restore the default value of :class:`EnumAtom` types Also, installations of the HDF5 library that have a broken support for the *long double* data type (see the `Issues with H5T_NATIVE_LDOUBLE`_ thread on the HFG5 forum) are detected by PyTables 3.1.0 and the corresponding features are automatically disabled. Users that need support for the *long double* data type should ensure to build PyTables against an installation of the HDF5 library that is not affected by the bug. .. _`Issues with H5T_NATIVE_LDOUBLE`: http://hdf-forum.184993.n3.nabble.com/Issues-with-H5T-NATIVE-LDOUBLE-tt4026450.html As always, a large amount of bugs have been addressed and squashed as well. In case you want to know more in detail what has changed in this version, please refer to: http://pytables.github.io/release_notes.html You can download a source package with generated PDF and HTML docs, as well as binaries for Windows, from: http://sourceforge.net/projects/pytables/files/pytables/3.1.0 For an online version of the manual, visit: http://pytables.github.io/usersguide/index.html What it is? =========== PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. PyTables includes OPSI, a new indexing technology, allowing to perform data lookups in tables exceeding 10 gigarows (10**10 rows) in less than a tenth of a second. Resources ========= About PyTables: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments =============== Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Most specially, a lot of kudos go to the HDF5 and NumPy makers. Without them, PyTables simply would not exist. Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. ---- **Enjoy data!** -- The PyTables Developers From sturla.molden at gmail.com Thu Feb 6 05:10:32 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 6 Feb 2014 10:10:32 +0000 (UTC) Subject: [Numpy-discussion] MKL and OpenBLAS References: Message-ID: <951978224413373619.887869sturla.molden-gmail.com@news.gmane.org> I just thought i'd mention this: Why not use Eigen? It has a full BLAS implementation and passes the BLAS test suite, and is generelly faster than MKL except on Core i7. Also, Eigen requires no build process, just include the header files. Yes, Eigen is based on C++, but OpenBLAS is parially coded in assembly. Whatever language is used internally in the BLAS implementation should be of no concern to NumPy. BTW: The performance of OpenBLAS is far behind Eigen, MKL and ACML, but better than ATLAS and Accelerate. Sturla "Dinesh Vadhia" wrote: > This conversation gets discussed often with Numpy developers but since > the requirement for optimized Blas is pretty common these days, how about > distributing Numpy with OpenBlas by default? People who don't want > optimized BLAS or OpenBLAS can then edit the site.cfg file to add/remove. > I can never remember if Numpy comes with Atlas by default but either > way, if using MKL is not feasible because of its licensing issues then > Numpy has to be re-compiled with OpenBLAS (for example). Why not make it > easier for developers to use Numpy with an in-built optimized Blas. > > Btw, just in case some folks from Intel are listening: how about > releasing MKL binaries for all platforms for developers to do with it > what they want ie. free. You know it makes sense! > > _______________________________________________ NumPy-Discussion mailing list > NumPy-Discussion at scipy.org href="http://mail.scipy.org/mailman/listinfo/numpy-discussion">http://mail.scipy.org/mailman/listinfo/numpy-discussion From stefan at sun.ac.za Thu Feb 6 05:59:12 2014 From: stefan at sun.ac.za (=?iso-8859-1?Q?St=E9fan?= van der Walt) Date: Thu, 6 Feb 2014 12:59:12 +0200 Subject: [Numpy-discussion] Suggestions for GSoC Projects In-Reply-To: References: <20140131103459.GA24791@gmail.com> Message-ID: <20140206105912.GH4638@gmail.com> On Tue, 04 Feb 2014 12:21:58 +0100, Ralf Gommers wrote: > Finding a suitable mentor for whatever project Jennifer chooses is an > important factor in the choice of project, so I have to ask: do you have > the bandwidth to be a mentor or help out this summer? I completely agree. I have time to be co-mentor, and I have ideas for other potential mentors (of course, Ralf would be on top of that list :). Members of the dipy team would also be interested. St?fan From thomas_unterthiner at web.de Thu Feb 6 07:11:43 2014 From: thomas_unterthiner at web.de (Thomas Unterthiner) Date: Thu, 06 Feb 2014 13:11:43 +0100 Subject: [Numpy-discussion] MKL and OpenBLAS In-Reply-To: <951978224413373619.887869sturla.molden-gmail.com@news.gmane.org> References: <951978224413373619.887869sturla.molden-gmail.com@news.gmane.org> Message-ID: <52F37BFF.2050400@web.de> On 2014-02-06 11:10, Sturla Molden wrote: > BTW: The performance of OpenBLAS is far behind Eigen, MKL and ACML, but > better than ATLAS and Accelerate. Hi there! Sorry for going a bit off-topic, but: do you have any links to the benchmarks? I googled around, but I haven't found anything. FWIW, on my own machines OpenBLAS is on par with MKL (on an i5 laptop and an older Xeon server) and actually slightly faster than ACML (on an FX8150) for my use cases (I mainly tested DGEMM/SGEMM, and a few LAPACK calls). So your claim is very surprising for me. Also, I'd be highly surprised if OpenBLAS would be slower than Eigen, given than the developers themselves say that Eigen is "nearly as fast as GotoBLAS"[1], and that OpenBLAS was originally forked from GotoBLAS. Cheers Thomas [1] http://eigen.tuxfamily.org/index.php?title=3.0 From jtaylor.debian at googlemail.com Thu Feb 6 07:27:16 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Thu, 6 Feb 2014 13:27:16 +0100 Subject: [Numpy-discussion] MKL and OpenBLAS In-Reply-To: <52F37BFF.2050400@web.de> References: <951978224413373619.887869sturla.molden-gmail.com@news.gmane.org> <52F37BFF.2050400@web.de> Message-ID: On Thu, Feb 6, 2014 at 1:11 PM, Thomas Unterthiner < thomas_unterthiner at web.de> wrote: > On 2014-02-06 11:10, Sturla Molden wrote: > > BTW: The performance of OpenBLAS is far behind Eigen, MKL and ACML, but > > better than ATLAS and Accelerate. > Hi there! > > Sorry for going a bit off-topic, but: do you have any links to the > benchmarks? I googled around, but I haven't found anything. FWIW, on my > own machines OpenBLAS is on par with MKL (on an i5 laptop and an older > Xeon server) and actually slightly faster than ACML (on an FX8150) for > my use cases (I mainly tested DGEMM/SGEMM, and a few LAPACK calls). So > your claim is very surprising for me. > > Also, I'd be highly surprised if OpenBLAS would be slower than Eigen, > given than the developers themselves say that Eigen is "nearly as fast > as GotoBLAS"[1], and that OpenBLAS was originally forked from GotoBLAS. > > I'm also a little sceptical about the benchmarks, e.g. according to the FAQ eigen does not seem to support AVX which is relatively important for blas level 3 performance. The lazy evaluation is probably eigens main selling point, which is something we cannot make use of in numpy currently. But nevertheless eigen could be an interesting alternative for our binary releases on windows. Having the stuff as headers makes it probably easier to build than ATLAS we are currently using. -------------- next part -------------- An HTML attachment was scrubbed... URL: From opossumnano at gmail.com Thu Feb 6 07:51:47 2014 From: opossumnano at gmail.com (Tiziano Zito) Date: Thu, 6 Feb 2014 13:51:47 +0100 Subject: [Numpy-discussion] [ANN] Summer School "Advanced Scientific Programming in Python" in Split, Croatia Message-ID: <20140206125147.GD5464@bio230.biologie.hu-berlin.de> Advanced Scientific Programming in Python ========================================= a Summer School by the G-Node and the Faculty of Electrical Engineering, Mechanical Engineering and Naval Architecture (FESB), University of Split Scientists spend more and more time writing, maintaining, and debugging software. While techniques for doing this efficiently have evolved, only few scientists have been trained to use them. As a result, instead of doing their research, they spend far too much time writing deficient code and reinventing the wheel. In this course we will present a selection of advanced programming techniques, incorporating theoretical lectures and practical exercises tailored to the needs of a programming scientist. New skills will be tested in a real programming project: we will team up to develop an entertaining scientific computer game. We use the Python programming language for the entire course. Python works as a simple programming language for beginners, but more importantly, it also works great in scientific simulations and data analysis. We show how clean language design, ease of extensibility, and the great wealth of open source libraries for scientific computing and data visualization are driving Python to become a standard tool for the programming scientist. This school is targeted at Master or PhD students and Post-docs from all areas of science. Competence in Python or in another language such as Java, C/C++, MATLAB, or Mathematica is absolutely required. Basic knowledge of Python is assumed. Participants without any prior experience with Python should work through the proposed introductory materials before the course. Date and Location ================= September 8?13, 2014. Split, Croatia Preliminary Program =================== Day 0 (Mon Sept 8) ? Best Programming Practices ? Best Practices for Scientific Computing ? Version control with git and how to contribute to Open Source with github ? Object-oriented programming & design patterns Day 1 (Tue Sept 9) ? Software Carpentry ? Test-driven development, unit testing & quality assurance ? Debugging, profiling and benchmarking techniques ? Advanced Python I: idioms, useful built-in data structures, generators Day 2 (Wed Sept 10) ? Scientific Tools for Python ? Advanced NumPy ? The Quest for Speed (intro): Interfacing to C with Cython ? Programming in teams Day 3 (Thu Sept 11) ? The Quest for Speed ? Writing parallel applications in Python ? Python 3: why should I care ? Programming project Day 4 (Fri Sept 12) ? Efficient Memory Management ? When parallelization does not help: the starving CPUs problem ? Advanced Python II: decorators and context managers ? Programming project Day 5 (Sat Sept 13) ? Practical Software Development ? Programming project ? The Pelita Tournament Every evening we will have the tutors' consultation hour: Tutors will answer your questions and give suggestions for your own projects. Applications ============ You can apply on-line at http://python.g-node.org Applications must be submitted before 23:59 UTC, May 1, 2014. Notifications of acceptance will be sent by June 1, 2014. No fee is charged but participants should take care of travel, living, and accommodation expenses. Candidates will be selected on the basis of their profile. Places are limited: acceptance rate is usually around 20%. Prerequisites: You are supposed to know the basics of Python to participate in the lectures. You are encouraged to go through the introductory material available on the website. Faculty ======= ? Francesc Alted, Continuum Analytics Inc., USA ? Pietro Berkes, Enthought Inc., UK ? Kathryn D. Huff, Department of Nuclear Engineering, University of California - Berkeley, USA ? Zbigniew J?drzejewski-Szmek, Krasnow Institute, George Mason University, USA ? Eilif Muller, Blue Brain Project, ?cole Polytechnique F?d?rale de Lausanne, Switzerland ? Rike-Benjamin Schuppner, Technologit GbR, Germany ? Nelle Varoquaux, Centre for Computational Biology Mines ParisTech, Institut Curie, U900 INSERM, Paris, France ? St?fan van der Walt, Applied Mathematics, Stellenbosch University, South Africa ? Niko Wilbert, TNG Technology Consulting GmbH, Germany ? Tiziano Zito, Institute for Theoretical Biology, Humboldt-Universit?t zu Berlin, Germany Organized by Tiziano Zito (head) and Zbigniew J?drzejewski-Szmek for the German Neuroinformatics Node of the INCF (Germany), Lana Peri?a for the Numerical and applied mathematics group, FESB, University of Split (Croatia), Ivana Kaji? from the Bernstein Center for Computational Neuroscience Berlin (Germany), Ivana Bala?evi? from the Technical University Berlin (Germany), and Filip Petkovski from IN2 Ltd. Skopje (Macedonia). Website: http://python.g-node.org Contact: python-info at g-node.org From alan.isaac at gmail.com Thu Feb 6 08:46:49 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Thu, 06 Feb 2014 08:46:49 -0500 Subject: [Numpy-discussion] create numerical arrays from strings Message-ID: <52F39249.5060803@gmail.com> NumPy matrix construction includes as a convenience feature the construction of matrices with a Matlab-like syntax. E.g., np.mat('1 2;3 4'). Is it correct that this syntax is not supported for direct (i.e., not using `mat`) ndarray creation? You may ask, where would this possibly matter? The answer: in the undergraduate classroom. Compare np.mat('1 2; 3 4') to np.array([[1, 2], [3, 4]]) for readability and intimidation factor. Little things matter when getting started with students who lack programming background. Thanks, Alan Isaac From stefan at sun.ac.za Thu Feb 6 09:04:35 2014 From: stefan at sun.ac.za (=?iso-8859-1?Q?St=E9fan?= van der Walt) Date: Thu, 6 Feb 2014 16:04:35 +0200 Subject: [Numpy-discussion] create numerical arrays from strings In-Reply-To: <52F39249.5060803@gmail.com> References: <52F39249.5060803@gmail.com> Message-ID: <20140206140435.GC9991@gmail.com> Hi Alan On Thu, 06 Feb 2014 08:46:49 -0500, Alan G Isaac wrote: > You may ask, where would this possibly matter? > The answer: in the undergraduate classroom. As a lecturer, I understand where you are coming from, but I don't think we can ultimately make API decisions based on teachability. The ndarray constructor already has behavior defined for strings: np.array('1 2 3; 4 5 6') array('1 2 3; 4 5 6', dtype='|S12') So we can't easily change that now. The best is probably to write a small utility library for your students that help them to easily construct arrays. Also, if you teach them inside an IPython Notebook, they can easily type np.ndarray([[1, 2], [3, 4]]) which is quite readable and makes use of standard Python objects. Regards St?fan From chris.barker at noaa.gov Thu Feb 6 11:42:38 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Thu, 6 Feb 2014 08:42:38 -0800 Subject: [Numpy-discussion] create numerical arrays from strings In-Reply-To: <52F39249.5060803@gmail.com> References: <52F39249.5060803@gmail.com> Message-ID: On Thu, Feb 6, 2014 at 5:46 AM, Alan G Isaac wrote: > NumPy matrix construction includes as a convenience feature > the construction of matrices with a Matlab-like syntax. > E.g., np.mat('1 2;3 4'). > > Is it correct that this syntax is not supported for > direct (i.e., not using `mat`) ndarray creation? > > You may ask, where would this possibly matter? > The answer: in the undergraduate classroom. > > Compare np.mat('1 2; 3 4') > to np.array([[1, 2], [3, 4]]) > for readability and intimidation factor. > Little things matter when getting started > with students who lack programming background. > 1) so use np.mat ! 2) The "right" way involves a few more keystrokes -- is this really a big deal? "Commas separate elements, each row is enclosed in square brackets" vs: "whitespace separates elements, semi-colons separate rows." I'm not sure it's that much harder to understand for a newbie. I'm sure it is for someone used to MATLAB, but do we really want to encourage folks to keep their MATLAB habits? 3) Even if it is substantially easier for a newbie, I think we need to be very careful in teaching to select for "easy to learn first" over "the right way to do it" -- in general, I think it's more important to establish good habits and understanding of what's under the covers than maximizing the ability to type in their first array literal. 4) we really don't want to go down the perl-esque route of "strings are interpreted as numbers if they happen to be numbers" IMHO, and all that.... -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Feb 6 15:09:33 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 6 Feb 2014 13:09:33 -0700 Subject: [Numpy-discussion] MKL and OpenBLAS In-Reply-To: References: <951978224413373619.887869sturla.molden-gmail.com@news.gmane.org> <52F37BFF.2050400@web.de> Message-ID: On Thu, Feb 6, 2014 at 5:27 AM, Julian Taylor wrote: > > On Thu, Feb 6, 2014 at 1:11 PM, Thomas Unterthiner < > thomas_unterthiner at web.de> wrote: > >> On 2014-02-06 11:10, Sturla Molden wrote: >> > BTW: The performance of OpenBLAS is far behind Eigen, MKL and ACML, but >> > better than ATLAS and Accelerate. >> Hi there! >> >> Sorry for going a bit off-topic, but: do you have any links to the >> benchmarks? I googled around, but I haven't found anything. FWIW, on my >> own machines OpenBLAS is on par with MKL (on an i5 laptop and an older >> Xeon server) and actually slightly faster than ACML (on an FX8150) for >> my use cases (I mainly tested DGEMM/SGEMM, and a few LAPACK calls). So >> your claim is very surprising for me. >> >> Also, I'd be highly surprised if OpenBLAS would be slower than Eigen, >> given than the developers themselves say that Eigen is "nearly as fast >> as GotoBLAS"[1], and that OpenBLAS was originally forked from GotoBLAS. >> >> > I'm also a little sceptical about the benchmarks, e.g. according to the > FAQ eigen does not seem to support AVX which is relatively important for > blas level 3 performance. > The lazy evaluation is probably eigens main selling point, which is > something we cannot make use of in numpy currently. > > But nevertheless eigen could be an interesting alternative for our binary > releases on windows. Having the stuff as headers makes it probably easier > to build than ATLAS we are currently using. > > The Eigen license is MPL-2. That doesn't look to be incompatible with BSD, but it may complicate things. Q8: I want to distribute (outside my organization) executable programs or libraries that I have compiled from someone else's unchanged MPL-licensed source code, either standalone or part of a larger work. What do I have to do? You must inform the recipients where they can get the source for the executable program you are distributing (i.e., you must comply with Section 3.2). You may also distribute any executables you create under a license of your choosing, as long as that license does not interfere with the recipients' rights to the source under the terms of the MPL. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthieu.brucher at gmail.com Thu Feb 6 16:45:01 2014 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Thu, 6 Feb 2014 21:45:01 +0000 Subject: [Numpy-discussion] MKL and OpenBLAS In-Reply-To: References: <951978224413373619.887869sturla.molden-gmail.com@news.gmane.org> <52F37BFF.2050400@web.de> Message-ID: According to the discussions on the ML, they switched from GPL to MPL to enable the kind of distribution numpy/scipy is looking for. They had some hesitations between BSD and MPL, but IIRC their official stand is to allow inclusion inside BSD-licensed code. Cheers, Matthieu 2014-02-06 20:09 GMT+00:00 Charles R Harris : > > > > On Thu, Feb 6, 2014 at 5:27 AM, Julian Taylor > wrote: >> >> >> On Thu, Feb 6, 2014 at 1:11 PM, Thomas Unterthiner >> wrote: >>> >>> On 2014-02-06 11:10, Sturla Molden wrote: >>> > BTW: The performance of OpenBLAS is far behind Eigen, MKL and ACML, but >>> > better than ATLAS and Accelerate. >>> Hi there! >>> >>> Sorry for going a bit off-topic, but: do you have any links to the >>> benchmarks? I googled around, but I haven't found anything. FWIW, on my >>> own machines OpenBLAS is on par with MKL (on an i5 laptop and an older >>> Xeon server) and actually slightly faster than ACML (on an FX8150) for >>> my use cases (I mainly tested DGEMM/SGEMM, and a few LAPACK calls). So >>> your claim is very surprising for me. >>> >>> Also, I'd be highly surprised if OpenBLAS would be slower than Eigen, >>> given than the developers themselves say that Eigen is "nearly as fast >>> as GotoBLAS"[1], and that OpenBLAS was originally forked from GotoBLAS. >>> >> >> I'm also a little sceptical about the benchmarks, e.g. according to the >> FAQ eigen does not seem to support AVX which is relatively important for >> blas level 3 performance. >> The lazy evaluation is probably eigens main selling point, which is >> something we cannot make use of in numpy currently. >> >> But nevertheless eigen could be an interesting alternative for our binary >> releases on windows. Having the stuff as headers makes it probably easier to >> build than ATLAS we are currently using. >> > > The Eigen license is MPL-2. That doesn't look to be incompatible with BSD, > but it may complicate things. > > Q8: I want to distribute (outside my organization) executable programs or > libraries that I have compiled from someone else's unchanged MPL-licensed > source code, either standalone or part of a larger work. What do I have to > do? > > You must inform the recipients where they can get the source for the > executable program you are distributing (i.e., you must comply with Section > 3.2). You may also distribute any executables you create under a license of > your choosing, as long as that license does not interfere with the > recipients' rights to the source under the terms of the MPL. > > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/ From d.l.goldsmith at gmail.com Thu Feb 6 18:03:57 2014 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Thu, 6 Feb 2014 15:03:57 -0800 Subject: [Numpy-discussion] create numerical arrays from strings Message-ID: Date: Thu, 6 Feb 2014 08:42:38 -0800 > From: Chris Barker > Subject: Re: [Numpy-discussion] create numerical arrays from strings > To: Discussion of Numerical Python > Message-ID: > < > CALGmxEKVNQok6wtY-jbjzgaeU5ewHh1_FLmSQXJsUJfcLExh2w at mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > 1) so use np.mat ! > To elaborate on this (because I, for one, was not aware that mat supported this API, and, significantly, the fact that it does, does not appear in its docstring: import numpy as np help(np.mat) Help on function asmatrix in module numpy.matrixlib.defmatrix: asmatrix(data, dtype=None) Interpret the input as a matrix. Unlike `matrix`, `asmatrix` does not make a copy if the input is already a matrix or an ndarray. Equivalent to ``matrix(data, copy=False)``. Parameters ---------- data : array_like Input data. Returns ------- mat : matrix `data` interpreted as a matrix. Examples -------- >>> x = np.array([[1, 2], [3, 4]]) >>> m = np.asmatrix(x) >>> x[0,0] = 5 >>> m matrix([[5, 2], [3, 4]]) ) However, we do have: a=np.mat('1 2;3 4') a matrix([[1, 2], [3, 4]]) b = np.array(a) b array([[1, 2], [3, 4]]) and so, as we should expect: c=np.array(np.mat('1 2;3 4')) c array([[1, 2], [3, 4]]) So the substance of the utility function Stefan suggests is one line: def numstr2numarr(in): """ 'in' is a matlab-style array containing strings for the numerical array entries """ return np.array(np.mat(in)) In essence, numpy "almost" provides the API you're asking for. DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Thu Feb 6 18:27:51 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 6 Feb 2014 23:27:51 +0000 (UTC) Subject: [Numpy-discussion] MKL and OpenBLAS References: <951978224413373619.887869sturla.molden-gmail.com@news.gmane.org> <52F37BFF.2050400@web.de> Message-ID: <809172291413421586.678747sturla.molden-gmail.com@news.gmane.org> Charles R Harris wrote: > The Eigen license is MPL-2. That doesn't look to be incompatible with > BSD, but it may complicate things. Q8: I want to distribute (outside my > organization) executable programs or libraries that I have compiled from > someone else's unchanged MPL-licensed source code, either standalone or > part of a larger work. What do I have to do? < href="http://www.mozilla.org/MPL/2.0/FAQ.html#distribute-my-binaries">http://www.mozilla.org/MPL/2.0/FAQ.html#distribute-my-binaries> > > You must inform the recipients where they can get the source for the > executable program you are distributing (i.e., you must comply with > Section 3.2). You may also distribute any executables you create under a > license of your choosing, as long as that license does not interfere with > the recipients' rights to the source under the terms of the MPL. MPL-2 is not viral like the GPL. The obligation to inform about the source code only applies to the MPL code. That is, if we use Eigen we must inform where the Eigen source code can be obtained. MPL is a compromize between BSD/MIT and GPL. Using MPL code in binaries should be fine. Sturla From alan.isaac at gmail.com Thu Feb 6 19:19:10 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Thu, 06 Feb 2014 19:19:10 -0500 Subject: [Numpy-discussion] create numerical arrays from strings In-Reply-To: References: Message-ID: <52F4267E.6000204@gmail.com> On 2/6/2014 6:03 PM, David Goldsmith wrote: > So the substance of the utility function Stefan suggests is one line: It's even easier than that: np.mat('1 2;3 4').A However the context is the introduction of the language to students who have no programming experience, not my personal convenience (which this affects not at all). Cheers, Alan From robert.kern at gmail.com Fri Feb 7 04:31:30 2014 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 7 Feb 2014 09:31:30 +0000 Subject: [Numpy-discussion] MKL and OpenBLAS In-Reply-To: References: <951978224413373619.887869sturla.molden-gmail.com@news.gmane.org> <52F37BFF.2050400@web.de> Message-ID: On Thu, Feb 6, 2014 at 9:45 PM, Matthieu Brucher wrote: > According to the discussions on the ML, they switched from GPL to MPL > to enable the kind of distribution numpy/scipy is looking for. They > had some hesitations between BSD and MPL, but IIRC their official > stand is to allow inclusion inside BSD-licensed code. If they want BSD-licensed projects to incorporate their code, they need to license it under the BSD license (or similar). They are not in a position to "allow" their MPL-licensed code to be included in a BSD-licensed project. That just doesn't mean anything. We never needed their permission. We could be "BSD-licensed except for this one bit which is MPLed", but we don't want to be. -- Robert Kern -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Fri Feb 7 07:44:46 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Fri, 7 Feb 2014 12:44:46 +0000 (UTC) Subject: [Numpy-discussion] MKL and OpenBLAS References: <951978224413373619.887869sturla.molden-gmail.com@news.gmane.org> <52F37BFF.2050400@web.de> Message-ID: <1174251510413468851.517388sturla.molden-gmail.com@news.gmane.org> Thomas Unterthiner wrote: > Sorry for going a bit off-topic, but: do you have any links to the > benchmarks? I googled around, but I haven't found anything. FWIW, on my > own machines OpenBLAS is on par with MKL (on an i5 laptop and an older > Xeon server) and actually slightly faster than ACML (on an FX8150) for > my use cases (I mainly tested DGEMM/SGEMM, and a few LAPACK calls). So > your claim is very surprising for me. I was thinking about the benchmarks on Eigen's website, but it might be a bit old now and possibly biased: http://eigen.tuxfamily.org/index.php?title=Benchmark It uses a single thread only, but for smaller matrix sizes Eigen tends to be the better. Carl Kleffner alerted me to this benchmark today: http://gcdart.blogspot.de/2013/06/fast-matrix-multiply-and-ml.html It shows superb performance and unparallelled scalability for OpenBLAS on Opteron. MKL might be better on Intel CPUs though. ATLAS is doing quite well too, better than I would expect, and generally better than Eigen. It is also interesting that ACML is crap, except with a single-threaded BLAS. Sturla From nouiz at nouiz.org Fri Feb 7 09:46:31 2014 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Fri, 7 Feb 2014 09:46:31 -0500 Subject: [Numpy-discussion] MKL and OpenBLAS In-Reply-To: References: <951978224413373619.887869sturla.molden-gmail.com@news.gmane.org> <52F37BFF.2050400@web.de> Message-ID: On Fri, Feb 7, 2014 at 4:31 AM, Robert Kern wrote: > On Thu, Feb 6, 2014 at 9:45 PM, Matthieu Brucher > wrote: >> According to the discussions on the ML, they switched from GPL to MPL >> to enable the kind of distribution numpy/scipy is looking for. They >> had some hesitations between BSD and MPL, but IIRC their official >> stand is to allow inclusion inside BSD-licensed code. > > If they want BSD-licensed projects to incorporate their code, they need to > license it under the BSD license (or similar). They are not in a position to > "allow" their MPL-licensed code to be included in a BSD-licensed project. > That just doesn't mean anything. We never needed their permission. We could > be "BSD-licensed except for this one bit which is MPLed", but we don't want > to be. I agree that we shouldn't include Eigen code in NumPy. But what distributing windows binaries that include Eigen headers? They wrote this on their web site: """ Virtually any software may use Eigen. For example, closed-source software may use Eigen without having to disclose its own source code. Many proprietary and closed-source software projects are using Eigen right now, as well as many BSD-licensed projects. """ Fred From robert.kern at gmail.com Fri Feb 7 12:08:16 2014 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 7 Feb 2014 17:08:16 +0000 Subject: [Numpy-discussion] MKL and OpenBLAS In-Reply-To: References: <951978224413373619.887869sturla.molden-gmail.com@news.gmane.org> <52F37BFF.2050400@web.de> Message-ID: On Fri, Feb 7, 2014 at 2:46 PM, Fr?d?ric Bastien wrote: > On Fri, Feb 7, 2014 at 4:31 AM, Robert Kern wrote: >> On Thu, Feb 6, 2014 at 9:45 PM, Matthieu Brucher >> wrote: >>> According to the discussions on the ML, they switched from GPL to MPL >>> to enable the kind of distribution numpy/scipy is looking for. They >>> had some hesitations between BSD and MPL, but IIRC their official >>> stand is to allow inclusion inside BSD-licensed code. >> >> If they want BSD-licensed projects to incorporate their code, they need to >> license it under the BSD license (or similar). They are not in a position to >> "allow" their MPL-licensed code to be included in a BSD-licensed project. >> That just doesn't mean anything. We never needed their permission. We could >> be "BSD-licensed except for this one bit which is MPLed", but we don't want >> to be. > > I agree that we shouldn't include Eigen code in NumPy. > > But what distributing windows binaries that include Eigen headers? > > They wrote this on their web site: > > """ > Virtually any software may use Eigen. For example, closed-source > software may use Eigen without having to disclose its own source code. > Many proprietary and closed-source software projects are using Eigen > right now, as well as many BSD-licensed projects. > """ I don't mind anyone distributing such binaries. I think it might even be reasonable for the project itself to distribute such binaries through a page on numpy.org. I do *not* think it would be wise to do so through our PyPI page or the download section of Sourceforge. Those bare lists of files provide insufficient context for us to be able to say "this one particular build includes MPL-licensed code in addition to the usual BSD-licensed code". -- Robert Kern From ralf.gommers at gmail.com Fri Feb 7 21:51:03 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sat, 8 Feb 2014 03:51:03 +0100 Subject: [Numpy-discussion] Suggestions for GSoC Projects In-Reply-To: <20140206105912.GH4638@gmail.com> References: <20140131103459.GA24791@gmail.com> <20140206105912.GH4638@gmail.com> Message-ID: On Thu, Feb 6, 2014 at 11:59 AM, St?fan van der Walt wrote: > On Tue, 04 Feb 2014 12:21:58 +0100, Ralf Gommers wrote: > > Finding a suitable mentor for whatever project Jennifer chooses is an > > important factor in the choice of project, so I have to ask: do you have > > the bandwidth to be a mentor or help out this summer? > > I completely agree. I have time to be co-mentor, and I have ideas for > other potential mentors Great! > (of course, Ralf would be on top of that list > :). That depends a bit on the topic - I don't know much about scipy.special, Pauli is our resident guru. > Members of the dipy team would also be interested. > That's specifically for the spherical harmonics topic right? Ralf > > St?fan > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Fri Feb 7 23:16:53 2014 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sat, 8 Feb 2014 06:16:53 +0200 Subject: [Numpy-discussion] Suggestions for GSoC Projects In-Reply-To: References: <20140131103459.GA24791@gmail.com> <20140206105912.GH4638@gmail.com> Message-ID: On 8 Feb 2014 04:51, "Ralf Gommers" wrote: > >> Members of the dipy team would also be interested. > > That's specifically for the spherical harmonics topic right? Right. Spherical harmonics are used as bases in many of DiPy's reconstruction algorithms. You are right, though, that gsoc would also require an expert in special functions. St?fan -------------- next part -------------- An HTML attachment was scrubbed... URL: From rays at blue-cove.com Sat Feb 8 20:44:32 2014 From: rays at blue-cove.com (RayS) Date: Sat, 08 Feb 2014 17:44:32 -0800 Subject: [Numpy-discussion] create numerical arrays from strings In-Reply-To: <52F39249.5060803@gmail.com> References: <52F39249.5060803@gmail.com> Message-ID: <201402090144.s191iWRO032673@blue-cove.com> At 05:46 AM 2/6/2014, Alan G Isaac wrote: >Compare np.mat('1 2; 3 4') >to np.array([[1, 2], [3, 4]]) >for readability and intimidation factor. >Little things matter when getting started >with students who lack programming background. my $.02: >'1 2; 3 4' is a non-obvious and non-intuitive way to describe a 2D array or matrix - and try explaining later that the values are actually stored in memory as 1,3,2,4 and why and watch the freshman chins drop... >>> np.array([[1,2], ... [3,4]]) ... array([[1, 2], [3, 4]]) Why use both significant whitespace and punctuation to separate elements? I've billed many months rewriting old Matlab code into Python - please don't saddle future engineers with a closed, non-objective, expensive product based on FORTRAN and written in C that breaks old code with every release. There are so many fine, easy tutorials like http://wiki.scipy.org/Tentative_NumPy_Tutorial http://www.loria.fr/~rougier/teaching/matplotlib/ - Ray -------------- next part -------------- An HTML attachment was scrubbed... URL: From argriffi at ncsu.edu Sun Feb 9 16:59:14 2014 From: argriffi at ncsu.edu (alex) Date: Sun, 9 Feb 2014 16:59:14 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix Message-ID: Hello list, I wrote this mini-nep for numpy but I've been advised it is more appropriate for discussion on the list. """ The ``numpy.matrix`` API provides a low barrier to using Python for linear algebra, just as the pre-3 Python ``input`` function and ``print`` statement provided low barriers to using Python for automatically evaluating input and for printing output. On the other hand, it really needs to be deprecated. Let's deprecate ``numpy.matrix``. """ I understand that numpy.matrix will not be deprecated any time soon, but I hope this will register as a vote to help nudge its deprecation closer to the realm of acceptable discussion. Alex From alan.isaac at gmail.com Sun Feb 9 17:12:04 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Sun, 09 Feb 2014 17:12:04 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: Message-ID: <52F7FD34.1000003@gmail.com> On 2/9/2014 4:59 PM, alex wrote: > """ > The ``numpy.matrix`` API provides a low barrier to using Python > for linear algebra, just as the pre-3 Python ``input`` function > and ``print`` statement provided low barriers to using Python for > automatically evaluating input and for printing output. > > On the other hand, it really needs to be deprecated. > Let's deprecate ``numpy.matrix``. > """ > > I understand that numpy.matrix will not be deprecated any time soon, > but I hope this will register as a vote to help nudge its deprecation > closer to the realm of acceptable discussion. I believe you will want to see the archived discussions of this controversial issue before broaching it again, and then when you do so, broach it in terms of the points that have been raised **in great detail** in the past. fwiw, Alan Isaac From argriffi at ncsu.edu Sun Feb 9 17:55:49 2014 From: argriffi at ncsu.edu (alex) Date: Sun, 9 Feb 2014 17:55:49 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: <52F7FD34.1000003@gmail.com> References: <52F7FD34.1000003@gmail.com> Message-ID: On Sun, Feb 9, 2014 at 5:12 PM, Alan G Isaac wrote: > On 2/9/2014 4:59 PM, alex wrote: >> """ >> The ``numpy.matrix`` API provides a low barrier to using Python >> for linear algebra, just as the pre-3 Python ``input`` function >> and ``print`` statement provided low barriers to using Python for >> automatically evaluating input and for printing output. >> >> On the other hand, it really needs to be deprecated. >> Let's deprecate ``numpy.matrix``. >> """ >> >> I understand that numpy.matrix will not be deprecated any time soon, >> but I hope this will register as a vote to help nudge its deprecation >> closer to the realm of acceptable discussion. > > I believe you will want to see the archived discussions of this > controversial issue before broaching it again, and then > when you do so, broach it in terms of the points that > have been raised **in great detail** in the past. I haven't followed the numpy mailing list, but some googling found a six years old thread http://mail.scipy.org/pipermail/scipy-user/2008-May/016738.html where Nathan Bell raised the question of numpy.matrix deprecation and you replied that you find them helpful for teaching and that you personally find them convenient to use. Perhaps coincidentally (or not...) I'm working on the same kinds of problems in scipy development (functions involving sparse matrices and abstract linear operators) as Nathan Bell had been working on when he made this post. I understand that unless I am aware of the history of this discussion there is no point in my broaching this controversial issue directly, but perhaps I could broach it indirectly by asking if anyone with a deeper understanding of the background of this issue has compiled some document enumerating or summarizing the points that have been made? Alex From jeffreback at gmail.com Sun Feb 9 18:02:21 2014 From: jeffreback at gmail.com (Jeff Reback) Date: Sun, 9 Feb 2014 18:02:21 -0500 Subject: [Numpy-discussion] [pydata] ANN: pandas 0.13.1 released References: Message-ID: <09D339D1-CBC0-4966-AF9C-105FA68E0A6E@gmail.com> > Hello, > > This is a minor release from 0.13.0 and includes a small number of API changes, several new features, enhancements, and > performance improvements along with a large number of bug fixes. > > We recommend that all users upgrade to this version. > > Highlights include: > > - Added infer_datetime_format keyword to read_csv/to_datetime to allow speedups for homogeneously formatted datetimes. > - Will intelligently limit display precision for datetime/timedelta formats. > - Enhanced Panel apply() method. > - Suggested tutorials in new Tutorials section. > - Our pandas ecosystem is growing, We now feature related projects in a new Pandas Ecosystem section. > - Much work has been taking place on improving the docs, and a new Contributing section has been added. > > v0.13.1 Whatsnew Page > http://pandas-docs.github.io/pandas-docs-travis/whatsnew.html#v0-13-1-february-3-2014 > > v0.13.1 Documentation Page > http://pandas.pydata.org/pandas-docs/stable/ > > Pleas visit here for the source tarball: > https://github.com/pydata/pandas/releases/tag/v0.13.1 > > Windows binaries are available from Christoph Gohlke's collection: > http://www.lfd.uci.edu/~gohlke/pythonlibs/#pandas > > tarballs and windows binaries are available on PyPi: > https://pypi.python.org/pypi/pandas > > We are looking forward to a next planned release of v0.14.0 in about three months time. > > Some things that we would like to include: > > - A big upgrade to SQL to/from interop with support for all major DBs, leveraging SQLAlchemy. > - Template-based displays for dataframes, with conditional formatting and roll-your-own output generation. > - Reduced memory dataframe construction from known-length iterators. > - Your PRs. > > Thanks > > The Pandas Team > > > Contributors to the 0.13.1 release > > $ git log v0.12.1..v0.13.1 --pretty='%aN##%s' | grep -v 'Merge pull' | grep -Po '^[^#]+' | sort | uniq -c | sort -rn > > 146 y-p > 97 jreback > 14 Joris Van den Bossche > 8 Phillip Cloud > 8 Andy Hayden > 6 unutbu > 4 Skipper Seabold > 3 TomAugspurger > 3 Jeff Tratner > 3 DSM > 3 Douglas McNeil > 3 Dan Birken > 3 Chapman Siu > 2 Tom Augspurger > 2 Naveen Michaud-Agrawal > 2 Michael Schatzow > 2 Kieran O'Mahony > 2 Jacob Schaer > 2 Doran Deluz > 2 danielballan > 2 Clark Fitzgerald > 2 chapman siu > 2 Caleb Epstein > 2 Brad Buran > 2 Andrew Burrows > 2 Alex Rothberg > 1 Spencer Lyon > 1 Roman Pekar > 1 Patrick O'Keeffe > 1 mwaskom > 1 lexual > 1 Julia Evans > 1 John McNamara > 1 Jan Wagner > 1 immerrr > 1 Guillaume Gay > 1 George Kuan > 1 Felix Lawrence > 1 Elliot S > 1 Draen Luanin > 1 Douglas Rudd > 1 David Wolever > 1 davidshinn > 1 david > 1 Daniel Waeber > 1 Chase Albert > 1 bwignall > 1 bmu > 1 Bjorn Arneson > 1 Alok Singhal > 1 akittredge > 1 acorbe > > -- > You received this message because you are subscribed to the Google Groups "PyData" group. > To unsubscribe from this group and stop receiving emails from it, send an email to pydata+unsubscribe at googlegroups.com. > For more options, visit https://groups.google.com/groups/opt_out. From matthew.brett at gmail.com Sun Feb 9 18:07:35 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Sun, 9 Feb 2014 15:07:35 -0800 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> Message-ID: Hi, On Sun, Feb 9, 2014 at 2:55 PM, alex wrote: > On Sun, Feb 9, 2014 at 5:12 PM, Alan G Isaac wrote: >> On 2/9/2014 4:59 PM, alex wrote: >>> """ >>> The ``numpy.matrix`` API provides a low barrier to using Python >>> for linear algebra, just as the pre-3 Python ``input`` function >>> and ``print`` statement provided low barriers to using Python for >>> automatically evaluating input and for printing output. >>> >>> On the other hand, it really needs to be deprecated. >>> Let's deprecate ``numpy.matrix``. >>> """ >>> >>> I understand that numpy.matrix will not be deprecated any time soon, >>> but I hope this will register as a vote to help nudge its deprecation >>> closer to the realm of acceptable discussion. >> >> I believe you will want to see the archived discussions of this >> controversial issue before broaching it again, and then >> when you do so, broach it in terms of the points that >> have been raised **in great detail** in the past. > > I haven't followed the numpy mailing list, but some googling found a > six years old thread > http://mail.scipy.org/pipermail/scipy-user/2008-May/016738.html > where Nathan Bell raised the question of numpy.matrix deprecation and > you replied that you find them helpful for teaching and that you > personally find them convenient to use. Perhaps coincidentally (or > not...) I'm working on the same kinds of problems in scipy development > (functions involving sparse matrices and abstract linear operators) as > Nathan Bell had been working on when he made this post. > > I understand that unless I am aware of the history of this discussion > there is no point in my broaching this controversial issue directly, > but perhaps I could broach it indirectly by asking if anyone with a > deeper understanding of the background of this issue has compiled some > document enumerating or summarizing the points that have been made? As I was saying over in the starting github issue thread, I have found that I use Sympy for demonstrating linear algebra on matrices. Here's an example: http://nbviewer.ipython.org/urls/raw2.github.com/practical-neuroimaging/pna-notebooks/master/Introducing%20the%20General%20Linear%20Model.ipynb Best, Matthew From njs at pobox.com Mon Feb 10 09:26:59 2014 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 10 Feb 2014 09:26:59 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: Message-ID: On Sun, Feb 9, 2014 at 4:59 PM, alex wrote: > Hello list, > > I wrote this mini-nep for numpy but I've been advised it is more > appropriate for discussion on the list. > > """ > The ``numpy.matrix`` API provides a low barrier to using Python > for linear algebra, just as the pre-3 Python ``input`` function > and ``print`` statement provided low barriers to using Python for > automatically evaluating input and for printing output. > > On the other hand, it really needs to be deprecated. > Let's deprecate ``numpy.matrix``. > """ > > I understand that numpy.matrix will not be deprecated any time soon, > but I hope this will register as a vote to help nudge its deprecation > closer to the realm of acceptable discussion. To make this more productive, maybe it would be useful to elaborate on what exactly we should do here. I can't imagine we'll actually remove 'matrix' from the numpy namespace at any point in the near future. I do have the sense that when people choose to use it, they eventually come to regret this choice. It's a bit buggy and has confusing behaviours, and due to limitations of numpy's subclassing model, will probably always be buggy and have confusing behaviours. And it's marketed as being for new users, who are exactly the kind of users who aren't sophisticated enough to recognize these dangers. Maybe there should be a big warning to this effect in the np.matrix docstring? Maybe using np.matrix should raise a DeprecationWarning? (DeprecationWarning doesn't have to mean that something will be disappearing -- e.g. the Python stdlib deprecates stuff all the time, but never actually removes it. It's just a warning flag that there are better options available.) Or what? -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From alan.isaac at gmail.com Mon Feb 10 10:09:44 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 10 Feb 2014 10:09:44 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> Message-ID: <52F8EBB8.7070205@gmail.com> On 2/9/2014 5:55 PM, alex wrote: > I'm working on the same kinds of problems in scipy development > (functions involving sparse matrices and abstract linear operators) And how is numpy's matrix object getting in your way? Your initial post simply treated the desirability of deprecation as a given and did not lay out reasons. A strong reason would be e.g. if the matrix object is creating a serious maintenance headache. Eliminating this should be a big enough gain to offset any lost interest in numpy from users of Matlab, GAUSS, IDL etc. from the disappearance of a user-friendly notation. I accept that a numpy matrix has some warts. In the past, I've proposed changes to address these. E.g., https://www.mail-archive.com/numpy-discussion at scipy.org/msg06780.html However these went nowhere, so effectively the status quo was defended. I can live with that. A bit of the notational advantage of the `matrix` object was undercut by the addition of the `dot` method to arrays. If `matrix` is deprecated, I would hope that a matrix-power method would be added. (One that works correctly with boolean arrays and has a short name.) I ideally an inverse method would be added as well (with a short name). I think adding the hermitian transpose as `.H()` already has some support, but I forget its current status. Right now, to give a simple example, students can write a simple projection matrix as `X * (X.T * X).I * X.T` instead of `X.dot(la.inv(X.T.dot(X))).dot(X.T)`. The advantage is obvious and even bigger with more complex expressions. If we were to get `.I` for matrix inverse of an array (which I expect to be vociferously resisted) it would be `X.dot(X.T.dot(X).I).dot(X.T)` which at the moment I'm inclined to see as acceptable for teaching. (Not sure.) Just to forestall the usual "just start them with arrays, eventually they'll be grateful" reply, I would want to hear that suggestion only from someone who has used it successfully with undergraduates in the social sciences. Alan Isaac From ndarray at mac.com Mon Feb 10 11:16:23 2014 From: ndarray at mac.com (Alexander Belopolsky) Date: Mon, 10 Feb 2014 11:16:23 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: Message-ID: On Sun, Feb 9, 2014 at 4:59 PM, alex wrote: > On the other hand, it really needs to be deprecated. While numpy.matrix may have its problems, a NEP should list a better rationale than the above to gain acceptance. Personally, I decided not to use numpy.matrix in production code about 10 years ago and never looked back to that decision. I've heard however that some of the worst inheritance warts have been fixed over the years. I also resisted introducing inheritance in the implementation of masked arrays, but I lost that argument. For better or worse, inheritance from ndarray is here to stay and I would rather see numpy.matrix stay as a test-bed for fixing inheritance issues rather than see it deprecated and have the same issues pop up in ma or elsewhere. -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Feb 10 11:27:42 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 10 Feb 2014 11:27:42 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: <52F8EBB8.7070205@gmail.com> References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> Message-ID: On Mon, Feb 10, 2014 at 10:09 AM, Alan G Isaac wrote: > On 2/9/2014 5:55 PM, alex wrote: > > I'm working on the same kinds of problems in scipy development > > (functions involving sparse matrices and abstract linear operators) > > > And how is numpy's matrix object getting in your way? > Your initial post simply treated the desirability of > deprecation as a given and did not lay out reasons. > A strong reason would be e.g. if the matrix object > is creating a serious maintenance headache. Eliminating > this should be a big enough gain to offset any lost interest > in numpy from users of Matlab, GAUSS, IDL etc. from the > disappearance of a user-friendly notation. > > I accept that a numpy matrix has some warts. In the past, > I've proposed changes to address these. E.g., > https://www.mail-archive.com/numpy-discussion at scipy.org/msg06780.html > However these went nowhere, so effectively the status quo was > defended. I can live with that. > > A bit of the notational advantage of the `matrix` object was undercut > by the addition of the `dot` method to arrays. If `matrix` is deprecated, > I would hope that a matrix-power method would be added. (One that works > correctly with boolean arrays and has a short name.) I ideally an inverse > method would be added as well (with a short name). I think adding the > hermitian transpose as `.H()` already has some support, but I forget its > current > status. > > Right now, to give a simple example, students can write a simple projection > matrix as `X * (X.T * X).I * X.T` instead of > `X.dot(la.inv(X.T.dot(X))).dot(X.T)`. > X.dot(la.pinv(X)) or even better assign pinv(X) to a name and reuse it. Josef (I never taught statistics or econometrics to undergraduates in Social Sciences.) How do we calculate the diagonal of the hat matrix without using N by N matrices? > The advantage is obvious and even bigger with more complex expressions. > If we were to get `.I` for matrix inverse of an array (which I expect to be > vociferously resisted) it would be `X.dot(X.T.dot(X).I).dot(X.T)` which > at the moment I'm inclined to see as acceptable for teaching. (Not sure.) > > Just to forestall the usual "just start them with arrays, eventually > they'll > be grateful" reply, I would want to hear that suggestion only from someone > who has used it successfully with undergraduates in the social sciences. > > Alan Isaac > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Feb 10 11:31:20 2014 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 10 Feb 2014 11:31:20 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: Message-ID: On Mon, Feb 10, 2014 at 11:16 AM, Alexander Belopolsky wrote: > > On Sun, Feb 9, 2014 at 4:59 PM, alex wrote: >> >> On the other hand, it really needs to be deprecated. > > > While numpy.matrix may have its problems, a NEP should list a better > rationale than the above to gain acceptance. > > Personally, I decided not to use numpy.matrix in production code about 10 > years ago and never looked back to that decision. I've heard however that > some of the worst inheritance warts have been fixed over the years. I also > resisted introducing inheritance in the implementation of masked arrays, > but I lost that argument. For better or worse, inheritance from ndarray is > here to stay and I would rather see numpy.matrix stay as a test-bed for > fixing inheritance issues rather than see it deprecated and have the same > issues pop up in ma or elsewhere. In practice, the existence of np.matrix doesn't seem to have any affect on whether inheritance issues get fixed. And in the long run, I think the goal is to move people away from inheriting from np.ndarray. Really the only good reason to inherit from np.ndarray right now, is if there's something you want to do that is impossible without using inheritance. But we're working on fixing those issues (e.g., __numpy_ufunc__ in the next release). And AFAICT most of the remaining issues with inheritance simply cannot be fixed, because the requirements are ill-defined and contradictory. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From dineshbvadhia at hotmail.com Mon Feb 10 11:58:56 2014 From: dineshbvadhia at hotmail.com (Dinesh Vadhia) Date: Mon, 10 Feb 2014 08:58:56 -0800 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: Message-ID: Scipy sparse uses matrices - I was under the impression that scipy sparse only works with matrices or have things moved on? -------------- next part -------------- An HTML attachment was scrubbed... URL: From argriffi at ncsu.edu Mon Feb 10 12:00:27 2014 From: argriffi at ncsu.edu (alex) Date: Mon, 10 Feb 2014 12:00:27 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> Message-ID: On Mon, Feb 10, 2014 at 11:27 AM, wrote: > How do we calculate the diagonal of the hat matrix without using N by N > matrices? Not sure if this was a rhetorical question or what, but this seems to work leverages = np.square(scipy.linalg.qr(X, mode='economic')[0]).sum(axis=1) http://www4.ncsu.edu/~ipsen/ps/slides_CSE2013.pdf Sorry for off-topic... From matthieu.brucher at gmail.com Mon Feb 10 12:02:16 2014 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Mon, 10 Feb 2014 17:02:16 +0000 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: Message-ID: Yes, but these will be scipy.sparse matrices, nothing to do with numpy (dense) matrices. Cheers, Matthieu 2014-02-10 Dinesh Vadhia : > Scipy sparse uses matrices - I was under the impression that scipy sparse > only works with matrices or have things moved on? > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/ From njs at pobox.com Mon Feb 10 12:16:01 2014 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 10 Feb 2014 12:16:01 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: Message-ID: On Mon, Feb 10, 2014 at 12:02 PM, Matthieu Brucher wrote: > Yes, but these will be scipy.sparse matrices, nothing to do with numpy > (dense) matrices. Unfortunately when scipy.sparse matrices interact with dense ndarrays (e.g., sparse matrix * dense vector), then you always get back np.matrix objects instead of np.ndarray objects. So it's impossible to avoid np.matrix entirely if using scipy.sparse. -n From argriffi at ncsu.edu Mon Feb 10 12:19:24 2014 From: argriffi at ncsu.edu (alex) Date: Mon, 10 Feb 2014 12:19:24 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: Message-ID: On Mon, Feb 10, 2014 at 12:16 PM, Nathaniel Smith wrote: > On Mon, Feb 10, 2014 at 12:02 PM, Matthieu Brucher > wrote: >> Yes, but these will be scipy.sparse matrices, nothing to do with numpy >> (dense) matrices. > > Unfortunately when scipy.sparse matrices interact with dense ndarrays > (e.g., sparse matrix * dense vector), then you always get back > np.matrix objects >>> csr_matrix([[1, 2], [3, 4]]) * array([5, 6]) array([17, 39]) From e.antero.tammi at gmail.com Mon Feb 10 14:03:43 2014 From: e.antero.tammi at gmail.com (eat) Date: Mon, 10 Feb 2014 21:03:43 +0200 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> Message-ID: On Mon, Feb 10, 2014 at 7:00 PM, alex wrote: > On Mon, Feb 10, 2014 at 11:27 AM, wrote: > > How do we calculate the diagonal of the hat matrix without using N by N > > matrices? > > Not sure if this was a rhetorical question or what, but this seems to work > leverages = np.square(scipy.linalg.qr(X, mode='economic')[0]).sum(axis=1) > http://www4.ncsu.edu/~ipsen/ps/slides_CSE2013.pdf Rhetorical or not, but FWIW I'll prefer to take singular value decomposition (u, s, vt= svd(x)) and then based on the singular values sI'll estimate a "numerically feasible rank" r. Thus the diagonal of such hat matrix would be (u[:, :r]** 2).sum(1). Regards, -eat > > Sorry for off-topic... > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From argriffi at ncsu.edu Mon Feb 10 14:08:57 2014 From: argriffi at ncsu.edu (alex) Date: Mon, 10 Feb 2014 14:08:57 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> Message-ID: On Mon, Feb 10, 2014 at 2:03 PM, eat wrote: > Rhetorical or not, but FWIW I'll prefer to take singular value decomposition > (u, s, vt= svd(x)) and then based on the singular values s I'll estimate a > "numerically feasible rank" r. Thus the diagonal of such hat matrix would be > (u[:, :r]** 2).sum(1). It's a small detail but you probably want svd(x, full_matrices=False) to avoid anything NxN. From e.antero.tammi at gmail.com Mon Feb 10 14:12:19 2014 From: e.antero.tammi at gmail.com (eat) Date: Mon, 10 Feb 2014 21:12:19 +0200 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> Message-ID: On Mon, Feb 10, 2014 at 9:08 PM, alex wrote: > On Mon, Feb 10, 2014 at 2:03 PM, eat wrote: > > Rhetorical or not, but FWIW I'll prefer to take singular value > decomposition > > (u, s, vt= svd(x)) and then based on the singular values s I'll estimate > a > > "numerically feasible rank" r. Thus the diagonal of such hat matrix > would be > > (u[:, :r]** 2).sum(1). > > It's a small detail but you probably want svd(x, full_matrices=False) > to avoid anything NxN. > Indeed. Thanks, -eat > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Feb 10 14:36:00 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 10 Feb 2014 11:36:00 -0800 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: Message-ID: Hi, On Mon, Feb 10, 2014 at 6:26 AM, Nathaniel Smith wrote: > On Sun, Feb 9, 2014 at 4:59 PM, alex wrote: >> Hello list, >> >> I wrote this mini-nep for numpy but I've been advised it is more >> appropriate for discussion on the list. >> >> """ >> The ``numpy.matrix`` API provides a low barrier to using Python >> for linear algebra, just as the pre-3 Python ``input`` function >> and ``print`` statement provided low barriers to using Python for >> automatically evaluating input and for printing output. >> >> On the other hand, it really needs to be deprecated. >> Let's deprecate ``numpy.matrix``. >> """ >> >> I understand that numpy.matrix will not be deprecated any time soon, >> but I hope this will register as a vote to help nudge its deprecation >> closer to the realm of acceptable discussion. > > To make this more productive, maybe it would be useful to elaborate on > what exactly we should do here. > > I can't imagine we'll actually remove 'matrix' from the numpy > namespace at any point in the near future. {out of order paste}: > Maybe there should be a big warning to this effect in the np.matrix docstring? That seems reasonable to me. Maybe, to avoid heat and fast changes the NEP could lay out different options with advantages and disadvantages. > I do have the sense that when people choose to use it, they eventually > come to regret this choice. It's a bit buggy and has confusing > behaviours, and due to limitations of numpy's subclassing model, will > probably always be buggy and have confusing behaviours. And it's > marketed as being for new users, who are exactly the kind of users who > aren't sophisticated enough to recognize these dangers. This paragraph is a good summary of why the current situation of np.matrix could cause harm. It would really useful to have some hard evidence of who's using it though. Are there projects that use np.matrix extensively? If so, maybe some code from these could be use-cases to see if (pseudo-) deprecation is practical? Alex - do you have time to lay this stuff out? I bet the NEP would be a good way of helping the discussion stays on track. At very least it could be a reference point the next time this comes up. Thanks for bringing this up, Matthew From josef.pktd at gmail.com Mon Feb 10 14:44:16 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 10 Feb 2014 14:44:16 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> Message-ID: On Mon, Feb 10, 2014 at 2:12 PM, eat wrote: > > > > On Mon, Feb 10, 2014 at 9:08 PM, alex wrote: > >> On Mon, Feb 10, 2014 at 2:03 PM, eat wrote: >> > Rhetorical or not, but FWIW I'll prefer to take singular value >> decomposition >> > (u, s, vt= svd(x)) and then based on the singular values s I'll >> estimate a >> > "numerically feasible rank" r. Thus the diagonal of such hat matrix >> would be >> > (u[:, :r]** 2).sum(1). >> >> It's a small detail but you probably want svd(x, full_matrices=False) >> to avoid anything NxN. >> > Indeed. > I meant the entire diagonal not the trace of the projection matrix. My (not articulated) thought was that I use element wise multiplication together with dot products instead of the three dot products, however elementwise algebra is not very common in linear algebra based textbooks. The question is whether students and new user coming from `matrix` languages can translate formulas into code, or just copy formulas to code. (It took me a while to get used to numpy and take advantage of it's features coming from GAUSS and Matlab.) OT since the precense or absence of matrix in numpy doesn't affect me. Josef > > Thanks, > -eat > >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Feb 10 14:49:06 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 10 Feb 2014 11:49:06 -0800 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> Message-ID: Hi, On Mon, Feb 10, 2014 at 11:44 AM, wrote: > > > On Mon, Feb 10, 2014 at 2:12 PM, eat wrote: >> >> >> >> >> On Mon, Feb 10, 2014 at 9:08 PM, alex wrote: >>> >>> On Mon, Feb 10, 2014 at 2:03 PM, eat wrote: >>> > Rhetorical or not, but FWIW I'll prefer to take singular value >>> > decomposition >>> > (u, s, vt= svd(x)) and then based on the singular values s I'll >>> > estimate a >>> > "numerically feasible rank" r. Thus the diagonal of such hat matrix >>> > would be >>> > (u[:, :r]** 2).sum(1). >>> >>> It's a small detail but you probably want svd(x, full_matrices=False) >>> to avoid anything NxN. >> >> Indeed. > > > I meant the entire diagonal not the trace of the projection matrix. > > My (not articulated) thought was that I use element wise multiplication > together with dot products instead of the three dot products, however > elementwise algebra is not very common in linear algebra based textbooks. > > The question is whether students and new user coming from `matrix` languages > can translate formulas into code, or just copy formulas to code. > (It took me a while to get used to numpy and take advantage of it's features > coming from GAUSS and Matlab.) > > OT since the precense or absence of matrix in numpy doesn't affect me. Josef - as a data point - does statsmodels use np.matrix? Cheers, Matthew From josef.pktd at gmail.com Mon Feb 10 14:50:22 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 10 Feb 2014 14:50:22 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: <52F8EBB8.7070205@gmail.com> References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> Message-ID: On Mon, Feb 10, 2014 at 10:09 AM, Alan G Isaac wrote: > On 2/9/2014 5:55 PM, alex wrote: > > I'm working on the same kinds of problems in scipy development > > (functions involving sparse matrices and abstract linear operators) > > > And how is numpy's matrix object getting in your way? > Your initial post simply treated the desirability of > deprecation as a given and did not lay out reasons. > A strong reason would be e.g. if the matrix object > is creating a serious maintenance headache. Eliminating > this should be a big enough gain to offset any lost interest > in numpy from users of Matlab, GAUSS, IDL etc. from the > disappearance of a user-friendly notation. > > I accept that a numpy matrix has some warts. In the past, > I've proposed changes to address these. E.g., > https://www.mail-archive.com/numpy-discussion at scipy.org/msg06780.html > However these went nowhere, so effectively the status quo was > defended. I can live with that. > > A bit of the notational advantage of the `matrix` object was undercut > by the addition of the `dot` method to arrays. just another one that make arrays nicer (although I'm on old versions and don't use it yet): keepdims option for reduce operations, like mean. demean each row ? Josef > If `matrix` is deprecated, > I would hope that a matrix-power method would be added. (One that works > correctly with boolean arrays and has a short name.) I ideally an inverse > method would be added as well (with a short name). I think adding the > hermitian transpose as `.H()` already has some support, but I forget its > current > status. > > Right now, to give a simple example, students can write a simple projection > matrix as `X * (X.T * X).I * X.T` instead of > `X.dot(la.inv(X.T.dot(X))).dot(X.T)`. > The advantage is obvious and even bigger with more complex expressions. > If we were to get `.I` for matrix inverse of an array (which I expect to be > vociferously resisted) it would be `X.dot(X.T.dot(X).I).dot(X.T)` which > at the moment I'm inclined to see as acceptable for teaching. (Not sure.) > > Just to forestall the usual "just start them with arrays, eventually > they'll > be grateful" reply, I would want to hear that suggestion only from someone > who has used it successfully with undergraduates in the social sciences. > > Alan Isaac > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsseabold at gmail.com Mon Feb 10 14:58:09 2014 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 10 Feb 2014 14:58:09 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> Message-ID: On Mon, Feb 10, 2014 at 2:49 PM, Matthew Brett wrote: > Hi, > > On Mon, Feb 10, 2014 at 11:44 AM, wrote: > > > > > > On Mon, Feb 10, 2014 at 2:12 PM, eat wrote: > >> > >> > >> > >> > >> On Mon, Feb 10, 2014 at 9:08 PM, alex wrote: > >>> > >>> On Mon, Feb 10, 2014 at 2:03 PM, eat wrote: > >>> > Rhetorical or not, but FWIW I'll prefer to take singular value > >>> > decomposition > >>> > (u, s, vt= svd(x)) and then based on the singular values s I'll > >>> > estimate a > >>> > "numerically feasible rank" r. Thus the diagonal of such hat matrix > >>> > would be > >>> > (u[:, :r]** 2).sum(1). > >>> > >>> It's a small detail but you probably want svd(x, full_matrices=False) > >>> to avoid anything NxN. > >> > >> Indeed. > > > > > > I meant the entire diagonal not the trace of the projection matrix. > > > > My (not articulated) thought was that I use element wise multiplication > > together with dot products instead of the three dot products, however > > elementwise algebra is not very common in linear algebra based textbooks. > > > > The question is whether students and new user coming from `matrix` > languages > > can translate formulas into code, or just copy formulas to code. > > (It took me a while to get used to numpy and take advantage of it's > features > > coming from GAUSS and Matlab.) > > > > OT since the precense or absence of matrix in numpy doesn't affect me. > > Josef - as a data point - does statsmodels use np.matrix? > > No. Skipper -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Feb 10 15:04:17 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 10 Feb 2014 15:04:17 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> Message-ID: On Mon, Feb 10, 2014 at 2:49 PM, Matthew Brett wrote: > Hi, > > On Mon, Feb 10, 2014 at 11:44 AM, wrote: > > > > > > On Mon, Feb 10, 2014 at 2:12 PM, eat wrote: > >> > >> > >> > >> > >> On Mon, Feb 10, 2014 at 9:08 PM, alex wrote: > >>> > >>> On Mon, Feb 10, 2014 at 2:03 PM, eat wrote: > >>> > Rhetorical or not, but FWIW I'll prefer to take singular value > >>> > decomposition > >>> > (u, s, vt= svd(x)) and then based on the singular values s I'll > >>> > estimate a > >>> > "numerically feasible rank" r. Thus the diagonal of such hat matrix > >>> > would be > >>> > (u[:, :r]** 2).sum(1). > >>> > >>> It's a small detail but you probably want svd(x, full_matrices=False) > >>> to avoid anything NxN. > >> > >> Indeed. > > > > > > I meant the entire diagonal not the trace of the projection matrix. > > > > My (not articulated) thought was that I use element wise multiplication > > together with dot products instead of the three dot products, however > > elementwise algebra is not very common in linear algebra based textbooks. > > > > The question is whether students and new user coming from `matrix` > languages > > can translate formulas into code, or just copy formulas to code. > > (It took me a while to get used to numpy and take advantage of it's > features > > coming from GAUSS and Matlab.) > > > > OT since the precense or absence of matrix in numpy doesn't affect me. > > Josef - as a data point - does statsmodels use np.matrix? > No (*). It's too much work to pay attention to whether something is an array or a matrix. although, we have a few sparse matrices. and pandas.DataFrames have a few tricky corners in between array and matrix. (*) grep findes two cases of `np.matrix` in the sandbox. (old unused code) Josef > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Feb 10 15:04:59 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 10 Feb 2014 12:04:59 -0800 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: <52F8EBB8.7070205@gmail.com> References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> Message-ID: Hi, On Mon, Feb 10, 2014 at 7:09 AM, Alan G Isaac wrote: [snip] > Just to forestall the usual "just start them with arrays, eventually they'll > be grateful" reply, I would want to hear that suggestion only from someone > who has used it successfully with undergraduates in the social sciences. I teach psychologists and neuroscientists mainly - you can get an idea of the level I'm teaching at from the notebook I posted earlier in the thread. I can't speak to my success in any objective way, but I didn't hear the students complain about the X.dot(Y). This may be because a) only some of them have much experience of or liking for matlab b) some of them have the impression that Python is the way to go, and they accept that this will mean some changes c) not much of the code they see is of the form: X * (X.T * X).I * X.T . In fact, the notebook I posted was the closest to that stuff. In any case I personally found it easier show the ideas using sympy. Cheers, Matthew From alan.isaac at gmail.com Mon Feb 10 15:23:08 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 10 Feb 2014 15:23:08 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> Message-ID: <52F9352C.4030808@gmail.com> On 2/10/2014 3:04 PM, Matthew Brett wrote: > I teach psychologists and neuroscientists mainly I must suspect that notebook was not for **undergraduate** psychology students. At least, not the ones I usually meet. SymPy is great but for those without background it is at best awkward. It certainly does not offer an equivalent to the notational convenience of numpy's matrix object. As far as I have been able to discern, the underlying motivation for eliminating the matrix class is that some developers want to stop supporting in any form the subclassing of numpy arrays. Do I have that right? So the real question is not about numpy's matrix class, but about whether subclassing will be supported. (If I'm correctly reading the tea leaves.) Alan Isaac From matthew.brett at gmail.com Mon Feb 10 15:33:35 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 10 Feb 2014 12:33:35 -0800 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: <52F9352C.4030808@gmail.com> References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> Message-ID: Hi, On Mon, Feb 10, 2014 at 12:23 PM, Alan G Isaac wrote: > On 2/10/2014 3:04 PM, Matthew Brett wrote: >> I teach psychologists and neuroscientists mainly > > I must suspect that notebook was not for > **undergraduate** psychology students. > At least, not the ones I usually meet. Well - in this case a mix. The class was this one: practical-neuroimaging.github.com I realize I'm not sure what you are teaching that is less complicated than the notebook, but nevertheless has a reasonable amount of stuff like X * (X.T * X).I * X.T ? Have you got any teaching materials to hand that would help us understand what you mean? > SymPy is great but for those without background > it is at best awkward. It certainly does not > offer an equivalent to the notational convenience > of numpy's matrix object. > > > As far as I have been able to discern, the underlying > motivation for eliminating the matrix class is that > some developers want to stop supporting in any form > the subclassing of numpy arrays. Do I have that right? No I don't think so, and I believe that line would be distracting. The question as I understand it, is very directly about whether the benefit of the notational convenience of np.matrix might be outweighed by the later cost of switching to np.array, and the confusion that comes up when a new user has to choose between them. But - I guess this will be stuff that has to go into the NEP. Cheers, Matthew From josef.pktd at gmail.com Mon Feb 10 15:39:17 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 10 Feb 2014 15:39:17 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> Message-ID: On Mon, Feb 10, 2014 at 3:04 PM, Matthew Brett wrote: > Hi, > > On Mon, Feb 10, 2014 at 7:09 AM, Alan G Isaac > wrote: > [snip] > > Just to forestall the usual "just start them with arrays, eventually > they'll > > be grateful" reply, I would want to hear that suggestion only from > someone > > who has used it successfully with undergraduates in the social sciences. > > I teach psychologists and neuroscientists mainly - you can get an idea > of the level I'm teaching at from the notebook I posted earlier in the > thread. > > I can't speak to my success in any objective way, but I didn't hear > the students complain about the X.dot(Y). This may be because > > a) only some of them have much experience of or liking for matlab > b) some of them have the impression that Python is the way to go, and > they accept that this will mean some changes > c) not much of the code they see is of the form: X * (X.T * X).I * X.T > . In fact, the notebook I posted was the closest to that stuff. In > any case I personally found it easier show the ideas using sympy. > In support of Alan's view: Linear models in econometrics is all linear algebra, and GAUSS is still popular among econometricians because you can write a lot of code just like in the paper. (although GAUSS isn't as popular as it was some time ago, but matlab is not much different.) https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/regression/gmm.py#L1194 statsmodels doesn't use masked arrays; structured dtypes and recarrays are only used for input, and might be replaced by pandas.DataFrames, pandas is creeping into more core areas of statsmodels. I'm not voting in favor of removing everything in numpy that I'm not using. Josef > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Feb 10 15:44:59 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 10 Feb 2014 13:44:59 -0700 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: <52F9352C.4030808@gmail.com> References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> Message-ID: On Mon, Feb 10, 2014 at 1:23 PM, Alan G Isaac wrote: > On 2/10/2014 3:04 PM, Matthew Brett wrote: > > I teach psychologists and neuroscientists mainly > > > I must suspect that notebook was not for > **undergraduate** psychology students. > At least, not the ones I usually meet. > > SymPy is great but for those without background > it is at best awkward. It certainly does not > offer an equivalent to the notational convenience > of numpy's matrix object. > > > As far as I have been able to discern, the underlying > motivation for eliminating the matrix class is that > some developers want to stop supporting in any form > the subclassing of numpy arrays. Do I have that right? > > So the real question is not about numpy's matrix class, > but about whether subclassing will be supported. > (If I'm correctly reading the tea leaves.) > > I don't see any reason to remove the Matrix object. It has its limitations, I don't use it myself, but it costs little and I don't see the value of forcing users to change. As to subclassing ndarray, it is not recommended because it seldom saves much work (see masked arrays), and can have side effects that are difficult to deal with. The result of the latter is that numpy itself is called upon to support array method overrides, of sum and mean for example. That makes for a mess. That said, there is no movement to forbid subclassing ndarray, but there will probably be more resistance to accommodating and fixing problems arising from that design choice. At least that is my own feeling. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From argriffi at ncsu.edu Mon Feb 10 15:45:41 2014 From: argriffi at ncsu.edu (alex) Date: Mon, 10 Feb 2014 15:45:41 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: Message-ID: On Mon, Feb 10, 2014 at 2:36 PM, Matthew Brett wrote: > Hi, > > On Mon, Feb 10, 2014 at 6:26 AM, Nathaniel Smith wrote: >> On Sun, Feb 9, 2014 at 4:59 PM, alex wrote: >>> Hello list, >>> >>> I wrote this mini-nep for numpy but I've been advised it is more >>> appropriate for discussion on the list. >>> >>> """ >>> The ``numpy.matrix`` API provides a low barrier to using Python >>> for linear algebra, just as the pre-3 Python ``input`` function >>> and ``print`` statement provided low barriers to using Python for >>> automatically evaluating input and for printing output. >>> >>> On the other hand, it really needs to be deprecated. >>> Let's deprecate ``numpy.matrix``. >>> """ >>> >>> I understand that numpy.matrix will not be deprecated any time soon, >>> but I hope this will register as a vote to help nudge its deprecation >>> closer to the realm of acceptable discussion. >> >> To make this more productive, maybe it would be useful to elaborate on >> what exactly we should do here. >> >> I can't imagine we'll actually remove 'matrix' from the numpy >> namespace at any point in the near future. > {out of order paste}: >> Maybe there should be a big warning to this effect in the np.matrix docstring? > > That seems reasonable to me. Maybe, to avoid heat and fast changes > the NEP could lay out different options with advantages and > disadvantages. > >> I do have the sense that when people choose to use it, they eventually >> come to regret this choice. It's a bit buggy and has confusing >> behaviours, and due to limitations of numpy's subclassing model, will >> probably always be buggy and have confusing behaviours. And it's >> marketed as being for new users, who are exactly the kind of users who >> aren't sophisticated enough to recognize these dangers. > > This paragraph is a good summary of why the current situation of > np.matrix could cause harm. > > It would really useful to have some hard evidence of who's using it > though. Are there projects that use np.matrix extensively? If so, > maybe some code from these could be use-cases to see if (pseudo-) > deprecation is practical? > > Alex - do you have time to lay this stuff out? I bet the NEP would be > a good way of helping the discussion stays on track. At very least it > could be a reference point the next time this comes up. I don't think I have enough perspective to write a real NEP, but maybe as a starting point we could begin a list somewhere, like on a wiki or possibly in the numpy github repo, surveying an early 2014 snapshot of the linear algebra APIs used by various Python projects. For example according to the responses in this thread, statsmodels seems to avoid using numpy.matrix except possibly for interfacing with pandas, and at least one professor relies on the numpy.matrix interface for classroom teaching. The list could include short quotes from people involved in the projects, if they want to share an opinion. It wouldn't be my intention to treat such a list as a vote, but rather as data and as an excuse to make a list of cool projects; I suspect that members of most projects would say "we don't use numpy.matrix but we don't mind if other people use it" and that most teachers or students who benefit from the gentler syntax of numpy.matrix would not even be reached by such a survey. Alex From matthew.brett at gmail.com Mon Feb 10 15:47:29 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 10 Feb 2014 12:47:29 -0800 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> Message-ID: Hi, On Mon, Feb 10, 2014 at 12:44 PM, Charles R Harris wrote: > > > > On Mon, Feb 10, 2014 at 1:23 PM, Alan G Isaac wrote: >> >> On 2/10/2014 3:04 PM, Matthew Brett wrote: >> > I teach psychologists and neuroscientists mainly >> >> >> I must suspect that notebook was not for >> **undergraduate** psychology students. >> At least, not the ones I usually meet. >> >> SymPy is great but for those without background >> it is at best awkward. It certainly does not >> offer an equivalent to the notational convenience >> of numpy's matrix object. >> >> >> As far as I have been able to discern, the underlying >> motivation for eliminating the matrix class is that >> some developers want to stop supporting in any form >> the subclassing of numpy arrays. Do I have that right? >> >> So the real question is not about numpy's matrix class, >> but about whether subclassing will be supported. >> (If I'm correctly reading the tea leaves.) >> > > I don't see any reason to remove the Matrix object. It has its limitations, > I don't use it myself, but it costs little and I don't see the value of > forcing users to change. Maybe it would help to take 'remove the Matrix object' off the table so we don't get side-tracked. Does anyone disagree with the proposal to take that off the table? Cheers, Matthew From matthew.brett at gmail.com Mon Feb 10 15:53:52 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 10 Feb 2014 12:53:52 -0800 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> Message-ID: Hi, On Mon, Feb 10, 2014 at 12:39 PM, wrote: > > > On Mon, Feb 10, 2014 at 3:04 PM, Matthew Brett > wrote: >> >> Hi, >> >> On Mon, Feb 10, 2014 at 7:09 AM, Alan G Isaac >> wrote: >> [snip] >> > Just to forestall the usual "just start them with arrays, eventually >> > they'll >> > be grateful" reply, I would want to hear that suggestion only from >> > someone >> > who has used it successfully with undergraduates in the social sciences. >> >> I teach psychologists and neuroscientists mainly - you can get an idea >> of the level I'm teaching at from the notebook I posted earlier in the >> thread. >> >> I can't speak to my success in any objective way, but I didn't hear >> the students complain about the X.dot(Y). This may be because >> >> a) only some of them have much experience of or liking for matlab >> b) some of them have the impression that Python is the way to go, and >> they accept that this will mean some changes >> c) not much of the code they see is of the form: X * (X.T * X).I * X.T >> . In fact, the notebook I posted was the closest to that stuff. In >> any case I personally found it easier show the ideas using sympy. > > > In support of Alan's view: > > Linear models in econometrics is all linear algebra, and GAUSS is still > popular among econometricians because you can write a lot of code just like > in the paper. (although GAUSS isn't as popular as it was some time ago, but > matlab is not much different.) > > https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/regression/gmm.py#L1194 Maybe it would be helpful to draw the distinction between 1) Teaching people to do numerical coding 2) Using code to demonstrate mathematical concepts For 1) - it looks like people writing serious code don't generally use np.matrix - but maybe we're missing some code-bases. For 2) - I personally think sympy is better for this. There might be some middle-ground (1.5) where the idea is to get people comfortable with writing 10-50 line scripts to do linear algebra-type things. I guess these people will be particularly difficult to persuade that it's a good idea to switch computer languages. Cheers, Matthew From josef.pktd at gmail.com Mon Feb 10 15:58:06 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 10 Feb 2014 15:58:06 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: Message-ID: On Mon, Feb 10, 2014 at 3:45 PM, alex wrote: > On Mon, Feb 10, 2014 at 2:36 PM, Matthew Brett > wrote: > > Hi, > > > > On Mon, Feb 10, 2014 at 6:26 AM, Nathaniel Smith wrote: > >> On Sun, Feb 9, 2014 at 4:59 PM, alex wrote: > >>> Hello list, > >>> > >>> I wrote this mini-nep for numpy but I've been advised it is more > >>> appropriate for discussion on the list. > >>> > >>> """ > >>> The ``numpy.matrix`` API provides a low barrier to using Python > >>> for linear algebra, just as the pre-3 Python ``input`` function > >>> and ``print`` statement provided low barriers to using Python for > >>> automatically evaluating input and for printing output. > >>> > >>> On the other hand, it really needs to be deprecated. > >>> Let's deprecate ``numpy.matrix``. > >>> """ > >>> > >>> I understand that numpy.matrix will not be deprecated any time soon, > >>> but I hope this will register as a vote to help nudge its deprecation > >>> closer to the realm of acceptable discussion. > >> > >> To make this more productive, maybe it would be useful to elaborate on > >> what exactly we should do here. > >> > >> I can't imagine we'll actually remove 'matrix' from the numpy > >> namespace at any point in the near future. > > {out of order paste}: > >> Maybe there should be a big warning to this effect in the np.matrix > docstring? > > > > That seems reasonable to me. Maybe, to avoid heat and fast changes > > the NEP could lay out different options with advantages and > > disadvantages. > > > >> I do have the sense that when people choose to use it, they eventually > >> come to regret this choice. It's a bit buggy and has confusing > >> behaviours, and due to limitations of numpy's subclassing model, will > >> probably always be buggy and have confusing behaviours. And it's > >> marketed as being for new users, who are exactly the kind of users who > >> aren't sophisticated enough to recognize these dangers. > > > > This paragraph is a good summary of why the current situation of > > np.matrix could cause harm. > > > > It would really useful to have some hard evidence of who's using it > > though. Are there projects that use np.matrix extensively? If so, > > maybe some code from these could be use-cases to see if (pseudo-) > > deprecation is practical? > > > > Alex - do you have time to lay this stuff out? I bet the NEP would be > > a good way of helping the discussion stays on track. At very least it > > could be a reference point the next time this comes up. > > I don't think I have enough perspective to write a real NEP, but maybe > as a starting point we could begin a list somewhere, like on a wiki or > possibly in the numpy github repo, surveying an early 2014 snapshot of > the linear algebra APIs used by various Python projects. For example > according to the responses in this thread, statsmodels seems to avoid > using numpy.matrix except possibly for interfacing with pandas, and at > least one professor relies on the numpy.matrix interface for classroom > teaching. The list could include short quotes from people involved in > the projects, if they want to share an opinion. > > It wouldn't be my intention to treat such a list as a vote, but rather > as data and as an excuse to make a list of cool projects; I suspect > that members of most projects would say "we don't use numpy.matrix but > we don't mind if other people use it" and that most teachers or > students who benefit from the gentler syntax of numpy.matrix would not > even be reached by such a survey. > My impression: As long as there is no big maintenance cost (which there isn't), I don't see any reason to remove matrix and to debate it every few years. All the users that are participating or reading the mailing list have been indoctrinated for years not to use matrix. Alan is one of the only active proponents. What about the silent hundred thousand users of numpy? I have no idea what they are doing. stage 1 use loops stage 2 use matrix stage 3 use arrays stage 1 and stage 2 is how undergraduate econometrics starts out. Josef Let's remove loops, users should vectorize. > > Alex > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Feb 10 16:00:12 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 10 Feb 2014 16:00:12 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> Message-ID: On Mon, Feb 10, 2014 at 3:39 PM, wrote: > > > On Mon, Feb 10, 2014 at 3:04 PM, Matthew Brett wrote: > >> Hi, >> >> On Mon, Feb 10, 2014 at 7:09 AM, Alan G Isaac >> wrote: >> [snip] >> > Just to forestall the usual "just start them with arrays, eventually >> they'll >> > be grateful" reply, I would want to hear that suggestion only from >> someone >> > who has used it successfully with undergraduates in the social sciences. >> >> I teach psychologists and neuroscientists mainly - you can get an idea >> of the level I'm teaching at from the notebook I posted earlier in the >> thread. >> >> I can't speak to my success in any objective way, but I didn't hear >> the students complain about the X.dot(Y). This may be because >> >> a) only some of them have much experience of or liking for matlab >> b) some of them have the impression that Python is the way to go, and >> they accept that this will mean some changes >> c) not much of the code they see is of the form: X * (X.T * X).I * X.T >> . In fact, the notebook I posted was the closest to that stuff. In >> any case I personally found it easier show the ideas using sympy. >> > > In support of Alan's view: > > Linear models in econometrics is all linear algebra, and GAUSS is still > popular among econometricians because you can write a lot of code just like > in the paper. (although GAUSS isn't as popular as it was some time ago, but > matlab is not much different.) > > > > https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/regression/gmm.py#L1194 > I should have added this http://en.wikipedia.org/wiki/Generalized_method_of_moments#Asymptotic_normality covariance of the estimator, second equation line Josef > > statsmodels doesn't use masked arrays; structured dtypes and recarrays are > only used for input, and might be replaced by pandas.DataFrames, pandas is > creeping into more core areas of statsmodels. > > I'm not voting in favor of removing everything in numpy that I'm not using. > > Josef > > >> >> Cheers, >> >> Matthew >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Mon Feb 10 16:03:03 2014 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 10 Feb 2014 23:03:03 +0200 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: <52F9352C.4030808@gmail.com> References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> Message-ID: 10.02.2014 22:23, Alan G Isaac kirjoitti: [clip] > As far as I have been able to discern, the underlying > motivation for eliminating the matrix class is that > some developers want to stop supporting in any form > the subclassing of numpy arrays. Do I have that right? What sparked this discussion (on Github) is that it is not possible to write duck-typed code that works correctly for: - ndarrays - matrices - scipy.sparse sparse matrixes The semantics of all three are different; scipy.sparse is somewhere between matrices and ndarrays with some things working randomly like matrices and others not. With some hyberbole added, one could say that from the developer point of view, np.matrix is doing and has already done evil just by existing, by messing up the unstated rules of ndarray semantics in Python. -- Pauli Virtanen From matthew.brett at gmail.com Mon Feb 10 16:08:29 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 10 Feb 2014 13:08:29 -0800 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: Message-ID: Hi, On Mon, Feb 10, 2014 at 12:58 PM, wrote: > > > On Mon, Feb 10, 2014 at 3:45 PM, alex wrote: >> >> On Mon, Feb 10, 2014 at 2:36 PM, Matthew Brett >> wrote: >> > Hi, >> > >> > On Mon, Feb 10, 2014 at 6:26 AM, Nathaniel Smith wrote: >> >> On Sun, Feb 9, 2014 at 4:59 PM, alex wrote: >> >>> Hello list, >> >>> >> >>> I wrote this mini-nep for numpy but I've been advised it is more >> >>> appropriate for discussion on the list. >> >>> >> >>> """ >> >>> The ``numpy.matrix`` API provides a low barrier to using Python >> >>> for linear algebra, just as the pre-3 Python ``input`` function >> >>> and ``print`` statement provided low barriers to using Python for >> >>> automatically evaluating input and for printing output. >> >>> >> >>> On the other hand, it really needs to be deprecated. >> >>> Let's deprecate ``numpy.matrix``. >> >>> """ >> >>> >> >>> I understand that numpy.matrix will not be deprecated any time soon, >> >>> but I hope this will register as a vote to help nudge its deprecation >> >>> closer to the realm of acceptable discussion. >> >> >> >> To make this more productive, maybe it would be useful to elaborate on >> >> what exactly we should do here. >> >> >> >> I can't imagine we'll actually remove 'matrix' from the numpy >> >> namespace at any point in the near future. >> > {out of order paste}: >> >> Maybe there should be a big warning to this effect in the np.matrix >> >> docstring? >> > >> > That seems reasonable to me. Maybe, to avoid heat and fast changes >> > the NEP could lay out different options with advantages and >> > disadvantages. >> > >> >> I do have the sense that when people choose to use it, they eventually >> >> come to regret this choice. It's a bit buggy and has confusing >> >> behaviours, and due to limitations of numpy's subclassing model, will >> >> probably always be buggy and have confusing behaviours. And it's >> >> marketed as being for new users, who are exactly the kind of users who >> >> aren't sophisticated enough to recognize these dangers. >> > >> > This paragraph is a good summary of why the current situation of >> > np.matrix could cause harm. >> > >> > It would really useful to have some hard evidence of who's using it >> > though. Are there projects that use np.matrix extensively? If so, >> > maybe some code from these could be use-cases to see if (pseudo-) >> > deprecation is practical? >> > >> > Alex - do you have time to lay this stuff out? I bet the NEP would be >> > a good way of helping the discussion stays on track. At very least it >> > could be a reference point the next time this comes up. >> >> I don't think I have enough perspective to write a real NEP, but maybe >> as a starting point we could begin a list somewhere, like on a wiki or >> possibly in the numpy github repo, surveying an early 2014 snapshot of >> the linear algebra APIs used by various Python projects. For example >> according to the responses in this thread, statsmodels seems to avoid >> using numpy.matrix except possibly for interfacing with pandas, and at >> least one professor relies on the numpy.matrix interface for classroom >> teaching. The list could include short quotes from people involved in >> the projects, if they want to share an opinion. >> >> It wouldn't be my intention to treat such a list as a vote, but rather >> as data and as an excuse to make a list of cool projects; I suspect >> that members of most projects would say "we don't use numpy.matrix but >> we don't mind if other people use it" and that most teachers or >> students who benefit from the gentler syntax of numpy.matrix would not >> even be reached by such a survey. > > > My impression: > > As long as there is no big maintenance cost (which there isn't), I don't see > any reason to remove matrix and to debate it every few years. No, all agree I think - let's not remove it. > All the users that are participating or reading the mailing list have been > indoctrinated for years not to use matrix. Here is the rub. This discussion does come up - 'np.array or np.matrix'. It came up in a Software Carpentry boot camp I was teaching on - and the instructors disagreed. I think the active questions here are: * Should we collect the discussion in coherent form somewhere? * Should we add something to the np.matrix docstring and if so what? * (Pauli's point): to what extent should we try to emulate the np.matrix API. Best, Matthew From alan.isaac at gmail.com Mon Feb 10 16:13:07 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 10 Feb 2014 16:13:07 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> Message-ID: <52F940E3.1020905@gmail.com> On 2/10/2014 4:03 PM, Pauli Virtanen wrote: > What sparked this discussion (on Github) is that it is not possible to > write duck-typed code that works correctly for: Do you mean one must start out with an 'asarray'? Or more than that? As I detailed in past discussion, the one thing I really do not like about the `matrix` design is that indexing always returns a matrix. I speculate this is the primary problem you're running into? Thanks, Alan Isaac From alan.isaac at gmail.com Mon Feb 10 16:26:26 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 10 Feb 2014 16:26:26 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: Message-ID: <52F94402.2000601@gmail.com> On 2/10/2014 4:08 PM, Matthew Brett wrote: > I think the active questions here are: > * Should we collect the discussion in coherent form somewhere? > * Should we add something to the np.matrix docstring and if so what? > * (Pauli's point): to what extent should we try to emulate the np.matrix API. Somewhat related to that last point: could an array grow an `inv` method? (Perhaps returning a pinv for ill conditioned cases.) Here are the primary things that make matrices convenient (particular in a teaching setting): * (partly addressed when `dot` method added) ** (could be partly addressed with an `mpow` method) .I (could be partly addressed with an `inv` method) .H (currently too controversial for ndarray) Some might also add the behavior of indexing, but I could only give qualified agreement to that. Alan Isaac From pav at iki.fi Mon Feb 10 16:28:03 2014 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 10 Feb 2014 23:28:03 +0200 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: <52F940E3.1020905@gmail.com> References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> Message-ID: 10.02.2014 23:13, Alan G Isaac kirjoitti: > On 2/10/2014 4:03 PM, Pauli Virtanen wrote: >> What sparked this discussion (on Github) is that it is not >> possible to write duck-typed code that works correctly for: > > Do you mean one must start out with an 'asarray'? Or more than > that? Starting with asarray won't work: sparse matrices are not subclasses of ndarray. Matrix-free linear operators are not such either. In Python code, you usually very seldom coerce your inputs to a specific type. The situation here is a bit as if there were two different stream object types in Python, and their .write() methods did completely different things, so that code doing I/O would need to always be careful with which type of a stream was in question. > As I detailed in past discussion, the one thing I really do not > like about the `matrix` design is that indexing always returns a > matrix. I speculate this is the primary problem you're running > into? The fact that reductions to 1D return 2D objects is also a problem, but the matrix multiplication vs. elementwise multiplication and division is also an issue. -- Pauli Virtanen From alan.isaac at gmail.com Mon Feb 10 16:40:14 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 10 Feb 2014 16:40:14 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> Message-ID: <52F9473E.8070104@gmail.com> On 2/10/2014 4:28 PM, Pauli Virtanen wrote: > Starting with asarray won't work: sparse matrices are not subclasses > of ndarray. I was focused on the `matrix` object. For this object, an initial asarray is all it takes to use array code. (Or ... not?) And it is a view, not a copy. I don't have the background to know how scipy ended up with a sparse matrix object instead of a sparse array object. In any case, it seems like a different question. Alan Isaac From argriffi at ncsu.edu Mon Feb 10 16:40:15 2014 From: argriffi at ncsu.edu (alex) Date: Mon, 10 Feb 2014 16:40:15 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> Message-ID: On Mon, Feb 10, 2014 at 3:47 PM, Matthew Brett wrote: > Hi, > > On Mon, Feb 10, 2014 at 12:44 PM, Charles R Harris > wrote: >> >> >> >> On Mon, Feb 10, 2014 at 1:23 PM, Alan G Isaac wrote: >>> >>> On 2/10/2014 3:04 PM, Matthew Brett wrote: >>> > I teach psychologists and neuroscientists mainly >>> >>> >>> I must suspect that notebook was not for >>> **undergraduate** psychology students. >>> At least, not the ones I usually meet. >>> >>> SymPy is great but for those without background >>> it is at best awkward. It certainly does not >>> offer an equivalent to the notational convenience >>> of numpy's matrix object. >>> >>> >>> As far as I have been able to discern, the underlying >>> motivation for eliminating the matrix class is that >>> some developers want to stop supporting in any form >>> the subclassing of numpy arrays. Do I have that right? >>> >>> So the real question is not about numpy's matrix class, >>> but about whether subclassing will be supported. >>> (If I'm correctly reading the tea leaves.) >>> >> >> I don't see any reason to remove the Matrix object. It has its limitations, >> I don't use it myself, but it costs little and I don't see the value of >> forcing users to change. > > Maybe it would help to take 'remove the Matrix object' off the table > so we don't get side-tracked. Does anyone disagree with the proposal > to take that off the table? No I really want to remove it :) If a non-frivolous NEP is written, this can be a token extreme opinion to be immediately discounted as not a practical solution. From alan.isaac at gmail.com Mon Feb 10 16:42:39 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 10 Feb 2014 16:42:39 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> Message-ID: <52F947CF.9080709@gmail.com> On 2/10/2014 4:40 PM, alex wrote: > I really want to remove it Can you articulate the problem created by its existence that leads you to this view? Alan Isaac From charlesr.harris at gmail.com Mon Feb 10 16:52:03 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 10 Feb 2014 14:52:03 -0700 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> Message-ID: On Mon, Feb 10, 2014 at 2:28 PM, Pauli Virtanen wrote: > 10.02.2014 23:13, Alan G Isaac kirjoitti: > > On 2/10/2014 4:03 PM, Pauli Virtanen wrote: > >> What sparked this discussion (on Github) is that it is not > >> possible to write duck-typed code that works correctly for: > > > > Do you mean one must start out with an 'asarray'? Or more than > > that? > > Starting with asarray won't work: sparse matrices are not subclasses > of ndarray. Matrix-free linear operators are not such either. > > In Python code, you usually very seldom coerce your inputs to a > specific type. The situation here is a bit as if there were two > different stream object types in Python, and their .write() methods > did completely different things, so that code doing I/O would need to > always be careful with which type of a stream was in question. > > > As I detailed in past discussion, the one thing I really do not > > like about the `matrix` design is that indexing always returns a > > matrix. I speculate this is the primary problem you're running > > into? > > The fact that reductions to 1D return 2D objects is also a problem, > but the matrix multiplication vs. elementwise multiplication and > division is also an issue. > > Is there a need for every package in numpy/scipy to support matrices? I can see leaving in the Matrix object for basic teaching/linear algebra, but perhaps it would be reasonable for more advanced applications to forgo support. That would fall into the class of not going out of the way to accommodate subclasses of ndarray that override methods. I support that approach in the long run because trying to keep all subclasses happy is extra effort that could be better spent elsewhere. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Feb 10 16:57:10 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 10 Feb 2014 13:57:10 -0800 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> Message-ID: Hi, On Mon, Feb 10, 2014 at 1:52 PM, Charles R Harris wrote: > > > > On Mon, Feb 10, 2014 at 2:28 PM, Pauli Virtanen wrote: >> >> 10.02.2014 23:13, Alan G Isaac kirjoitti: >> > On 2/10/2014 4:03 PM, Pauli Virtanen wrote: >> >> What sparked this discussion (on Github) is that it is not >> >> possible to write duck-typed code that works correctly for: >> > >> > Do you mean one must start out with an 'asarray'? Or more than >> > that? >> >> Starting with asarray won't work: sparse matrices are not subclasses >> of ndarray. Matrix-free linear operators are not such either. >> >> In Python code, you usually very seldom coerce your inputs to a >> specific type. The situation here is a bit as if there were two >> different stream object types in Python, and their .write() methods >> did completely different things, so that code doing I/O would need to >> always be careful with which type of a stream was in question. >> >> > As I detailed in past discussion, the one thing I really do not >> > like about the `matrix` design is that indexing always returns a >> > matrix. I speculate this is the primary problem you're running >> > into? >> >> The fact that reductions to 1D return 2D objects is also a problem, >> but the matrix multiplication vs. elementwise multiplication and >> division is also an issue. >> > > Is there a need for every package in numpy/scipy to support matrices? I can > see leaving in the Matrix object for basic teaching/linear algebra, but > perhaps it would be reasonable for more advanced applications to forgo > support. That would fall into the class of not going out of the way to > accommodate subclasses of ndarray that override methods. I support that > approach in the long run because trying to keep all subclasses happy is > extra effort that could be better spent elsewhere. Yes, I bet there is a solution in that direction that everyone could live with. Alex - yes - I think it would be hugely useful to write up this discussion as a wiki page or a NEP or a wiki page that might become a NEP. It seems to me there is a great deal of agreement here which could fruitfully be recorded. Cheers, Matthew From pav at iki.fi Mon Feb 10 17:11:55 2014 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 11 Feb 2014 00:11:55 +0200 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: <52F9473E.8070104@gmail.com> References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> Message-ID: 10.02.2014 23:40, Alan G Isaac kirjoitti: > On 2/10/2014 4:28 PM, Pauli Virtanen wrote: >> Starting with asarray won't work: sparse matrices are not >> subclasses of ndarray. > > I was focused on the `matrix` object. For this object, an initial > asarray is all it takes to use array code. (Or ... not?) And it is > a view, not a copy. > > I don't have the background to know how scipy ended up with a > sparse matrix object instead of a sparse array object. In any case, > it seems like a different question. I think this is very relevant question, and I believe one of the main motivations for the continuous reappearance of this discussion. The existence of np.matrix messes up the general agreement on ndarray semantics in Python. The meaning of very basic code such as A * B A.sum(0) A[0] where A and B are NxN matrices of some sort now depends on the types of A and B. This makes writing duck typed code impossible when both semantics are in play. This is more of a community and ecosystem question rather than about np.matrix and asarray(). I think the existence of np.matrix and its influence has set back the development of a way to express generic linear algebra (dense, sparse, matrix-free) algorithms in Python. -- Pauli Virtanen From argriffi at ncsu.edu Mon Feb 10 17:16:06 2014 From: argriffi at ncsu.edu (alex) Date: Mon, 10 Feb 2014 17:16:06 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: <52F947CF.9080709@gmail.com> References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F947CF.9080709@gmail.com> Message-ID: On Mon, Feb 10, 2014 at 4:42 PM, Alan G Isaac wrote: > On 2/10/2014 4:40 PM, alex wrote: >> I really want to remove it > > Can you articulate the problem created by its existence > that leads you to this view? In my opinion, Pauli has articulated these problems well in this thread. From chris.barker at noaa.gov Mon Feb 10 17:16:28 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 10 Feb 2014 14:16:28 -0800 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: <52F940E3.1020905@gmail.com> References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> Message-ID: On Mon, Feb 10, 2014 at 1:13 PM, Alan G Isaac wrote: > Do you mean one must start out with an 'asarray'? > Or more than that? > maybe np.asanyarray() It's nice, at least in principle for duck-typed functions to return the type they were handed. And this really is the primary issu ewith np.matrix -- it takes some real effort to write code that preserves your objects as the matrix type. As I detailed in past discussion, the one thing > I really do not like about the `matrix` design > is that indexing always returns a matrix. > And that's the other one -- to be really nice and useful, I think we'd need a row_vector and column_vector type. i.e if you iterate through a matrix, you get a bunch of row_vector instances -- not a bunch of Nx1 matrixes. But anyway -- there was a big ol' discussion about this a few years back -- my summary of that is: 1) not very many people use matrix 1a) Those that do, often end up dropping it as their experience develops 1b) It is a source of sonfusion -- some argue more confusion than it's worth. 2) It might be more useful if it were substantially improved - some of the subclassing issues - vector types - ??? 3) A number of people had some great ideas how to improve it. 4) Not a single person with both the skill set and the bandwidth to actually do it has shown any interest for a long time. Given (1) and (4) -- I can see that deprecation might seem to make sense. However, I am perfectly willing to Accept Alan's assurance that it's a useful teaching tool in some cases as is is. Note that I would argue that it's NOT for "newbies", but rather, useful if you want to provide a computational environment where matrixes make sense, and the point is to teach and work with those concepts, rather than to learn numpy in the broader sense. If the goal is to teach numpy for general use, I don't think you should introduce the matrix object. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Feb 10 17:17:58 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 10 Feb 2014 14:17:58 -0800 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> Message-ID: Hi, On Mon, Feb 10, 2014 at 2:11 PM, Pauli Virtanen wrote: > 10.02.2014 23:40, Alan G Isaac kirjoitti: >> On 2/10/2014 4:28 PM, Pauli Virtanen wrote: >>> Starting with asarray won't work: sparse matrices are not >>> subclasses of ndarray. >> >> I was focused on the `matrix` object. For this object, an initial >> asarray is all it takes to use array code. (Or ... not?) And it is >> a view, not a copy. >> >> I don't have the background to know how scipy ended up with a >> sparse matrix object instead of a sparse array object. In any case, >> it seems like a different question. > > I think this is very relevant question, and I believe one of the main > motivations for the continuous reappearance of this discussion. > > The existence of np.matrix messes up the general agreement on ndarray > semantics in Python. The meaning of very basic code such as > > A * B > A.sum(0) > A[0] > > where A and B are NxN matrices of some sort now depends on the types > of A and B. This makes writing duck typed code impossible when both > semantics are in play. That is a very convincing argument. What would be the problems (apart from code compatibility) in making scipy.sparse use the ndarray semantics? Thanks, Matthew From alan.isaac at gmail.com Mon Feb 10 17:31:52 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 10 Feb 2014 17:31:52 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> Message-ID: <52F95358.8010109@gmail.com> On 2/10/2014 5:11 PM, Pauli Virtanen wrote: > The existence of np.matrix messes up the general agreement on ndarray > semantics in Python. The meaning of very basic code such as > > A * B > A.sum(0) > A[0] > > where A and B are NxN matrices of some sort now depends on the types > of A and B. This makes writing duck typed code impossible when both > semantics are in play. I'm just missing the point here; sorry. Why isn't the right approach to require that any object that wants to work with scipy can be called by `asarray` to guarantee the core semantics? (And the matrix object passes this test.) For some objects we can agree that `asarray` will coerce them. (E.g., lists.) I just do not see why scipy should care about the semantics an object uses for interacting with other objects of the same type. Alan Isaac From pav at iki.fi Mon Feb 10 17:33:40 2014 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 11 Feb 2014 00:33:40 +0200 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> Message-ID: 11.02.2014 00:17, Matthew Brett kirjoitti: [clip] > That is a very convincing argument. > > What would be the problems (apart from code compatibility) in making > scipy.sparse use the ndarray semantics? I'd estimate the effort it would take to convert scipy.sparse to ndarray semantics is about a couple of afternoon hacks (normal, not Ipython-size), so it should be doable. Also, a shorthand for right-multiplication is probably necessary, as A.T.dot(B.T).T is unwieldy. As far as backward compatibility goes: change from * to .dot would break everyone's code. I suspect the rest of the changes have smaller impacts. The code breakage is such that I don't think it can be easily done by changing the behavior of "csr_matrix". I've previously proposed adding csr_array et al., and deprecating csr_matrix et al.. Not sure if the *_matrix can ever be removed, but it would be useful to point new users to use the interface with the ndarray convention. -- Pauli Virtanen From matthew.brett at gmail.com Mon Feb 10 17:54:39 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 10 Feb 2014 14:54:39 -0800 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> Message-ID: Hi, On Mon, Feb 10, 2014 at 2:33 PM, Pauli Virtanen wrote: > 11.02.2014 00:17, Matthew Brett kirjoitti: > [clip] >> That is a very convincing argument. >> >> What would be the problems (apart from code compatibility) in making >> scipy.sparse use the ndarray semantics? > > I'd estimate the effort it would take to convert scipy.sparse to ndarray > semantics is about a couple of afternoon hacks (normal, not > Ipython-size), so it should be doable. > > Also, a shorthand for right-multiplication is probably necessary, as > > A.T.dot(B.T).T > > is unwieldy. > > As far as backward compatibility goes: change from * to .dot would break > everyone's code. I suspect the rest of the changes have smaller impacts. > > The code breakage is such that I don't think it can be easily done by > changing the behavior of "csr_matrix". I've previously proposed adding > csr_array et al., and deprecating csr_matrix et al.. Not sure if the > *_matrix can ever be removed, but it would be useful to point new users > to use the interface with the ndarray convention. Yes, that seems very sensible. Then what about Chuck's suggestion - np.matrix stays but it is effectively an independent project that other parts of numpy or scipy are not required to support. Scipy.sparse switches to the ndarray semantics and future subclasses of ndarray should also use ndarray semantics? Matthew From pav at iki.fi Mon Feb 10 18:29:28 2014 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 11 Feb 2014 01:29:28 +0200 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: <52F95358.8010109@gmail.com> References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> Message-ID: 11.02.2014 00:31, Alan G Isaac kirjoitti: > On 2/10/2014 5:11 PM, Pauli Virtanen wrote: >> The existence of np.matrix messes up the general agreement on ndarray >> semantics in Python. The meaning of very basic code such as >> >> A * B >> A.sum(0) >> A[0] >> >> where A and B are NxN matrices of some sort now depends on the types >> of A and B. This makes writing duck typed code impossible when both >> semantics are in play. > > I'm just missing the point here; sorry. > Why isn't the right approach to require that > any object that wants to work with scipy > can be called by `asarray` to guarantee > the core semantics? (And the matrix > object passes this test.) For some objects > we can agree that `asarray` will coerce them. > (E.g., lists.) > > I just do not see why scipy should care about > the semantics an object uses for interacting > with other objects of the same type. I have a couple of points: (A) asarray() coerces the input to a dense array. This you do not want to do to sparse matrices or matrix-free linear operators, as many linear algebra algorithms don't need to know the matrix entries. (B) Coercing input types is something that is seldom done in Python code, since it breaks duck typing. Usually, the interface is specified by assumed semantics of the input objects. The user is then free to pass in mock objects that fulfill the necessary subsection of the assumed interface. (C) This is not only about Scipy, but also a language design question: Suppose someone, who is not a Python expert, wants to implement a linear algebra algorithm in Python. Will they write it using matrix or ndarray? (Note: np.matrix is not uncommon on stackoverflow.) Will someone who reads the code easily understand what it does (does * stand for elementwise or matrix product etc)? Can they easily make it work both with sparse and dense matrices? Matrix-free operators? Does it work both for ndarray and np.matrix inputs? (D) The presence of np.matrix invites people to write code using the np.matrix semantics. This can further lead to the code spitting out dense results as np.matrix, and then it becomes difficult to follow what sort of an object you have. (E) Some examples of the above semantics diaspora on scipy.sparse: * Implementation of GMRES et al in Scipy. The implementation reinvents yet another set of semantics that it uses internally. * scipy.sparse has mostly matrix semantics, but not completely, and the return values vary between matrix and ndarray -- Pauli Virtanen From josef.pktd at gmail.com Mon Feb 10 18:39:34 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 10 Feb 2014 18:39:34 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> Message-ID: On Mon, Feb 10, 2014 at 6:29 PM, Pauli Virtanen wrote: > 11.02.2014 00:31, Alan G Isaac kirjoitti: > > On 2/10/2014 5:11 PM, Pauli Virtanen wrote: > >> The existence of np.matrix messes up the general agreement on ndarray > >> semantics in Python. The meaning of very basic code such as > >> > >> A * B > >> A.sum(0) > >> A[0] > >> > >> where A and B are NxN matrices of some sort now depends on the types > >> of A and B. This makes writing duck typed code impossible when both > >> semantics are in play. > > > > I'm just missing the point here; sorry. > > Why isn't the right approach to require that > > any object that wants to work with scipy > > can be called by `asarray` to guarantee > > the core semantics? (And the matrix > > object passes this test.) For some objects > > we can agree that `asarray` will coerce them. > > (E.g., lists.) > > > > I just do not see why scipy should care about > > the semantics an object uses for interacting > > with other objects of the same type. > > I have a couple of points: > > (A) > > asarray() coerces the input to a dense array. This you do not want to do > to sparse matrices or matrix-free linear operators, as many linear > algebra algorithms don't need to know the matrix entries. > > (B) > > Coercing input types is something that is seldom done in Python code, > since it breaks duck typing. > > Usually, the interface is specified by assumed semantics of the input > objects. The user is then free to pass in mock objects that fulfill the > necessary subsection of the assumed interface. > Almost all the code in scipy.stats and statsmodels starts with np.asarray. The numpy doc standard has the term `array_like` to indicate things that can be converted to a usable object by ndasarray. ducktyping could be restricted to a very narrow category of ducks. What about masked arrays and structured dtypes? Because we cannot usefully convert them by asarray, we have to tell users that they don't work with a function. Our ducks that quack in the wrong way. ? How do you handle list and other array_likes in sparse? Josef > > (C) > > This is not only about Scipy, but also a language design question: > > Suppose someone, who is not a Python expert, wants to implement a > linear algebra algorithm in Python. > > Will they write it using matrix or ndarray? (Note: np.matrix is not > uncommon on stackoverflow.) > > Will someone who reads the code easily understand what it does (does * > stand for elementwise or matrix product etc)? > > Can they easily make it work both with sparse and dense matrices? > Matrix-free operators? Does it work both for ndarray and np.matrix inputs? > > (D) > > The presence of np.matrix invites people to write code using the > np.matrix semantics. This can further lead to the code spitting out > dense results as np.matrix, and then it becomes difficult to follow > what sort of an object you have. > > (E) > > Some examples of the above semantics diaspora on scipy.sparse: > > * Implementation of GMRES et al in Scipy. The implementation reinvents > yet another set of semantics that it uses internally. > > * scipy.sparse has mostly matrix semantics, but not completely, and the > return values vary between matrix and ndarray > > > -- > Pauli Virtanen > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Feb 10 18:45:38 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 10 Feb 2014 18:45:38 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> Message-ID: On Mon, Feb 10, 2014 at 6:39 PM, wrote: > > > On Mon, Feb 10, 2014 at 6:29 PM, Pauli Virtanen wrote: > >> 11.02.2014 00:31, Alan G Isaac kirjoitti: >> > On 2/10/2014 5:11 PM, Pauli Virtanen wrote: >> >> The existence of np.matrix messes up the general agreement on ndarray >> >> semantics in Python. The meaning of very basic code such as >> >> >> >> A * B >> >> A.sum(0) >> >> A[0] >> >> >> >> where A and B are NxN matrices of some sort now depends on the types >> >> of A and B. This makes writing duck typed code impossible when both >> >> semantics are in play. >> > >> > I'm just missing the point here; sorry. >> > Why isn't the right approach to require that >> > any object that wants to work with scipy >> > can be called by `asarray` to guarantee >> > the core semantics? (And the matrix >> > object passes this test.) For some objects >> > we can agree that `asarray` will coerce them. >> > (E.g., lists.) >> > >> > I just do not see why scipy should care about >> > the semantics an object uses for interacting >> > with other objects of the same type. >> >> I have a couple of points: >> >> (A) >> >> asarray() coerces the input to a dense array. This you do not want to do >> to sparse matrices or matrix-free linear operators, as many linear >> algebra algorithms don't need to know the matrix entries. >> >> (B) >> >> Coercing input types is something that is seldom done in Python code, >> since it breaks duck typing. >> >> Usually, the interface is specified by assumed semantics of the input >> objects. The user is then free to pass in mock objects that fulfill the >> necessary subsection of the assumed interface. >> > > Almost all the code in scipy.stats and statsmodels starts with np.asarray. > The numpy doc standard has the term `array_like` to indicate things that > can be converted to a usable object by ndasarray. > > ducktyping could be restricted to a very narrow category of ducks. > I thought once it would be nice to have a flag on the classes that indicate `array_semantic` versus `matrix_semantic` so it would be easy to check the quack instead of the duck. Josef > > What about masked arrays and structured dtypes? > Because we cannot usefully convert them by asarray, we have to tell users > that they don't work with a function. > Our ducks that quack in the wrong way. ? > > How do you handle list and other array_likes in sparse? > > Josef > > >> >> (C) >> >> This is not only about Scipy, but also a language design question: >> >> Suppose someone, who is not a Python expert, wants to implement a >> linear algebra algorithm in Python. >> >> Will they write it using matrix or ndarray? (Note: np.matrix is not >> uncommon on stackoverflow.) >> >> Will someone who reads the code easily understand what it does (does * >> stand for elementwise or matrix product etc)? >> >> Can they easily make it work both with sparse and dense matrices? >> Matrix-free operators? Does it work both for ndarray and np.matrix inputs? >> >> (D) >> >> The presence of np.matrix invites people to write code using the >> np.matrix semantics. This can further lead to the code spitting out >> dense results as np.matrix, and then it becomes difficult to follow >> what sort of an object you have. >> >> (E) >> >> Some examples of the above semantics diaspora on scipy.sparse: >> >> * Implementation of GMRES et al in Scipy. The implementation reinvents >> yet another set of semantics that it uses internally. >> >> * scipy.sparse has mostly matrix semantics, but not completely, and the >> return values vary between matrix and ndarray >> >> >> -- >> Pauli Virtanen >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Mon Feb 10 19:39:16 2014 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 11 Feb 2014 02:39:16 +0200 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> Message-ID: 11.02.2014 01:39, josef.pktd at gmail.com kirjoitti: [clip] > Almost all the code in scipy.stats and statsmodels starts with np.asarray. > The numpy doc standard has the term `array_like` to indicate things that > can be converted to a usable object by ndasarray. > > ducktyping could be restricted to a very narrow category of ducks. > > What about masked arrays and structured dtypes? > Because we cannot usefully convert them by asarray, we have to tell users > that they don't work with a function. > Our ducks that quack in the wrong way.? The issue here is semantics for basic linear algebra operations, such as matrix multiplication, that work for different matrix objects, including ndarrays. What is there now in scipy.sparse is influenced by np.matrix, and this is proving to be sub-optimal, as it is incompatible with ndarrays. > How do you handle list and other array_likes in sparse? if isinstance(t, (list, tuple)): asarray(...) Sure, np.matrix can be dealt with as an input too. But as said, I'm not arguing so much about asarray'in np.matrices as input, but the fact that agreement on the meaning of "*" in linear algebra code in Python is muddled. This should be fixed, and deprecating np.matrix would point the way. (I also suspect that this argument has been raised before, but as long as there's no canonical write-up...) -- Pauli Virtanen From charlesr.harris at gmail.com Mon Feb 10 20:11:40 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 10 Feb 2014 18:11:40 -0700 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> Message-ID: On Mon, Feb 10, 2014 at 5:39 PM, Pauli Virtanen wrote: > 11.02.2014 01:39, josef.pktd at gmail.com kirjoitti: > [clip] > > Almost all the code in scipy.stats and statsmodels starts with > np.asarray. > > The numpy doc standard has the term `array_like` to indicate things that > > can be converted to a usable object by ndasarray. > > > > ducktyping could be restricted to a very narrow category of ducks. > > > > What about masked arrays and structured dtypes? > > Because we cannot usefully convert them by asarray, we have to tell users > > that they don't work with a function. > > Our ducks that quack in the wrong way.? > > The issue here is semantics for basic linear algebra operations, such as > matrix multiplication, that work for different matrix objects, including > ndarrays. > > What is there now in scipy.sparse is influenced by np.matrix, and this > is proving to be sub-optimal, as it is incompatible with ndarrays. > > > How do you handle list and other array_likes in sparse? > > if isinstance(t, (list, tuple)): asarray(...) > > Sure, np.matrix can be dealt with as an input too. > > But as said, I'm not arguing so much about asarray'in np.matrices as > input, but the fact that agreement on the meaning of "*" in linear > algebra code in Python is muddled. This should be fixed, and deprecating > np.matrix would point the way. > > (I also suspect that this argument has been raised before, but as long > as there's no canonical write-up...) > > This would require deprecating current sparse as well, no? I could be convinced to follow this route if there were a pedagogic version of a matrix type that was restricted to linear algebra available as a separate project. It could even have some improvements, row and column vectors, inv, etc, but would not be as full featured as numpy arrays. The idea is that it would serve for teaching matrices rather than numerical programming in python. Hopefully that would satisfy Alan's teaching use case. There is the danger of students getting tied to that restricted implementation, but that may not be something to worry about for the sort of students Alan is talking about. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Feb 10 20:45:36 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 10 Feb 2014 18:45:36 -0700 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> Message-ID: On Mon, Feb 10, 2014 at 6:11 PM, Charles R Harris wrote: > > > > On Mon, Feb 10, 2014 at 5:39 PM, Pauli Virtanen wrote: > >> 11.02.2014 01:39, josef.pktd at gmail.com kirjoitti: >> [clip] >> > Almost all the code in scipy.stats and statsmodels starts with >> np.asarray. >> > The numpy doc standard has the term `array_like` to indicate things that >> > can be converted to a usable object by ndasarray. >> > >> > ducktyping could be restricted to a very narrow category of ducks. >> > >> > What about masked arrays and structured dtypes? >> > Because we cannot usefully convert them by asarray, we have to tell >> users >> > that they don't work with a function. >> > Our ducks that quack in the wrong way.? >> >> The issue here is semantics for basic linear algebra operations, such as >> matrix multiplication, that work for different matrix objects, including >> ndarrays. >> >> What is there now in scipy.sparse is influenced by np.matrix, and this >> is proving to be sub-optimal, as it is incompatible with ndarrays. >> >> > How do you handle list and other array_likes in sparse? >> >> if isinstance(t, (list, tuple)): asarray(...) >> >> Sure, np.matrix can be dealt with as an input too. >> >> But as said, I'm not arguing so much about asarray'in np.matrices as >> input, but the fact that agreement on the meaning of "*" in linear >> algebra code in Python is muddled. This should be fixed, and deprecating >> np.matrix would point the way. >> >> (I also suspect that this argument has been raised before, but as long >> as there's no canonical write-up...) >> >> > This would require deprecating current sparse as well, no? > > I could be convinced to follow this route if there were a pedagogic > version of a matrix type that was restricted to linear algebra available as > a separate project. It could even have some improvements, row and column > vectors, inv, etc, but would not be as full featured as numpy arrays. The > idea is that it would serve for teaching matrices rather than numerical > programming in python. Hopefully that would satisfy Alan's teaching use > case. There is the danger of students getting tied to that restricted > implementation, but that may not be something to worry about for the sort > of students Alan is talking about. > > Another possibility is to provide an infix matrix multiplication operator. from sage.misc.decorators import infix_operator @infix_operator('multiply')def dot(a,b): return a.dot_product(b) u=vector([1,2,3]) v=vector([5,4,3])print(u *dot* v)# => 22 @infix_operator('or')def plus(x,y): return x*yprint(2 |plus| 4)# => 6 Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Mon Feb 10 21:23:12 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 10 Feb 2014 21:23:12 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> Message-ID: <52F98990.7010500@gmail.com> On 2/10/2014 7:39 PM, Pauli Virtanen wrote: > The issue here is semantics for basic linear algebra operations, such as > matrix multiplication, that work for different matrix objects, including > ndarrays. I'll see if I can restate my suggestion in another way, because I do not feel you are responding to it. (I might be wrong.) What is a duck? If you ask it to quack, it quacks. OK, but what is it to quack? Here, quacking is behaving like an ndarray (in your view, as I understand it) when asked. But how do we ask? Your view (if I understand) is we ask via the operations supported by ndarrays. But maybe that is the wrong way for the library to ask this question. If so, then scipy libraries could ask an object to behave like an an ndarray by calling, e.g., __asarray__ on it. It becomes the responsibility of the object to return something appropriate when __asarray__ is called. Objects that know how to do this will provide __asarray__ and respond appropriately. Other types can be coerced if that is the documented behavior (e.g., lists). The libraries will then be written for a single type of behavior. What it means to "quack" is pretty easily documented, and a matrix object already knows how (e.g., m.A). Presumably in this scenario __asarray__ would return an object that behaves like an ndarray and a converter for turning the final result into the desired object type (e.g., into a `matrix` if necessary). Hope that clearer, even if it proves a terrible idea. Alan Isaac From ndarray at mac.com Mon Feb 10 23:01:28 2014 From: ndarray at mac.com (Alexander Belopolsky) Date: Mon, 10 Feb 2014 23:01:28 -0500 Subject: [Numpy-discussion] Inheriting from ndarray Was: deprecate numpy.matrix Message-ID: On Mon, Feb 10, 2014 at 11:31 AM, Nathaniel Smith wrote: > And in the long run, I > think the goal is to move people away from inheriting from np.ndarray. > This is music to my ears, but what is the future of numpy.ma? I understand that numpy.oldnumeric.ma (the older version written without inheritance) has been deprecated and slated to be removed in 1.9. I also have seen some attempts to bring ma functionality into the core ndarray object, but those have not been successful as far as I can tell. In general, what is the future of inheriting from np.ndarray? -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Feb 11 01:16:30 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 10 Feb 2014 23:16:30 -0700 Subject: [Numpy-discussion] Inheriting from ndarray Was: deprecate numpy.matrix In-Reply-To: References: Message-ID: On Mon, Feb 10, 2014 at 9:01 PM, Alexander Belopolsky wrote: > > On Mon, Feb 10, 2014 at 11:31 AM, Nathaniel Smith wrote: > >> And in the long run, I >> think the goal is to move people away from inheriting from np.ndarray. >> > > This is music to my ears, but what is the future of numpy.ma? I > understand that numpy.oldnumeric.ma (the older version written without > inheritance) has been deprecated and slated to be removed in 1.9. I also > have seen some attempts to bring ma functionality into the core ndarray > object, but those have not been successful as far as I can tell. > numpy.ma is pretty much unmaintained at the moment, but it is pretty stable and there are no plans to remove it. I'm kinda sad that moving the functionality into numpy came to naught, but the time was short and the disagreements were long. Hopefully we learned something in the attempt. I don't know of any plans for masked arrays at the moment apart for waiting to see what happens with dynd. I don't know what the chances of an overhaul might be, or even if it could be made without disturbing current code. I think we would have to offer something special to motivate folks to even think of switching. > > In general, what is the future of inheriting from np.ndarray? > Well, we can't do much about it except discourage it. It is often a bad design decision that people get sucked into because they want to borrow some functionality. OTOH, there hasn't been an easy way to make use of ndarray functionality for non-subclasses, and there is a *lot* to implement to make an ndarray like object. Hopefully the new `__numpy_ufunc__` attribute will make that easier. If you have suggestions we'd like to hear them. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjordan1 at uw.edu Tue Feb 11 01:43:53 2014 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Mon, 10 Feb 2014 22:43:53 -0800 Subject: [Numpy-discussion] record arrays with char*? Message-ID: I'm trying to wrap some C code using cython. The C code can take inputs in two modes: dense inputs and sparse inputs. For dense inputs the array indexing is naive. I have wrappers for that. In the sparse case the matrix entries are typically indexed via names. So, for example, the library documentation includes this as input you could give: struct { char* ind; double val, wght; } data[] = { {"camera", 15, 2}, {"necklace", 100, 20}, {"vase", 90, 20}, {"pictures", 60, 30}, {"tv", 40, 40}, {"video", 15, 30}}; At the C level, data is passed to the function by directly giving its address. (i.e. the C function takes as an argument (unsigned long) data, casting the data pointer to an int) I'd like to create something similar using record arrays, such as np.array([("camera", 15, 2), ("necklace", 100, 20), ... ], dtype='object,f8,f8'). Unfortunately this fails because (1) In cython I need to determine the address of the first element and I can't take the address of a an input whose type I don't know (the exact type will vary on the application, so more or fewer fields may be in the C struct) (2) I don't think a python object type is what I want--I need a char* representation of the string. (Unfortunately I can't test this because I haven't solved (1) -- how do you pass a record array around in cython and/or take its address?) Any suggestions? From toddrjen at gmail.com Tue Feb 11 05:16:45 2014 From: toddrjen at gmail.com (Todd) Date: Tue, 11 Feb 2014 11:16:45 +0100 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: <52F98990.7010500@gmail.com> References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> <52F98990.7010500@gmail.com> Message-ID: On Feb 11, 2014 3:23 AM, "Alan G Isaac" wrote: > > On 2/10/2014 7:39 PM, Pauli Virtanen wrote: > > The issue here is semantics for basic linear algebra operations, such as > > matrix multiplication, that work for different matrix objects, including > > ndarrays. > > > I'll see if I can restate my suggestion in another way, > because I do not feel you are responding to it. > (I might be wrong.) > > What is a duck? If you ask it to quack, it quacks. > OK, but what is it to quack? > > Here, quacking is behaving like an ndarray (in your view, > as I understand it) when asked. But how do we ask? > Your view (if I understand) is we ask via the operations > supported by ndarrays. But maybe that is the wrong way > for the library to ask this question. > > If so, then scipy libraries could ask an object > to behave like an an ndarray by calling, e.g., > __asarray__ on it. It becomes the responsibility > of the object to return something appropriate > when __asarray__ is called. Objects that know how to do > this will provide __asarray__ and respond > appropriately. Other types can be coerced if > that is the documented behavior (e.g., lists). > The libraries will then be written for a single > type of behavior. What it means to "quack" is > pretty easily documented, and a matrix object > already knows how (e.g., m.A). Presumably in > this scenario __asarray__ would return an object > that behaves like an ndarray and a converter for > turning the final result into the desired object > type (e.g., into a `matrix` if necessary). > > Hope that clearer, even if it proves a terrible idea. > > Alan Isaac I don't currently use the matrix class, but having taken many linear algebra classes I can see the appeal, and if I end up teaching the subject I think I would appreciate having it available. On the other hand, I certainly can see the possibility for confusion, and I don't think it is something that should be used unless someone has a really good reason. So I come out somewhere in the middle here. So, although this may end up being a terrible idea, I would like to purpose what I think is a compromise: instead of just removing matrix, split it out into a scikit. That way, it it's still available for those who need it, but will be less likely to be used accidentally, and won't be interfering with the rest of numpy and scipy development. Specifically, I would split matrix into a scikit, while in the same release deprecate np.matrix. They can then exist in parallel for a few releases to allow code to be ported away from it. However, I would suggest that before the split, all linear algebra routines should be available as functions or methods in numpy proper. -------------- next part -------------- An HTML attachment was scrubbed... URL: From toddrjen at gmail.com Tue Feb 11 05:20:46 2014 From: toddrjen at gmail.com (Todd) Date: Tue, 11 Feb 2014 11:20:46 +0100 Subject: [Numpy-discussion] Inheriting from ndarray Was: deprecate numpy.matrix In-Reply-To: References: Message-ID: On Feb 11, 2014 5:01 AM, "Alexander Belopolsky" wrote: > > > On Mon, Feb 10, 2014 at 11:31 AM, Nathaniel Smith wrote: >> >> And in the long run, I >> think the goal is to move people away from inheriting from np.ndarray. > > > This is music to my ears, There are a lot of units (meter, foot, built, etc.) packages that use inheriting from numpy to handle their units. I wouldn't call it a shortcut, it is a common design decision. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Tue Feb 11 05:25:30 2014 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 11 Feb 2014 10:25:30 +0000 (UTC) Subject: [Numpy-discussion] deprecate numpy.matrix References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> <52F98990.7010500@gmail.com> Message-ID: Alan G Isaac gmail.com> writes: [clip] > Here, quacking is behaving like an ndarray (in your view, > as I understand it) when asked. But how do we ask? > Your view (if I understand) is we ask via the operations > supported by ndarrays. But maybe that is the wrong way > for the library to ask this question. It is not a good thing that there is no well defined "domain specific language" for matrix algebra in Python. Rather, some code is written with one convention and other code with a different convention. The conventions disagree on how to express basic operations, such as matrix multiplication. Moreover, the ndarray is also lacking some useful things, as you point out. But I think the right solution would be to stuff the required additions into ndarray, rather than retaining the otherwise incompatible np.matrix as a crutch. > If so, then scipy libraries could ask an object > to behave like an an ndarray by calling, e.g., > __asarray__ on it. It becomes the responsibility > of the object to return something appropriate > when __asarray__ is called. Objects that know how to do > this will provide __asarray__ and respond > appropriately. Another way to achieve similar thing as your suggestion is to add a coercion function in the vein of scipy.sparse.aslinearoperator. It could deal with known-failure cases (np.matrix, scipy.sparse matrices) and for the rest just assume the object satisfies the ndarray API and pass them through. -- Pauli Virtanen From hoogendoorn.eelco at gmail.com Tue Feb 11 05:36:32 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Tue, 11 Feb 2014 11:36:32 +0100 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> <52F98990.7010500@gmail.com> Message-ID: My 2pc: I personally hardly ever use matrix, even in linear algebra dense code. It can be nice though, to use matrix semantics within a restricted scope. When I first came to numpy, the ability to choose linear algebra versus array semantics seemed like a really neat thing to me; though in practice the array semantics are so much more useful you really don't mind having to write the occasional np.dot. There seems to be some resistance form the developer side in having to maintain this architecture, which I cannot comment on, but from a user perspective, I am perfectly fine with the way things are. On Tue, Feb 11, 2014 at 11:16 AM, Todd wrote: > > On Feb 11, 2014 3:23 AM, "Alan G Isaac" wrote: > > > > On 2/10/2014 7:39 PM, Pauli Virtanen wrote: > > > The issue here is semantics for basic linear algebra operations, such > as > > > matrix multiplication, that work for different matrix objects, > including > > > ndarrays. > > > > > > I'll see if I can restate my suggestion in another way, > > because I do not feel you are responding to it. > > (I might be wrong.) > > > > What is a duck? If you ask it to quack, it quacks. > > OK, but what is it to quack? > > > > Here, quacking is behaving like an ndarray (in your view, > > as I understand it) when asked. But how do we ask? > > Your view (if I understand) is we ask via the operations > > supported by ndarrays. But maybe that is the wrong way > > for the library to ask this question. > > > > If so, then scipy libraries could ask an object > > to behave like an an ndarray by calling, e.g., > > __asarray__ on it. It becomes the responsibility > > of the object to return something appropriate > > when __asarray__ is called. Objects that know how to do > > this will provide __asarray__ and respond > > appropriately. Other types can be coerced if > > that is the documented behavior (e.g., lists). > > The libraries will then be written for a single > > type of behavior. What it means to "quack" is > > pretty easily documented, and a matrix object > > already knows how (e.g., m.A). Presumably in > > this scenario __asarray__ would return an object > > that behaves like an ndarray and a converter for > > turning the final result into the desired object > > type (e.g., into a `matrix` if necessary). > > > > Hope that clearer, even if it proves a terrible idea. > > > > Alan Isaac > > I don't currently use the matrix class, but having taken many linear > algebra classes I can see the appeal, and if I end up teaching the subject > I think I would appreciate having it available. > > On the other hand, I certainly can see the possibility for confusion, and > I don't think it is something that should be used unless someone has a > really good reason. > > So I come out somewhere in the middle here. So, although this may end up > being a terrible idea, I would like to purpose what I think is a > compromise: instead of just removing matrix, split it out into a scikit. > That way, it it's still available for those who need it, but will be less > likely to be used accidentally, and won't be interfering with the rest of > numpy and scipy development. > > Specifically, I would split matrix into a scikit, while in the same > release deprecate np.matrix. They can then exist in parallel for a few > releases to allow code to be ported away from it. > > However, I would suggest that before the split, all linear algebra > routines should be available as functions or methods in numpy proper. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Tue Feb 11 05:47:00 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 11 Feb 2014 10:47:00 +0000 (UTC) Subject: [Numpy-discussion] deprecate numpy.matrix References: <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> <52F98990.7010500@gmail.com> Message-ID: <1445233539413807954.295871sturla.molden-gmail.com@news.gmane.org> Pauli Virtanen wrote: > It is not a good thing that there is no well defined > "domain specific language" for matrix algebra in Python. Perhaps Python should get some new operators? __dot__ __cross__ __kronecker__ Obviously, using them in real Python code would require unicode. ;-) On the serious side, I don't think there really is a good solution to this problem at all. Sturla From pav at iki.fi Tue Feb 11 06:47:05 2014 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 11 Feb 2014 11:47:05 +0000 (UTC) Subject: [Numpy-discussion] deprecate numpy.matrix References: <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> <52F98990.7010500@gmail.com> <1445233539413807954.295871sturla.molden-gmail.com@news.gmane.org> Message-ID: Sturla Molden gmail.com> writes: > Pauli Virtanen iki.fi> wrote: > > It is not a good thing that there is no well defined > > "domain specific language" for matrix algebra in Python. > > Perhaps Python should get some new operators? It might still be possible to advocate for this in core Python, even though the ship has sailed long ago. Some previous discussion: [1] http://fperez.org/py4science/numpy-pep225/numpy-pep225.html [2] http://www.python.org/dev/peps/pep-0225/ [3] http://www.python.org/dev/peps/pep-0211/ (My own take would be that one extra operator is enough for most purposes, and would be easier to push for.) [clip] > On the serious side, I don't think there really is a good solution to this > problem at all. This is true. However, I'd prefer to have one solution over several conflicting ones. -- Pauli Virtanen From J.M.Hoekstra at tudelft.nl Tue Feb 11 07:16:09 2014 From: J.M.Hoekstra at tudelft.nl (Jacco Hoekstra - LR) Date: Tue, 11 Feb 2014 12:16:09 +0000 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> <52F98990.7010500@gmail.com> <1445233539413807954.295871sturla.molden-gmail.com@news.gmane.org> Message-ID: <245AC908B39361438CFA2299B0DD50E438E2DBED@SRV361.tudelft.net> For our students, the matrix class is really appealing as we use a lot of linear algebra and expressions with matrices simply look better with an operator instead of a function: x=A.I*b looks much better than x = np.dot(np.linalg.inv(A),b) And this gets worse when the expression is longer: x = R.I*A*R*b becomes: x = np.dot( np.linalg.inv(R), np.dot(A, np.dot(R, b))) Actually, not being involved in earlier discussions on this topic, I was a bit surprised by this and do not see the problem of having the matrix class as nobody is obliged to use it. I tried to find the reasons, but did not find it in the thread mentioned. Maybe someone could summarize the main problem with keeping this class for newbies on this list like me? Anyway, I would say that there is a clear use for the matrix class: readability of linear algebra code and hence a lower chance of errors, so higher productivity. Just my 2cts, Jacco Hoekstra P.S. Also: A = np.mat("2 3 ; -1 0") is very handy! -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Pauli Virtanen Sent: dinsdag 11 februari 2014 12:47 To: numpy-discussion at scipy.org Subject: Re: [Numpy-discussion] deprecate numpy.matrix Sturla Molden gmail.com> writes: > Pauli Virtanen iki.fi> wrote: > > It is not a good thing that there is no well defined "domain > > specific language" for matrix algebra in Python. > > Perhaps Python should get some new operators? It might still be possible to advocate for this in core Python, even though the ship has sailed long ago. Some previous discussion: [1] http://fperez.org/py4science/numpy-pep225/numpy-pep225.html [2] http://www.python.org/dev/peps/pep-0225/ [3] http://www.python.org/dev/peps/pep-0211/ (My own take would be that one extra operator is enough for most purposes, and would be easier to push for.) [clip] > On the serious side, I don't think there really is a good solution to > this problem at all. This is true. However, I'd prefer to have one solution over several conflicting ones. -- Pauli Virtanen _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From sturla.molden at gmail.com Tue Feb 11 07:57:40 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 11 Feb 2014 12:57:40 +0000 (UTC) Subject: [Numpy-discussion] deprecate numpy.matrix References: <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> <52F98990.7010500@gmail.com> <1445233539413807954.295871sturla.molden-gmail.com@news.gmane.org> Message-ID: <1418995911413814891.757002sturla.molden-gmail.com@news.gmane.org> Pauli Virtanen wrote: > [1] http://fperez.org/py4science/numpy-pep225/numpy-pep225.html I have to agree with Robert Kern here. One operator that we can (ab)use for matrix multiplication would suffice. Sturla From daniele at grinta.net Tue Feb 11 08:08:49 2014 From: daniele at grinta.net (Daniele Nicolodi) Date: Tue, 11 Feb 2014 14:08:49 +0100 Subject: [Numpy-discussion] Overlapping time series Message-ID: <52FA20E1.3050704@grinta.net> Hello, I have two time series (2xN dimensional arrays) recorded on the same time basis, but each with it's own dead times (and start and end recording times). I would like to obtain two time series containing only the time overlapping segments of the data. Does numpy or scipy offer something that may help in this? I can imagine strategies about how to approach the problem, but none that would be efficient. Ideas? Thank you. Best, Daniele From lists at hilboll.de Tue Feb 11 08:10:41 2014 From: lists at hilboll.de (Andreas Hilboll) Date: Tue, 11 Feb 2014 14:10:41 +0100 Subject: [Numpy-discussion] Overlapping time series In-Reply-To: <52FA20E1.3050704@grinta.net> References: <52FA20E1.3050704@grinta.net> Message-ID: <52FA2151.4040806@hilboll.de> On 11.02.2014 14:08, Daniele Nicolodi wrote: > Hello, > > I have two time series (2xN dimensional arrays) recorded on the same > time basis, but each with it's own dead times (and start and end > recording times). I would like to obtain two time series containing > only the time overlapping segments of the data. > > Does numpy or scipy offer something that may help in this? > > I can imagine strategies about how to approach the problem, but none > that would be efficient. Ideas? Take a look at pandas. It has built-in time series functionality. Andreas. From daniele at grinta.net Tue Feb 11 08:22:14 2014 From: daniele at grinta.net (Daniele Nicolodi) Date: Tue, 11 Feb 2014 14:22:14 +0100 Subject: [Numpy-discussion] Overlapping time series In-Reply-To: <52FA2151.4040806@hilboll.de> References: <52FA20E1.3050704@grinta.net> <52FA2151.4040806@hilboll.de> Message-ID: <52FA2406.5030501@grinta.net> On 11/02/2014 14:10, Andreas Hilboll wrote: > On 11.02.2014 14:08, Daniele Nicolodi wrote: >> Hello, >> >> I have two time series (2xN dimensional arrays) recorded on the same >> time basis, but each with it's own dead times (and start and end >> recording times). I would like to obtain two time series containing >> only the time overlapping segments of the data. >> >> Does numpy or scipy offer something that may help in this? >> >> I can imagine strategies about how to approach the problem, but none >> that would be efficient. Ideas? > > Take a look at pandas. It has built-in time series functionality. Even using Pandas (and I would like to avoid to have to depend on it) it is not clear to me how I would achieve what I want. Am I missing something? Cheers, Daniele From lists at hilboll.de Tue Feb 11 08:41:38 2014 From: lists at hilboll.de (Andreas Hilboll) Date: Tue, 11 Feb 2014 14:41:38 +0100 Subject: [Numpy-discussion] Overlapping time series In-Reply-To: <52FA2406.5030501@grinta.net> References: <52FA20E1.3050704@grinta.net> <52FA2151.4040806@hilboll.de> <52FA2406.5030501@grinta.net> Message-ID: <52FA2892.2060700@hilboll.de> On 11.02.2014 14:22, Daniele Nicolodi wrote: > On 11/02/2014 14:10, Andreas Hilboll wrote: >> On 11.02.2014 14:08, Daniele Nicolodi wrote: >>> Hello, >>> >>> I have two time series (2xN dimensional arrays) recorded on the same >>> time basis, but each with it's own dead times (and start and end >>> recording times). I would like to obtain two time series containing >>> only the time overlapping segments of the data. >>> >>> Does numpy or scipy offer something that may help in this? >>> >>> I can imagine strategies about how to approach the problem, but none >>> that would be efficient. Ideas? >> >> Take a look at pandas. It has built-in time series functionality. > > Even using Pandas (and I would like to avoid to have to depend on it) it > is not clear to me how I would achieve what I want. Am I missing something? If the two time series are pandas.Series objects and are called s1 and s2: new1 = s1.ix[s2.dropna().index].dropna() new2 = s2.ix[s1.dropna().index].dropna() new1 = new1.ix[s2.dropna().index].dropna() Looks hackish, so there might be a more elegant solution. For further questions about how to use pandas, please look at the pydata mailing list or stackoverflow. HTH, Andreas. From daniele at grinta.net Tue Feb 11 08:47:19 2014 From: daniele at grinta.net (Daniele Nicolodi) Date: Tue, 11 Feb 2014 14:47:19 +0100 Subject: [Numpy-discussion] Overlapping time series In-Reply-To: <52FA2892.2060700@hilboll.de> References: <52FA20E1.3050704@grinta.net> <52FA2151.4040806@hilboll.de> <52FA2406.5030501@grinta.net> <52FA2892.2060700@hilboll.de> Message-ID: <52FA29E7.4010501@grinta.net> On 11/02/2014 14:41, Andreas Hilboll wrote: > On 11.02.2014 14:22, Daniele Nicolodi wrote: >> On 11/02/2014 14:10, Andreas Hilboll wrote: >>> On 11.02.2014 14:08, Daniele Nicolodi wrote: >>>> Hello, >>>> >>>> I have two time series (2xN dimensional arrays) recorded on the same >>>> time basis, but each with it's own dead times (and start and end >>>> recording times). I would like to obtain two time series containing >>>> only the time overlapping segments of the data. >>>> >>>> Does numpy or scipy offer something that may help in this? >>>> >>>> I can imagine strategies about how to approach the problem, but none >>>> that would be efficient. Ideas? >>> >>> Take a look at pandas. It has built-in time series functionality. >> >> Even using Pandas (and I would like to avoid to have to depend on it) it >> is not clear to me how I would achieve what I want. Am I missing something? > > If the two time series are pandas.Series objects and are called s1 and s2: > > new1 = s1.ix[s2.dropna().index].dropna() > new2 = s2.ix[s1.dropna().index].dropna() > new1 = new1.ix[s2.dropna().index].dropna() > > Looks hackish, so there might be a more elegant solution. For further > questions about how to use pandas, please look at the pydata mailing > list or stackoverflow. Correct me if I'm wrong, but this assumes that missing data points are represented with Nan. In my case missing data points are just missing. Cheers, Daniele From sturla.molden at gmail.com Tue Feb 11 08:53:25 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 11 Feb 2014 13:53:25 +0000 (UTC) Subject: [Numpy-discussion] Overlapping time series References: <52FA20E1.3050704@grinta.net> Message-ID: <858301368413819234.604452sturla.molden-gmail.com@news.gmane.org> Daniele Nicolodi wrote: > I can imagine strategies about how to approach the problem, but none > that would be efficient. Ideas? I would just loop from the start and loop from the end and find out where to clip. Then slice in between. If Python loops take too much time, JIT compile them with Numba. Sturla From lists at hilboll.de Tue Feb 11 08:55:27 2014 From: lists at hilboll.de (Andreas Hilboll) Date: Tue, 11 Feb 2014 14:55:27 +0100 Subject: [Numpy-discussion] Overlapping time series In-Reply-To: <52FA29E7.4010501@grinta.net> References: <52FA20E1.3050704@grinta.net> <52FA2151.4040806@hilboll.de> <52FA2406.5030501@grinta.net> <52FA2892.2060700@hilboll.de> <52FA29E7.4010501@grinta.net> Message-ID: <52FA2BCF.3030807@hilboll.de> On 11.02.2014 14:47, Daniele Nicolodi wrote: > On 11/02/2014 14:41, Andreas Hilboll wrote: >> On 11.02.2014 14:22, Daniele Nicolodi wrote: >>> On 11/02/2014 14:10, Andreas Hilboll wrote: >>>> On 11.02.2014 14:08, Daniele Nicolodi wrote: >>>>> Hello, >>>>> >>>>> I have two time series (2xN dimensional arrays) recorded on the same >>>>> time basis, but each with it's own dead times (and start and end >>>>> recording times). I would like to obtain two time series containing >>>>> only the time overlapping segments of the data. >>>>> >>>>> Does numpy or scipy offer something that may help in this? >>>>> >>>>> I can imagine strategies about how to approach the problem, but none >>>>> that would be efficient. Ideas? >>>> >>>> Take a look at pandas. It has built-in time series functionality. >>> >>> Even using Pandas (and I would like to avoid to have to depend on it) it >>> is not clear to me how I would achieve what I want. Am I missing something? >> >> If the two time series are pandas.Series objects and are called s1 and s2: >> >> new1 = s1.ix[s2.dropna().index].dropna() >> new2 = s2.ix[s1.dropna().index].dropna() >> new1 = new1.ix[s2.dropna().index].dropna() >> >> Looks hackish, so there might be a more elegant solution. For further >> questions about how to use pandas, please look at the pydata mailing >> list or stackoverflow. > > Correct me if I'm wrong, but this assumes that missing data points are > represented with Nan. In my case missing data points are just missing. pandas doesn't care. Andreas. From rays at blue-cove.com Tue Feb 11 08:57:25 2014 From: rays at blue-cove.com (RayS) Date: Tue, 11 Feb 2014 05:57:25 -0800 Subject: [Numpy-discussion] Overlapping time series In-Reply-To: <52FA2406.5030501@grinta.net> References: <52FA20E1.3050704@grinta.net> <52FA2151.4040806@hilboll.de> <52FA2406.5030501@grinta.net> Message-ID: <201402111357.s1BDvPdn027766@blue-cove.com> > > On 11.02.2014 14:08, Daniele Nicolodi wrote: > >> Hello, > >> > >> I have two time series (2xN dimensional arrays) recorded on the same > >> time basis, but each with it's own dead times (and start and end > >> recording times). I would like to obtain two time series containing > >> only the time overlapping segments of the data. > >> > >> Does numpy or scipy offer something that may help in this? > >> > >> I can imagine strategies about how to approach the problem, but none > >> that would be efficient. Ideas? What is the gate/tach, ie pointer to start/stop? I work with both tachometers and EKGs and do similar windowing, usually just using gates as slices so as not to make copies. I also found this interesting and bookmarked it http://www.johnvinyard.com/blog/?p=268 which you might like. Just to be clear, you have 2 2D arrays and want a 4x(N-m) shape, like mixing two stereo tracks? >>> a1 = np.arange(0,20).reshape((2,-1)) >>> a2 = np.arange(5,25).reshape((2,-1)) >>> np.concatenate((a1,a2)) array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9], [10, 11, 12, 13, 14, 15, 16, 17, 18, 19], [ 5, 6, 7, 8, 9, 10, 11, 12, 13, 14], [15, 16, 17, 18, 19, 20, 21, 22, 23, 24]]) >>> st1 = 2 >>> end1 = 7 >>> st2 = 1 >>> end2 = 6 >>> np.concatenate((a1[:,st1:end1],a2[:,st2:end2])) array([[ 2, 3, 4, 5, 6], [12, 13, 14, 15, 16], [ 6, 7, 8, 9, 10], [16, 17, 18, 19, 20]]) - Ray -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Tue Feb 11 08:56:19 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 11 Feb 2014 13:56:19 +0000 (UTC) Subject: [Numpy-discussion] Overlapping time series References: <52FA20E1.3050704@grinta.net> <52FA2151.4040806@hilboll.de> <52FA2406.5030501@grinta.net> <52FA2892.2060700@hilboll.de> <52FA29E7.4010501@grinta.net> Message-ID: <207196323413819665.062003sturla.molden-gmail.com@news.gmane.org> Daniele Nicolodi wrote: > Correct me if I'm wrong, but this assumes that missing data points are > represented with Nan. In my case missing data points are just missing. Then your data cannot be stored in a 2 x N array as you indicated. Sturla From rays at blue-cove.com Tue Feb 11 09:04:12 2014 From: rays at blue-cove.com (RayS) Date: Tue, 11 Feb 2014 06:04:12 -0800 Subject: [Numpy-discussion] Overlapping time series Message-ID: <201402111404.s1BE4CIw030087@blue-cove.com> > > On 11.02.2014 14:08, Daniele Nicolodi wrote: > >> Hello, > >> > >> I have two time series (2xN dimensional arrays) recorded on the same > >> time basis, but each with it's own dead times (and start and end > >> recording times). I would like to obtain two time series containing > >> only the time overlapping segments of the data. > >> > >> Does numpy or scipy offer something that may help in this? > >> > >> I can imagine strategies about how to approach the problem, but none > >> that would be efficient. Ideas? or? st1 = 5 >>> np.concatenate((a1[:,st1:],a2[:,:a1.shape[1]-st1])) array([[ 5, 6, 7, 8, 9], [15, 16, 17, 18, 19], [ 5, 6, 7, 8, 9], [15, 16, 17, 18, 19]]) - Ray -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniele at grinta.net Tue Feb 11 09:07:00 2014 From: daniele at grinta.net (Daniele Nicolodi) Date: Tue, 11 Feb 2014 15:07:00 +0100 Subject: [Numpy-discussion] Overlapping time series In-Reply-To: <207196323413819665.062003sturla.molden-gmail.com@news.gmane.org> References: <52FA20E1.3050704@grinta.net> <52FA2151.4040806@hilboll.de> <52FA2406.5030501@grinta.net> <52FA2892.2060700@hilboll.de> <52FA29E7.4010501@grinta.net> <207196323413819665.062003sturla.molden-gmail.com@news.gmane.org> Message-ID: <52FA2E84.3050900@grinta.net> On 11/02/2014 14:56, Sturla Molden wrote: > Daniele Nicolodi wrote: > >> Correct me if I'm wrong, but this assumes that missing data points are >> represented with Nan. In my case missing data points are just missing. > > Then your data cannot be stored in a 2 x N array as you indicated. I was probably not that clear: I have two 2xN arrays, one for each data recording, one column for time (taken from the same clock for both measurements) and one with data values. Each array has some gaps. Cheers, Daniele From rays at blue-cove.com Tue Feb 11 09:13:02 2014 From: rays at blue-cove.com (RayS) Date: Tue, 11 Feb 2014 06:13:02 -0800 Subject: [Numpy-discussion] Overlapping time series In-Reply-To: <52FA2E84.3050900@grinta.net> References: <52FA20E1.3050704@grinta.net> <52FA2151.4040806@hilboll.de> <52FA2406.5030501@grinta.net> <52FA2892.2060700@hilboll.de> <52FA29E7.4010501@grinta.net> <207196323413819665.062003sturla.molden-gmail.com@news.gmane.org> <52FA2E84.3050900@grinta.net> Message-ID: <201402111413.s1BED2nR002908@blue-cove.com> At 06:07 AM 2/11/2014, you wrote: >On 11/02/2014 14:56, Sturla Molden wrote: > > Daniele Nicolodi wrote: > > > >> Correct me if I'm wrong, but this assumes that missing data points are > >> represented with Nan. In my case missing data points are just missing. > > > > Then your data cannot be stored in a 2 x N array as you indicated. > >I was probably not that clear: I have two 2xN arrays, one for each data >recording, one column for time (taken from the same clock for both >measurements) and one with data values. Each array has some gaps. gaps at the ends I assume... use numpy.where() with the time channel as the condition - Ray From alan.isaac at gmail.com Tue Feb 11 09:32:48 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Tue, 11 Feb 2014 09:32:48 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F7FD34.1000003@gmail.com> <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> <52F98990.7010500@gmail.com> Message-ID: <52FA3490.9050604@gmail.com> On 2/11/2014 5:25 AM, Pauli Virtanen wrote: > the ndarray is also lacking some useful things, as > you point out. But I think the right solution would be to stuff > the required additions into ndarray, rather than retaining the > otherwise incompatible np.matrix as a crutch. Given that we now have the `dot` method, if we could get the other convenience attributes even as methods (say .I(), .H(), and .mpow()) that would greatly reduce the need for `matrix`. Just to be clear, the `matrix` object is not *really* the issue in the discussion of the scipy library, right? Since it already knows how to be seen as an ndarray, the library can always work with m.A when doing any linear algebra. From what I've read in this thread, the real issues for scipy seem to lie with the sparse matrix objects... ? Alan From sturla.molden at gmail.com Tue Feb 11 09:38:23 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 11 Feb 2014 14:38:23 +0000 (UTC) Subject: [Numpy-discussion] Overlapping time series References: <52FA20E1.3050704@grinta.net> <52FA2151.4040806@hilboll.de> <52FA2406.5030501@grinta.net> <52FA2892.2060700@hilboll.de> <52FA29E7.4010501@grinta.net> <207196323413819665.062003sturla.molden-gmail.com@news.gmane.org> <52FA2E84.3050900@grinta.net> Message-ID: <2109333070413821346.530910sturla.molden-gmail.com@news.gmane.org> Daniele Nicolodi wrote: > I was probably not that clear: I have two 2xN arrays, one for each data > recording, one column for time (taken from the same clock for both > measurements) and one with data values. Each array has some gaps. If you want all subarrays where both timeseries are sampled, a single loop through the data can fix that. First find the smallest common timestamp. This is the first starting point. Then loop and follow the timestamps. As long as they are in synch, do just continue. If one timeseries suddenly skips forward (i.e. there is a gap), you have an end point. Then slice between the start and the end point, and append the view array to a list. Follow the timeseries that did not skip until timestamps are synchronous again, and you have the next starting point. Then just continue like this until the the two timeseries are exhausted. It is an O(n) strategy, so it will not be inefficient. If you are worried about the loop and performance, both Numba and Cython can transform this into C speed. Numba takes less effort to use. But Python loops are actually much faster than most scientists used to Matlab and the like would expect. Sturla From jorisvandenbossche at gmail.com Tue Feb 11 09:41:00 2014 From: jorisvandenbossche at gmail.com (Joris Van den Bossche) Date: Tue, 11 Feb 2014 15:41:00 +0100 Subject: [Numpy-discussion] Overlapping time series In-Reply-To: <52FA2BCF.3030807@hilboll.de> References: <52FA20E1.3050704@grinta.net> <52FA2151.4040806@hilboll.de> <52FA2406.5030501@grinta.net> <52FA2892.2060700@hilboll.de> <52FA29E7.4010501@grinta.net> <52FA2BCF.3030807@hilboll.de> Message-ID: 2014-02-11 14:55 GMT+01:00 Andreas Hilboll : > On 11.02.2014 14:47, Daniele Nicolodi wrote: > > On 11/02/2014 14:41, Andreas Hilboll wrote: > >> On 11.02.2014 14:22, Daniele Nicolodi wrote: > >>> On 11/02/2014 14:10, Andreas Hilboll wrote: > >>>> On 11.02.2014 14:08, Daniele Nicolodi wrote: > >>>>> Hello, > >>>>> > >>>>> I have two time series (2xN dimensional arrays) recorded on the same > >>>>> time basis, but each with it's own dead times (and start and end > >>>>> recording times). I would like to obtain two time series containing > >>>>> only the time overlapping segments of the data. > >>>>> > >>>>> Does numpy or scipy offer something that may help in this? > >>>>> > >>>>> I can imagine strategies about how to approach the problem, but none > >>>>> that would be efficient. Ideas? > >>>> > >>>> Take a look at pandas. It has built-in time series functionality. > >>> > >>> Even using Pandas (and I would like to avoid to have to depend on it) > it > >>> is not clear to me how I would achieve what I want. Am I missing > something? > >> > >> If the two time series are pandas.Series objects and are called s1 and > s2: > >> > >> new1 = s1.ix[s2.dropna().index].dropna() > >> new2 = s2.ix[s1.dropna().index].dropna() > >> new1 = new1.ix[s2.dropna().index].dropna() > >> > >> Looks hackish, so there might be a more elegant solution. For further > >> questions about how to use pandas, please look at the pydata mailing > >> list or stackoverflow. > > > > Correct me if I'm wrong, but this assumes that missing data points are > > represented with Nan. In my case missing data points are just missing. > > pandas doesn't care. > > In pandas, you could simply do something like this (assuming the time is set as the index): pd.concat([s1, s2], axis=1) and then remove the nan's (where the index was not overlapping) or use `join='inner'` Joris > Andreas. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniele at grinta.net Tue Feb 11 10:52:11 2014 From: daniele at grinta.net (Daniele Nicolodi) Date: Tue, 11 Feb 2014 16:52:11 +0100 Subject: [Numpy-discussion] Overlapping time series In-Reply-To: <2109333070413821346.530910sturla.molden-gmail.com@news.gmane.org> References: <52FA20E1.3050704@grinta.net> <52FA2151.4040806@hilboll.de> <52FA2406.5030501@grinta.net> <52FA2892.2060700@hilboll.de> <52FA29E7.4010501@grinta.net> <207196323413819665.062003sturla.molden-gmail.com@news.gmane.org> <52FA2E84.3050900@grinta.net> <2109333070413821346.530910sturla.molden-gmail.com@news.gmane.org> Message-ID: <52FA472B.1010206@grinta.net> On 11/02/2014 15:38, Sturla Molden wrote: > Daniele Nicolodi wrote: > >> I was probably not that clear: I have two 2xN arrays, one for each data >> recording, one column for time (taken from the same clock for both >> measurements) and one with data values. Each array has some gaps. > > If you want all subarrays where both timeseries are sampled, a single loop > through the data > can fix that. First find the smallest common timestamp. This is the first > starting point. Then loop and follow the timestamps. As long as they are in > synch, do just continue. If one timeseries suddenly skips forward (i.e. > there is a gap), you have an end point. Then slice between the start and > the end point, and append the view array to a list. Follow the timeseries > that did not skip until timestamps are synchronous again, and you have the > next starting point. Then just continue like this until the the two > timeseries are exhausted. It is an O(n) strategy, so it will not be > inefficient. If you are worried about the loop and performance, both Numba > and Cython can transform this into C speed. Numba takes less effort to use. > But Python loops are actually much faster than most scientists used to > Matlab and the like would expect. Thanks Sturla. That's more or less my current approach (except that I use the fact that the data is evenly samples to use np.where(np.diff(t1) != dt) to detect the regions of continuous data, to avoid the loop. Cheers, Daniele From matthew.brett at gmail.com Tue Feb 11 11:25:39 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 11 Feb 2014 08:25:39 -0800 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: <245AC908B39361438CFA2299B0DD50E438E2DBED@SRV361.tudelft.net> References: <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> <52F98990.7010500@gmail.com> <1445233539413807954.295871sturla.molden-gmail.com@news.gmane.org> <245AC908B39361438CFA2299B0DD50E438E2DBED@SRV361.tudelft.net> Message-ID: Hi, On Tue, Feb 11, 2014 at 4:16 AM, Jacco Hoekstra - LR wrote: > For our students, the matrix class is really appealing as we use a lot of linear algebra and expressions with matrices simply look better with an operator instead of a function: > > x=A.I*b > > looks much better than > > x = np.dot(np.linalg.inv(A),b) Yes, but: 1) as Alan has mentioned, the dot method helps a lot. import numpy.linalg as npl x = npl.inv(A).dot(b) 2) Overloading the * operator means that you've lost * to do element-wise operations. MATLAB has a different operator for that, '.*' - and it's very easy to forget the dot. numpy makes this more explicit - you read 'dot' as 'dot'. > And this gets worse when the expression is longer: > > x = R.I*A*R*b > > becomes: > > x = np.dot( np.linalg.inv(R), np.dot(A, np.dot(R, b))) x = npl.inv(R).dot(A.dot(R.dot(b)) > Actually, not being involved in earlier discussions on this topic, I was a bit surprised by this and do not see the problem of having the matrix class as nobody is obliged to use it. I tried to find the reasons, but did not find it in the thread mentioned. Maybe someone could summarize the main problem with keeping this class for newbies on this list like me? > > Anyway, I would say that there is a clear use for the matrix class: readability of linear algebra code and hence a lower chance of errors, so higher productivity. Yes, but it looks like there are not any developers on this list who write substantial code with the np.matrix class, so, if there is a gain in productivity, it is short-lived, soon to be replaced by a cost. Cheers, Matthew From chris.barker at noaa.gov Tue Feb 11 11:36:44 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 11 Feb 2014 08:36:44 -0800 Subject: [Numpy-discussion] record arrays with char*? In-Reply-To: References: Message-ID: On Mon, Feb 10, 2014 at 10:43 PM, Christopher Jordan-Squire wrote: > I'm trying to wrap some C code using cython. The C code can take > inputs in two modes: dense inputs and sparse inputs. For dense inputs > the array indexing is naive. I have wrappers for that. In the sparse > case the matrix entries are typically indexed via names. So, for > example, the library documentation includes this as input you could > give: > > struct > { > char* ind; > double val, wght; > } data[] = { {"camera", 15, 2}, {"necklace", 100, 20}, {"vase", 90, 20}, > {"pictures", 60, 30}, {"tv", 40, 40}, {"video", 15, 30}}; > > At the C level, data is passed to the function by directly giving its > address. (i.e. the C function takes as an argument (unsigned long) > data, casting the data pointer to an int) > wow -- that's prone to error! but I"m still not sure which pointer you're talking about -- a pointer to this struct? I'd like to create something similar using record arrays, such as > > np.array([("camera", 15, 2), ("necklace", 100, 20), ... ], > dtype='object,f8,f8'). > > Unfortunately this fails because > (1) In cython I need to determine the address of the first element and > I can't take the address of a an input whose type I don't know (the > exact type will vary on the application, so more or fewer fields may > be in the C struct) > still a bit confused, but if this is types as an array in Cython, you should be abel to do somethign like: &the_array[i] to get the address of the ith element. (2) I don't think a python object type is what I want--I need a char* > representation of the string. (Unfortunately I can't test this because > I haven't solved (1) -- how do you pass a record array around in > cython and/or take its address?) > well, and object type will give you a pointer to a pyobject. If you know for sure that that pyobject is a string object (probably want a bytes object -- you son't want unicode here), then you should be abel to get the address of the underlying char array. But that would require passing something different off to the C code that the address of that element. You could use an unsigned long for that first field, as you are assuming that in the C code anyway but I don't hink there is a way in numpy to set that to a pointer to a char allocated elsewhere -- where would it be allocated? So I would give up on expecting to store the struct directly in numpy array, and rather, put something reasonable (maybe what you have above) in the numpy array, and build the C struct you need from that rather than passing a pointer in directly. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Feb 11 11:55:42 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 11 Feb 2014 11:55:42 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> <52F98990.7010500@gmail.com> <1445233539413807954.295871sturla.molden-gmail.com@news.gmane.org> <245AC908B39361438CFA2299B0DD50E438E2DBED@SRV361.tudelft.net> Message-ID: On Tue, Feb 11, 2014 at 11:25 AM, Matthew Brett wrote: > Hi, > > On Tue, Feb 11, 2014 at 4:16 AM, Jacco Hoekstra - LR > wrote: > > For our students, the matrix class is really appealing as we use a lot > of linear algebra and expressions with matrices simply look better with an > operator instead of a function: > > > > x=A.I*b > > > > looks much better than > > > > x = np.dot(np.linalg.inv(A),b) > > Yes, but: > > 1) as Alan has mentioned, the dot method helps a lot. > > import numpy.linalg as npl > > x = npl.inv(A).dot(b) > > 2) Overloading the * operator means that you've lost * to do > element-wise operations. MATLAB has a different operator for that, > '.*' - and it's very easy to forget the dot. numpy makes this more > explicit - you read 'dot' as 'dot'. > > > And this gets worse when the expression is longer: > > > > x = R.I*A*R*b > > > > becomes: > > > > x = np.dot( np.linalg.inv(R), np.dot(A, np.dot(R, b))) > > x = npl.inv(R).dot(A.dot(R.dot(b)) > > > Actually, not being involved in earlier discussions on this topic, I was > a bit surprised by this and do not see the problem of having the matrix > class as nobody is obliged to use it. I tried to find the reasons, but did > not find it in the thread mentioned. Maybe someone could summarize the main > problem with keeping this class for newbies on this list like me? > > > > Anyway, I would say that there is a clear use for the matrix class: > readability of linear algebra code and hence a lower chance of errors, so > higher productivity. > > Yes, but it looks like there are not any developers on this list who > write substantial code with the np.matrix class, so, if there is a > gain in productivity, it is short-lived, soon to be replaced by a > cost. > selection bias ! I have seen lots of Matlab and GAUSS code written by academic econometricians that have been programming for years but are not "developers", code that is "inefficient" and numerically not very stable but looks just like the formulas. fast prototyping for code that is, at most used, for a few papers. (just to avoid misunderstanding: there are also econometricians that are "developers", and write code that is intended for reuse.) (but maybe numpy users are all "developers") Josef > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cjordan1 at uw.edu Tue Feb 11 11:57:05 2014 From: cjordan1 at uw.edu (Christopher Jordan-Squire) Date: Tue, 11 Feb 2014 08:57:05 -0800 Subject: [Numpy-discussion] record arrays with char*? In-Reply-To: References: Message-ID: Thanks for the answers! My responses are inline. On Tue, Feb 11, 2014 at 8:36 AM, Chris Barker wrote: > On Mon, Feb 10, 2014 at 10:43 PM, Christopher Jordan-Squire > wrote: >> >> I'm trying to wrap some C code using cython. The C code can take >> inputs in two modes: dense inputs and sparse inputs. For dense inputs >> the array indexing is naive. I have wrappers for that. In the sparse >> case the matrix entries are typically indexed via names. So, for >> example, the library documentation includes this as input you could >> give: >> >> struct >> { >> char* ind; >> double val, wght; >> } data[] = { {"camera", 15, 2}, {"necklace", 100, 20}, {"vase", 90, 20}, >> {"pictures", 60, 30}, {"tv", 40, 40}, {"video", 15, 30}}; >> >> At the C level, data is passed to the function by directly giving its >> address. (i.e. the C function takes as an argument (unsigned long) >> data, casting the data pointer to an int) > > > wow -- that's prone to error! but I"m still not sure which pointer you're > talking about -- a pointer to this struct? > I mean data. data is an array of structs, but in C that's (essentially) the same as a pointer to a struct. So data is the pointer I'm referring to. >> I'd like to create something similar using record arrays, such as >> >> np.array([("camera", 15, 2), ("necklace", 100, 20), ... ], >> dtype='object,f8,f8'). > > > >> >> Unfortunately this fails because >> (1) In cython I need to determine the address of the first element and >> I can't take the address of a an input whose type I don't know (the >> exact type will vary on the application, so more or fewer fields may >> be in the C struct) > > > still a bit confused, but if this is types as an array in Cython, you should > be abel to do somethign like: > > &the_array[i] > > to get the address of the ith element. > To do that I need to be able to tell cython the type of the memory view I give. There are very few examples for non-primitive arrays in cython, so I'm not sure what that would look like. Or at least I *think* I need to do that, based on the cython errors I'm getting. >> (2) I don't think a python object type is what I want--I need a char* >> representation of the string. (Unfortunately I can't test this because >> I haven't solved (1) -- how do you pass a record array around in >> cython and/or take its address?) > > > well, and object type will give you a pointer to a pyobject. If you know for > sure that that pyobject is a string object (probably want a bytes object -- > you son't want unicode here), then you should be abel to get the address of > the underlying char array. But that would require passing something > different off to the C code that the address of that element. > > You could use an unsigned long for that first field, as you are assuming > that in the C code anyway but I don't hink there is a way in numpy to set > that to a pointer to a char allocated elsewhere -- where would it be > allocated? > The strings are char*, the unsigned long cast was simply to cast data to a memory address. (The specific setup of the library was passing a string with the memory location in hexadecimal. This seems weird, but it's because the C functions being called are an intermediary to a simple (third-party) virtual machine running a program compiled from another language. It doesn't deal with pointers directly, so the other language is just passed the memory address, in hex and as a string, for where the data resides. Along with the number of elements in the block of memory pointed to.) > So I would give up on expecting to store the struct directly in numpy array, > and rather, put something reasonable (maybe what you have above) in the > numpy array, and build the C struct you need from that rather than passing a > pointer in directly. > That sounds reasonable. I really wanted to avoid this because, as I mentioned above, I'm just trying to generate the data in numpy and pass it to this virtual machine running a program compiled from another language. The form of the struct depends on what was done in the other language. It could easily have more or fewer fields, have the fields reordered, etc.. I wanted to avoid having to write wrappers for all such possibilities and instead use numpy record arrays as mapping exactly to C structs. But if that's really the only way to go then I guess I'll have to write struct wrappers in cython. > -Chris > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From matthew.brett at gmail.com Tue Feb 11 12:05:06 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 11 Feb 2014 09:05:06 -0800 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> <52F98990.7010500@gmail.com> <1445233539413807954.295871sturla.molden-gmail.com@news.gmane.org> <245AC908B39361438CFA2299B0DD50E438E2DBED@SRV361.tudelft.net> Message-ID: Hi, On Tue, Feb 11, 2014 at 8:55 AM, wrote: > > > On Tue, Feb 11, 2014 at 11:25 AM, Matthew Brett > wrote: >> >> Hi, >> >> On Tue, Feb 11, 2014 at 4:16 AM, Jacco Hoekstra - LR >> wrote: >> > For our students, the matrix class is really appealing as we use a lot >> > of linear algebra and expressions with matrices simply look better with an >> > operator instead of a function: >> > >> > x=A.I*b >> > >> > looks much better than >> > >> > x = np.dot(np.linalg.inv(A),b) >> >> Yes, but: >> >> 1) as Alan has mentioned, the dot method helps a lot. >> >> import numpy.linalg as npl >> >> x = npl.inv(A).dot(b) >> >> 2) Overloading the * operator means that you've lost * to do >> element-wise operations. MATLAB has a different operator for that, >> '.*' - and it's very easy to forget the dot. numpy makes this more >> explicit - you read 'dot' as 'dot'. >> >> > And this gets worse when the expression is longer: >> > >> > x = R.I*A*R*b >> > >> > becomes: >> > >> > x = np.dot( np.linalg.inv(R), np.dot(A, np.dot(R, b))) >> >> x = npl.inv(R).dot(A.dot(R.dot(b)) >> >> > Actually, not being involved in earlier discussions on this topic, I was >> > a bit surprised by this and do not see the problem of having the matrix >> > class as nobody is obliged to use it. I tried to find the reasons, but did >> > not find it in the thread mentioned. Maybe someone could summarize the main >> > problem with keeping this class for newbies on this list like me? >> > >> > Anyway, I would say that there is a clear use for the matrix class: >> > readability of linear algebra code and hence a lower chance of errors, so >> > higher productivity. >> >> Yes, but it looks like there are not any developers on this list who >> write substantial code with the np.matrix class, so, if there is a >> gain in productivity, it is short-lived, soon to be replaced by a >> cost. > > > selection bias ! > > I have seen lots of Matlab and GAUSS code written by academic > econometricians that have been programming for years but are not > "developers", code that is "inefficient" and numerically not very stable but > looks just like the formulas. Yes, I used to use matlab myself. There is certainly biased sampling on this list, so it's very difficult to say whether there is a large constituency of np.matrix users out there, it's possible. I hope not, because I think they would honestly be better served with ndarray even if some of the lines in their script don't look quite as neat. But in any case, I still don't think that dropping np.matrix is an option in the short or even the medium term. The discussion - I think - is whether we should move towards standardizing to ndarray semantics for clarity and simplicity of future development (and teaching), Cheers, Matthew From chris.barker at noaa.gov Tue Feb 11 12:14:33 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 11 Feb 2014 09:14:33 -0800 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> <52F98990.7010500@gmail.com> <1445233539413807954.295871sturla.molden-gmail.com@news.gmane.org> <245AC908B39361438CFA2299B0DD50E438E2DBED@SRV361.tudelft.net> Message-ID: On Tue, Feb 11, 2014 at 8:25 AM, Matthew Brett wrote: > > Anyway, I would say that there is a clear use for the matrix class: > readability of linear algebra code and hence a lower chance of errors, so > higher productivity. > > Yes, but it looks like there are not any developers on this list who > write substantial code with the np.matrix class, so, if there is a > gain in productivity, it is short-lived, soon to be replaced by a > cost. > to re-iterate: matrix is NOT for newbies, nor is it for higher productivity or fewer errors in production code -- the truth is, the ratio of linear algebra expressions like the above to element-wise, etc. operations that ndarray is well suited to is tiny in "real" code. Did anyone that used MATLAB for significant problems get not get really really annoyed by all the typing of ".*" ? What matrix is good for is what someone here described as "domain specific language" -- i.e. that little bit of code that really is doing mostly linera algebra. So it's a nice tool for teaching and experimenting with linear-algebra-based concepts. To address Alan's question about duck-typing -- one of the things folks like to do with duck-typed functions and method is return the type that is passed in when possible: i.e use asanyarray(), rather than asarray() -- but this is really going to be broken with matrix, as the symantics have changed. So we could say "don't expect that to work with matrix", but that requires one of: 1) folks always use asarray() and return an array, rather than a matrix to the caller -- not too bad, folks that want matrix can use np.matrix to get it back (a bit ugly, though..) however, this means that any other array subclass will get mangled as well... 2) folks use asanyarray(), and it will break, maybe painfully, when a matrix is passed in -- folks using matrixes would need to use .A when calling such functions. This really seems ripe for confusion. The truth is that the "right" way to do all this is to have different operators, rather than different objects, but that's been tried and did not fly. -Chris Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Tue Feb 11 12:18:31 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 11 Feb 2014 17:18:31 +0000 (UTC) Subject: [Numpy-discussion] Overlapping time series References: <52FA20E1.3050704@grinta.net> <52FA2151.4040806@hilboll.de> <52FA2406.5030501@grinta.net> <52FA2892.2060700@hilboll.de> <52FA29E7.4010501@grinta.net> <207196323413819665.062003sturla.molden-gmail.com@news.gmane.org> <52FA2E84.3050900@grinta.net> <2109333070413821346.530910sturla.molden-gmail.com@news.gmane.org> <52FA472B.1010206@grinta.net> Message-ID: <1087677166413831668.247865sturla.molden-gmail.com@news.gmane.org> Daniele Nicolodi wrote: > That's more or less my current approach (except that I use the fact that > the data is evenly samples to use np.where(np.diff(t1) != dt) to detect > the regions of continuous data, to avoid the loop. I hope you realize that np.where(np.diff(t1) != dt) generates three loops, as well as two temporary arrays and one output array. If you do what I suggested, you get one loop and no temporaries. But you will need Numba or Cython to get full speed. Sturla From chris.barker at noaa.gov Tue Feb 11 12:23:22 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Tue, 11 Feb 2014 09:23:22 -0800 Subject: [Numpy-discussion] record arrays with char*? In-Reply-To: References: Message-ID: On Tue, Feb 11, 2014 at 8:57 AM, Christopher Jordan-Squire wrote: > Thanks for the answers! My responses are inline. my C is a bit weak, so forgive my misunderstanding, but: >> { >> char* ind; >> double val, wght; >> } data[] = { {"camera", 15, 2}, {"necklace", 100, 20}, {"vase", 90, 20}, >> {"pictures", 60, 30}, {"tv", 40, 40}, {"video", 15, 30}}; Here is my C weakness -- what does this struct look like in memory? i.e is that first element a pointer, and the actually string data is somewhere else in memory? In which case, where? and how does that memory get managed? Or are the bytes at the begining of the struct with the string data in it? in the "real" case, you say the char* is a string representation of a hex value for a memory address -- so you would know a priory exactly what the length of that sting is, so you could use a numpy struct like: ('S8,f8,f8') i.e 8 bytes to store the string at the begining of the struct. then & the_array[i] would be the address of the beggining of the string -- which may be what you want. HTH, -Chris > >> > >> At the C level, data is passed to the function by directly giving its > >> address. (i.e. the C function takes as an argument (unsigned long) > >> data, casting the data pointer to an int) > > > > > > wow -- that's prone to error! but I"m still not sure which pointer you're > > talking about -- a pointer to this struct? > > > > I mean data. data is an array of structs, but in C that's > (essentially) the same as a pointer to a struct. So data is the > pointer I'm referring to. > > >> I'd like to create something similar using record arrays, such as > >> > >> np.array([("camera", 15, 2), ("necklace", 100, 20), ... ], > >> dtype='object,f8,f8'). > > > > > > > >> > >> Unfortunately this fails because > >> (1) In cython I need to determine the address of the first element and > >> I can't take the address of a an input whose type I don't know (the > >> exact type will vary on the application, so more or fewer fields may > >> be in the C struct) > > > > > > still a bit confused, but if this is types as an array in Cython, you > should > > be abel to do somethign like: > > > > &the_array[i] > > > > to get the address of the ith element. > > > > To do that I need to be able to tell cython the type of the memory > view I give. There are very few examples for non-primitive arrays in > cython, so I'm not sure what that would look like. Or at least I > *think* I need to do that, based on the cython errors I'm getting. > > >> (2) I don't think a python object type is what I want--I need a char* > >> representation of the string. (Unfortunately I can't test this because > >> I haven't solved (1) -- how do you pass a record array around in > >> cython and/or take its address?) > > > > > > well, and object type will give you a pointer to a pyobject. If you know > for > > sure that that pyobject is a string object (probably want a bytes object > -- > > you son't want unicode here), then you should be abel to get the address > of > > the underlying char array. But that would require passing something > > different off to the C code that the address of that element. > > > > You could use an unsigned long for that first field, as you are assuming > > that in the C code anyway but I don't hink there is a way in numpy to > set > > that to a pointer to a char allocated elsewhere -- where would it be > > allocated? > > > > The strings are char*, the unsigned long cast was simply to cast data > to a memory address. (The specific setup of the library was passing a > string with the memory location in hexadecimal. This seems weird, but > it's because the C functions being called are an intermediary to a > simple (third-party) virtual machine running a program compiled from > another language. It doesn't deal with pointers directly, so the other > language is just passed the memory address, in hex and as a string, > for where the data resides. Along with the number of elements in the > block of memory pointed to.) > > > So I would give up on expecting to store the struct directly in numpy > array, > > and rather, put something reasonable (maybe what you have above) in the > > numpy array, and build the C struct you need from that rather than > passing a > > pointer in directly. > > > > That sounds reasonable. I really wanted to avoid this because, as I > mentioned above, I'm just trying to generate the data in numpy and > pass it to this virtual machine running a program compiled from > another language. The form of the struct depends on what was done in > the other language. It could easily have more or fewer fields, have > the fields reordered, etc.. I wanted to avoid having to write wrappers > for all such possibilities and instead use numpy record arrays as > mapping exactly to C structs. But if that's really the only way to go > then I guess I'll have to write struct wrappers in cython. > > > -Chris > > > > > > > > -- > > > > Christopher Barker, Ph.D. > > Oceanographer > > > > Emergency Response Division > > NOAA/NOS/OR&R (206) 526-6959 voice > > 7600 Sand Point Way NE (206) 526-6329 fax > > Seattle, WA 98115 (206) 526-6317 main reception > > > > Chris.Barker at noaa.gov > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Feb 11 12:55:22 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 11 Feb 2014 12:55:22 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> <52F98990.7010500@gmail.com> <1445233539413807954.295871sturla.molden-gmail.com@news.gmane.org> <245AC908B39361438CFA2299B0DD50E438E2DBED@SRV361.tudelft.net> Message-ID: On Tue, Feb 11, 2014 at 12:14 PM, Chris Barker wrote: > On Tue, Feb 11, 2014 at 8:25 AM, Matthew Brett wrote: > >> > Anyway, I would say that there is a clear use for the matrix class: >> readability of linear algebra code and hence a lower chance of errors, so >> higher productivity. >> >> Yes, but it looks like there are not any developers on this list who >> write substantial code with the np.matrix class, so, if there is a >> gain in productivity, it is short-lived, soon to be replaced by a >> cost. >> > > to re-iterate: > > matrix is NOT for newbies, nor is it for higher productivity or fewer > errors in production code -- the truth is, the ratio of linear algebra > expressions like the above to element-wise, etc. operations that ndarray is > well suited to is tiny in "real" code. Did anyone that used MATLAB for > significant problems get not get really really annoyed by all the typing of > ".*" ? > > What matrix is good for is what someone here described as "domain specific > language" -- i.e. that little bit of code that really is > doing mostly linera algebra. So it's a nice tool for teaching and > experimenting with linear-algebra-based concepts. > http://www.mathworks.com/matlabcentral/fileexchange/27095-tsls-2sls/content/tsls.m https://github.com/statsmodels/statsmodels/blob/master/statsmodels/sandbox/regression/gmm.py#L134 If I were not strict ndarray coder, I might prefer to wrap an ndarray and use only matrices inside a function and ndarrays outside for the code that is not linear algebra. > > To address Alan's question about duck-typing -- one of the things folks > like to do with duck-typed functions and method is return the type that is > passed in when possible: i.e use asanyarray(), rather than asarray() -- but > this is really going to be broken with matrix, as the symantics have > changed. So we could say "don't expect that to work with matrix", but that > requires one of: > > 1) folks always use asarray() and return an array, rather than a matrix to > the caller -- not too bad, folks that want matrix can use np.matrix to get > it back (a bit ugly, though..) however, this means that any other array > subclass will get mangled as well... > scipy.linalg has an arraywrap on input and output. (at least when I looked a few years ago) (statsmodels has a pandas wrapper that converts arguments and returns to have ndarrays internally) some packages have helper functions to make a consistent interface to ndarrays and sparce "matrices" scipy.stats still doesn't protect against masked arrays and nans. IMO: that's life. Removing matrices from numpy doesn't make the problem go away. Although the problem could be pushed to other packages. But if nobody uses matrices, then we would have at least **one** problem less. > > 2) folks use asanyarray(), and it will break, maybe painfully, when a > matrix is passed in -- folks using matrixes would need to use .A when > calling such functions. This really seems ripe for confusion. > There are many ndarray subclasses out there, and I have no idea how they behave. pandas.Series was until recently a ndarray subclass, that didn't quite behave like one. We had to fix some bugs in statsmodels where we accidentially use asanyarray instead of asarray. Josef > > The truth is that the "right" way to do all this is to > have different operators, rather than different objects, but that's been > tried and did not fly. > > -Chris > > > > > > > > > > > > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From argriffi at ncsu.edu Tue Feb 11 13:11:05 2014 From: argriffi at ncsu.edu (alex) Date: Tue, 11 Feb 2014 13:11:05 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> <52F98990.7010500@gmail.com> <1445233539413807954.295871sturla.molden-gmail.com@news.gmane.org> <245AC908B39361438CFA2299B0DD50E438E2DBED@SRV361.tudelft.net> Message-ID: On Tue, Feb 11, 2014 at 12:14 PM, Chris Barker wrote: > On Tue, Feb 11, 2014 at 8:25 AM, Matthew Brett > wrote: >> >> > Anyway, I would say that there is a clear use for the matrix class: >> > readability of linear algebra code and hence a lower chance of errors, so >> > higher productivity. >> >> Yes, but it looks like there are not any developers on this list who >> write substantial code with the np.matrix class, so, if there is a >> gain in productivity, it is short-lived, soon to be replaced by a >> cost. > > > to re-iterate: > > matrix is NOT for newbies, nor is it for higher productivity or fewer errors > in production code -- the truth is, the ratio of linear algebra expressions > like the above to element-wise, etc. operations that ndarray is well suited > to is tiny in "real" code. Did anyone that used MATLAB for significant > problems get not get really really annoyed by all the typing of ".*" ? > > What matrix is good for is what someone here described as "domain specific > language" -- i.e. that little bit of code that really is doing mostly linear > algebra. This point would suggest that the "domain specific language" defined by the numpy.matrix semantics would be particularly useful for representations of linear operators for which elementwise modification might be less efficient (for example as in some implementations of sparse matrices) or essentially unavailable (for example as in matrix-free linear operators). Alex From argriffi at ncsu.edu Tue Feb 11 14:20:10 2014 From: argriffi at ncsu.edu (alex) Date: Tue, 11 Feb 2014 14:20:10 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F8EBB8.7070205@gmail.com> <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> <52F98990.7010500@gmail.com> <1445233539413807954.295871sturla.molden-gmail.com@news.gmane.org> <245AC908B39361438CFA2299B0DD50E438E2DBED@SRV361.tudelft.net> Message-ID: On Tue, Feb 11, 2014 at 12:05 PM, Matthew Brett wrote: > Hi, > > On Tue, Feb 11, 2014 at 8:55 AM, wrote: >> >> >> On Tue, Feb 11, 2014 at 11:25 AM, Matthew Brett >> wrote: >>> >>> Hi, >>> >>> On Tue, Feb 11, 2014 at 4:16 AM, Jacco Hoekstra - LR >>> wrote: >>> > For our students, the matrix class is really appealing as we use a lot >>> > of linear algebra and expressions with matrices simply look better with an >>> > operator instead of a function: >>> > >>> > x=A.I*b >>> > >>> > looks much better than >>> > >>> > x = np.dot(np.linalg.inv(A),b) >>> >>> Yes, but: >>> >>> 1) as Alan has mentioned, the dot method helps a lot. >>> >>> import numpy.linalg as npl >>> >>> x = npl.inv(A).dot(b) >>> >>> 2) Overloading the * operator means that you've lost * to do >>> element-wise operations. MATLAB has a different operator for that, >>> '.*' - and it's very easy to forget the dot. numpy makes this more >>> explicit - you read 'dot' as 'dot'. >>> >>> > And this gets worse when the expression is longer: >>> > >>> > x = R.I*A*R*b >>> > >>> > becomes: >>> > >>> > x = np.dot( np.linalg.inv(R), np.dot(A, np.dot(R, b))) >>> >>> x = npl.inv(R).dot(A.dot(R.dot(b)) >>> >>> > Actually, not being involved in earlier discussions on this topic, I was >>> > a bit surprised by this and do not see the problem of having the matrix >>> > class as nobody is obliged to use it. I tried to find the reasons, but did >>> > not find it in the thread mentioned. Maybe someone could summarize the main >>> > problem with keeping this class for newbies on this list like me? >>> > >>> > Anyway, I would say that there is a clear use for the matrix class: >>> > readability of linear algebra code and hence a lower chance of errors, so >>> > higher productivity. >>> >>> Yes, but it looks like there are not any developers on this list who >>> write substantial code with the np.matrix class, so, if there is a >>> gain in productivity, it is short-lived, soon to be replaced by a >>> cost. >> >> >> selection bias ! >> >> I have seen lots of Matlab and GAUSS code written by academic >> econometricians that have been programming for years but are not >> "developers", code that is "inefficient" and numerically not very stable but >> looks just like the formulas. > > Yes, I used to use matlab myself. > > There is certainly biased sampling on this list, so it's very > difficult to say whether there is a large constituency of np.matrix > users out there, it's possible. I hope not, because I think they > would honestly be better served with ndarray even if some of the lines > in their script don't look quite as neat. In the spirit of offsetting this bias and because this thread is lacking in examples of projects that use numpy.matrix, here's another data point: cvxpy (https://github.com/cvxgrp/cvxpy) is a serious active project that supports the numpy.matrix interface, for example as in https://github.com/cvxgrp/cvxpy/tree/master/cvxpy/interface/numpy_interface. This project is from a somewhat famous Stanford lab (http://www.stanford.edu/~boyd/index.html) and they are currently running a MOOC (http://en.wikipedia.org/wiki/Massive_open_online_course) for convex optimization (https://class.stanford.edu/courses/Engineering/CVX101/Winter2014/about). This course currently uses a MATLAB-based modeling system (http://cvxr.com/cvx/) but they are trying to switch to, or at least support, Python. But they have not yet been able to get cvxpy to a mature enough state to use for their course. Maybe the simple numpy.matrix syntax has accelerated their progress so that they have been able to reach cvxpy's currently somewhat usable state, or maybe the extra work to deal with numpy.matrix has slowed their progress so that we are using MATLAB instead of Python as the standard for training Stanford undergrads and random internet MOOC participants, or maybe their progress has little to do with numpy.matrix. I'm not sure which is the case. But in any case, they use numpy.matrix explicitly in their project. Alex From pav at iki.fi Tue Feb 11 14:40:35 2014 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 11 Feb 2014 21:40:35 +0200 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> <52F98990.7010500@gmail.com> <1445233539413807954.295871sturla.molden-gmail.com@news.gmane.org> <245AC908B39361438CFA2299B0DD50E438E2DBED@SRV361.tudelft.net> Message-ID: 11.02.2014 21:20, alex kirjoitti: [clip] > In the spirit of offsetting this bias and because this thread is > lacking in examples of projects that use numpy.matrix, here's another > data point: cvxpy (https://github.com/cvxgrp/cvxpy) is a serious > active project that supports the numpy.matrix interface, for example > as in https://github.com/cvxgrp/cvxpy/tree/master/cvxpy/interface/numpy_interface. Here's some more data: http://nullege.com/codes/search?cq=numpy.matrix http://nullege.com/codes/search?cq=numpy.array -- Pauli Virtanen From matthew.brett at gmail.com Tue Feb 11 15:35:18 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 11 Feb 2014 12:35:18 -0800 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: <52F9352C.4030808@gmail.com> <52F940E3.1020905@gmail.com> <52F9473E.8070104@gmail.com> <52F95358.8010109@gmail.com> <52F98990.7010500@gmail.com> <1445233539413807954.295871sturla.molden-gmail.com@news.gmane.org> <245AC908B39361438CFA2299B0DD50E438E2DBED@SRV361.tudelft.net> Message-ID: Hi, On Tue, Feb 11, 2014 at 11:40 AM, Pauli Virtanen wrote: > 11.02.2014 21:20, alex kirjoitti: > [clip] >> In the spirit of offsetting this bias and because this thread is >> lacking in examples of projects that use numpy.matrix, here's another >> data point: cvxpy (https://github.com/cvxgrp/cvxpy) is a serious >> active project that supports the numpy.matrix interface, for example >> as in https://github.com/cvxgrp/cvxpy/tree/master/cvxpy/interface/numpy_interface. > > Here's some more data: > > http://nullege.com/codes/search?cq=numpy.matrix These are quite revealing. Some of the top hits are from old nipy code. We don't have any use of np.matrix in the current nipy trunk. Another couple of hits are from the 'Pylon' package: https://github.com/rwl/pylon - of this form: scipy.io.mmwrite("./data/fDC.mtx", numpy.matrix(f_dc)) When I ran the code without the numpy.matrix call it was easy to see how that happened: In [4]: scipy.io.mmwrite('fDc.mtx', f_dc) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) ... ValueError: expected matrix Of course, what the message means is that you should pass a 2D array: scipy.io.mmwrite('fDc.mtx', f_dc[None]) works fine. So I think this is another example of confusion caused by np.matrix. https://github.com/scipy/scipy/pull/3310 Matthew From pav at iki.fi Tue Feb 11 15:41:41 2014 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 11 Feb 2014 22:41:41 +0200 Subject: [Numpy-discussion] Suggestions for GSoC Projects In-Reply-To: References: Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Hi, 04.02.2014 20:30, jennifer stone kirjoitti: > 3. As stated earlier, we have spherical harmonic functions (with > much scope >>> for dev) we are yet to have elliptical and cylindrical harmonic >>> function, which may be developed. >> >> This sounds very doable. How much work do you think would be >> involved? > > As Stefan so rightly pointed out, the function for spherical > harmonic function, sph_harm at present calls lpmn thus evaluating > all orders gives me a feeling that it would be very well possible to avoid > that by maybe avoiding the dependence on lpmn. > > Further, we can introduce ellipsoidal harmonic functions of first > kind and the second kind. I am confident about about the > implementation of ellipsoidal H function of first kind but don't > know much about the second kind. But I believe we can work it out > in due course.And cylindrical harmonics can be carried out using > Bessel functions. It's not so often someone wants to work on scipy.special, so you'd be welcome to improve it :) The general structure of work on special functions goes as follows: - - Check if there is a license-compatible implementation that someone has already written. This is usually not the case. - - Find formulas for evaluating the function in terms of more primitive operations. (Ie. power series, asymptotic series, continued fractions, expansions in terms of other special functions, ...) - - Determine the parameter region where the expansions converge in a floating point implementation, and select algorithms appropriately. Here it helps if you find a research paper where the author has already thought about what sort of an approach works best. - - Life is usually made *much* easier thanks to Fredrik Johansson's prior work on arbitrary-precision arithmetic library mpmath http://code.google.com/p/mpmath/ It can usually be used to check the "true" values of the functions. Also it contains implementations of algorithms for evaluating special functions, but because mpmath works with arbitrary precision numbers, these algorithms are not directly usable for floating-point calculations, as in floating point you cannot adjust the precision of the calculation dynamically. Moreover, the arbitrary-precision arithmetic can be slow compared to a more optimized floating point implementations. Spherical harmonics might be a reasonable part of a GSoC proposal. However, note that there exists also a *second* Legendre polynomial function `lpmv`, which doesn't store the values of the previous N functions. There's one numerical problem in the current way of evaluation via ~Pmn(cos(theta)), which is that this approach seems to lose relatively much precision at large orders for certain values of theta. I don't recall now exactly how imprecise it becomes at large orders, but it may be necessary to check. Adding new special functions also sounds like an useful project. Here, it helps if they are something that you expect you will need later on :) There's also the case that several of the functions in Scipy have only implementations for real-valued inputs, although the functions would be defined on the whole complex plane. A list of the situation is here: https://github.com/scipy/scipy/blob/master/scipy/special /generate_ufuncs.py#L85 Lowercase d correspond to real-valued implementations, uppercase D to complex-valued. I'm not at the moment completely sure which would have the highest priority --- whether you need this or not really depends on the application. If you want additional ideas about possible things to fix in scipy.special, take a look at this file: https://github.com/scipy/scipy/blob/master/scipy/special/tests /test_mpmath.py#L648 The entries marked @knownfailure* have some undiagnosed issues in the implementation, which might be useful to look into. However: most of these have to do with corner cases in hypergeometric functions. Trying to address those is likely a risky GSoC topic, as the multi-argument hyp* functions are challenging to evaluate in floating point. (mpmath and Mathematica can evaluate them in most parameter regimes, but AFAIK both require arbitrary-precision methods for this.) So I think there would be a large number of possible things to do here, and help would be appreciated. - -- Pauli Virtanen -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iEYEARECAAYFAlL6iwAACgkQ6BQxb7O0pWBfOgCfYHAB12N4FWDmrqx8/ORTBRps pXYAoL3ufAiShe+0qTEGfEvrmDgr1X0p =kAwF -----END PGP SIGNATURE----- From pav at iki.fi Tue Feb 11 15:43:21 2014 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 11 Feb 2014 22:43:21 +0200 Subject: [Numpy-discussion] Suggestions for GSoC Projects In-Reply-To: References: <20140131103459.GA24791@gmail.com> <20140206105912.GH4638@gmail.com> Message-ID: Hi, 08.02.2014 06:16, St?fan van der Walt kirjoitti: > On 8 Feb 2014 04:51, "Ralf Gommers" wrote: >> >>> Members of the dipy team would also be interested. >> >> That's specifically for the spherical harmonics topic right? > > Right. Spherical harmonics are used as bases in many of DiPy's > reconstruction algorithms. If help is needed with a GSoC project for scipy.special, I'm in principle available to chip in co-mentoring, or just trying to help answer questions. Best, Pauli From Wolfgang.Draxinger at physik.uni-muenchen.de Tue Feb 11 15:56:01 2014 From: Wolfgang.Draxinger at physik.uni-muenchen.de (Wolfgang Draxinger) Date: Tue, 11 Feb 2014 21:56:01 +0100 Subject: [Numpy-discussion] Geometrically defined masking arrays; how to optimize? Message-ID: <52FA8E61.4000800@physik.uni-muenchen.de> Hi, I implemented the following helper function to create masking arrays: def gen_mask(self, ring, sector): in_segment = None if ring == 0: radius = self.scales()[0] def in_center_circle(xy): dx = xy[0] - self.xy[0] dy = xy[1] - self.xy[1] r = math.sqrt( dx**2 + dy**2 ) return r < radius in_segment = in_center_circle else: angle = ( self.a_sector(sector, ring), self.a_sector( (sector+1) % self.n_sectors(), ring) ) radii = self.scales()[ring:ring+1] def in_segment_ring(xy): dx = xy[0] - self.xy[0] dy = xy[1] - self.xy[1] r = math.sqrt( dx**2 + dy**2 ) a = math.atan2( dy, dx ) return r >= radii[0] and r < radii[1] \ and a >= angle[0] and a < angle[1] in_segment = in_segment_ring width,height = self.arr.shape mask = numpy.zeros(shape=(width, height), dtype=numpy.bool) for y in range(height): for x in range(width): mask[x][y] = in_segment((x,y)) return mask self.scales() generates a list of radius scaling factors. self.a_sector() gives the dividing angle between sectors "sector" and "sector+1" on the given ring. The function works, it generates the masks as I need it. The problem is - of course - that it's quite slow due to the inner loops the perform the geometrical test if the given element of the array self.arr is withing the bounds of the ring-sector for which the mask is generated. I wonder if you guys have some ideas, on how I could accelerate this. This can be perfectly well constructed by boolean combination of elementary geometrical objects. For example a ring would be ring(p, r0, r1): disk(p, r1) xor disk(p, r0) # where r0 < r1 The sector would then be ring_sector(p, r, s): ring(p, r, r + ...) and sector(p, s, s+1) I'd like to avoid doing this using some C helper routine. I'm looking for something that is the most efficient when it comes to "speedup"/"time invested to develop this". Cheers, Wolfgang From davidmenhur at gmail.com Tue Feb 11 16:16:46 2014 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Tue, 11 Feb 2014 22:16:46 +0100 Subject: [Numpy-discussion] Geometrically defined masking arrays; how to optimize? In-Reply-To: <52FA8E61.4000800@physik.uni-muenchen.de> References: <52FA8E61.4000800@physik.uni-muenchen.de> Message-ID: Here it is an example: import numpy as np import pylab as plt N = 100 M = 200 x, y = np.meshgrid(np.arange(N), np.arange(M)) # Center at 40, 70, radius 20 x -= 40 y -= 70 out_circle = (x * x + y * y < 20**2) out_ring = out_circle - (x * x + y * y < 10**2) plt.imshow(out_circle) plt.figure() plt.imshow(out_ring) plt.show() Note that I have avoided taking the costly square root of each element by just taking the square of the radius. It can also be generalised to ellipses or rectangles, if you need it. Also, don't use 0 as a False value, and don't force it to be a 0. Instead, use "if not ring:" /David On 11 February 2014 21:56, Wolfgang Draxinger < Wolfgang.Draxinger at physik.uni-muenchen.de> wrote: > Hi, > > I implemented the following helper function to create masking arrays: > > def gen_mask(self, ring, sector): > in_segment = None > if ring == 0: > radius = self.scales()[0] > def in_center_circle(xy): > dx = xy[0] - self.xy[0] > dy = xy[1] - self.xy[1] > r = math.sqrt( dx**2 + dy**2 ) > return r < radius > in_segment = in_center_circle > else: > angle = ( self.a_sector(sector, ring), > self.a_sector( (sector+1) % self.n_sectors(), > ring) ) > radii = self.scales()[ring:ring+1] > def in_segment_ring(xy): > dx = xy[0] - self.xy[0] > dy = xy[1] - self.xy[1] > r = math.sqrt( dx**2 + dy**2 ) > a = math.atan2( dy, dx ) > return r >= radii[0] and r < radii[1] \ > and a >= angle[0] and a < angle[1] > in_segment = in_segment_ring > > width,height = self.arr.shape > > mask = numpy.zeros(shape=(width, height), dtype=numpy.bool) > for y in range(height): > for x in range(width): > mask[x][y] = in_segment((x,y)) > return mask > > self.scales() generates a list of radius scaling factors. > self.a_sector() gives the dividing angle between sectors "sector" and > "sector+1" on the given ring. > > The function works, it generates the masks as I need it. The problem is > - of course - that it's quite slow due to the inner loops the perform > the geometrical test if the given element of the array self.arr is > withing the bounds of the ring-sector for which the mask is generated. > > I wonder if you guys have some ideas, on how I could accelerate this. > This can be perfectly well constructed by boolean combination of > elementary geometrical objects. For example a ring would be > > ring(p, r0, r1): disk(p, r1) xor disk(p, r0) # where r0 < r1 > > The sector would then be > > ring_sector(p, r, s): ring(p, r, r + ...) and sector(p, s, s+1) > > I'd like to avoid doing this using some C helper routine. I'm looking > for something that is the most efficient when it comes to > "speedup"/"time invested to develop this". > > > Cheers, > > Wolfgang > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidmenhur at gmail.com Tue Feb 11 16:28:24 2014 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Tue, 11 Feb 2014 22:28:24 +0100 Subject: [Numpy-discussion] Geometrically defined masking arrays; how to optimize? In-Reply-To: References: <52FA8E61.4000800@physik.uni-muenchen.de> Message-ID: A small improvement: # Dimensions N = 100 M = 200 # Coordinates of the centre x0 = 40 y0 = 70 x, y = np.meshgrid(np.arange(N) - x0, np.arange(M) - y0, sparse=True) # Center at 40, 70, radius 20 out_circle = (x * x + y * y < 20**2) out_ring = out_circle - (x * x + y * y < 10**2) plt.imshow(out_circle) plt.figure() plt.imshow(out_ring) plt.show() Using sparse you can avoid the repetition of the arrays, getting the same functionality. Also, if the image is big, you can delegate the computations of out_circle and out_ring to numexpr for speed. /David. On 11 February 2014 22:16, Da?id wrote: > Here it is an example: > > import numpy as np > import pylab as plt > > N = 100 > M = 200 > x, y = np.meshgrid(np.arange(N), np.arange(M)) > > # Center at 40, 70, radius 20 > x -= 40 > y -= 70 > out_circle = (x * x + y * y < 20**2) > > out_ring = out_circle - (x * x + y * y < 10**2) > > plt.imshow(out_circle) > plt.figure() > plt.imshow(out_ring) > plt.show() > > Note that I have avoided taking the costly square root of each element by > just taking the square of the radius. It can also be generalised to > ellipses or rectangles, if you need it. > > Also, don't use 0 as a False value, and don't force it to be a 0. Instead, > use "if not ring:" > > > /David > > > > > > On 11 February 2014 21:56, Wolfgang Draxinger < > Wolfgang.Draxinger at physik.uni-muenchen.de> wrote: > >> Hi, >> >> I implemented the following helper function to create masking arrays: >> >> def gen_mask(self, ring, sector): >> in_segment = None >> if ring == 0: >> radius = self.scales()[0] >> def in_center_circle(xy): >> dx = xy[0] - self.xy[0] >> dy = xy[1] - self.xy[1] >> r = math.sqrt( dx**2 + dy**2 ) >> return r < radius >> in_segment = in_center_circle >> else: >> angle = ( self.a_sector(sector, ring), >> self.a_sector( (sector+1) % self.n_sectors(), >> ring) ) >> radii = self.scales()[ring:ring+1] >> def in_segment_ring(xy): >> dx = xy[0] - self.xy[0] >> dy = xy[1] - self.xy[1] >> r = math.sqrt( dx**2 + dy**2 ) >> a = math.atan2( dy, dx ) >> return r >= radii[0] and r < radii[1] \ >> and a >= angle[0] and a < angle[1] >> in_segment = in_segment_ring >> >> width,height = self.arr.shape >> >> mask = numpy.zeros(shape=(width, height), dtype=numpy.bool) >> for y in range(height): >> for x in range(width): >> mask[x][y] = in_segment((x,y)) >> return mask >> >> self.scales() generates a list of radius scaling factors. >> self.a_sector() gives the dividing angle between sectors "sector" and >> "sector+1" on the given ring. >> >> The function works, it generates the masks as I need it. The problem is >> - of course - that it's quite slow due to the inner loops the perform >> the geometrical test if the given element of the array self.arr is >> withing the bounds of the ring-sector for which the mask is generated. >> >> I wonder if you guys have some ideas, on how I could accelerate this. >> This can be perfectly well constructed by boolean combination of >> elementary geometrical objects. For example a ring would be >> >> ring(p, r0, r1): disk(p, r1) xor disk(p, r0) # where r0 < r1 >> >> The sector would then be >> >> ring_sector(p, r, s): ring(p, r, r + ...) and sector(p, s, s+1) >> >> I'd like to avoid doing this using some C helper routine. I'm looking >> for something that is the most efficient when it comes to >> "speedup"/"time invested to develop this". >> >> >> Cheers, >> >> Wolfgang >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniele at grinta.net Tue Feb 11 17:54:28 2014 From: daniele at grinta.net (Daniele Nicolodi) Date: Tue, 11 Feb 2014 23:54:28 +0100 Subject: [Numpy-discussion] Overlapping time series In-Reply-To: <1087677166413831668.247865sturla.molden-gmail.com@news.gmane.org> References: <52FA20E1.3050704@grinta.net> <52FA2151.4040806@hilboll.de> <52FA2406.5030501@grinta.net> <52FA2892.2060700@hilboll.de> <52FA29E7.4010501@grinta.net> <207196323413819665.062003sturla.molden-gmail.com@news.gmane.org> <52FA2E84.3050900@grinta.net> <2109333070413821346.530910sturla.molden-gmail.com@news.gmane.org> <52FA472B.1010206@grinta.net> <1087677166413831668.247865sturla.molden-gmail.com@news.gmane.org> Message-ID: <52FAAA24.7080008@grinta.net> On 11/02/2014 18:18, Sturla Molden wrote: > Daniele Nicolodi wrote: > >> That's more or less my current approach (except that I use the fact that >> the data is evenly samples to use np.where(np.diff(t1) != dt) to detect >> the regions of continuous data, to avoid the loop. > > I hope you realize that np.where(np.diff(t1) != dt) generates three loops, > as well as two temporary arrays and one output array. If you do what I > suggested, you get one loop and no temporaries. But you will need Numba or > Cython to get full speed. Sure, I realize that, thank for the clarification. The arrays are quite small, then the three loops and the temporary take negligible time and memory in the overall processing. I was lazy and I did not want to write in Cython, however I didn't benchmark the exact solution you propose written in pure python... Cheers, Daniele From sturla.molden at gmail.com Wed Feb 12 00:44:51 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Wed, 12 Feb 2014 05:44:51 +0000 (UTC) Subject: [Numpy-discussion] Overlapping time series References: <52FA20E1.3050704@grinta.net> <52FA2151.4040806@hilboll.de> <52FA2406.5030501@grinta.net> <52FA2892.2060700@hilboll.de> <52FA29E7.4010501@grinta.net> <207196323413819665.062003sturla.molden-gmail.com@news.gmane.org> <52FA2E84.3050900@grinta.net> <2109333070413821346.530910sturla.molden-gmail.com@news.gmane.org> <52FA472B.1010206@grinta.net> <1087677166413831668.247865sturla.molden-gmail.com@news.gmane.org> <52FAAA24.7080008@grinta.net> Message-ID: <1718003453413876412.866242sturla.molden-gmail.com@news.gmane.org> Daniele Nicolodi wrote: > Sure, I realize that, thank for the clarification. The arrays are quite > small, then the three loops and the temporary take negligible time and > memory in the overall processing. If they are small, a Python loop would do the job as well. And if it doesn't, it is just a matter of adding a couple of decorators to use Numba JIT. Sturla From Wolfgang.Draxinger at physik.uni-muenchen.de Wed Feb 12 05:43:37 2014 From: Wolfgang.Draxinger at physik.uni-muenchen.de (Wolfgang Draxinger) Date: Wed, 12 Feb 2014 11:43:37 +0100 Subject: [Numpy-discussion] Geometrically defined masking arrays; how to optimize? In-Reply-To: References: <52FA8E61.4000800@physik.uni-muenchen.de> Message-ID: <20140212114337.045e3513@narfi.yggdrasil.draxit.de> On Tue, 11 Feb 2014 22:16:46 +0100 Da?id wrote: > Here it is an example: > > import numpy as np > import pylab as plt > > N = 100 > M = 200 > x, y = np.meshgrid(np.arange(N), np.arange(M)) > > # Center at 40, 70, radius 20 > x -= 40 > y -= 70 > out_circle = (x * x + y * y < 20**2) Neat. > Note that I have avoided taking the costly square root of each > element by just taking the square of the radius. It can also be > generalised to ellipses or rectangles, if you need it. Oops, why did I take the "long" tour anyway? > Also, don't use 0 as a False value, and don't force it to be a 0. > Instead, use "if not ring:" 'ring' is not a boolean, but a numeric index and the test is if the index is nonzero. I could have also written 'if ring > 0:' and swapped the clauses. Yes I know that for a value 0 zero it will test as not, but in this case I wanted to have it written down, that this tests for a certain numerical value. Thanks, Wolfgang From chris.barker at noaa.gov Wed Feb 12 14:04:08 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 12 Feb 2014 11:04:08 -0800 Subject: [Numpy-discussion] Geometrically defined masking arrays; how to optimize? In-Reply-To: <20140212114337.045e3513@narfi.yggdrasil.draxit.de> References: <52FA8E61.4000800@physik.uni-muenchen.de> <20140212114337.045e3513@narfi.yggdrasil.draxit.de> Message-ID: An extra note here: One of the great things about numpy (as opposed, to say, MATLAB), is array broadcasting Thus you generally don't need meshgrid -- why carry all that extra repetitive data around. So here's a version that uses broadcasting instead radius = 5 # Center at 0, 0 x = np.linspace(-5,5,11) # y is re-shaped to be a column vector... y = np.linspace(-8,8,17).reshape((-1,1)) # when you add x and y you get the 2-d array for the results... out_circle = (x * x + y * y < radius**2) print out_circle On Wed, Feb 12, 2014 at 2:43 AM, Wolfgang Draxinger < Wolfgang.Draxinger at physik.uni-muenchen.de> wrote: > On Tue, 11 Feb 2014 22:16:46 +0100 > Da?id wrote: > > > Here it is an example: > > > > import numpy as np > > import pylab as plt > > > > N = 100 > > M = 200 > > x, y = np.meshgrid(np.arange(N), np.arange(M)) > > > > # Center at 40, 70, radius 20 > > x -= 40 > > y -= 70 > > out_circle = (x * x + y * y < 20**2) > > Neat. > > > Note that I have avoided taking the costly square root of each > > element by just taking the square of the radius. It can also be > > generalised to ellipses or rectangles, if you need it. > > Oops, why did I take the "long" tour anyway? > > > Also, don't use 0 as a False value, and don't force it to be a 0. > > Instead, use "if not ring:" > > 'ring' is not a boolean, but a numeric index and the test is if the > index is nonzero. I could have also written 'if ring > 0:' and swapped > the clauses. Yes I know that for a value 0 zero it will test as not, > but in this case I wanted to have it written down, that this tests for > a certain numerical value. > > > Thanks, > > Wolfgang > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Wed Feb 12 14:23:18 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 12 Feb 2014 20:23:18 +0100 Subject: [Numpy-discussion] Geometrically defined masking arrays; how to optimize? In-Reply-To: References: <52FA8E61.4000800@physik.uni-muenchen.de> <20140212114337.045e3513@narfi.yggdrasil.draxit.de> Message-ID: <52FBCA26.7030106@googlemail.com> meshgrid also has the sparse keyword argument which archives the same. On 12.02.2014 20:04, Chris Barker wrote: > An extra note here: > > One of the great things about numpy (as opposed, to say, MATLAB), is > array broadcasting Thus you generally don't need meshgrid -- why carry > all that extra repetitive data around. So here's a version that uses > broadcasting instead > > > radius = 5 > > # Center at 0, 0 > > x = np.linspace(-5,5,11) > > # y is re-shaped to be a column vector... > y = np.linspace(-8,8,17).reshape((-1,1)) > > # when you add x and y you get the 2-d array for the results... > > out_circle = (x * x + y * y < radius**2) > > print out_circle > > > > > On Wed, Feb 12, 2014 at 2:43 AM, Wolfgang Draxinger > > wrote: > > On Tue, 11 Feb 2014 22:16:46 +0100 > Da?id > wrote: > > > Here it is an example: > > > > import numpy as np > > import pylab as plt > > > > N = 100 > > M = 200 > > x, y = np.meshgrid(np.arange(N), np.arange(M)) > > > > # Center at 40, 70, radius 20 > > x -= 40 > > y -= 70 > > out_circle = (x * x + y * y < 20**2) > > Neat. > > > Note that I have avoided taking the costly square root of each > > element by just taking the square of the radius. It can also be > > generalised to ellipses or rectangles, if you need it. > > Oops, why did I take the "long" tour anyway? > > > Also, don't use 0 as a False value, and don't force it to be a 0. > > Instead, use "if not ring:" > > 'ring' is not a boolean, but a numeric index and the test is if the > index is nonzero. I could have also written 'if ring > 0:' and swapped > the clauses. Yes I know that for a value 0 zero it will test as not, > but in this case I wanted to have it written down, that this tests for > a certain numerical value. > > > Thanks, > > Wolfgang > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ndbecker2 at gmail.com Thu Feb 13 08:54:44 2014 From: ndbecker2 at gmail.com (Neal Becker) Date: Thu, 13 Feb 2014 08:54:44 -0500 Subject: [Numpy-discussion] FYI: libflatarray Message-ID: I thought this was interesting: http://www.libgeodecomp.org/libflatarray.html From sturla.molden at gmail.com Thu Feb 13 09:37:16 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 13 Feb 2014 14:37:16 +0000 (UTC) Subject: [Numpy-discussion] libflatarray References: Message-ID: <2041968608413994005.314792sturla.molden-gmail.com@news.gmane.org> Neal Becker wrote: > I thought this was interesting: > > http://www.libgeodecomp.org/libflatarray.html This is mostly flawed thinking. Nowadays, CPUs are much faster than memory access, and the gap is just increasing. In addition, CPUs have hierarchical memory (several layers of cache). Most algorithms therefore benefit from doing as much computation on the data as possible, before reading more data from RAM. That means that an interleaved memory layout is usually the more effective. This of course deepends on the algorithm, but an array of structs is usually better than a struct of arrays. Sturla From hoogendoorn.eelco at gmail.com Thu Feb 13 09:47:34 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Thu, 13 Feb 2014 15:47:34 +0100 Subject: [Numpy-discussion] libflatarray In-Reply-To: <2041968608413994005.314792sturla.molden-gmail.com@news.gmane.org> References: <2041968608413994005.314792sturla.molden-gmail.com@news.gmane.org> Message-ID: As usual, 'it depends', but a struct of arrays layout (which is a virtual necessity on GPU's), can also be advantageous on the CPU. One rarely acts on only a single object at a time; but quite often, you only work on a subset of the objects attributes at a time. In an array of structs layout, you are always pulling the whole objects from main memory into the cache, even if you only use a single attribute. In a struct of arrays layout, we can do efficient prefetching on a single attribute when looping over all objects. On Thu, Feb 13, 2014 at 3:37 PM, Sturla Molden wrote: > Neal Becker wrote: > > I thought this was interesting: > > > > http://www.libgeodecomp.org/libflatarray.html > > This is mostly flawed thinking. Nowadays, CPUs are much faster than memory > access, and the gap is just increasing. In addition, CPUs have hierarchical > memory (several layers of cache). Most algorithms therefore benefit from > doing as much computation on the data as possible, before reading more data > from RAM. That means that an interleaved memory layout is usually the more > effective. This of course deepends on the algorithm, but an array of > structs is usually better than a struct of arrays. > > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From argriffi at ncsu.edu Thu Feb 13 10:35:29 2014 From: argriffi at ncsu.edu (alex) Date: Thu, 13 Feb 2014 10:35:29 -0500 Subject: [Numpy-discussion] deprecate numpy.matrix In-Reply-To: References: Message-ID: On Mon, Feb 10, 2014 at 11:16 AM, Alexander Belopolsky wrote: > > On Sun, Feb 9, 2014 at 4:59 PM, alex wrote: >> >> On the other hand, it really needs to be deprecated. > > > While numpy.matrix may have its problems, a NEP should list a better > rationale than the above to gain acceptance. > > Personally, I decided not to use numpy.matrix in production code about 10 > years ago and never looked back to that decision. I've heard however that > some of the worst inheritance warts have been fixed over the years. I also > resisted introducing inheritance in the implementation of masked arrays, > but I lost that argument. For better or worse, inheritance from ndarray is > here to stay and I would rather see numpy.matrix stay as a test-bed for > fixing inheritance issues rather than see it deprecated and have the same > issues pop up in ma or elsewhere. btw here's an issue https://github.com/scipy/scipy/issues/3324 reported on scipy github this morning, involving the interaction between numpy.matrix and masked arrays. Alex From charlesr.harris at gmail.com Thu Feb 13 11:48:55 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 13 Feb 2014 09:48:55 -0700 Subject: [Numpy-discussion] GSOC Message-ID: Thought I'd forward this to the lists in case we need to do something. Hi everyone, > > Just a friendly reminder that applications for mentoring organizations > close in about 24 hours. Please get your applications in soon, we will > not accept late applications for any reason! > > Thanks, > Carol Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jenny.stone125 at gmail.com Thu Feb 13 13:27:11 2014 From: jenny.stone125 at gmail.com (Jennifer stone) Date: Thu, 13 Feb 2014 23:57:11 +0530 Subject: [Numpy-discussion] GSOC In-Reply-To: References: Message-ID: On Thu, Feb 13, 2014 at 10:18 PM, Charles R Harris < charlesr.harris at gmail.com> wrote: > Thought I'd forward this to the lists in case we need to do something. > > Hi everyone, >> >> Just a friendly reminder that applications for mentoring organizations >> close in about 24 hours. Please get your applications in soon, we will >> not accept late applications for any reason! >> >> Thanks, >> Carol > > > Chuck > > Guys, please register. People like me are eagerly looking forward to work with the organization this summer! -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu Feb 13 13:59:51 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 13 Feb 2014 13:59:51 -0500 Subject: [Numpy-discussion] GSOC In-Reply-To: References: Message-ID: On Thu, Feb 13, 2014 at 1:27 PM, Jennifer stone wrote: > > > > On Thu, Feb 13, 2014 at 10:18 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> Thought I'd forward this to the lists in case we need to do something. >> >> Hi everyone, >>> >>> Just a friendly reminder that applications for mentoring organizations >>> close in about 24 hours. Please get your applications in soon, we will >>> not accept late applications for any reason! >>> >>> Thanks, >>> Carol >> >> >> Chuck >> >> Guys, please register. People like me are eagerly looking forward to work > with the organization this summer! > I assume numpy/scipy will participate under the PSF umbrella. So this deadline is for the PSF. However, Terri, the organizer for the PSF, asked for links to Ideas pages to be able to show Google what interesting projects the PSF has. Josef > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Thu Feb 13 14:44:31 2014 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 13 Feb 2014 21:44:31 +0200 Subject: [Numpy-discussion] GSOC In-Reply-To: References: Message-ID: -----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 13.02.2014 20:59, josef.pktd at gmail.com kirjoitti: [clip] > I assume numpy/scipy will participate under the PSF umbrella. So > this deadline is for the PSF. However, Terri, the organizer for the > PSF, asked for links to Ideas pages to be able to show Google what > interesting projects the PSF has. Here's a shot at that (stolen from roadmap etc): https://github.com/scipy/scipy/wiki/GSoC-project-ideas Please update as you see fit. Did we count the number of prospective mentors >= 3? Scipy is not yet listed on the PSF GSoC 2014 project list, so I think if we are going to participate, we should let them know. Best, Pauli -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.14 (GNU/Linux) iEYEARECAAYFAlL9IJoACgkQ6BQxb7O0pWALNwCgy0YwyTBxuaD+As3lOiAlp0/A 3ZcAnR4VCb9rjQ0WE/JDbfpWPxbAj76W =iYXB -----END PGP SIGNATURE----- From josef.pktd at gmail.com Thu Feb 13 15:32:27 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 13 Feb 2014 15:32:27 -0500 Subject: [Numpy-discussion] GSOC In-Reply-To: References: Message-ID: On Thu, Feb 13, 2014 at 2:44 PM, Pauli Virtanen wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > 13.02.2014 20:59, josef.pktd at gmail.com kirjoitti: > [clip] >> I assume numpy/scipy will participate under the PSF umbrella. So >> this deadline is for the PSF. However, Terri, the organizer for the >> PSF, asked for links to Ideas pages to be able to show Google what >> interesting projects the PSF has. > > Here's a shot at that (stolen from roadmap etc): > > https://github.com/scipy/scipy/wiki/GSoC-project-ideas > > Please update as you see fit. > > Did we count the number of prospective mentors >= 3? Scipy is not yet > listed on the PSF GSoC 2014 project list, so I think if we are going > to participate, we should let them know. It should be possible to edit the python wiki https://wiki.python.org/moin/SummerOfCode/2014 Terri phrased it so that it means that projects should add the links. The decision which projects will participate under the PSF will be later, if it's the same pattern as in previous years. Josef > > Best, > Pauli > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.14 (GNU/Linux) > > iEYEARECAAYFAlL9IJoACgkQ6BQxb7O0pWALNwCgy0YwyTBxuaD+As3lOiAlp0/A > 3ZcAnR4VCb9rjQ0WE/JDbfpWPxbAj76W > =iYXB > -----END PGP SIGNATURE----- > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charles at crunch.io Fri Feb 14 16:51:55 2014 From: charles at crunch.io (Charles G. Waldman) Date: Fri, 14 Feb 2014 15:51:55 -0600 Subject: [Numpy-discussion] bool value of dtype is False? Message-ID: >>> d = numpy.dtype(int) >>> if d: print "OK" ... else: print "I'm surprised" I'm surprised From ndarray at mac.com Fri Feb 14 19:59:43 2014 From: ndarray at mac.com (Alexander Belopolsky) Date: Fri, 14 Feb 2014 19:59:43 -0500 Subject: [Numpy-discussion] bool value of dtype is False? In-Reply-To: References: Message-ID: On Fri, Feb 14, 2014 at 4:51 PM, Charles G. Waldman wrote: > >>> d = numpy.dtype(int) > >>> if d: print "OK" > ... else: print "I'm surprised" > > I'm surprised > _______________________________________________ > I think this is an artifact of regular dtypes having "length" of zero: >>> len(array(1.).dtype) 0 For record arrays dtypes you would get True: >>> len(numpy.dtype([('x', int)])) 1 >>> bool(numpy.dtype([('x', int)])) True -------------- next part -------------- An HTML attachment was scrubbed... URL: From argriffi at ncsu.edu Sat Feb 15 16:37:15 2014 From: argriffi at ncsu.edu (alex) Date: Sat, 15 Feb 2014 16:37:15 -0500 Subject: [Numpy-discussion] svd error checking vs. speed Message-ID: Hello list, Here's another idea resurrection from numpy github comments that I've been advised could be posted here for re-discussion. The proposal would be to make np.linalg.svd more like scipy.linalg.svd with respect to input checking. The argument against the change is raw speed; if you know that you will never feed non-finite input to svd, then np.linalg.svd is a bit faster than scipy.linalg.svd. An argument for the change could be to avoid issues reported on github like crashes, hangs, spurious non-convergence exceptions, etc. from the undefined behavior of svd of non-finite input. """ [...] the following numpy code hangs until I `kill -9` it. ``` $ python runtests.py --shell $ python Python 2.7.5+ [GCC 4.8.1] on linux2 >>> import numpy as np >>> np.__version__ '1.9.0.dev-e3f0f53' >>> A = np.array([[1e3, 0], [0, 1]]) >>> B = np.array([[1e300, 0], [0, 1]]) >>> C = np.array([[1e3000, 0], [0, 1]]) >>> np.linalg.svd(A) (array([[ 1., 0.], [ 0., 1.]]), array([ 1000., 1.]), array([[ 1., 0.], [ 0., 1.]])) >>> np.linalg.svd(B) (array([[ 1., 0.], [ 0., 1.]]), array([ 1.00000000e+300, 1.00000000e+000]), array([[ 1., 0.], [ 0., 1.]])) >>> np.linalg.svd(C) [hangs forever] ``` """ Alex From sebastian at sipsolutions.net Sat Feb 15 16:56:08 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sat, 15 Feb 2014 22:56:08 +0100 Subject: [Numpy-discussion] svd error checking vs. speed In-Reply-To: References: Message-ID: <1392501368.21195.1.camel@sebastian-t440> On Sa, 2014-02-15 at 16:37 -0500, alex wrote: > Hello list, > > Here's another idea resurrection from numpy github comments that I've > been advised could be posted here for re-discussion. > > The proposal would be to make np.linalg.svd more like scipy.linalg.svd > with respect to input checking. The argument against the change is > raw speed; if you know that you will never feed non-finite input to > svd, then np.linalg.svd is a bit faster than scipy.linalg.svd. An > argument for the change could be to avoid issues reported on github > like crashes, hangs, spurious non-convergence exceptions, etc. from > the undefined behavior of svd of non-finite input. > +1, unless this is a huge speed penalty, correctness (and decent error messages) should come first in my opinion, this is python after all. If this is a noticable speed difference, a kwarg may be an option (but would think about that some more). - Sebastian > """ > [...] the following numpy code hangs until I `kill -9` it. > > ``` > $ python runtests.py --shell > $ python > Python 2.7.5+ > [GCC 4.8.1] on linux2 > >>> import numpy as np > >>> np.__version__ > '1.9.0.dev-e3f0f53' > >>> A = np.array([[1e3, 0], [0, 1]]) > >>> B = np.array([[1e300, 0], [0, 1]]) > >>> C = np.array([[1e3000, 0], [0, 1]]) > >>> np.linalg.svd(A) > (array([[ 1., 0.], > [ 0., 1.]]), array([ 1000., 1.]), array([[ 1., 0.], > [ 0., 1.]])) > >>> np.linalg.svd(B) > (array([[ 1., 0.], > [ 0., 1.]]), array([ 1.00000000e+300, 1.00000000e+000]), > array([[ 1., 0.], > [ 0., 1.]])) > >>> np.linalg.svd(C) > [hangs forever] > ``` > """ > > Alex > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Sat Feb 15 17:08:31 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 15 Feb 2014 17:08:31 -0500 Subject: [Numpy-discussion] svd error checking vs. speed In-Reply-To: <1392501368.21195.1.camel@sebastian-t440> References: <1392501368.21195.1.camel@sebastian-t440> Message-ID: On Sat, Feb 15, 2014 at 4:56 PM, Sebastian Berg wrote: > On Sa, 2014-02-15 at 16:37 -0500, alex wrote: >> Hello list, >> >> Here's another idea resurrection from numpy github comments that I've >> been advised could be posted here for re-discussion. >> >> The proposal would be to make np.linalg.svd more like scipy.linalg.svd >> with respect to input checking. The argument against the change is >> raw speed; if you know that you will never feed non-finite input to >> svd, then np.linalg.svd is a bit faster than scipy.linalg.svd. An >> argument for the change could be to avoid issues reported on github >> like crashes, hangs, spurious non-convergence exceptions, etc. from >> the undefined behavior of svd of non-finite input. >> > > +1, unless this is a huge speed penalty, correctness (and decent error > messages) should come first in my opinion, this is python after all. If > this is a noticable speed difference, a kwarg may be an option (but > would think about that some more). maybe -1 statsmodels is using np.linalg.pinv which uses svd I never ran heard of any crash (*), and the only time I compared with scipy I didn't like the slowdown. I didn't do any serious timings just a few examples. (*) not converged, ... pinv(x.T).dot(x) -> pinv(x.T, please_don_t_check=True).dot(y) numbers ? grep: we also use scipy.linalg.pinv in some cases Josef > > - Sebastian > >> """ >> [...] the following numpy code hangs until I `kill -9` it. >> >> ``` >> $ python runtests.py --shell >> $ python >> Python 2.7.5+ >> [GCC 4.8.1] on linux2 >> >>> import numpy as np >> >>> np.__version__ >> '1.9.0.dev-e3f0f53' >> >>> A = np.array([[1e3, 0], [0, 1]]) >> >>> B = np.array([[1e300, 0], [0, 1]]) >> >>> C = np.array([[1e3000, 0], [0, 1]]) >> >>> np.linalg.svd(A) >> (array([[ 1., 0.], >> [ 0., 1.]]), array([ 1000., 1.]), array([[ 1., 0.], >> [ 0., 1.]])) >> >>> np.linalg.svd(B) >> (array([[ 1., 0.], >> [ 0., 1.]]), array([ 1.00000000e+300, 1.00000000e+000]), >> array([[ 1., 0.], >> [ 0., 1.]])) >> >>> np.linalg.svd(C) >> [hangs forever] >> ``` >> """ >> >> Alex >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From jsseabold at gmail.com Sat Feb 15 17:12:37 2014 From: jsseabold at gmail.com (Skipper Seabold) Date: Sat, 15 Feb 2014 17:12:37 -0500 Subject: [Numpy-discussion] svd error checking vs. speed In-Reply-To: References: <1392501368.21195.1.camel@sebastian-t440> Message-ID: On Sat, Feb 15, 2014 at 5:08 PM, wrote: > On Sat, Feb 15, 2014 at 4:56 PM, Sebastian Berg > wrote: > > On Sa, 2014-02-15 at 16:37 -0500, alex wrote: > >> Hello list, > >> > >> Here's another idea resurrection from numpy github comments that I've > >> been advised could be posted here for re-discussion. > >> > >> The proposal would be to make np.linalg.svd more like scipy.linalg.svd > >> with respect to input checking. The argument against the change is > >> raw speed; if you know that you will never feed non-finite input to > >> svd, then np.linalg.svd is a bit faster than scipy.linalg.svd. An > >> argument for the change could be to avoid issues reported on github > >> like crashes, hangs, spurious non-convergence exceptions, etc. from > >> the undefined behavior of svd of non-finite input. > >> > > > > +1, unless this is a huge speed penalty, correctness (and decent error > > messages) should come first in my opinion, this is python after all. If > > this is a noticable speed difference, a kwarg may be an option (but > > would think about that some more). > > maybe -1 > > statsmodels is using np.linalg.pinv which uses svd > I never ran heard of any crash (*), and the only time I compared with > scipy I didn't like the slowdown. > I didn't do any serious timings just a few examples. > > (*) not converged, ... > > pinv(x.T).dot(x) -> pinv(x.T, please_don_t_check=True).dot(y) > > numbers ? > FWIW, I see this spurious SVD did not converge warning very frequently with ARMA when there is a nan that has creeped in. I usually know where to find the problem, but I think it'd be nice if this error message was a little better. Skipper > > grep: we also use scipy.linalg.pinv in some cases > > Josef > > > > > > - Sebastian > > > >> """ > >> [...] the following numpy code hangs until I `kill -9` it. > >> > >> ``` > >> $ python runtests.py --shell > >> $ python > >> Python 2.7.5+ > >> [GCC 4.8.1] on linux2 > >> >>> import numpy as np > >> >>> np.__version__ > >> '1.9.0.dev-e3f0f53' > >> >>> A = np.array([[1e3, 0], [0, 1]]) > >> >>> B = np.array([[1e300, 0], [0, 1]]) > >> >>> C = np.array([[1e3000, 0], [0, 1]]) > >> >>> np.linalg.svd(A) > >> (array([[ 1., 0.], > >> [ 0., 1.]]), array([ 1000., 1.]), array([[ 1., 0.], > >> [ 0., 1.]])) > >> >>> np.linalg.svd(B) > >> (array([[ 1., 0.], > >> [ 0., 1.]]), array([ 1.00000000e+300, 1.00000000e+000]), > >> array([[ 1., 0.], > >> [ 0., 1.]])) > >> >>> np.linalg.svd(C) > >> [hangs forever] > >> ``` > >> """ > >> > >> Alex > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From argriffi at ncsu.edu Sat Feb 15 17:18:02 2014 From: argriffi at ncsu.edu (alex) Date: Sat, 15 Feb 2014 17:18:02 -0500 Subject: [Numpy-discussion] svd error checking vs. speed In-Reply-To: References: <1392501368.21195.1.camel@sebastian-t440> Message-ID: On Sat, Feb 15, 2014 at 5:08 PM, wrote: > On Sat, Feb 15, 2014 at 4:56 PM, Sebastian Berg > wrote: >> On Sa, 2014-02-15 at 16:37 -0500, alex wrote: >>> Hello list, >>> >>> Here's another idea resurrection from numpy github comments that I've >>> been advised could be posted here for re-discussion. >>> >>> The proposal would be to make np.linalg.svd more like scipy.linalg.svd >>> with respect to input checking. The argument against the change is >>> raw speed; if you know that you will never feed non-finite input to >>> svd, then np.linalg.svd is a bit faster than scipy.linalg.svd. An >>> argument for the change could be to avoid issues reported on github >>> like crashes, hangs, spurious non-convergence exceptions, etc. from >>> the undefined behavior of svd of non-finite input. >>> >> >> +1, unless this is a huge speed penalty, correctness (and decent error >> messages) should come first in my opinion, this is python after all. If >> this is a noticable speed difference, a kwarg may be an option (but >> would think about that some more). > > maybe -1 > > statsmodels is using np.linalg.pinv which uses svd > I never ran heard of any crash (*), and the only time I compared with > scipy I didn't like the slowdown. > I didn't do any serious timings just a few examples. According to https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tools/linalg.py statsmodels pinv(A) checks isfinite(A) at least twice and also checks for finiteness of the identity matrix. Or maybe this is not the pinv that you meant. Alex From josef.pktd at gmail.com Sat Feb 15 17:35:53 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 15 Feb 2014 17:35:53 -0500 Subject: [Numpy-discussion] svd error checking vs. speed In-Reply-To: References: <1392501368.21195.1.camel@sebastian-t440> Message-ID: On Sat, Feb 15, 2014 at 5:12 PM, Skipper Seabold wrote: > On Sat, Feb 15, 2014 at 5:08 PM, wrote: >> >> On Sat, Feb 15, 2014 at 4:56 PM, Sebastian Berg >> wrote: >> > On Sa, 2014-02-15 at 16:37 -0500, alex wrote: >> >> Hello list, >> >> >> >> Here's another idea resurrection from numpy github comments that I've >> >> been advised could be posted here for re-discussion. >> >> >> >> The proposal would be to make np.linalg.svd more like scipy.linalg.svd >> >> with respect to input checking. The argument against the change is >> >> raw speed; if you know that you will never feed non-finite input to >> >> svd, then np.linalg.svd is a bit faster than scipy.linalg.svd. An >> >> argument for the change could be to avoid issues reported on github >> >> like crashes, hangs, spurious non-convergence exceptions, etc. from >> >> the undefined behavior of svd of non-finite input. >> >> >> > >> > +1, unless this is a huge speed penalty, correctness (and decent error >> > messages) should come first in my opinion, this is python after all. If >> > this is a noticable speed difference, a kwarg may be an option (but >> > would think about that some more). >> >> maybe -1 >> >> statsmodels is using np.linalg.pinv which uses svd >> I never ran heard of any crash (*), and the only time I compared with >> scipy I didn't like the slowdown. >> I didn't do any serious timings just a few examples. >> >> (*) not converged, ... >> >> pinv(x.T).dot(x) -> pinv(x.T, please_don_t_check=True).dot(y) >> >> numbers ? > > > FWIW, I see this spurious SVD did not converge warning very frequently with > ARMA when there is a nan that has creeped in. I usually know where to find > the problem, but I think it'd be nice if this error message was a little > better. maybe I'm +1 While we don't see crashes, when I run Alex's example I see 13% cpu usage for a hanging process which looks very familiar to me, I see it reasonably often when I'm debugging code. I never tried to track down where it hangs. Josef > > Skipper > >> >> >> grep: we also use scipy.linalg.pinv in some cases >> >> Josef >> >> >> > >> > - Sebastian >> > >> >> """ >> >> [...] the following numpy code hangs until I `kill -9` it. >> >> >> >> ``` >> >> $ python runtests.py --shell >> >> $ python >> >> Python 2.7.5+ >> >> [GCC 4.8.1] on linux2 >> >> >>> import numpy as np >> >> >>> np.__version__ >> >> '1.9.0.dev-e3f0f53' >> >> >>> A = np.array([[1e3, 0], [0, 1]]) >> >> >>> B = np.array([[1e300, 0], [0, 1]]) >> >> >>> C = np.array([[1e3000, 0], [0, 1]]) >> >> >>> np.linalg.svd(A) >> >> (array([[ 1., 0.], >> >> [ 0., 1.]]), array([ 1000., 1.]), array([[ 1., 0.], >> >> [ 0., 1.]])) >> >> >>> np.linalg.svd(B) >> >> (array([[ 1., 0.], >> >> [ 0., 1.]]), array([ 1.00000000e+300, 1.00000000e+000]), >> >> array([[ 1., 0.], >> >> [ 0., 1.]])) >> >> >>> np.linalg.svd(C) >> >> [hangs forever] >> >> ``` >> >> """ >> >> >> >> Alex >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From argriffi at ncsu.edu Sat Feb 15 17:46:54 2014 From: argriffi at ncsu.edu (alex) Date: Sat, 15 Feb 2014 17:46:54 -0500 Subject: [Numpy-discussion] svd error checking vs. speed In-Reply-To: References: <1392501368.21195.1.camel@sebastian-t440> Message-ID: On Sat, Feb 15, 2014 at 5:08 PM, wrote: > On Sat, Feb 15, 2014 at 4:56 PM, Sebastian Berg > wrote: >> On Sa, 2014-02-15 at 16:37 -0500, alex wrote: >>> Hello list, >>> >>> Here's another idea resurrection from numpy github comments that I've >>> been advised could be posted here for re-discussion. >>> >>> The proposal would be to make np.linalg.svd more like scipy.linalg.svd >>> with respect to input checking. The argument against the change is >>> raw speed; if you know that you will never feed non-finite input to >>> svd, then np.linalg.svd is a bit faster than scipy.linalg.svd. An >>> argument for the change could be to avoid issues reported on github >>> like crashes, hangs, spurious non-convergence exceptions, etc. from >>> the undefined behavior of svd of non-finite input. >>> >> >> +1, unless this is a huge speed penalty, correctness (and decent error >> messages) should come first in my opinion, this is python after all. If >> this is a noticable speed difference, a kwarg may be an option (but >> would think about that some more). > > maybe -1 > > statsmodels is using np.linalg.pinv which uses svd > I never ran heard of any crash (*), and the only time I compared with > scipy I didn't like the slowdown. Although numpy.linalg.pinv uses svd, scipy.linalg.pinv uses least squares with an rhs identity matrix. The scipy.linalg function that uses svd for pseudoinverse is pinv2. These connect to different LAPACK functions. Also I noticed that these scipy functions all have redundant finiteness checking which I've fixed in a PR. Alex From josef.pktd at gmail.com Sat Feb 15 18:02:22 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 15 Feb 2014 18:02:22 -0500 Subject: [Numpy-discussion] svd error checking vs. speed In-Reply-To: References: <1392501368.21195.1.camel@sebastian-t440> Message-ID: On Sat, Feb 15, 2014 at 5:18 PM, alex wrote: > On Sat, Feb 15, 2014 at 5:08 PM, wrote: >> On Sat, Feb 15, 2014 at 4:56 PM, Sebastian Berg >> wrote: >>> On Sa, 2014-02-15 at 16:37 -0500, alex wrote: >>>> Hello list, >>>> >>>> Here's another idea resurrection from numpy github comments that I've >>>> been advised could be posted here for re-discussion. >>>> >>>> The proposal would be to make np.linalg.svd more like scipy.linalg.svd >>>> with respect to input checking. The argument against the change is >>>> raw speed; if you know that you will never feed non-finite input to >>>> svd, then np.linalg.svd is a bit faster than scipy.linalg.svd. An >>>> argument for the change could be to avoid issues reported on github >>>> like crashes, hangs, spurious non-convergence exceptions, etc. from >>>> the undefined behavior of svd of non-finite input. >>>> >>> >>> +1, unless this is a huge speed penalty, correctness (and decent error >>> messages) should come first in my opinion, this is python after all. If >>> this is a noticable speed difference, a kwarg may be an option (but >>> would think about that some more). >> >> maybe -1 >> >> statsmodels is using np.linalg.pinv which uses svd >> I never ran heard of any crash (*), and the only time I compared with >> scipy I didn't like the slowdown. >> I didn't do any serious timings just a few examples. > > According to https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tools/linalg.py > statsmodels pinv(A) checks isfinite(A) at least twice and also checks > for finiteness of the identity matrix. Or maybe this is not the pinv > that you meant. that's dead code copy of np.pinv used in linear regression https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tools/tools.py#L348 (it's a recent change to streamline some of the linalg in regression, and master only) outside of linear regression we still use almost only np.linalg.pinv directly Josef > > Alex > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sebastian at sipsolutions.net Sat Feb 15 18:06:49 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 16 Feb 2014 00:06:49 +0100 Subject: [Numpy-discussion] svd error checking vs. speed In-Reply-To: References: <1392501368.21195.1.camel@sebastian-t440> Message-ID: <1392505609.21195.18.camel@sebastian-t440> On Sa, 2014-02-15 at 17:35 -0500, josef.pktd at gmail.com wrote: > On Sat, Feb 15, 2014 at 5:12 PM, Skipper Seabold wrote: > > On Sat, Feb 15, 2014 at 5:08 PM, wrote: > >> > >> On Sat, Feb 15, 2014 at 4:56 PM, Sebastian Berg > >> wrote: > >> > On Sa, 2014-02-15 at 16:37 -0500, alex wrote: > >> >> Hello list, > >> >> > >> >> Here's another idea resurrection from numpy github comments that I've > >> >> been advised could be posted here for re-discussion. > >> >> > >> >> The proposal would be to make np.linalg.svd more like scipy.linalg.svd > >> >> with respect to input checking. The argument against the change is > >> >> raw speed; if you know that you will never feed non-finite input to > >> >> svd, then np.linalg.svd is a bit faster than scipy.linalg.svd. An > >> >> argument for the change could be to avoid issues reported on github > >> >> like crashes, hangs, spurious non-convergence exceptions, etc. from > >> >> the undefined behavior of svd of non-finite input. > >> >> > >> > > >> > +1, unless this is a huge speed penalty, correctness (and decent error > >> > messages) should come first in my opinion, this is python after all. If > >> > this is a noticable speed difference, a kwarg may be an option (but > >> > would think about that some more). > >> > >> maybe -1 > >> > >> statsmodels is using np.linalg.pinv which uses svd > >> I never ran heard of any crash (*), and the only time I compared with > >> scipy I didn't like the slowdown. > >> I didn't do any serious timings just a few examples. > >> > >> (*) not converged, ... > >> > >> pinv(x.T).dot(x) -> pinv(x.T, please_don_t_check=True).dot(y) > >> > >> numbers ? > > > > > > FWIW, I see this spurious SVD did not converge warning very frequently with > > ARMA when there is a nan that has creeped in. I usually know where to find > > the problem, but I think it'd be nice if this error message was a little > > better. > > maybe I'm +1 > > While we don't see crashes, when I run Alex's example I see 13% cpu > usage for a hanging process which looks very familiar to me, I see it > reasonably often when I'm debugging code. > > I never tried to track down where it hangs. > If this should not cause big hangs/crashes (just "not converged" after a long time or so), then maybe we should just check afterwards to give the user a better idea of where to look for the error. I think I remember people running into this and being confused (but without crash/hang). - Sebsatian > Josef > > > > > Skipper > > > >> > >> > >> grep: we also use scipy.linalg.pinv in some cases > >> > >> Josef > >> > >> > >> > > >> > - Sebastian > >> > > >> >> """ > >> >> [...] the following numpy code hangs until I `kill -9` it. > >> >> > >> >> ``` > >> >> $ python runtests.py --shell > >> >> $ python > >> >> Python 2.7.5+ > >> >> [GCC 4.8.1] on linux2 > >> >> >>> import numpy as np > >> >> >>> np.__version__ > >> >> '1.9.0.dev-e3f0f53' > >> >> >>> A = np.array([[1e3, 0], [0, 1]]) > >> >> >>> B = np.array([[1e300, 0], [0, 1]]) > >> >> >>> C = np.array([[1e3000, 0], [0, 1]]) > >> >> >>> np.linalg.svd(A) > >> >> (array([[ 1., 0.], > >> >> [ 0., 1.]]), array([ 1000., 1.]), array([[ 1., 0.], > >> >> [ 0., 1.]])) > >> >> >>> np.linalg.svd(B) > >> >> (array([[ 1., 0.], > >> >> [ 0., 1.]]), array([ 1.00000000e+300, 1.00000000e+000]), > >> >> array([[ 1., 0.], > >> >> [ 0., 1.]])) > >> >> >>> np.linalg.svd(C) > >> >> [hangs forever] > >> >> ``` > >> >> """ > >> >> > >> >> Alex > >> >> _______________________________________________ > >> >> NumPy-Discussion mailing list > >> >> NumPy-Discussion at scipy.org > >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> >> > >> > > >> > > >> > _______________________________________________ > >> > NumPy-Discussion mailing list > >> > NumPy-Discussion at scipy.org > >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sturla.molden at gmail.com Sat Feb 15 18:13:27 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 15 Feb 2014 23:13:27 +0000 (UTC) Subject: [Numpy-discussion] svd error checking vs. speed References: <1392501368.21195.1.camel@sebastian-t440> Message-ID: <1110129906414198541.505916sturla.molden-gmail.com@news.gmane.org> wrote: > copy of np.pinv used in linear regression > https://github.com/statsmodels/statsmodels/blob/master/statsmodels/tools/tools.py#L348 > (it's a recent change to streamline some of the linalg in regression, > and master only) Why not call lapack routine DGELSS instead? It does exactly this, only faster. (And DEGELS for fitting with QR?) Sturla From argriffi at ncsu.edu Sat Feb 15 18:20:38 2014 From: argriffi at ncsu.edu (alex) Date: Sat, 15 Feb 2014 18:20:38 -0500 Subject: [Numpy-discussion] svd error checking vs. speed In-Reply-To: <1392505609.21195.18.camel@sebastian-t440> References: <1392501368.21195.1.camel@sebastian-t440> <1392505609.21195.18.camel@sebastian-t440> Message-ID: On Sat, Feb 15, 2014 at 6:06 PM, Sebastian Berg wrote: > On Sa, 2014-02-15 at 17:35 -0500, josef.pktd at gmail.com wrote: >> On Sat, Feb 15, 2014 at 5:12 PM, Skipper Seabold wrote: >> > On Sat, Feb 15, 2014 at 5:08 PM, wrote: >> >> >> >> On Sat, Feb 15, 2014 at 4:56 PM, Sebastian Berg >> >> wrote: >> >> > On Sa, 2014-02-15 at 16:37 -0500, alex wrote: >> >> >> Hello list, >> >> >> >> >> >> Here's another idea resurrection from numpy github comments that I've >> >> >> been advised could be posted here for re-discussion. >> >> >> >> >> >> The proposal would be to make np.linalg.svd more like scipy.linalg.svd >> >> >> with respect to input checking. The argument against the change is >> >> >> raw speed; if you know that you will never feed non-finite input to >> >> >> svd, then np.linalg.svd is a bit faster than scipy.linalg.svd. An >> >> >> argument for the change could be to avoid issues reported on github >> >> >> like crashes, hangs, spurious non-convergence exceptions, etc. from >> >> >> the undefined behavior of svd of non-finite input. >> >> >> >> >> > >> >> > +1, unless this is a huge speed penalty, correctness (and decent error >> >> > messages) should come first in my opinion, this is python after all. If >> >> > this is a noticable speed difference, a kwarg may be an option (but >> >> > would think about that some more). >> >> >> >> maybe -1 >> >> >> >> statsmodels is using np.linalg.pinv which uses svd >> >> I never ran heard of any crash (*), and the only time I compared with >> >> scipy I didn't like the slowdown. >> >> I didn't do any serious timings just a few examples. >> >> >> >> (*) not converged, ... >> >> >> >> pinv(x.T).dot(x) -> pinv(x.T, please_don_t_check=True).dot(y) >> >> >> >> numbers ? >> > >> > >> > FWIW, I see this spurious SVD did not converge warning very frequently with >> > ARMA when there is a nan that has creeped in. I usually know where to find >> > the problem, but I think it'd be nice if this error message was a little >> > better. >> >> maybe I'm +1 >> >> While we don't see crashes, when I run Alex's example I see 13% cpu >> usage for a hanging process which looks very familiar to me, I see it >> reasonably often when I'm debugging code. >> >> I never tried to track down where it hangs. >> > > If this should not cause big hangs/crashes (just "not converged" after a > long time or so), then maybe we should just check afterwards to give the > user a better idea of where to look for the error. I think I remember > people running into this and being confused (but without crash/hang). I'm not sure exactly what you mean by this. You are suggesting that if the svd fails with some kind of exception (possibly poorly or misleadingly worded) then it could be cleaned-up after the fact by checking the input, and that this would not incur the speed penalty because no check will be done if the svd succeeds? This would not work on my system because that svd call really does hang, as in some non-ctrl-c-interruptable spin lock inside fortran code or something. I think the behavior is undefined and it can crash although I do not personally have an example of this. These modes of failure cannot be recovered from as easily as recovering from an exception. Alex From sebastian at sipsolutions.net Sat Feb 15 18:34:18 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 16 Feb 2014 00:34:18 +0100 Subject: [Numpy-discussion] svd error checking vs. speed In-Reply-To: References: <1392501368.21195.1.camel@sebastian-t440> <1392505609.21195.18.camel@sebastian-t440> Message-ID: <1392507258.21195.21.camel@sebastian-t440> On Sa, 2014-02-15 at 18:20 -0500, alex wrote: > > I'm not sure exactly what you mean by this. You are suggesting that > if the svd fails with some kind of exception (possibly poorly or > misleadingly worded) then it could be cleaned-up after the fact by > checking the input, and that this would not incur the speed penalty > because no check will be done if the svd succeeds? This would not > work on my system because that svd call really does hang, as in some > non-ctrl-c-interruptable spin lock inside fortran code or something. > I think the behavior is undefined and it can crash although I do not > personally have an example of this. These modes of failure cannot be > recovered from as easily as recovering from an exception. > Yeah, I meant that. But it has a big "if", that the failure is basically a bug in the library you happen to be using and extremely uncommon. - Sebastian > Alex > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From argriffi at ncsu.edu Sat Feb 15 18:41:05 2014 From: argriffi at ncsu.edu (alex) Date: Sat, 15 Feb 2014 18:41:05 -0500 Subject: [Numpy-discussion] svd error checking vs. speed In-Reply-To: <1392507258.21195.21.camel@sebastian-t440> References: <1392501368.21195.1.camel@sebastian-t440> <1392505609.21195.18.camel@sebastian-t440> <1392507258.21195.21.camel@sebastian-t440> Message-ID: On Sat, Feb 15, 2014 at 6:34 PM, Sebastian Berg wrote: > On Sa, 2014-02-15 at 18:20 -0500, alex wrote: > >> >> I'm not sure exactly what you mean by this. You are suggesting that >> if the svd fails with some kind of exception (possibly poorly or >> misleadingly worded) then it could be cleaned-up after the fact by >> checking the input, and that this would not incur the speed penalty >> because no check will be done if the svd succeeds? This would not >> work on my system because that svd call really does hang, as in some >> non-ctrl-c-interruptable spin lock inside fortran code or something. >> I think the behavior is undefined and it can crash although I do not >> personally have an example of this. These modes of failure cannot be >> recovered from as easily as recovering from an exception. >> > > Yeah, I meant that. But it has a big "if", that the failure is basically > a bug in the library you happen to be using and extremely uncommon. On my system the lapack bundled with numpy hangs. From dfreese at stanford.edu Sun Feb 16 12:13:40 2014 From: dfreese at stanford.edu (David Freese) Date: Sun, 16 Feb 2014 09:13:40 -0800 Subject: [Numpy-discussion] Requesting Code Review of nanmedian ENH Message-ID: Hi everyone, I put together a np.nanmedian function to extend np.median to handle nans. Could someone review this code and give me some feedback on it before I submit a pull request for it? https://github.com/dfreese/numpy/compare/master...feature;nanmedian Thanks, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Sun Feb 16 12:25:08 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Sun, 16 Feb 2014 18:25:08 +0100 Subject: [Numpy-discussion] Requesting Code Review of nanmedian ENH In-Reply-To: References: Message-ID: hi david, I havnt run the code; but the _replace_nan(0) call worries me; especially considering that the unit tests seem to deal with positive numbers exclusively. Have you tested with mixed positive/negative inputs? On Sun, Feb 16, 2014 at 6:13 PM, David Freese wrote: > Hi everyone, > > I put together a np.nanmedian function to extend np.median to handle nans. > Could someone review this code and give me some feedback on it before I > submit a pull request for it? > > https://github.com/dfreese/numpy/compare/master...feature;nanmedian > > Thanks, > David > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From argriffi at ncsu.edu Sun Feb 16 12:52:31 2014 From: argriffi at ncsu.edu (alex) Date: Sun, 16 Feb 2014 12:52:31 -0500 Subject: [Numpy-discussion] Requesting Code Review of nanmedian ENH In-Reply-To: References: Message-ID: On Sun, Feb 16, 2014 at 12:13 PM, David Freese wrote: > Hi everyone, > > I put together a np.nanmedian function to extend np.median to handle nans. > Could someone review this code and give me some feedback on it before I > submit a pull request for it? It looks good to submit as a pull request but probably will need some changes like the mixed sign thing already mentioned, and I see mean vs. median copypaste remnants in the docstring. From dfreese at stanford.edu Sun Feb 16 13:01:24 2014 From: dfreese at stanford.edu (David Freese) Date: Sun, 16 Feb 2014 10:01:24 -0800 Subject: [Numpy-discussion] Requesting Code Review of nanmedian ENH In-Reply-To: References: Message-ID: the 0s put into the array copy "arr" are not used in computation. The _replace_nan call is used primarily to generate a mask of the NaNs and make sure it passes the mutation test. I updated the unit tests to reflect negative values, which works. (and the documentation should be cleaned up now) https://github.com/dfreese/numpy/compare/master...feature;nanmedian On Sun, Feb 16, 2014 at 9:52 AM, alex wrote: > On Sun, Feb 16, 2014 at 12:13 PM, David Freese > wrote: > > Hi everyone, > > > > I put together a np.nanmedian function to extend np.median to handle > nans. > > Could someone review this code and give me some feedback on it before I > > submit a pull request for it? > > It looks good to submit as a pull request but probably will need some > changes like the mixed sign thing already mentioned, and I see mean > vs. median copypaste remnants in the docstring. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From argriffi at ncsu.edu Sun Feb 16 13:34:03 2014 From: argriffi at ncsu.edu (alex) Date: Sun, 16 Feb 2014 13:34:03 -0500 Subject: [Numpy-discussion] Requesting Code Review of nanmedian ENH In-Reply-To: References: Message-ID: > It doesn't deal with numpy.matrix in the same way as numpy.nanmean. never mind about this -- it looks like np.median is currently broken for np.matrix. From argriffi at ncsu.edu Sun Feb 16 13:11:12 2014 From: argriffi at ncsu.edu (alex) Date: Sun, 16 Feb 2014 13:11:12 -0500 Subject: [Numpy-discussion] Requesting Code Review of nanmedian ENH In-Reply-To: References: Message-ID: On Sun, Feb 16, 2014 at 1:01 PM, David Freese wrote: > the 0s put into the array copy "arr" are not used in computation. The > _replace_nan call is used primarily to generate a mask of the NaNs and make > sure it passes the mutation test. I updated the unit tests to reflect > negative values, which works. (and the documentation should be cleaned up > now) > > https://github.com/dfreese/numpy/compare/master...feature;nanmedian It doesn't deal with numpy.matrix in the same way as numpy.nanmean. For example """ >>> import numpy as np >>> m = np.matrix([[np.nan, np.nan, 1]]) >>> np.isscalar(np.nanmean(m)) True >>> np.isscalar(np.nanmedian(m)) False """ Some of the nanfunctions.py code has comments regarding carefulness in dealing with subclasses of numpy.ndarray (like numpy.matrix), and some of the nanfunctions tests include tests for this kind of behavior. From charlesr.harris at gmail.com Sun Feb 16 14:05:12 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 16 Feb 2014 12:05:12 -0700 Subject: [Numpy-discussion] Removal of doc/cython and doc/pyrex Message-ID: Apropos #2384 , Ralph has suggested removing the doc/pyrex directory as irrelevant at this point. Along the same lines, we could remove doc/cython as cython now has good numpy support. There is also doc/swig, which contains numpy.i, and which we probably need to retain, but maybe put somewhere else. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jenny.stone125 at gmail.com Sun Feb 16 16:34:35 2014 From: jenny.stone125 at gmail.com (Jennifer stone) Date: Mon, 17 Feb 2014 03:04:35 +0530 Subject: [Numpy-discussion] Suggestions for GSoC Projects In-Reply-To: References: Message-ID: On Wed, Feb 12, 2014 at 2:11 AM, Pauli Virtanen wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > Hi > > It's not so often someone wants to work on scipy.special, so you'd be > welcome to improve it :) > > That's great! Thanks a lot for your guidance. . > > - > > Spherical harmonics might be a reasonable part of a GSoC proposal. > However, note that there exists also a *second* Legendre polynomial > function `lpmv`, which doesn't store the values of the previous N > functions. There's one numerical problem in the current way of > evaluation via ~Pmn(cos(theta)), which is that this approach seems to > lose relatively much precision at large orders for certain values of > theta. I don't recall now exactly how imprecise it becomes at large > orders, but it may be necessary to check. > > I checked lpmv and lpmn. As you rightly said, lpmv avoids the storage of values small N's by using recursion. Why can't we first check if m and n are n positive integers and pass them to lpmv itself rather than lpmn? lpmn does give us the derivatives too, but sph_harm has no need for that, and of course all the values for Adding new special functions also sounds like an useful project. Here, > it helps if they are something that you expect you will need later on :) > I am unable to think beyond ellipsoidal functions. As for their use, we as students used them in problems of thermal equilibrium in ellipsoidal bodies, and some scattering cases. Though I have no idea if its use in general is quite prominent. And cylindrical harmonic function would be useful but I feel it's quite straight forward to implement (correct me if I am wrong) . > > > There's also the case that several of the functions in Scipy have only > implementations for real-valued inputs, although the functions would > be defined on the whole complex plane. A list of the situation is here: > > https://github.com/scipy/scipy/blob/master/scipy/special > /generate_ufuncs.py#L85 > > Lowercase d correspond to real-valued implementations, uppercase D to > complex-valued. I'm not at the moment completely sure which would have > the highest priority --- whether you need this or not really depends > on the application. > > > If you want additional ideas about possible things to fix in > scipy.special, take a look at this file: > > https://github.com/scipy/scipy/blob/master/scipy/special/tests > /test_mpmath.py#L648 > > The entries marked @knownfailure* have some undiagnosed issues in the > implementation, which might be useful to look into. However: most of > these have to do with corner cases in hypergeometric functions. Trying > to address those is likely a risky GSoC topic, as the multi-argument > hyp* functions are challenging to evaluate in floating point. (mpmath > and Mathematica can evaluate them in most parameter regimes, but AFAIK > both require arbitrary-precision methods for this.) > Yeah, many of the known failures seem to revolve around hyp2f1. An unexplained inclination towards hypergeometric functions really tempts me to plunge into this. If it's too risky, I can work on this after the summers, as I would have gained quite a lot of experience with the code here. So I think there would be a large number of possible things to do > here, and help would be appreciated. > > - -- > Pauli Virtanen > -----BEGIN PGP SIGNATURE----- > Version: GnuPG v1.4.14 (GNU/Linux) > > iEYEARECAAYFAlL6iwAACgkQ6BQxb7O0pWBfOgCfYHAB12N4FWDmrqx8/ORTBRps > pXYAoL3ufAiShe+0qTEGfEvrmDgr1X0p > =kAwF > -----END PGP SIGNATURE----- > Regard Jennifer -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Sun Feb 16 17:08:42 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 16 Feb 2014 23:08:42 +0100 Subject: [Numpy-discussion] ANN: Scipy 0.13.3 release In-Reply-To: References: Message-ID: Binaries are now available on SourceForge. Ralf On Tue, Feb 4, 2014 at 8:16 AM, Ralf Gommers wrote: > Hi, > > I'm happy to announce the availability of the scipy 0.13.3 release. This > is a bugfix only release; it contains fixes for regressions in ndimage and > weave. > > Source tarballs can be found at > https://sourceforge.net/projects/scipy/files/scipy/0.13.3/ and on PyPi. > Release notes copied below, binaries will follow later (the regular build > machine is not available for the next two weeks). > > Cheers, > Ralf > > > > ========================== > SciPy 0.13.3 Release Notes > ========================== > > SciPy 0.13.3 is a bug-fix release with no new features compared to 0.13.2. > Both the weave and the ndimage.label bugs were severe regressions in > 0.13.0, > hence this release. > > Issues fixed > ------------ > - 3148: fix a memory leak in ``ndimage.label``. > - 3216: fix weave issue with too long file names for MSVC. > > Other changes > ------------- > - Update Sphinx theme used for html docs so ``>>>`` in examples can be > toggled. > > Checksums > ========= > 0547c1f8e8afad4009cc9b5ef17a2d4d release/installers/scipy-0.13.3.tar.gz > 20ff3a867cc5925ef1d654aed2ff7e88 release/installers/scipy-0.13.3.zip > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun Feb 16 17:43:24 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 16 Feb 2014 17:43:24 -0500 Subject: [Numpy-discussion] argsort speed Message-ID: currently using numpy 1.6.1 What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? Is there a reason that np.argsort is almost 3 times slower than np.sort? I'm doing semi-systematic timing for a stats(models) algorithm. Josef From hoogendoorn.eelco at gmail.com Sun Feb 16 17:50:04 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Sun, 16 Feb 2014 23:50:04 +0100 Subject: [Numpy-discussion] argsort speed In-Reply-To: References: Message-ID: <783F8977825F44CEA82527954067F07D@EelcoRetina> My guess; First of all, you are actually manipulating twice as much data as opposed to an inplace sort. Moreover, an inplace sort gains locality as it is being sorted, whereas the argsort is continuously making completely random memory accesses. -----Original Message----- From: josef.pktd at gmail.com Sent: Sunday, February 16, 2014 11:43 PM To: Discussion of Numerical Python Subject: [Numpy-discussion] argsort speed currently using numpy 1.6.1 What's the fastest argsort for a 1d array with around 28 Million elements, roughly uniformly distributed, random order? Is there a reason that np.argsort is almost 3 times slower than np.sort? I'm doing semi-systematic timing for a stats(models) algorithm. Josef _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From davidmenhur at gmail.com Sun Feb 16 18:12:27 2014 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Mon, 17 Feb 2014 00:12:27 +0100 Subject: [Numpy-discussion] argsort speed In-Reply-To: References: Message-ID: On 16 February 2014 23:43, wrote: > What's the fastest argsort for a 1d array with around 28 Million > elements, roughly uniformly distributed, random order? > On numpy latest version: for kind in ['quicksort', 'mergesort', 'heapsort']: print kind %timeit np.sort(data, kind=kind) %timeit np.argsort(data, kind=kind) quicksort 1 loops, best of 3: 3.55 s per loop 1 loops, best of 3: 10.3 s per loop mergesort 1 loops, best of 3: 4.84 s per loop 1 loops, best of 3: 9.49 s per loop heapsort 1 loops, best of 3: 12.1 s per loop 1 loops, best of 3: 39.3 s per loop It looks quicksort is quicker sorting, but mergesort is marginally faster sorting args. The diference is slim, but upon repetition, it remains significant. Why is that? Probably part of the reason is what Eelco said, and part is that in sort comparison are done accessing the array elements directly, but in argsort you have to index the array, introducing some overhead. I seem unable to find the code for ndarray.sort, so I can't check. I have tried to grep it tring all possible combinations of "def ndarray", "self.sort", etc. Where is it? /David. -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidmenhur at gmail.com Sun Feb 16 18:14:44 2014 From: davidmenhur at gmail.com (=?UTF-8?B?RGHPgGlk?=) Date: Mon, 17 Feb 2014 00:14:44 +0100 Subject: [Numpy-discussion] argsort speed In-Reply-To: References: Message-ID: On 17 February 2014 00:12, Da?id wrote: > I seem unable to find the code for ndarray.sort, so I can't check. I have > tried to grep it tring all possible combinations of "def ndarray", > "self.sort", etc. Where is it? Nevermind, it is in core/src/multiarray/methods.c -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun Feb 16 18:15:45 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 16 Feb 2014 18:15:45 -0500 Subject: [Numpy-discussion] argsort speed In-Reply-To: <783F8977825F44CEA82527954067F07D@EelcoRetina> References: <783F8977825F44CEA82527954067F07D@EelcoRetina> Message-ID: On Sun, Feb 16, 2014 at 5:50 PM, Eelco Hoogendoorn wrote: > > My guess; > > First of all, you are actually manipulating twice as much data as opposed to > an inplace sort. > > Moreover, an inplace sort gains locality as it is being sorted, whereas the > argsort is continuously making completely random memory accesses. > > > -----Original Message----- > From: josef.pktd at gmail.com > Sent: Sunday, February 16, 2014 11:43 PM > To: Discussion of Numerical Python > Subject: [Numpy-discussion] argsort speed > > currently using numpy 1.6.1 > > What's the fastest argsort for a 1d array with around 28 Million > elements, roughly uniformly distributed, random order? > > Is there a reason that np.argsort is almost 3 times slower than np.sort? > > I'm doing semi-systematic timing for a stats(models) algorithm. I was using np.sort, inplace sort is only a little bit faster It looks like sorting first, and then argsorting is faster than argsort alon pvals.sort() sortind = np.argsort(pvals) replacing the inplace sort in the above reduces speed only a bit -------------- import time use_master = True if use_master: import sys sys.path.insert(0, r"E:\Josef\!reps\numpy\dist\Programs\Python27\Lib\site-packages") import numpy as np print "np.__version__ =", np.__version__ n = 5300 pvals = np.random.rand(n**2) t0 = time.time() p = np.sort(pvals) t1 = time.time() sortind = np.argsort(pvals) t2 = time.time() pvals.sort() sortind = np.argsort(pvals) t3 = time.time() print t1 - t0, t2 - t1, t3 - t2 print (t2 - t1) * 1. / (t1 - t0), (t3 - t2) * 1. / (t1 - t0) ------------ np.__version__ = 1.9.0.dev-2868dc4 3.91900014877 9.5569999218 4.92900013924 2.43863219163 1.2577187936 Josef > > Josef > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From josef.pktd at gmail.com Sun Feb 16 18:18:03 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 16 Feb 2014 18:18:03 -0500 Subject: [Numpy-discussion] argsort speed In-Reply-To: References: <783F8977825F44CEA82527954067F07D@EelcoRetina> Message-ID: On Sun, Feb 16, 2014 at 6:15 PM, wrote: > On Sun, Feb 16, 2014 at 5:50 PM, Eelco Hoogendoorn > wrote: >> >> My guess; >> >> First of all, you are actually manipulating twice as much data as opposed to >> an inplace sort. >> >> Moreover, an inplace sort gains locality as it is being sorted, whereas the >> argsort is continuously making completely random memory accesses. >> >> >> -----Original Message----- >> From: josef.pktd at gmail.com >> Sent: Sunday, February 16, 2014 11:43 PM >> To: Discussion of Numerical Python >> Subject: [Numpy-discussion] argsort speed >> >> currently using numpy 1.6.1 >> >> What's the fastest argsort for a 1d array with around 28 Million >> elements, roughly uniformly distributed, random order? >> >> Is there a reason that np.argsort is almost 3 times slower than np.sort? >> >> I'm doing semi-systematic timing for a stats(models) algorithm. > > I was using np.sort, inplace sort is only a little bit faster > > It looks like sorting first, and then argsorting is faster than argsort alon > > pvals.sort() > sortind = np.argsort(pvals) Ok, that was useless, that won't be anything I want. Josef > > replacing the inplace sort in the above reduces speed only a bit > > -------------- > import time > use_master = True > if use_master: > import sys > sys.path.insert(0, > r"E:\Josef\!reps\numpy\dist\Programs\Python27\Lib\site-packages") > > import numpy as np > print "np.__version__ =", np.__version__ > > n = 5300 > pvals = np.random.rand(n**2) > > t0 = time.time() > > p = np.sort(pvals) > t1 = time.time() > > sortind = np.argsort(pvals) > t2 = time.time() > > pvals.sort() > sortind = np.argsort(pvals) > t3 = time.time() > > print t1 - t0, t2 - t1, t3 - t2 > print (t2 - t1) * 1. / (t1 - t0), (t3 - t2) * 1. / (t1 - t0) > ------------ > > np.__version__ = 1.9.0.dev-2868dc4 > 3.91900014877 9.5569999218 4.92900013924 > 2.43863219163 1.2577187936 > > Josef > > >> >> Josef >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion From pav at iki.fi Sun Feb 16 18:25:18 2014 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 17 Feb 2014 01:25:18 +0200 Subject: [Numpy-discussion] Suggestions for GSoC Projects In-Reply-To: References: Message-ID: 16.02.2014 23:34, Jennifer stone kirjoitti: [clip] > Yeah, many of the known failures seem to revolve around hyp2f1. An > unexplained inclination towards hypergeometric functions really > tempts me to plunge into this. If it's too risky, I can work on > this after the summers, as I would have gained quite a lot of > experience with the code here. If you are interested in the hypergeometric numerical evaluation, it's probably a good idea to take a look at this recent master's thesis written on the problem: http://people.maths.ox.ac.uk/porterm/research/pearson_final.pdf This may give some systematic overview on the range of methods available. (Note that for copyright reasons, it's not a good idea to look closely at the source codes linked from that thesis, as they are not available under a compatible license.) It may well be that the best approach for evaluating these functions, if accuracy in the whole parameter range is wanted, in the end turns out to require arbitrary-precision computations. In that case, it would be a very good idea to look at how the problem is approached in mpmath. There are existing multiprecision packages written in C, and using one of them in scipy.special could bring better evaluation performance even if the algorithm is the same. -- Pauli Virtanen From josef.pktd at gmail.com Sun Feb 16 19:08:29 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 16 Feb 2014 19:08:29 -0500 Subject: [Numpy-discussion] argsort speed In-Reply-To: References: Message-ID: On Sun, Feb 16, 2014 at 6:12 PM, Da?id wrote: > On 16 February 2014 23:43, wrote: >> >> What's the fastest argsort for a 1d array with around 28 Million >> elements, roughly uniformly distributed, random order? > > > On numpy latest version: > > for kind in ['quicksort', 'mergesort', 'heapsort']: > print kind > %timeit np.sort(data, kind=kind) > %timeit np.argsort(data, kind=kind) > > > quicksort > 1 loops, best of 3: 3.55 s per loop > 1 loops, best of 3: 10.3 s per loop > mergesort > 1 loops, best of 3: 4.84 s per loop > 1 loops, best of 3: 9.49 s per loop > heapsort > 1 loops, best of 3: 12.1 s per loop > 1 loops, best of 3: 39.3 s per loop > > > It looks quicksort is quicker sorting, but mergesort is marginally faster > sorting args. The diference is slim, but upon repetition, it remains > significant. > > Why is that? Probably part of the reason is what Eelco said, and part is > that in sort comparison are done accessing the array elements directly, but > in argsort you have to index the array, introducing some overhead. Thanks, both. I also gain a second with mergesort. matlab would be nicer in my case, it returns both. I still need to use the argsort to index into the array to also get the sorted array. Josef > > I seem unable to find the code for ndarray.sort, so I can't check. I have > tried to grep it tring all possible combinations of "def ndarray", > "self.sort", etc. Where is it? > > > /David. > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Sun Feb 16 19:13:29 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 16 Feb 2014 17:13:29 -0700 Subject: [Numpy-discussion] argsort speed In-Reply-To: References: <783F8977825F44CEA82527954067F07D@EelcoRetina> Message-ID: On Sun, Feb 16, 2014 at 4:18 PM, wrote: > On Sun, Feb 16, 2014 at 6:15 PM, wrote: > > On Sun, Feb 16, 2014 at 5:50 PM, Eelco Hoogendoorn > > wrote: > >> > >> My guess; > >> > >> First of all, you are actually manipulating twice as much data as > opposed to > >> an inplace sort. > >> > >> Moreover, an inplace sort gains locality as it is being sorted, whereas > the > >> argsort is continuously making completely random memory accesses. > >> > >> > >> -----Original Message----- > >> From: josef.pktd at gmail.com > >> Sent: Sunday, February 16, 2014 11:43 PM > >> To: Discussion of Numerical Python > >> Subject: [Numpy-discussion] argsort speed > >> > >> currently using numpy 1.6.1 > >> > >> What's the fastest argsort for a 1d array with around 28 Million > >> elements, roughly uniformly distributed, random order? > >> > >> Is there a reason that np.argsort is almost 3 times slower than np.sort? > >> > >> I'm doing semi-systematic timing for a stats(models) algorithm. > > > > I was using np.sort, inplace sort is only a little bit faster > > > > It looks like sorting first, and then argsorting is faster than argsort > alon > > > > pvals.sort() > > sortind = np.argsort(pvals) > > Ok, that was useless, that won't be anything I want. > > I think locality is the most important thing. The argsort routines used to move both the indexes and the array to argsort, the new ones only move the indexes. It is a tradeoff, twice as many moves vs locality. It's probably possible to invent an algorithm that mixes the two. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun Feb 16 19:44:07 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 16 Feb 2014 19:44:07 -0500 Subject: [Numpy-discussion] argsort speed In-Reply-To: References: <783F8977825F44CEA82527954067F07D@EelcoRetina> Message-ID: On Sun, Feb 16, 2014 at 7:13 PM, Charles R Harris wrote: > > > > On Sun, Feb 16, 2014 at 4:18 PM, wrote: >> >> On Sun, Feb 16, 2014 at 6:15 PM, wrote: >> > On Sun, Feb 16, 2014 at 5:50 PM, Eelco Hoogendoorn >> > wrote: >> >> >> >> My guess; >> >> >> >> First of all, you are actually manipulating twice as much data as >> >> opposed to >> >> an inplace sort. >> >> >> >> Moreover, an inplace sort gains locality as it is being sorted, whereas >> >> the >> >> argsort is continuously making completely random memory accesses. >> >> >> >> >> >> -----Original Message----- >> >> From: josef.pktd at gmail.com >> >> Sent: Sunday, February 16, 2014 11:43 PM >> >> To: Discussion of Numerical Python >> >> Subject: [Numpy-discussion] argsort speed >> >> >> >> currently using numpy 1.6.1 >> >> >> >> What's the fastest argsort for a 1d array with around 28 Million >> >> elements, roughly uniformly distributed, random order? >> >> >> >> Is there a reason that np.argsort is almost 3 times slower than >> >> np.sort? >> >> >> >> I'm doing semi-systematic timing for a stats(models) algorithm. >> > >> > I was using np.sort, inplace sort is only a little bit faster >> > >> > It looks like sorting first, and then argsorting is faster than argsort >> > alon >> > >> > pvals.sort() >> > sortind = np.argsort(pvals) >> >> Ok, that was useless, that won't be anything I want. >> > > I think locality is the most important thing. The argsort routines used to > move both the indexes and the array to argsort, the new ones only move the > indexes. It is a tradeoff, twice as many moves vs locality. It's probably > possible to invent an algorithm that mixes the two. If that's the way it is, then that's the way it is. I just never realized that argsort can take so long, since I usually use only smaller arrays. I was surprised that argsort took almost 10 out of around 12 seconds in my function, and this after I cleaned up my code and removed duplicate and avoidable argsorts. Josef > > > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Sun Feb 16 20:04:30 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 16 Feb 2014 18:04:30 -0700 Subject: [Numpy-discussion] argsort speed In-Reply-To: References: Message-ID: On Sun, Feb 16, 2014 at 4:12 PM, Da?id wrote: > On 16 February 2014 23:43, wrote: > >> What's the fastest argsort for a 1d array with around 28 Million >> elements, roughly uniformly distributed, random order? >> > > On numpy latest version: > > for kind in ['quicksort', 'mergesort', 'heapsort']: > print kind > %timeit np.sort(data, kind=kind) > %timeit np.argsort(data, kind=kind) > > > quicksort > 1 loops, best of 3: 3.55 s per loop > 1 loops, best of 3: 10.3 s per loop > mergesort > 1 loops, best of 3: 4.84 s per loop > 1 loops, best of 3: 9.49 s per loop > heapsort > 1 loops, best of 3: 12.1 s per loop > 1 loops, best of 3: 39.3 s per loop > > Interesting. I get slightly different results quicksort 1 loops, best of 3: 3.25 s per loop 1 loops, best of 3: 6.16 s per loop mergesort 1 loops, best of 3: 3.99 s per loop 1 loops, best of 3: 6.97 s per loop heapsort 1 loops, best of 3: 10.1 s per loop 1 loops, best of 3: 29.3 s per loop Possibly faster memory here. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From francesc at continuum.io Mon Feb 17 09:18:20 2014 From: francesc at continuum.io (Francesc Alted) Date: Mon, 17 Feb 2014 15:18:20 +0100 Subject: [Numpy-discussion] argsort speed In-Reply-To: References: Message-ID: <53021A2C.5020601@continuum.io> On 2/17/14, 1:08 AM, josef.pktd at gmail.com wrote: > On Sun, Feb 16, 2014 at 6:12 PM, Da?id wrote: >> On 16 February 2014 23:43, wrote: >>> What's the fastest argsort for a 1d array with around 28 Million >>> elements, roughly uniformly distributed, random order? >> >> On numpy latest version: >> >> for kind in ['quicksort', 'mergesort', 'heapsort']: >> print kind >> %timeit np.sort(data, kind=kind) >> %timeit np.argsort(data, kind=kind) >> >> >> quicksort >> 1 loops, best of 3: 3.55 s per loop >> 1 loops, best of 3: 10.3 s per loop >> mergesort >> 1 loops, best of 3: 4.84 s per loop >> 1 loops, best of 3: 9.49 s per loop >> heapsort >> 1 loops, best of 3: 12.1 s per loop >> 1 loops, best of 3: 39.3 s per loop >> >> >> It looks quicksort is quicker sorting, but mergesort is marginally faster >> sorting args. The diference is slim, but upon repetition, it remains >> significant. >> >> Why is that? Probably part of the reason is what Eelco said, and part is >> that in sort comparison are done accessing the array elements directly, but >> in argsort you have to index the array, introducing some overhead. > Thanks, both. > > I also gain a second with mergesort. > > matlab would be nicer in my case, it returns both. > I still need to use the argsort to index into the array to also get > the sorted array. Many years ago I needed something similar, so I made some functions for sorting and argsorting in one single shot. Maybe you want to reuse them. Here it is an example of the C implementation: https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619 and here the Cython wrapper for all of them: https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129 Francesc > > Josef > > >> I seem unable to find the code for ndarray.sort, so I can't check. I have >> tried to grep it tring all possible combinations of "def ndarray", >> "self.sort", etc. Where is it? >> >> >> /David. >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Francesc Alted From novin01 at gmail.com Mon Feb 17 04:49:34 2014 From: novin01 at gmail.com (Dave Hirschfeld) Date: Mon, 17 Feb 2014 09:49:34 +0000 (UTC) Subject: [Numpy-discussion] svd error checking vs. speed References: Message-ID: alex ncsu.edu> writes: > > Hello list, > > Here's another idea resurrection from numpy github comments that I've > been advised could be posted here for re-discussion. > > The proposal would be to make np.linalg.svd more like scipy.linalg.svd > with respect to input checking. The argument against the change is > raw speed; if you know that you will never feed non-finite input to > svd, then np.linalg.svd is a bit faster than scipy.linalg.svd. An > argument for the change could be to avoid issues reported on github > like crashes, hangs, spurious non-convergence exceptions, etc. from > the undefined behavior of svd of non-finite input. > > """ > [...] the following numpy code hangs until I `kill -9` it. > > ``` > $ python runtests.py --shell > $ python > Python 2.7.5+ > [GCC 4.8.1] on linux2 > >>> import numpy as np > >>> np.__version__ > '1.9.0.dev-e3f0f53' > >>> A = np.array([[1e3, 0], [0, 1]]) > >>> B = np.array([[1e300, 0], [0, 1]]) > >>> C = np.array([[1e3000, 0], [0, 1]]) > >>> np.linalg.svd(A) > (array([[ 1., 0.], > [ 0., 1.]]), array([ 1000., 1.]), array([[ 1., 0.], > [ 0., 1.]])) > >>> np.linalg.svd(B) > (array([[ 1., 0.], > [ 0., 1.]]), array([ 1.00000000e+300, 1.00000000e+000]), > array([[ 1., 0.], > [ 0., 1.]])) > >>> np.linalg.svd(C) > [hangs forever] > ``` > """ > > Alex > I'm -1 on checking finiteness - if there's one place you usually want maximum performance it's linear algebra operations. It certainly shouldn't crash or hang though and for me at least it doesn't - it returns NaN which immediately suggests to me that I've got bad input (maybe just because I've seen it before). I'm not sure adding an extra kwarg is worth cluttering up the api when a simple call to isfinite beforehand will do the job if you think you may potentially have non-finite input. Python 2.7.5 |Anaconda 1.8.0 (64-bit)| (default, Jul 1 2013, 12:37:52) [MSC v.1500 64 bit (AMD64)] In [1]: import numpy as np In [2]: >>> A = np.array([[1e3, 0], [0, 1]]) ...: >>> B = np.array([[1e300, 0], [0, 1]]) ...: >>> C = np.array([[1e3000, 0], [0, 1]]) ...: >>> np.linalg.svd(A) ...: Out[2]: (array([[ 1., 0.], [ 0., 1.]]), array([ 1000., 1.]), array([[ 1., 0.], [ 0., 1.]])) In [3]: np.linalg.svd(B) Out[3]: (array([[ 1., 0.], [ 0., 1.]]), array([ 1.0000e+300, 1.0000e+000]), array([[ 1., 0.], [ 0., 1.]])) In [4]: C Out[4]: array([[ inf, 0.], [ 0., 1.]]) In [5]: np.linalg.svd(C) Out[5]: (array([[ 0., 1.], [ 1., 0.]]), array([ nan, nan]), array([[ 0., 1.], [ 1., 0.]])) In [6]: np.__version__ Out[6]: '1.7.1' Regards, Dave From jason-sage at creativetrax.com Mon Feb 17 09:58:24 2014 From: jason-sage at creativetrax.com (Jason Grout) Date: Mon, 17 Feb 2014 08:58:24 -0600 Subject: [Numpy-discussion] svd error checking vs. speed In-Reply-To: References: Message-ID: <53022390.90503@creativetrax.com> On 2/15/14 3:37 PM, alex wrote: > The proposal would be to make np.linalg.svd more like scipy.linalg.svd > with respect to input checking. The argument against the change is > raw speed; if you know that you will never feed non-finite input to > svd, then np.linalg.svd is a bit faster than scipy.linalg.svd. An > argument for the change could be to avoid issues reported on github > like crashes, hangs, spurious non-convergence exceptions, etc. from > the undefined behavior of svd of non-finite input. For what my vote is worth, -1. I thought this was pretty much the designed difference between the scipy and numpy linalg routines. Scipy does the checking, and numpy provides the raw speed. Maybe this is better resolved as a note in the documentation for numpy about the assumptions for the input and a reference to the scipy implementation? That said, I don't extensively use the linalg.svd routine in practice, so I defer to those that use it. Thanks, Jason From sturla.molden at gmail.com Mon Feb 17 09:59:53 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 17 Feb 2014 14:59:53 +0000 (UTC) Subject: [Numpy-discussion] svd error checking vs. speed References: Message-ID: <1838889561414341786.087908sturla.molden-gmail.com@news.gmane.org> Dave Hirschfeld wrote: > It certainly shouldn't crash or hang though and for me at least it doesn't - > it returns NaN which immediately suggests to me that I've got bad input > (maybe just because I've seen it before). It might be dependent on the BLAS or LAPACK version. Since you are on Anaconda, I assume you are on MKL. But can we expect f2c lapack-lite and blas-lite to be equally well behaved? Sturla From sturla.molden at gmail.com Mon Feb 17 10:04:43 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 17 Feb 2014 15:04:43 +0000 (UTC) Subject: [Numpy-discussion] svd error checking vs. speed References: <53022390.90503@creativetrax.com> Message-ID: <1909954843414342097.852573sturla.molden-gmail.com@news.gmane.org> Jason Grout wrote: > For what my vote is worth, -1. I thought this was pretty much the > designed difference between the scipy and numpy linalg routines. Scipy > does the checking, and numpy provides the raw speed. Maybe this is > better resolved as a note in the documentation for numpy about the > assumptions for the input and a reference to the scipy implementation? I think if there is a stability issue, we should find out which LAPACK or BLAS versions are affected, and then decide what to do with it. No NumPy functions should arbitrarily hang forever. I would consider that a bug. Sturla From josef.pktd at gmail.com Mon Feb 17 10:11:32 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 17 Feb 2014 10:11:32 -0500 Subject: [Numpy-discussion] argsort speed In-Reply-To: <53021A2C.5020601@continuum.io> References: <53021A2C.5020601@continuum.io> Message-ID: On Mon, Feb 17, 2014 at 9:18 AM, Francesc Alted wrote: > On 2/17/14, 1:08 AM, josef.pktd at gmail.com wrote: >> On Sun, Feb 16, 2014 at 6:12 PM, Da?id wrote: >>> On 16 February 2014 23:43, wrote: >>>> What's the fastest argsort for a 1d array with around 28 Million >>>> elements, roughly uniformly distributed, random order? >>> >>> On numpy latest version: >>> >>> for kind in ['quicksort', 'mergesort', 'heapsort']: >>> print kind >>> %timeit np.sort(data, kind=kind) >>> %timeit np.argsort(data, kind=kind) >>> >>> >>> quicksort >>> 1 loops, best of 3: 3.55 s per loop >>> 1 loops, best of 3: 10.3 s per loop >>> mergesort >>> 1 loops, best of 3: 4.84 s per loop >>> 1 loops, best of 3: 9.49 s per loop >>> heapsort >>> 1 loops, best of 3: 12.1 s per loop >>> 1 loops, best of 3: 39.3 s per loop >>> >>> >>> It looks quicksort is quicker sorting, but mergesort is marginally faster >>> sorting args. The diference is slim, but upon repetition, it remains >>> significant. >>> >>> Why is that? Probably part of the reason is what Eelco said, and part is >>> that in sort comparison are done accessing the array elements directly, but >>> in argsort you have to index the array, introducing some overhead. >> Thanks, both. >> >> I also gain a second with mergesort. >> >> matlab would be nicer in my case, it returns both. >> I still need to use the argsort to index into the array to also get >> the sorted array. > > Many years ago I needed something similar, so I made some functions for > sorting and argsorting in one single shot. Maybe you want to reuse > them. Here it is an example of the C implementation: > > https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619 > > and here the Cython wrapper for all of them: > > https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129 Thanks Francesc That would be very useful to have. However, I don't speak C, and it would require too much maintenance time if I were to include that in statsmodels. I'm trying to concentrate more on stats (and my backlog there), and leave other parts to developers that know those better. Josef > > Francesc > >> >> Josef >> >> >>> I seem unable to find the code for ndarray.sort, so I can't check. I have >>> tried to grep it tring all possible combinations of "def ndarray", >>> "self.sort", etc. Where is it? >>> >>> >>> /David. >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > -- > Francesc Alted > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From argriffi at ncsu.edu Mon Feb 17 10:03:38 2014 From: argriffi at ncsu.edu (alex) Date: Mon, 17 Feb 2014 10:03:38 -0500 Subject: [Numpy-discussion] svd error checking vs. speed In-Reply-To: References: Message-ID: On Mon, Feb 17, 2014 at 4:49 AM, Dave Hirschfeld wrote: > alex ncsu.edu> writes: > >> >> Hello list, >> >> Here's another idea resurrection from numpy github comments that I've >> been advised could be posted here for re-discussion. >> >> The proposal would be to make np.linalg.svd more like scipy.linalg.svd >> with respect to input checking. The argument against the change is >> raw speed; if you know that you will never feed non-finite input to >> svd, then np.linalg.svd is a bit faster than scipy.linalg.svd. An >> argument for the change could be to avoid issues reported on github >> like crashes, hangs, spurious non-convergence exceptions, etc. from >> the undefined behavior of svd of non-finite input. >> >> """ >> [...] the following numpy code hangs until I `kill -9` it. >> >> ``` >> $ python runtests.py --shell >> $ python >> Python 2.7.5+ >> [GCC 4.8.1] on linux2 >> >>> import numpy as np >> >>> np.__version__ >> '1.9.0.dev-e3f0f53' >> >>> A = np.array([[1e3, 0], [0, 1]]) >> >>> B = np.array([[1e300, 0], [0, 1]]) >> >>> C = np.array([[1e3000, 0], [0, 1]]) >> >>> np.linalg.svd(A) >> (array([[ 1., 0.], >> [ 0., 1.]]), array([ 1000., 1.]), array([[ 1., 0.], >> [ 0., 1.]])) >> >>> np.linalg.svd(B) >> (array([[ 1., 0.], >> [ 0., 1.]]), array([ 1.00000000e+300, 1.00000000e+000]), >> array([[ 1., 0.], >> [ 0., 1.]])) >> >>> np.linalg.svd(C) >> [hangs forever] >> ``` >> """ >> >> Alex >> > > I'm -1 on checking finiteness - if there's one place you usually want > maximum performance it's linear algebra operations. > > It certainly shouldn't crash or hang though and for me at least it doesn't - > it returns NaN btw when I use the python/numpy/openblas packaged for ubuntu, I also get NaN. The infinite loop appears when I build numpy letting it use its lapack lite. I don't know which LAPACK Josef uses to get the weird behavior he observes "13% cpu usage for a hanging process". This is consistent with the scipy svd docstring describing its check_finite flag, where it warns "Disabling may give a performance gain, but may result in problems (crashes, non-termination) if the inputs do contain infinities or NaNs." I think this caveat also applies to most numpy linalg functions that connect more or less directly to lapack. From josef.pktd at gmail.com Mon Feb 17 10:41:33 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 17 Feb 2014 10:41:33 -0500 Subject: [Numpy-discussion] svd error checking vs. speed In-Reply-To: References: Message-ID: On Mon, Feb 17, 2014 at 10:03 AM, alex wrote: > On Mon, Feb 17, 2014 at 4:49 AM, Dave Hirschfeld wrote: >> alex ncsu.edu> writes: >> >>> >>> Hello list, >>> >>> Here's another idea resurrection from numpy github comments that I've >>> been advised could be posted here for re-discussion. >>> >>> The proposal would be to make np.linalg.svd more like scipy.linalg.svd >>> with respect to input checking. The argument against the change is >>> raw speed; if you know that you will never feed non-finite input to >>> svd, then np.linalg.svd is a bit faster than scipy.linalg.svd. An >>> argument for the change could be to avoid issues reported on github >>> like crashes, hangs, spurious non-convergence exceptions, etc. from >>> the undefined behavior of svd of non-finite input. >>> >>> """ >>> [...] the following numpy code hangs until I `kill -9` it. >>> >>> ``` >>> $ python runtests.py --shell >>> $ python >>> Python 2.7.5+ >>> [GCC 4.8.1] on linux2 >>> >>> import numpy as np >>> >>> np.__version__ >>> '1.9.0.dev-e3f0f53' >>> >>> A = np.array([[1e3, 0], [0, 1]]) >>> >>> B = np.array([[1e300, 0], [0, 1]]) >>> >>> C = np.array([[1e3000, 0], [0, 1]]) >>> >>> np.linalg.svd(A) >>> (array([[ 1., 0.], >>> [ 0., 1.]]), array([ 1000., 1.]), array([[ 1., 0.], >>> [ 0., 1.]])) >>> >>> np.linalg.svd(B) >>> (array([[ 1., 0.], >>> [ 0., 1.]]), array([ 1.00000000e+300, 1.00000000e+000]), >>> array([[ 1., 0.], >>> [ 0., 1.]])) >>> >>> np.linalg.svd(C) >>> [hangs forever] >>> ``` >>> """ >>> >>> Alex >>> >> >> I'm -1 on checking finiteness - if there's one place you usually want >> maximum performance it's linear algebra operations. >> >> It certainly shouldn't crash or hang though and for me at least it doesn't - >> it returns NaN > > btw when I use the python/numpy/openblas packaged for ubuntu, I also > get NaN. The infinite loop appears when I build numpy letting it use > its lapack lite. I don't know which LAPACK Josef uses to get the > weird behavior he observes "13% cpu usage for a hanging process". I use official numpy release for development, Windows, 32bit python, i.e. MingW 3.5 and whatever old ATLAS the release includes. a constant 13% cpu usage is 1/8 th of my 8 virtual cores. If it were in a loop doing some work, then cpu usage fluctuates (between 12 and 13% in a busy loop). +/- 1 Josef > > This is consistent with the scipy svd docstring describing its > check_finite flag, where it warns "Disabling may give a performance > gain, but may result in problems (crashes, non-termination) if the > inputs do contain infinities or NaNs." I think this caveat also > applies to most numpy linalg functions that connect more or less > directly to lapack. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sturla.molden at gmail.com Mon Feb 17 11:02:15 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 17 Feb 2014 16:02:15 +0000 (UTC) Subject: [Numpy-discussion] svd error checking vs. speed References: Message-ID: <1556894915414345274.194992sturla.molden-gmail.com@news.gmane.org> wrote: > I use official numpy release for development, Windows, 32bit python, > i.e. MingW 3.5 and whatever old ATLAS the release includes. > > a constant 13% cpu usage is 1/8 th of my 8 virtual cores. Based on this and Alex' message it seems the offender is the f2c generated lapack_lite library. So do we do with lapack_lite? Should we patch it? Sturla From novin01 at gmail.com Mon Feb 17 11:35:12 2014 From: novin01 at gmail.com (Dave Hirschfeld) Date: Mon, 17 Feb 2014 16:35:12 +0000 (UTC) Subject: [Numpy-discussion] svd error checking vs. speed References: <1556894915414345274.194992sturla.molden-gmail.com@news.gmane.org> Message-ID: Sturla Molden gmail.com> writes: > > gmail.com> wrote: > > > I use official numpy release for development, Windows, 32bit python, > > i.e. MingW 3.5 and whatever old ATLAS the release includes. > > > > a constant 13% cpu usage is 1/8 th of my 8 virtual cores. > > Based on this and Alex' message it seems the offender is the f2c generated > lapack_lite library. > > So do we do with lapack_lite? Should we patch it? > > Sturla > Even if lapack_lite always performed the isfinite check and threw a python error if False, it would be much better than either hanging or segfaulting and people who care about the isfinite cost probably would be linking to a fast lapack anyway. -Dave From sturla.molden at gmail.com Mon Feb 17 13:09:22 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 17 Feb 2014 18:09:22 +0000 (UTC) Subject: [Numpy-discussion] svd error checking vs. speed References: <1556894915414345274.194992sturla.molden-gmail.com@news.gmane.org> Message-ID: <1458618056414352506.887385sturla.molden-gmail.com@news.gmane.org> Dave Hirschfeld wrote: > Even if lapack_lite always performed the isfinite check and threw a python > error if False, it would be much better than either hanging or segfaulting and > people who care about the isfinite cost probably would be linking to a fast > lapack anyway. +1 (if I have a vote) Correctness is always more important than speed. Segfaulting or hanging while burning the CPU is not something we should allow "by design". And those who need speed should in any case use a different lapack library instead. The easiest place to put a finiteness test is the check_object function here: https://github.com/numpy/numpy/blob/master/numpy/linalg/lapack_litemodule.c But in that case we should probably use a macro guard to leave it out if any other LAPACK than the builtin f2c version is used. Sturla From sturla.molden at gmail.com Mon Feb 17 13:24:24 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 17 Feb 2014 18:24:24 +0000 (UTC) Subject: [Numpy-discussion] svd error checking vs. speed References: <1556894915414345274.194992sturla.molden-gmail.com@news.gmane.org> <1458618056414352506.887385sturla.molden-gmail.com@news.gmane.org> Message-ID: <245746841414353707.772460sturla.molden-gmail.com@news.gmane.org> Sturla Molden wrote: > Dave Hirschfeld wrote: > >> Even if lapack_lite always performed the isfinite check and threw a python >> error if False, it would be much better than either hanging or segfaulting and >> people who care about the isfinite cost probably would be linking to a fast >> lapack anyway. > > +1 (if I have a vote) > > Correctness is always more important than speed. Segfaulting or hanging > while burning the CPU is not something we should allow "by design". And > those who need speed should in any case use a different lapack library > instead. The easiest place to put a finiteness test is the check_object > function here: > > https://github.com/numpy/numpy/blob/master/numpy/linalg/lapack_litemodule.c > > But in that case we should probably use a macro guard to leave it out if > any other LAPACK than the builtin f2c version is used. It seems even the more recent (3.4.x) versions of LAPACK have places where NANs can cause infinite loops. As long as this is an issue it might perhaps be worth checking everywhere. http://www.netlib.org/lapack/bug_list.html The semi-official C interface LAPACKE implements NAN checking as well: http://www.netlib.org/lapack/lapacke.html#_nan_checking If Intel's engineers put NAN checking inside LAPACKE it probably were for a good reason. Sturla From jtaylor.debian at googlemail.com Mon Feb 17 13:32:18 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Mon, 17 Feb 2014 19:32:18 +0100 Subject: [Numpy-discussion] argsort speed In-Reply-To: <53021A2C.5020601@continuum.io> References: <53021A2C.5020601@continuum.io> Message-ID: <530255B2.60509@googlemail.com> On 17.02.2014 15:18, Francesc Alted wrote: > On 2/17/14, 1:08 AM, josef.pktd at gmail.com wrote: >> On Sun, Feb 16, 2014 at 6:12 PM, Da?id wrote: >>> On 16 February 2014 23:43, wrote: >>>> What's the fastest argsort for a 1d array with around 28 Million >>>> elements, roughly uniformly distributed, random order? >>> >>> On numpy latest version: >>> >>> for kind in ['quicksort', 'mergesort', 'heapsort']: >>> print kind >>> %timeit np.sort(data, kind=kind) >>> %timeit np.argsort(data, kind=kind) >>> >>> >>> quicksort >>> 1 loops, best of 3: 3.55 s per loop >>> 1 loops, best of 3: 10.3 s per loop >>> mergesort >>> 1 loops, best of 3: 4.84 s per loop >>> 1 loops, best of 3: 9.49 s per loop >>> heapsort >>> 1 loops, best of 3: 12.1 s per loop >>> 1 loops, best of 3: 39.3 s per loop >>> >>> >>> It looks quicksort is quicker sorting, but mergesort is marginally faster >>> sorting args. The diference is slim, but upon repetition, it remains >>> significant. >>> >>> Why is that? Probably part of the reason is what Eelco said, and part is >>> that in sort comparison are done accessing the array elements directly, but >>> in argsort you have to index the array, introducing some overhead. >> Thanks, both. >> >> I also gain a second with mergesort. >> >> matlab would be nicer in my case, it returns both. >> I still need to use the argsort to index into the array to also get >> the sorted array. > > Many years ago I needed something similar, so I made some functions for > sorting and argsorting in one single shot. Maybe you want to reuse > them. Here it is an example of the C implementation: > > https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619 > > and here the Cython wrapper for all of them: > > https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129 > > Francesc > that doesn't really make a big difference if the data is randomly distributed. the sorting operation is normally much more expensive than latter applying the indices: In [1]: d = np.arange(10000000) In [2]: np.random.shuffle(d) In [3]: %timeit np.argsort(d) 1 loops, best of 3: 1.99 s per loop In [4]: idx = np.argsort(d) In [5]: %timeit d[idx] 1 loops, best of 3: 213 ms per loop But if your data is not random it can make a difference as even quicksort can be a lot faster then. timsort would be a nice addition to numpy, it performs very well for partially sorted data. Unfortunately its quite complicated to implement. From sturla.molden at gmail.com Mon Feb 17 13:35:20 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 17 Feb 2014 18:35:20 +0000 (UTC) Subject: [Numpy-discussion] svd error checking vs. speed References: <1392501368.21195.1.camel@sebastian-t440> Message-ID: <191918006414354392.582160sturla.molden-gmail.com@news.gmane.org> wrote: maybe -1 > > statsmodels is using np.linalg.pinv which uses svd > I never ran heard of any crash (*), and the only time I compared with > scipy I didn't like the slowdown. If you did care about speed in least-sqares fitting you would not call QR or SVD directly, but use the builting LAPACK least-squares drivers (*GELSS, *GELS, *GGGLM), which are much faster (I have checked), as well as use an optimized multi-core efficient LAPACK (e.g. Intel MKL). Any overhead from finiteness checking will be tiny compared to the Python/NumPy overhead your statsmodels code incurs, not to mention the overhead you get from using f2c lapack_lite. Sturla From charlesr.harris at gmail.com Mon Feb 17 13:40:50 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 17 Feb 2014 11:40:50 -0700 Subject: [Numpy-discussion] argsort speed In-Reply-To: <530255B2.60509@googlemail.com> References: <53021A2C.5020601@continuum.io> <530255B2.60509@googlemail.com> Message-ID: On Mon, Feb 17, 2014 at 11:32 AM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > On 17.02.2014 15:18, Francesc Alted wrote: > > On 2/17/14, 1:08 AM, josef.pktd at gmail.com wrote: > >> On Sun, Feb 16, 2014 at 6:12 PM, Da?id wrote: > >>> On 16 February 2014 23:43, wrote: > >>>> What's the fastest argsort for a 1d array with around 28 Million > >>>> elements, roughly uniformly distributed, random order? > >>> > >>> On numpy latest version: > >>> > >>> for kind in ['quicksort', 'mergesort', 'heapsort']: > >>> print kind > >>> %timeit np.sort(data, kind=kind) > >>> %timeit np.argsort(data, kind=kind) > >>> > >>> > >>> quicksort > >>> 1 loops, best of 3: 3.55 s per loop > >>> 1 loops, best of 3: 10.3 s per loop > >>> mergesort > >>> 1 loops, best of 3: 4.84 s per loop > >>> 1 loops, best of 3: 9.49 s per loop > >>> heapsort > >>> 1 loops, best of 3: 12.1 s per loop > >>> 1 loops, best of 3: 39.3 s per loop > >>> > >>> > >>> It looks quicksort is quicker sorting, but mergesort is marginally > faster > >>> sorting args. The diference is slim, but upon repetition, it remains > >>> significant. > >>> > >>> Why is that? Probably part of the reason is what Eelco said, and part > is > >>> that in sort comparison are done accessing the array elements > directly, but > >>> in argsort you have to index the array, introducing some overhead. > >> Thanks, both. > >> > >> I also gain a second with mergesort. > >> > >> matlab would be nicer in my case, it returns both. > >> I still need to use the argsort to index into the array to also get > >> the sorted array. > > > > Many years ago I needed something similar, so I made some functions for > > sorting and argsorting in one single shot. Maybe you want to reuse > > them. Here it is an example of the C implementation: > > > > https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619 > > > > and here the Cython wrapper for all of them: > > > > > https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129 > > > > Francesc > > > > that doesn't really make a big difference if the data is randomly > distributed. > the sorting operation is normally much more expensive than latter > applying the indices: > > In [1]: d = np.arange(10000000) > > In [2]: np.random.shuffle(d) > > In [3]: %timeit np.argsort(d) > 1 loops, best of 3: 1.99 s per loop > > In [4]: idx = np.argsort(d) > > In [5]: %timeit d[idx] > 1 loops, best of 3: 213 ms per loop > > > > But if your data is not random it can make a difference as even > quicksort can be a lot faster then. > timsort would be a nice addition to numpy, it performs very well for > partially sorted data. Unfortunately its quite complicated to implement. > Quicksort and shellsort gain speed by having simple inner loops. I have the impression that timsort is optimal when compares and memory access are expensive, but I haven't seen any benchmarks for native types in contiguous memory. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Mon Feb 17 13:41:29 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 17 Feb 2014 18:41:29 +0000 (UTC) Subject: [Numpy-discussion] svd error checking vs. speed References: <1392501368.21195.1.camel@sebastian-t440> <191918006414354392.582160sturla.molden-gmail.com@news.gmane.org> Message-ID: <1367839796414355047.292114sturla.molden-gmail.com@news.gmane.org> Sturla Molden wrote: > wrote: > maybe -1 >> >> statsmodels is using np.linalg.pinv which uses svd >> I never ran heard of any crash (*), and the only time I compared with >> scipy I didn't like the slowdown. > > If you did care about speed in least-sqares fitting you would not call QR > or SVD directly, but use the builting LAPACK least-squares drivers (*GELSS, > *GELS, *GGGLM), which are much faster (I have checked), as well as use an > optimized multi-core efficient LAPACK (e.g. Intel MKL). Any overhead from > finiteness checking will be tiny compared to the Python/NumPy overhead your > statsmodels code incurs, not to mention the overhead you get from using f2c > lapack_lite. By the way: I am not saying you should call this methods. Keeping most of the QR and SVD least-squares solvers in Python has its merits as well, e.g. for clarity. But if you do, it defeats any argument that finiteness checking before calling LAPACK will be too slow. Sturla From jtaylor.debian at googlemail.com Mon Feb 17 14:31:56 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Mon, 17 Feb 2014 20:31:56 +0100 Subject: [Numpy-discussion] allocated memory cache for numpy Message-ID: <530263AC.4030304@googlemail.com> hi, I noticed that during some simplistic benchmarks (e.g. https://github.com/numpy/numpy/issues/4310) a lot of time is spent in the kernel zeroing pages. This is because under linux glibc will always allocate large memory blocks with mmap. As these pages can come from other processes the kernel must zero them for security reasons. For memory within the numpy process this unnecessary and possibly a large overhead for the many temporaries numpy creates. The behavior of glibc can be tuned to change the threshold at which it starts using mmap but that would be a platform specific fix. I was thinking about adding a thread local cache of pointers to of allocated memory. When an array is created it tries to get its memory from the cache and when its deallocated it returns it to the cache. The threshold and cached memory block sizes could be adaptive depending on the application workload. For simplistic temporary heavy benchmarks this eliminates the time spent in the kernel (system with time). But I don't know how relevant this is for real world applications. Have you noticed large amounts of time spent in the kernel in your apps? I also found this paper which describes pretty much exactly what I'm proposing: pyhpc.org/workshop/papers/Doubling.pdf? Someone know why their changes were never incorporated in numpy? I couldn't find a reference in the list archive. From sturla.molden at gmail.com Mon Feb 17 15:16:45 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 17 Feb 2014 20:16:45 +0000 (UTC) Subject: [Numpy-discussion] allocated memory cache for numpy References: <530263AC.4030304@googlemail.com> Message-ID: <1454055383414360591.251341sturla.molden-gmail.com@news.gmane.org> Julian Taylor wrote: > When an array is created it tries to get its memory from the cache and > when its deallocated it returns it to the cache. Good idea, however there is already a C function that does this. It uses a heap to keep the cached memory blocks sorted according to size. You know it as malloc ? and is why we call this allocation from the heap. Which by the way is what NumPy already does. ;-) Sturla From jtaylor.debian at googlemail.com Mon Feb 17 15:20:31 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Mon, 17 Feb 2014 21:20:31 +0100 Subject: [Numpy-discussion] allocated memory cache for numpy In-Reply-To: <1454055383414360591.251341sturla.molden-gmail.com@news.gmane.org> References: <530263AC.4030304@googlemail.com> <1454055383414360591.251341sturla.molden-gmail.com@news.gmane.org> Message-ID: <53026F0F.5020608@googlemail.com> On 17.02.2014 21:16, Sturla Molden wrote: > Julian Taylor wrote: > >> When an array is created it tries to get its memory from the cache and >> when its deallocated it returns it to the cache. > > Good idea, however there is already a C function that does this. It uses a > heap to keep the cached memory blocks sorted according to size. You know it > as malloc ? and is why we call this allocation from the heap. Which by the > way is what NumPy already does. ;-) > not with glibc, glibc has no cache for mmap allocated memory. It does cache sbrk allocated memory and uses a dynamic threshold for using it but its tuned for generic applications so the maximum threshold is very low, I think its 32 MB. Far too low for many numerical applications. From njs at pobox.com Mon Feb 17 15:42:13 2014 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 17 Feb 2014 15:42:13 -0500 Subject: [Numpy-discussion] allocated memory cache for numpy In-Reply-To: <1454055383414360591.251341sturla.molden-gmail.com@news.gmane.org> References: <530263AC.4030304@googlemail.com> <1454055383414360591.251341sturla.molden-gmail.com@news.gmane.org> Message-ID: On 17 Feb 2014 15:17, "Sturla Molden" wrote: > > Julian Taylor wrote: > > > When an array is created it tries to get its memory from the cache and > > when its deallocated it returns it to the cache. > > Good idea, however there is already a C function that does this. It uses a > heap to keep the cached memory blocks sorted according to size. You know it > as malloc ? and is why we call this allocation from the heap. Which by the > way is what NumPy already does. ;-) Common malloc implementations are not well optimized for programs that have frequent, short-lived, large-sized allocations. Usually they optimize for small short-lived allocations of of small sizes. It's totally plausible that we could do a better job in the common case of array operations like 'a + b + c + d' that allocate and free a bunch of same-sized temporary arrays as they go. (Right now, if those arrays are large, that expression will always generate multiple mmap/munmap calls.) The question is to what extent numpy programs are bottlenecked by such allocations. Also, I'd be pretty wary of caching large chunks of unused memory. People already have a lot of trouble understanding their program's memory usage, and getting rid of 'greedy free' will make this even worse. Another optimization we should consider that might help a lot in the same situations where this would help: for code called from the cpython eval loop, it's afaict possible to determine which inputs are temporaries by checking their refcnt. In the second call to __add__ in '(a + b) + c', the temporary will have refcnt 1, while the other arrays will all have refcnt >1. In such cases (subject to various sanity checks on shape, dtype, etc) we could elide temporaries by reusing the input array for the output. The risk is that there may be some code out there that calls these operations directly from C with non-temp arrays that nonetheless have refcnt 1, but we should at least investigate the feasibility. E.g. maybe we can do the optimization for tp_add but not PyArray_Add. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at seefeld.name Mon Feb 17 15:55:53 2014 From: stefan at seefeld.name (Stefan Seefeld) Date: Mon, 17 Feb 2014 15:55:53 -0500 Subject: [Numpy-discussion] allocated memory cache for numpy In-Reply-To: References: <530263AC.4030304@googlemail.com> <1454055383414360591.251341sturla.molden-gmail.com@news.gmane.org> Message-ID: <53027759.5040604@seefeld.name> On 02/17/2014 03:42 PM, Nathaniel Smith wrote: > Another optimization we should consider that might help a lot in the > same situations where this would help: for code called from the > cpython eval loop, it's afaict possible to determine which inputs are > temporaries by checking their refcnt. In the second call to __add__ in > '(a + b) + c', the temporary will have refcnt 1, while the other > arrays will all have refcnt >1. In such cases (subject to various > sanity checks on shape, dtype, etc) we could elide temporaries by > reusing the input array for the output. The risk is that there may be > some code out there that calls these operations directly from C with > non-temp arrays that nonetheless have refcnt 1, but we should at least > investigate the feasibility. E.g. maybe we can do the optimization for > tp_add but not PyArray_Add. For element-wise operations such as the above, wouldn't it be even better to use loop fusion, by evaluating the entire compound expression per element, instead of each individual operation ? That would require methods such as __add__ to return an operation object, rather than the result value. I believe a technique like that is used in the numexpr package (https://github.com/pydata/numexpr), which I saw announced here recently... FWIW, Stefan PS: Such a loop-fusion technique would also open the door to other optimizations, such as vectorization (simd)... -- ...ich hab' noch einen Koffer in Berlin... From sturla.molden at gmail.com Mon Feb 17 16:27:31 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Mon, 17 Feb 2014 21:27:31 +0000 (UTC) Subject: [Numpy-discussion] allocated memory cache for numpy References: <530263AC.4030304@googlemail.com> <1454055383414360591.251341sturla.molden-gmail.com@news.gmane.org> Message-ID: <1932399482414364977.236518sturla.molden-gmail.com@news.gmane.org> Nathaniel Smith wrote: > Also, I'd be pretty wary of caching large chunks of unused memory. People > already have a lot of trouble understanding their program's memory usage, > and getting rid of 'greedy free' will make this even worse. A cache would only be needed when there is a lot of computing going on, so one could set an "early expire date" on chached chunks. Anything older than a second or two could be returned. A cache would thus require a garbage collector thread. Sturla From stefan.otte at gmail.com Mon Feb 17 16:39:46 2014 From: stefan.otte at gmail.com (Stefan Otte) Date: Mon, 17 Feb 2014 22:39:46 +0100 Subject: [Numpy-discussion] Proposal: Chaining np.dot with mdot helper function Message-ID: Hey guys, I wrote myself a little helper function `mdot` which chains np.dot for multiple arrays. So I can write mdot(A, B, C, D, E) instead of these A.dot(B).dot(C).dot(D).dot(E) np.dot(np.dot(np.dot(np.dot(A, B), C), D), E) I know you can use `numpy.matrix` to get nicer formulas. However, most numpy/scipy function return arrays instead of numpy.matrix. Therefore, sometimes you actually use array multiplication when you think you use matrix multiplication. `mdot` is a simple way to avoid using numpy.matrix but to improve the readability. What do you think? Is this useful and worthy to integrate in numpy? I already created an issuer for this: https://github.com/numpy/numpy/issues/4311 jaimefrio also suggested to do some reordering of the arrays to minimize computation: https://github.com/numpy/numpy/issues/4311#issuecomment-35295857 Best, Stefan From josef.pktd at gmail.com Mon Feb 17 16:57:13 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 17 Feb 2014 16:57:13 -0500 Subject: [Numpy-discussion] Proposal: Chaining np.dot with mdot helper function In-Reply-To: References: Message-ID: On Mon, Feb 17, 2014 at 4:39 PM, Stefan Otte wrote: > Hey guys, > > I wrote myself a little helper function `mdot` which chains np.dot for > multiple arrays. So I can write > > mdot(A, B, C, D, E) > > instead of these > > A.dot(B).dot(C).dot(D).dot(E) > np.dot(np.dot(np.dot(np.dot(A, B), C), D), E) > > I know you can use `numpy.matrix` to get nicer formulas. However, most > numpy/scipy function return arrays instead of numpy.matrix. Therefore, > sometimes you actually use array multiplication when you think you use > matrix multiplication. `mdot` is a simple way to avoid using > numpy.matrix but to improve the readability. > > What do you think? Is this useful and worthy to integrate in numpy? > > > I already created an issuer for this: > https://github.com/numpy/numpy/issues/4311 > > jaimefrio also suggested to do some reordering of the arrays to > minimize computation: > https://github.com/numpy/numpy/issues/4311#issuecomment-35295857 statsmodels has a convenience chaindot, but most of the time I don't like it's usage, because of the missing brackets. say, you have a (10000, 10) array and you use an intermediate (10000, 10000) array instead of (10,10) array IIRC, for reordering I looked at this http://www.mathworks.com/matlabcentral/fileexchange/27950-mmtimes-matrix-chain-product Josef (don't make it too easy for people to shoot themselves in ...) > > > Best, > Stefan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From jaime.frio at gmail.com Mon Feb 17 17:27:37 2014 From: jaime.frio at gmail.com (=?ISO-8859-1?Q?Jaime_Fern=E1ndez_del_R=EDo?=) Date: Mon, 17 Feb 2014 14:27:37 -0800 Subject: [Numpy-discussion] Proposal: Chaining np.dot with mdot helper function In-Reply-To: References: Message-ID: Perhaps you could reuse np.dot, by giving its second argument a default None value, and passing a tuple as first argument, i.e. np.dot((a, b, c)) would compute a.dot(b).dot(c), possibly not in that order. As is suggested in the matlab thread linked by Josef, if you do implement an optimal ordering algorithm, then precalculating the ordering and passing it in as an argument should be an option. If I get a vote, I am definitely +1 on this, especially the more sophisticated version. On Feb 17, 2014 1:40 PM, "Stefan Otte" wrote: -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Mon Feb 17 18:23:00 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Tue, 18 Feb 2014 00:23:00 +0100 Subject: [Numpy-discussion] Proposal: Chaining np.dot with mdot helper function In-Reply-To: References: Message-ID: considering np.dot takes only its binary positional args and a single defaulted kwarg, passing in a variable number of positional args as a list makes sense. Then just call the builtin reduce on the list, and there you go. I also generally approve of such semantics for binary associative operations. On Mon, Feb 17, 2014 at 11:27 PM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > Perhaps you could reuse np.dot, by giving its second argument a default > None value, and passing a tuple as first argument, i.e. np.dot((a, b, c)) > would compute a.dot(b).dot(c), possibly not in that order. > > As is suggested in the matlab thread linked by Josef, if you do implement > an optimal ordering algorithm, then precalculating the ordering and passing > it in as an argument should be an option. > > If I get a vote, I am definitely +1 on this, especially the more > sophisticated version. > > On Feb 17, 2014 1:40 PM, "Stefan Otte" wrote: > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Feb 17 18:52:06 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 17 Feb 2014 18:52:06 -0500 Subject: [Numpy-discussion] Proposal: Chaining np.dot with mdot helper function In-Reply-To: References: Message-ID: On Mon, Feb 17, 2014 at 4:57 PM, wrote: > On Mon, Feb 17, 2014 at 4:39 PM, Stefan Otte wrote: >> Hey guys, >> >> I wrote myself a little helper function `mdot` which chains np.dot for >> multiple arrays. So I can write >> >> mdot(A, B, C, D, E) >> >> instead of these >> >> A.dot(B).dot(C).dot(D).dot(E) >> np.dot(np.dot(np.dot(np.dot(A, B), C), D), E) >> >> I know you can use `numpy.matrix` to get nicer formulas. However, most >> numpy/scipy function return arrays instead of numpy.matrix. Therefore, >> sometimes you actually use array multiplication when you think you use >> matrix multiplication. `mdot` is a simple way to avoid using >> numpy.matrix but to improve the readability. >> >> What do you think? Is this useful and worthy to integrate in numpy? >> >> >> I already created an issuer for this: >> https://github.com/numpy/numpy/issues/4311 >> >> jaimefrio also suggested to do some reordering of the arrays to >> minimize computation: >> https://github.com/numpy/numpy/issues/4311#issuecomment-35295857 > > statsmodels has a convenience chaindot, but most of the time I don't > like it's usage, because of the missing brackets. > > say, you have a (10000, 10) array and you use an intermediate (10000, > 10000) array instead of (10,10) array >>> nobs = 10000 >>> v = np.diag(np.ones(4)) >>> x = np.random.randn(nobs, 4) >>> y = np.random.randn(nobs, 3) >>> reduce(np.dot, [x, v, x.T, y]).shape >>> def dotp(x, y): xy = np.dot(x,y) print xy.shape return xy >>> reduce(dotp, [x, v, x.T, y]).shape (10000, 4) (10000, 10000) (10000, 3) (10000, 3) >>> def dotTp(x, y): xy = np.dot(x.T,y.T) print xy.shape return xy.T >>> reduce(dotTp, [x, v, x.T, y][::-1]).shape (3, 4) (3, 4) (3, 10000) (10000, 3) Josef > > IIRC, for reordering I looked at this > http://www.mathworks.com/matlabcentral/fileexchange/27950-mmtimes-matrix-chain-product > > Josef > (don't make it too easy for people to shoot themselves in ...) > >> >> >> Best, >> Stefan >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion From njs at pobox.com Mon Feb 17 18:56:31 2014 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 17 Feb 2014 18:56:31 -0500 Subject: [Numpy-discussion] allocated memory cache for numpy In-Reply-To: <53027759.5040604@seefeld.name> References: <530263AC.4030304@googlemail.com> <1454055383414360591.251341sturla.molden-gmail.com@news.gmane.org> <53027759.5040604@seefeld.name> Message-ID: On Mon, Feb 17, 2014 at 3:55 PM, Stefan Seefeld wrote: > On 02/17/2014 03:42 PM, Nathaniel Smith wrote: >> Another optimization we should consider that might help a lot in the >> same situations where this would help: for code called from the >> cpython eval loop, it's afaict possible to determine which inputs are >> temporaries by checking their refcnt. In the second call to __add__ in >> '(a + b) + c', the temporary will have refcnt 1, while the other >> arrays will all have refcnt >1. In such cases (subject to various >> sanity checks on shape, dtype, etc) we could elide temporaries by >> reusing the input array for the output. The risk is that there may be >> some code out there that calls these operations directly from C with >> non-temp arrays that nonetheless have refcnt 1, but we should at least >> investigate the feasibility. E.g. maybe we can do the optimization for >> tp_add but not PyArray_Add. > > For element-wise operations such as the above, wouldn't it be even > better to use loop fusion, by evaluating the entire compound expression > per element, instead of each individual operation ? That would require > methods such as __add__ to return an operation object, rather than the > result value. I believe a technique like that is used in the numexpr > package (https://github.com/pydata/numexpr), which I saw announced here > recently... Hi Stefan (long time no see!), Sure, that would be an excellent thing, but adding a delayed evaluation engine to numpy is a big piece of new code, and we'd want to make it something you opt-in to explicitly. (There are too many weird potential issues with e.g. errors showing up at some far away place from the actual broken code, due to evaluation being delayed to there.) By contrast, the optimization suggested here is a tiny change we could do now, and would still be useful even in the hypothetical future where we do have lazy evaluation, for anyone who doesn't use it. -n From jtaylor.debian at googlemail.com Mon Feb 17 19:23:41 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 18 Feb 2014 01:23:41 +0100 Subject: [Numpy-discussion] allocated memory cache for numpy In-Reply-To: <1932399482414364977.236518sturla.molden-gmail.com@news.gmane.org> References: <530263AC.4030304@googlemail.com> <1454055383414360591.251341sturla.molden-gmail.com@news.gmane.org> <1932399482414364977.236518sturla.molden-gmail.com@news.gmane.org> Message-ID: <5302A80D.4030304@googlemail.com> On 17.02.2014 22:27, Sturla Molden wrote: > Nathaniel Smith wrote: >> Also, I'd be pretty wary of caching large chunks of unused memory. People >> already have a lot of trouble understanding their program's memory usage, >> and getting rid of 'greedy free' will make this even worse. > > A cache would only be needed when there is a lot of computing going on, so > one could set an "early expire date" on chached chunks. Anything older than > a second or two could be returned. A cache would thus require a garbage > collector thread. > I was thinking of something much simpler, just a layer of pointer stacks for different allocations sizes, the larger the size the smaller the cache with pessimistic defaults. e.g. the largest default cache layer is 128MB and with one or two entries so we can cache temporal close operations like a + (c * b). Maybe an age counter can be included that clears the largest old entry if it hasn't been used for X allocations. We can also monitor the allocations and make the maximum cache layer a fraction of the biggest allocation that occurred during the runtime. Devising a good scheme is probably tricky but we can start out with something simple like limiting the cache to 512MB which will already profit many cases while still being a acceptable amount of memory to waste on a modern machine. From stefan at seefeld.name Mon Feb 17 19:45:05 2014 From: stefan at seefeld.name (Stefan Seefeld) Date: Mon, 17 Feb 2014 19:45:05 -0500 Subject: [Numpy-discussion] allocated memory cache for numpy In-Reply-To: References: <530263AC.4030304@googlemail.com> <1454055383414360591.251341sturla.molden-gmail.com@news.gmane.org> <53027759.5040604@seefeld.name> Message-ID: <5302AD11.9050702@seefeld.name> On 02/17/2014 06:56 PM, Nathaniel Smith wrote: > On Mon, Feb 17, 2014 at 3:55 PM, Stefan Seefeld wrote: >> On 02/17/2014 03:42 PM, Nathaniel Smith wrote: >>> Another optimization we should consider that might help a lot in the >>> same situations where this would help: for code called from the >>> cpython eval loop, it's afaict possible to determine which inputs are >>> temporaries by checking their refcnt. In the second call to __add__ in >>> '(a + b) + c', the temporary will have refcnt 1, while the other >>> arrays will all have refcnt >1. In such cases (subject to various >>> sanity checks on shape, dtype, etc) we could elide temporaries by >>> reusing the input array for the output. The risk is that there may be >>> some code out there that calls these operations directly from C with >>> non-temp arrays that nonetheless have refcnt 1, but we should at least >>> investigate the feasibility. E.g. maybe we can do the optimization for >>> tp_add but not PyArray_Add. >> For element-wise operations such as the above, wouldn't it be even >> better to use loop fusion, by evaluating the entire compound expression >> per element, instead of each individual operation ? That would require >> methods such as __add__ to return an operation object, rather than the >> result value. I believe a technique like that is used in the numexpr >> package (https://github.com/pydata/numexpr), which I saw announced here >> recently... > Hi Stefan (long time no see!), Indeed ! :-) > Sure, that would be an excellent thing, but adding a delayed > evaluation engine to numpy is a big piece of new code, and we'd want > to make it something you opt-in to explicitly. (There are too many > weird potential issues with e.g. errors showing up at some far away > place from the actual broken code, due to evaluation being delayed to > there.) By contrast, the optimization suggested here is a tiny change > we could do now, and would still be useful even in the hypothetical > future where we do have lazy evaluation, for anyone who doesn't use > it. Sure, I fully agree. I didn't mean to suggest this as an alternative to a focused memory management optimization. Still, it seems this would be a nice project (perhaps even under the GSoC umbrella). It could be controlled by a metaclass (substituting appropriate ndarray methods), and thus could be enabled separately and explicitly. Anyhow, just an idea for someone else to pick up. :-) Stefan -- ...ich hab' noch einen Koffer in Berlin... From cournape at gmail.com Mon Feb 17 19:47:30 2014 From: cournape at gmail.com (David Cournapeau) Date: Tue, 18 Feb 2014 00:47:30 +0000 Subject: [Numpy-discussion] allocated memory cache for numpy In-Reply-To: <530263AC.4030304@googlemail.com> References: <530263AC.4030304@googlemail.com> Message-ID: On Mon, Feb 17, 2014 at 7:31 PM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > hi, > I noticed that during some simplistic benchmarks (e.g. > https://github.com/numpy/numpy/issues/4310) a lot of time is spent in > the kernel zeroing pages. > This is because under linux glibc will always allocate large memory > blocks with mmap. As these pages can come from other processes the > kernel must zero them for security reasons. > Do you have numbers for 'a lot of time' ? Is the above script the exact one you used for benchmarking this issue ? > For memory within the numpy process this unnecessary and possibly a > large overhead for the many temporaries numpy creates. > > The behavior of glibc can be tuned to change the threshold at which it > starts using mmap but that would be a platform specific fix. > > I was thinking about adding a thread local cache of pointers to of > allocated memory. > When an array is created it tries to get its memory from the cache and > when its deallocated it returns it to the cache. > The threshold and cached memory block sizes could be adaptive depending > on the application workload. > > For simplistic temporary heavy benchmarks this eliminates the time spent > in the kernel (system with time). > For this kind of setup, I would advise to look into perf on linux. It should be much more precise than time. If nobody beats me to it, I can try to look at this this WE, > But I don't know how relevant this is for real world applications. > Have you noticed large amounts of time spent in the kernel in your apps? > In my experience, more time is spent on figuring out how to spare memory than speeding this kind of operations for 'real life applications' (TM). What happens to your benchmark if you tune malloc to not use mmap at all ? David > > I also found this paper which describes pretty much exactly what I'm > proposing: > pyhpc.org/workshop/papers/Doubling.pdf? > > Someone know why their changes were never incorporated in numpy? I > couldn't find a reference in the list archive. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Feb 17 20:13:19 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 17 Feb 2014 18:13:19 -0700 Subject: [Numpy-discussion] Proposal to make power return float, and other such things. Message-ID: This is apropos issue #899 , where it is suggested that power promote integers to float. That sounds reasonable to me, but such a change in behavior makes it a bit iffy. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From alan.isaac at gmail.com Mon Feb 17 20:48:21 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 17 Feb 2014 20:48:21 -0500 Subject: [Numpy-discussion] Proposal to make power return float, and other such things. In-Reply-To: References: Message-ID: <5302BBE5.7010509@gmail.com> On 2/17/2014 8:13 PM, Charles R Harris wrote: > This is apropos issue #899 , where it is suggested that power promote integers to float. Even when base and exponent are both positive integers? Alan Isaac From argriffi at ncsu.edu Mon Feb 17 20:59:42 2014 From: argriffi at ncsu.edu (alex) Date: Mon, 17 Feb 2014 20:59:42 -0500 Subject: [Numpy-discussion] Proposal to make power return float, and other such things. In-Reply-To: References: Message-ID: On Mon, Feb 17, 2014 at 8:13 PM, Charles R Harris wrote: > This is apropos issue #899, where it is suggested that power promote > integers to float. That sounds reasonable to me, but such a change in > behavior makes it a bit iffy. > > Thoughts? After this change, what would be the recommended way to get entrywise positive integer powers of an array with integer entries? From charles at crunch.io Mon Feb 17 22:18:08 2014 From: charles at crunch.io (Charles G. Waldman) Date: Mon, 17 Feb 2014 21:18:08 -0600 Subject: [Numpy-discussion] bug with mmap'ed datetime64 arrays Message-ID: test case: #!/usr/bin/env python import numpy as np a=np.array(['2014', '2015', '2016'], dtype='datetime64') x=np.datetime64('2015') print a>x np.save('test.npy', a) b = np.load('test.npy', mmap_mode='c') print b>x result: >>> [False False True] Traceback (most recent call last): File "", line 1, in File "/tmp/t.py", line 12, in print b>x File "/usr/lib64/python2.7/site-packages/numpy/core/memmap.py", line 279, in __array_finalize__ if hasattr(obj, '_mmap') and np.may_share_memory(self, obj): File "/usr/lib64/python2.7/site-packages/numpy/lib/utils.py", line 298, in may_share_memory a_low, a_high = byte_bounds(a) File "/usr/lib64/python2.7/site-packages/numpy/lib/utils.py", line 258, in byte_bounds bytes_a = int(ai['typestr'][2:]) ValueError: invalid literal for int() with base 10: '8[Y]' fix: diff --git a/numpy/lib/utils.py b/numpy/lib/utils.py index 1f1cdfc..c73f2f1 100644 --- a/numpy/lib/utils.py +++ b/numpy/lib/utils.py @@ -210,7 +210,7 @@ def byte_bounds(a): a_data = ai['data'][0] astrides = ai['strides'] ashape = ai['shape'] - bytes_a = int(ai['typestr'][2:]) + bytes_a = a.dtype.itemsize a_low = a_high = a_data if astrides is None: # contiguous case will submit pull request via github From francesc at continuum.io Tue Feb 18 03:08:10 2014 From: francesc at continuum.io (Francesc Alted) Date: Tue, 18 Feb 2014 09:08:10 +0100 Subject: [Numpy-discussion] ANN: numexpr 2.3.1 released Message-ID: <530314EA.8060106@continuum.io> ========================== Announcing Numexpr 2.3.1 ========================== Numexpr is a fast numerical expression evaluator for NumPy. With it, expressions that operate on arrays (like "3*a+4*b") are accelerated and use less memory than doing the same calculation in Python. It wears multi-threaded capabilities, as well as support for Intel's MKL (Math Kernel Library), which allows an extremely fast evaluation of transcendental functions (sin, cos, tan, exp, log...) while squeezing the last drop of performance out of your multi-core processors. Look here for a some benchmarks of numexpr using MKL: https://github.com/pydata/numexpr/wiki/NumexprMKL Its only dependency is NumPy (MKL is optional), so it works well as an easy-to-deploy, easy-to-use, computational engine for projects that don't want to adopt other solutions requiring more heavy dependencies. What's new ========== * Added support for shift-left (<<) and shift-right (>>) binary operators. See PR #131. Thanks to fish2000! * Removed the rpath flag for the GCC linker, because it is probably not necessary and it chokes to clang. In case you want to know more in detail what has changed in this version, see: https://github.com/pydata/numexpr/wiki/Release-Notes or have a look at RELEASE_NOTES.txt in the tarball. Where I can find Numexpr? ========================= The project is hosted at GitHub in: https://github.com/pydata/numexpr You can get the packages from PyPI as well (but not for RC releases): http://pypi.python.org/pypi/numexpr Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. Enjoy data! -- Francesc Alted From jtaylor.debian at googlemail.com Tue Feb 18 04:05:31 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 18 Feb 2014 10:05:31 +0100 Subject: [Numpy-discussion] allocated memory cache for numpy In-Reply-To: References: <530263AC.4030304@googlemail.com> Message-ID: On Tue, Feb 18, 2014 at 1:47 AM, David Cournapeau wrote: > > On Mon, Feb 17, 2014 at 7:31 PM, Julian Taylor > wrote: >> >> hi, >> I noticed that during some simplistic benchmarks (e.g. >> https://github.com/numpy/numpy/issues/4310) a lot of time is spent in >> the kernel zeroing pages. >> This is because under linux glibc will always allocate large memory >> blocks with mmap. As these pages can come from other processes the >> kernel must zero them for security reasons. > > > Do you have numbers for 'a lot of time' ? Is the above script the exact one > you used for benchmarking this issue ? I saw it in many benchmarks I did over time for the numerous little improvements I added. But I'm aware these are overly simplistic, thats why I'm asking for numbers from real applications. The paper I found did a little more thorough benchmarks and seem to indicate more applications profit from it. > >> >> For memory within the numpy process this unnecessary and possibly a >> large overhead for the many temporaries numpy creates. >> >> The behavior of glibc can be tuned to change the threshold at which it >> starts using mmap but that would be a platform specific fix. >> >> I was thinking about adding a thread local cache of pointers to of >> allocated memory. >> When an array is created it tries to get its memory from the cache and >> when its deallocated it returns it to the cache. >> The threshold and cached memory block sizes could be adaptive depending >> on the application workload. >> >> For simplistic temporary heavy benchmarks this eliminates the time spent >> in the kernel (system with time). > > > For this kind of setup, I would advise to look into perf on linux. It should > be much more precise than time. > > If nobody beats me to it, I can try to look at this this WE, I'm using perf for most of my benchmarks. But in this case time is sufficient as the system time is all you need to know. perf confirms this time is almost all spent zeroing pages. From stefan.otte at gmail.com Tue Feb 18 04:17:56 2014 From: stefan.otte at gmail.com (Stefan Otte) Date: Tue, 18 Feb 2014 10:17:56 +0100 Subject: [Numpy-discussion] Proposal: Chaining np.dot with mdot helper function In-Reply-To: References: Message-ID: Just to give an idea about the performance implications I timed the operations on my machine %timeit reduce(dotp, [x, v, x.T, y]).shape 1 loops, best of 3: 1.32 s per loop %timeit reduce(dotTp, [x, v, x.T, y][::-1]).shape 1000 loops, best of 3: 394 ?s per loop I was just interested in a nicer formulas but if the side effect is a performance improvement I can live with that. Pauli Virtanen posed in the issue an older discussion on the mailinglist: http://thread.gmane.org/gmane.comp.python.numeric.general/14288/ Beste Gr??e, Stefan On Tue, Feb 18, 2014 at 12:52 AM, wrote: > On Mon, Feb 17, 2014 at 4:57 PM, wrote: >> On Mon, Feb 17, 2014 at 4:39 PM, Stefan Otte wrote: >>> Hey guys, >>> >>> I wrote myself a little helper function `mdot` which chains np.dot for >>> multiple arrays. So I can write >>> >>> mdot(A, B, C, D, E) >>> >>> instead of these >>> >>> A.dot(B).dot(C).dot(D).dot(E) >>> np.dot(np.dot(np.dot(np.dot(A, B), C), D), E) >>> >>> I know you can use `numpy.matrix` to get nicer formulas. However, most >>> numpy/scipy function return arrays instead of numpy.matrix. Therefore, >>> sometimes you actually use array multiplication when you think you use >>> matrix multiplication. `mdot` is a simple way to avoid using >>> numpy.matrix but to improve the readability. >>> >>> What do you think? Is this useful and worthy to integrate in numpy? >>> >>> >>> I already created an issuer for this: >>> https://github.com/numpy/numpy/issues/4311 >>> >>> jaimefrio also suggested to do some reordering of the arrays to >>> minimize computation: >>> https://github.com/numpy/numpy/issues/4311#issuecomment-35295857 >> >> statsmodels has a convenience chaindot, but most of the time I don't >> like it's usage, because of the missing brackets. >> >> say, you have a (10000, 10) array and you use an intermediate (10000, >> 10000) array instead of (10,10) array > >>>> nobs = 10000 >>>> v = np.diag(np.ones(4)) >>>> x = np.random.randn(nobs, 4) >>>> y = np.random.randn(nobs, 3) >>>> reduce(np.dot, [x, v, x.T, y]).shape > > >>>> def dotp(x, y): > xy = np.dot(x,y) > print xy.shape > return xy > >>>> reduce(dotp, [x, v, x.T, y]).shape > (10000, 4) > (10000, 10000) > (10000, 3) > (10000, 3) > >>>> def dotTp(x, y): > xy = np.dot(x.T,y.T) > print xy.shape > return xy.T > >>>> reduce(dotTp, [x, v, x.T, y][::-1]).shape > (3, 4) > (3, 4) > (3, 10000) > (10000, 3) > > Josef > >> >> IIRC, for reordering I looked at this >> http://www.mathworks.com/matlabcentral/fileexchange/27950-mmtimes-matrix-chain-product >> >> Josef >> (don't make it too easy for people to shoot themselves in ...) >> >>> >>> >>> Best, >>> Stefan >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sturla.molden at gmail.com Tue Feb 18 04:39:06 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 18 Feb 2014 09:39:06 +0000 (UTC) Subject: [Numpy-discussion] allocated memory cache for numpy References: <530263AC.4030304@googlemail.com> <1454055383414360591.251341sturla.molden-gmail.com@news.gmane.org> <1932399482414364977.236518sturla.molden-gmail.com@news.gmane.org> <5302A80D.4030304@googlemail.com> Message-ID: <1905671494414407688.407658sturla.molden-gmail.com@news.gmane.org> Julian Taylor wrote: > I was thinking of something much simpler, just a layer of pointer stacks > for different allocations sizes, the larger the size the smaller the > cache with pessimistic defaults. > e.g. the largest default cache layer is 128MB and with one or two > entries so we can cache temporal close operations like a + (c * b). > Maybe an age counter can be included that clears the largest old entry > if it hasn't been used for X allocations. It would not be difficult if we e.g. used two heaps instead of one. One would sort the cached blocks according to size (similar to malloc), the other would be a priority queue for age. Every now and then the expired entries would be cleared off the cache. Sturla From sturla.molden at gmail.com Tue Feb 18 04:51:16 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 18 Feb 2014 09:51:16 +0000 (UTC) Subject: [Numpy-discussion] Proposal to make power return float, and other such things. References: Message-ID: <363066132414409842.296834sturla.molden-gmail.com@news.gmane.org> Charles R Harris wrote: > This is apropos issue #899 < href="https://github.com/numpy/numpy/issues/899">https://github.com/numpy/numpy/issues/899>, > where it is suggested that power promote integers to float. That sounds > reasonable to me, but such a change in behavior makes it a bit iffy. > > Thoughts? Numpy should do the same as Python does. Sturla From robert.kern at gmail.com Tue Feb 18 05:55:20 2014 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 18 Feb 2014 10:55:20 +0000 Subject: [Numpy-discussion] Proposal to make power return float, and other such things. In-Reply-To: <363066132414409842.296834sturla.molden-gmail.com@news.gmane.org> References: <363066132414409842.296834sturla.molden-gmail.com@news.gmane.org> Message-ID: On Tue, Feb 18, 2014 at 9:51 AM, Sturla Molden wrote: > Charles R Harris wrote: >> This is apropos issue #899 <> href="https://github.com/numpy/numpy/issues/899">https://github.com/numpy/numpy/issues/899>, >> where it is suggested that power promote integers to float. That sounds >> reasonable to me, but such a change in behavior makes it a bit iffy. >> >> Thoughts? > > Numpy should do the same as Python does. That's problematic because Python's behavior is value dependent. Python 3.3.1 (default, May 16 2013, 17:20:13) [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> 2 ** 2 4 >>> 2 ** -2 0.25 That's fine if you only have one value for each operand. When you have multiple values for each operand, say an exponent array containing both positive and negative integers, that becomes a problem. Generally, we try to make ufuncs return types that are predictable from the types of the operands, not the values of the operands. I am -1 on the proposal to make power(x:int, y:int) always return a float. It is usually trivial to just make the exponent a float if one wants a float returned. Or we could introduce an fpow() that always coerces the inputs to the best inexact type. -- Robert Kern From toddrjen at gmail.com Tue Feb 18 05:59:53 2014 From: toddrjen at gmail.com (Todd) Date: Tue, 18 Feb 2014 11:59:53 +0100 Subject: [Numpy-discussion] Proposal to make power return float, and other such things. In-Reply-To: References: <363066132414409842.296834sturla.molden-gmail.com@news.gmane.org> Message-ID: On Feb 18, 2014 11:55 AM, "Robert Kern" wrote: > > On Tue, Feb 18, 2014 at 9:51 AM, Sturla Molden wrote: > > Charles R Harris wrote: > >> This is apropos issue #899 < >> href="https://github.com/numpy/numpy/issues/899"> https://github.com/numpy/numpy/issues/899>, > >> where it is suggested that power promote integers to float. That sounds > >> reasonable to me, but such a change in behavior makes it a bit iffy. > >> > >> Thoughts? > > > > Numpy should do the same as Python does. > > That's problematic because Python's behavior is value dependent. > > Python 3.3.1 (default, May 16 2013, 17:20:13) > [GCC 4.2.1 Compatible Apple Clang 4.1 ((tags/Apple/clang-421.11.66))] on darwin > Type "help", "copyright", "credits" or "license" for more information. > >>> 2 ** 2 > 4 > >>> 2 ** -2 > 0.25 > > That's fine if you only have one value for each operand. When you have > multiple values for each operand, say an exponent array containing > both positive and negative integers, that becomes a problem. > Generally, we try to make ufuncs return types that are predictable > from the types of the operands, not the values of the operands. > > I am -1 on the proposal to make power(x:int, y:int) always return a > float. It is usually trivial to just make the exponent a float if one > wants a float returned. Or we could introduce an fpow() that always > coerces the inputs to the best inexact type. What about making it floats for int types but int for uint types? -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Feb 18 06:10:07 2014 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 18 Feb 2014 11:10:07 +0000 Subject: [Numpy-discussion] Proposal to make power return float, and other such things. In-Reply-To: References: <363066132414409842.296834sturla.molden-gmail.com@news.gmane.org> Message-ID: On Tue, Feb 18, 2014 at 10:59 AM, Todd wrote: > > On Feb 18, 2014 11:55 AM, "Robert Kern" wrote: >> I am -1 on the proposal to make power(x:int, y:int) always return a >> float. It is usually trivial to just make the exponent a float if one >> wants a float returned. Or we could introduce an fpow() that always >> coerces the inputs to the best inexact type. > > What about making it floats for int types but int for uint types? Doesn't really do it for me. int_array**2 would now return a float for no good reason. -- Robert Kern From sturla.molden at gmail.com Tue Feb 18 06:19:54 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 18 Feb 2014 11:19:54 +0000 (UTC) Subject: [Numpy-discussion] Proposal to make power return float, and other such things. References: <363066132414409842.296834sturla.molden-gmail.com@news.gmane.org> Message-ID: <1664835161414414945.367849sturla.molden-gmail.com@news.gmane.org> Robert Kern wrote: > That's fine if you only have one value for each operand. When you have > multiple values for each operand, say an exponent array containing > both positive and negative integers, that becomes a problem. I don't really see why. If you have a negative integer in there you get a float array returned, otherwise it stays integer. > Generally, we try to make ufuncs return types that are predictable > from the types of the operands, not the values of the operands. Isn't that unpractical in this case? Who cares if the power operator behaves as an ufunc? Sturla From robert.kern at gmail.com Tue Feb 18 06:25:18 2014 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 18 Feb 2014 11:25:18 +0000 Subject: [Numpy-discussion] Proposal to make power return float, and other such things. In-Reply-To: <1664835161414414945.367849sturla.molden-gmail.com@news.gmane.org> References: <363066132414409842.296834sturla.molden-gmail.com@news.gmane.org> <1664835161414414945.367849sturla.molden-gmail.com@news.gmane.org> Message-ID: On Tue, Feb 18, 2014 at 11:19 AM, Sturla Molden wrote: > Robert Kern wrote: > >> That's fine if you only have one value for each operand. When you have >> multiple values for each operand, say an exponent array containing >> both positive and negative integers, that becomes a problem. > > I don't really see why. If you have a negative integer in there you get a > float array returned, otherwise it stays integer. We don't do this for any other ufunc. >> Generally, we try to make ufuncs return types that are predictable >> from the types of the operands, not the values of the operands. > > Isn't that unpractical in this case? Who cares if the power operator > behaves as an ufunc? We're talking about numpy.power(), not just ndarray.__pow__(). The equivalence of the two is indeed an implementation detail, but I do think that it is useful to maintain the equivalence. If we didn't, it would be the only exception, to my knowledge. -- Robert Kern From sturla.molden at gmail.com Tue Feb 18 06:44:22 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 18 Feb 2014 11:44:22 +0000 (UTC) Subject: [Numpy-discussion] Proposal to make power return float, and other such things. References: <363066132414409842.296834sturla.molden-gmail.com@news.gmane.org> <1664835161414414945.367849sturla.molden-gmail.com@news.gmane.org> Message-ID: <1314443430414416459.112438sturla.molden-gmail.com@news.gmane.org> Robert Kern wrote: > We're talking about numpy.power(), not just ndarray.__pow__(). The > equivalence of the two is indeed an implementation detail, but I do > think that it is useful to maintain the equivalence. If we didn't, it > would be the only exception, to my knowledge. But in this case it makes sence. math.pow(2,2) and 2**2 does not do the same. That is how Python behaves. Sturla From robert.kern at gmail.com Tue Feb 18 06:57:21 2014 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 18 Feb 2014 11:57:21 +0000 Subject: [Numpy-discussion] Proposal to make power return float, and other such things. In-Reply-To: <1314443430414416459.112438sturla.molden-gmail.com@news.gmane.org> References: <363066132414409842.296834sturla.molden-gmail.com@news.gmane.org> <1664835161414414945.367849sturla.molden-gmail.com@news.gmane.org> <1314443430414416459.112438sturla.molden-gmail.com@news.gmane.org> Message-ID: On Tue, Feb 18, 2014 at 11:44 AM, Sturla Molden wrote: > Robert Kern wrote: > >> We're talking about numpy.power(), not just ndarray.__pow__(). The >> equivalence of the two is indeed an implementation detail, but I do >> think that it is useful to maintain the equivalence. If we didn't, it >> would be the only exception, to my knowledge. > > But in this case it makes sence. Every proposed special case we come up with "makes sense" in some way. That doesn't mean that they are special enough to break the rules. In my opinion, this is not special enough to break the rules. In your opinion, it is. > math.pow(2,2) and 2**2 does not do the same. That is how Python behaves. Yes, because the functions in the math module are explicitly thin wrappers around floating-point C library functions and don't have any particular relationship to the special methods on int objects. numpy does have largely 1:1 relationship between its ndarray operator special methods and the ufuncs that implement them. I believe this is a useful relationship for learning the API and predicting what a given expression is going to do. -- Robert Kern From njs at pobox.com Tue Feb 18 07:00:49 2014 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 18 Feb 2014 07:00:49 -0500 Subject: [Numpy-discussion] Proposal to make power return float, and other such things. In-Reply-To: References: <363066132414409842.296834sturla.molden-gmail.com@news.gmane.org> <1664835161414414945.367849sturla.molden-gmail.com@news.gmane.org> <1314443430414416459.112438sturla.molden-gmail.com@news.gmane.org> Message-ID: Perhaps integer power should raise an error on negative powers? That way people will at least be directed to use arr ** -1.0 instead of silently getting nonsense from arr ** -1. On 18 Feb 2014 06:57, "Robert Kern" wrote: > On Tue, Feb 18, 2014 at 11:44 AM, Sturla Molden > wrote: > > Robert Kern wrote: > > > >> We're talking about numpy.power(), not just ndarray.__pow__(). The > >> equivalence of the two is indeed an implementation detail, but I do > >> think that it is useful to maintain the equivalence. If we didn't, it > >> would be the only exception, to my knowledge. > > > > But in this case it makes sence. > > Every proposed special case we come up with "makes sense" in some way. > That doesn't mean that they are special enough to break the rules. In > my opinion, this is not special enough to break the rules. In your > opinion, it is. > > > math.pow(2,2) and 2**2 does not do the same. That is how Python behaves. > > Yes, because the functions in the math module are explicitly thin > wrappers around floating-point C library functions and don't have any > particular relationship to the special methods on int objects. numpy > does have largely 1:1 relationship between its ndarray operator > special methods and the ufuncs that implement them. I believe this is > a useful relationship for learning the API and predicting what a given > expression is going to do. > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Feb 18 07:06:55 2014 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 18 Feb 2014 12:06:55 +0000 Subject: [Numpy-discussion] Proposal to make power return float, and other such things. In-Reply-To: References: <363066132414409842.296834sturla.molden-gmail.com@news.gmane.org> <1664835161414414945.367849sturla.molden-gmail.com@news.gmane.org> <1314443430414416459.112438sturla.molden-gmail.com@news.gmane.org> Message-ID: On Tue, Feb 18, 2014 at 12:00 PM, Nathaniel Smith wrote: > Perhaps integer power should raise an error on negative powers? That way > people will at least be directed to use arr ** -1.0 instead of silently > getting nonsense from arr ** -1. Controllable by np.seterr(invalid=...)? I could get behind that. -- Robert Kern From sebastian at sipsolutions.net Tue Feb 18 07:06:32 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 18 Feb 2014 13:06:32 +0100 Subject: [Numpy-discussion] Proposal to make power return float, and other such things. In-Reply-To: <1314443430414416459.112438sturla.molden-gmail.com@news.gmane.org> References: <363066132414409842.296834sturla.molden-gmail.com@news.gmane.org> <1664835161414414945.367849sturla.molden-gmail.com@news.gmane.org> <1314443430414416459.112438sturla.molden-gmail.com@news.gmane.org> Message-ID: <1392725192.5100.5.camel@sebastian-t440> On Di, 2014-02-18 at 11:44 +0000, Sturla Molden wrote: > Robert Kern wrote: > > > We're talking about numpy.power(), not just ndarray.__pow__(). The > > equivalence of the two is indeed an implementation detail, but I do > > think that it is useful to maintain the equivalence. If we didn't, it > > would be the only exception, to my knowledge. > > But in this case it makes sence. > > math.pow(2,2) and 2**2 does not do the same. That is how Python behaves. > To be honest, that comparison only makes half sense to me. `math` are all float (double precision) functions, basically you could just as well call the library `fmath`... I am -0.5 right now. `__pow__` already has special behaviour as opposed to `np.power`, but these are for speed and don't change behaviour. T he `uint` idea seems to just make things more complicated. If I am aware of uint vs int, I am already aware I can just cast to float. - Sebastian > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sturla.molden at gmail.com Tue Feb 18 07:24:26 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 18 Feb 2014 12:24:26 +0000 (UTC) Subject: [Numpy-discussion] Proposal to make power return float, and other such things. References: <363066132414409842.296834sturla.molden-gmail.com@news.gmane.org> <1664835161414414945.367849sturla.molden-gmail.com@news.gmane.org> <1314443430414416459.112438sturla.molden-gmail.com@news.gmane.org> Message-ID: <99187718414419010.318284sturla.molden-gmail.com@news.gmane.org> Nathaniel Smith wrote: > Perhaps integer power should raise an error on negative powers? That way > people will at least be directed to use arr ** -1.0 instead of silently > getting nonsense from arr ** -1. That sounds far better than silently returning errorneous results. Sturla From njs at pobox.com Tue Feb 18 08:46:10 2014 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 18 Feb 2014 08:46:10 -0500 Subject: [Numpy-discussion] Proposal to make power return float, and other such things. In-Reply-To: References: <363066132414409842.296834sturla.molden-gmail.com@news.gmane.org> <1664835161414414945.367849sturla.molden-gmail.com@news.gmane.org> <1314443430414416459.112438sturla.molden-gmail.com@news.gmane.org> Message-ID: On 18 Feb 2014 07:07, "Robert Kern" wrote: > > On Tue, Feb 18, 2014 at 12:00 PM, Nathaniel Smith wrote: > > Perhaps integer power should raise an error on negative powers? That way > > people will at least be directed to use arr ** -1.0 instead of silently > > getting nonsense from arr ** -1. > > Controllable by np.seterr(invalid=...)? I could get behind that. I'm not sure the np.seterr part would work or be a good idea, given that we have no way to return or propagate NaN... I vote for just an unconditional error. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Feb 18 08:53:40 2014 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 18 Feb 2014 13:53:40 +0000 Subject: [Numpy-discussion] Proposal to make power return float, and other such things. In-Reply-To: References: <363066132414409842.296834sturla.molden-gmail.com@news.gmane.org> <1664835161414414945.367849sturla.molden-gmail.com@news.gmane.org> <1314443430414416459.112438sturla.molden-gmail.com@news.gmane.org> Message-ID: On Tue, Feb 18, 2014 at 1:46 PM, Nathaniel Smith wrote: > On 18 Feb 2014 07:07, "Robert Kern" wrote: >> >> On Tue, Feb 18, 2014 at 12:00 PM, Nathaniel Smith wrote: >> > Perhaps integer power should raise an error on negative powers? That way >> > people will at least be directed to use arr ** -1.0 instead of silently >> > getting nonsense from arr ** -1. >> >> Controllable by np.seterr(invalid=...)? I could get behind that. > > I'm not sure the np.seterr part would work or be a good idea, given that we > have no way to return or propagate NaN... I vote for just an unconditional > error. We issue configurable warning/error/ignore behavior for integer 0/0 through this mechanism too without any NaNs. However, that's `divide` and not `invalid`. Your point is taken that `invalid` usually implies that a `NaN` is generated, though I don't think this is ever stated anywhere. I just suggested `invalid` as that is usually what we use for function domain violations. -- Robert Kern From josef.pktd at gmail.com Tue Feb 18 09:37:29 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 18 Feb 2014 09:37:29 -0500 Subject: [Numpy-discussion] Proposal to make power return float, and other such things. In-Reply-To: References: <363066132414409842.296834sturla.molden-gmail.com@news.gmane.org> <1664835161414414945.367849sturla.molden-gmail.com@news.gmane.org> <1314443430414416459.112438sturla.molden-gmail.com@news.gmane.org> Message-ID: On Tue, Feb 18, 2014 at 8:53 AM, Robert Kern wrote: > On Tue, Feb 18, 2014 at 1:46 PM, Nathaniel Smith wrote: >> On 18 Feb 2014 07:07, "Robert Kern" wrote: >>> >>> On Tue, Feb 18, 2014 at 12:00 PM, Nathaniel Smith wrote: >>> > Perhaps integer power should raise an error on negative powers? That way >>> > people will at least be directed to use arr ** -1.0 instead of silently >>> > getting nonsense from arr ** -1. >>> >>> Controllable by np.seterr(invalid=...)? I could get behind that. >> >> I'm not sure the np.seterr part would work or be a good idea, given that we >> have no way to return or propagate NaN... I vote for just an unconditional >> error. > > We issue configurable warning/error/ignore behavior for > integer 0/0 through this mechanism too without any NaNs. However, > that's `divide` and not `invalid`. Your point is taken that `invalid` > usually implies that a `NaN` is generated, though I don't think this > is ever stated anywhere. I just suggested `invalid` as that is usually > what we use for function domain violations. I thought 0/0 = 0 has been removed a few versions ago. Does numpy still have silent casting of nan to 0 in ints. I thought invalid and divide error/warnings are for floating point when we want to signal that the outcome is nan or inf, not that we are casting and return a result that is just wrong. Josef > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From robert.kern at gmail.com Tue Feb 18 09:42:48 2014 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 18 Feb 2014 14:42:48 +0000 Subject: [Numpy-discussion] Proposal to make power return float, and other such things. In-Reply-To: References: <363066132414409842.296834sturla.molden-gmail.com@news.gmane.org> <1664835161414414945.367849sturla.molden-gmail.com@news.gmane.org> <1314443430414416459.112438sturla.molden-gmail.com@news.gmane.org> Message-ID: On Tue, Feb 18, 2014 at 2:37 PM, wrote: > On Tue, Feb 18, 2014 at 8:53 AM, Robert Kern wrote: >> On Tue, Feb 18, 2014 at 1:46 PM, Nathaniel Smith wrote: >>> On 18 Feb 2014 07:07, "Robert Kern" wrote: >>>> >>>> On Tue, Feb 18, 2014 at 12:00 PM, Nathaniel Smith wrote: >>>> > Perhaps integer power should raise an error on negative powers? That way >>>> > people will at least be directed to use arr ** -1.0 instead of silently >>>> > getting nonsense from arr ** -1. >>>> >>>> Controllable by np.seterr(invalid=...)? I could get behind that. >>> >>> I'm not sure the np.seterr part would work or be a good idea, given that we >>> have no way to return or propagate NaN... I vote for just an unconditional >>> error. >> >> We issue configurable warning/error/ignore behavior for >> integer 0/0 through this mechanism too without any NaNs. However, >> that's `divide` and not `invalid`. Your point is taken that `invalid` >> usually implies that a `NaN` is generated, though I don't think this >> is ever stated anywhere. I just suggested `invalid` as that is usually >> what we use for function domain violations. > > I thought 0/0 = 0 has been removed a few versions ago. Does numpy > still have silent casting of nan to 0 in ints. There is no casting involved. > I thought invalid and divide error/warnings are for floating point > when we want to signal that the outcome is nan or inf, not that we are > casting and return a result that is just wrong. No, they are also for some integer operations without going through a floating point intermediate. -- Robert Kern From charlesr.harris at gmail.com Tue Feb 18 09:51:37 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 18 Feb 2014 07:51:37 -0700 Subject: [Numpy-discussion] Numpy type inheritance Message-ID: Hi All, I'm looking for some expert explication of the problems in issues #1398and #1397 . The segfault in the second looks nasty. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue Feb 18 09:54:13 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 18 Feb 2014 09:54:13 -0500 Subject: [Numpy-discussion] Proposal to make power return float, and other such things. In-Reply-To: References: <363066132414409842.296834sturla.molden-gmail.com@news.gmane.org> <1664835161414414945.367849sturla.molden-gmail.com@news.gmane.org> <1314443430414416459.112438sturla.molden-gmail.com@news.gmane.org> Message-ID: On Tue, Feb 18, 2014 at 9:42 AM, Robert Kern wrote: > On Tue, Feb 18, 2014 at 2:37 PM, wrote: >> On Tue, Feb 18, 2014 at 8:53 AM, Robert Kern wrote: >>> On Tue, Feb 18, 2014 at 1:46 PM, Nathaniel Smith wrote: >>>> On 18 Feb 2014 07:07, "Robert Kern" wrote: >>>>> >>>>> On Tue, Feb 18, 2014 at 12:00 PM, Nathaniel Smith wrote: >>>>> > Perhaps integer power should raise an error on negative powers? That way >>>>> > people will at least be directed to use arr ** -1.0 instead of silently >>>>> > getting nonsense from arr ** -1. >>>>> >>>>> Controllable by np.seterr(invalid=...)? I could get behind that. >>>> >>>> I'm not sure the np.seterr part would work or be a good idea, given that we >>>> have no way to return or propagate NaN... I vote for just an unconditional >>>> error. >>> >>> We issue configurable warning/error/ignore behavior for >>> integer 0/0 through this mechanism too without any NaNs. However, >>> that's `divide` and not `invalid`. Your point is taken that `invalid` >>> usually implies that a `NaN` is generated, though I don't think this >>> is ever stated anywhere. I just suggested `invalid` as that is usually >>> what we use for function domain violations. >> >> I thought 0/0 = 0 has been removed a few versions ago. Does numpy >> still have silent casting of nan to 0 in ints. > > There is no casting involved. numpy still creates a return dtype, that holds the wrong result. > >> I thought invalid and divide error/warnings are for floating point >> when we want to signal that the outcome is nan or inf, not that we are >> casting and return a result that is just wrong. > > No, they are also for some integer operations without going through a > floating point intermediate. good that I don't like integers, I didn't see a bug because of this in years. at least for division, we can go into the __future__ with python 3.3: >>> np.array(0) ** np.array(-1) -2147483648 >>> np.divide(np.array(0), np.array(0)) nan >>> np.array(0) / np.array(0) nan Josef > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From nouiz at nouiz.org Tue Feb 18 09:56:07 2014 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Tue, 18 Feb 2014 09:56:07 -0500 Subject: [Numpy-discussion] Suggestion: Port Theano RNG implementation to NumPy Message-ID: Hi, In a ticket I did a coment and Charles suggested that I post it here: In Theano we have an C implementation of a faster RNG: MRG31k3p. It is faster on CPU, and we have a GPU implementation. It would be relatively easy to parallize on the CPU with OpenMP. If someone is interested to port this to numpy, their wouldn't be any dependency problem. No license problem as Theano license have the same license as NumPy. The speed difference is significant, but I don't recall numbers. Fred From jtaylor.debian at googlemail.com Tue Feb 18 10:21:44 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 18 Feb 2014 16:21:44 +0100 Subject: [Numpy-discussion] allocated memory cache for numpy In-Reply-To: References: <530263AC.4030304@googlemail.com> <1454055383414360591.251341sturla.molden-gmail.com@news.gmane.org> Message-ID: On Mon, Feb 17, 2014 at 9:42 PM, Nathaniel Smith wrote: > On 17 Feb 2014 15:17, "Sturla Molden" wrote: >> >> Julian Taylor wrote: >> >> > When an array is created it tries to get its memory from the cache and >> > when its deallocated it returns it to the cache. >> ... > > Another optimization we should consider that might help a lot in the same > situations where this would help: for code called from the cpython eval > loop, it's afaict possible to determine which inputs are temporaries by > checking their refcnt. In the second call to __add__ in '(a + b) + c', the > temporary will have refcnt 1, while the other arrays will all have refcnt >>1. In such cases (subject to various sanity checks on shape, dtype, etc) we > could elide temporaries by reusing the input array for the output. The risk > is that there may be some code out there that calls these operations > directly from C with non-temp arrays that nonetheless have refcnt 1, but we > should at least investigate the feasibility. E.g. maybe we can do the > optimization for tp_add but not PyArray_Add. > this seems to be a really good idea, I experimented a bit and it solves the temporary problem for this types of arithmetic nicely. Its simple to implement, just change to inplace in array_{add,sub,mul,div} handlers for the python slots. Doing so does not fail numpy, scipy and pandas testsuite so it seems save. Performance wise, besides the simple page zeroing limited benchmarks (a+b+c), it also it brings the laplace out of place benchmark to the same speed as the inplace benchmark [0]. This is very nice as the inplace variant is significantly harder to read. Does anyone see any issue we might be overlooking in this refcount == 1 optimization for the python api? I'll post a PR with the change shortly. Regardless of this change, caching memory blocks might still be worthwhile for fancy indexing and other operations which require allocations. [0] http://yarikoptic.github.io/numpy-vbench/vb_vb_app.html#laplace-normal From njs at pobox.com Tue Feb 18 10:35:09 2014 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 18 Feb 2014 10:35:09 -0500 Subject: [Numpy-discussion] allocated memory cache for numpy In-Reply-To: References: <530263AC.4030304@googlemail.com> <1454055383414360591.251341sturla.molden-gmail.com@news.gmane.org> Message-ID: On 18 Feb 2014 10:21, "Julian Taylor" wrote: > > On Mon, Feb 17, 2014 at 9:42 PM, Nathaniel Smith wrote: > > On 17 Feb 2014 15:17, "Sturla Molden" wrote: > >> > >> Julian Taylor wrote: > >> > >> > When an array is created it tries to get its memory from the cache and > >> > when its deallocated it returns it to the cache. > >> > ... > > > > Another optimization we should consider that might help a lot in the same > > situations where this would help: for code called from the cpython eval > > loop, it's afaict possible to determine which inputs are temporaries by > > checking their refcnt. In the second call to __add__ in '(a + b) + c', the > > temporary will have refcnt 1, while the other arrays will all have refcnt > >>1. In such cases (subject to various sanity checks on shape, dtype, etc) we > > could elide temporaries by reusing the input array for the output. The risk > > is that there may be some code out there that calls these operations > > directly from C with non-temp arrays that nonetheless have refcnt 1, but we > > should at least investigate the feasibility. E.g. maybe we can do the > > optimization for tp_add but not PyArray_Add. > > > > this seems to be a really good idea, I experimented a bit and it > solves the temporary problem for this types of arithmetic nicely. > Its simple to implement, just change to inplace in > array_{add,sub,mul,div} handlers for the python slots. Doing so does > not fail numpy, scipy and pandas testsuite so it seems save. > Performance wise, besides the simple page zeroing limited benchmarks > (a+b+c), it also it brings the laplace out of place benchmark to the > same speed as the inplace benchmark [0]. This is very nice as the > inplace variant is significantly harder to read. Sweet. > Does anyone see any issue we might be overlooking in this refcount == > 1 optimization for the python api? > I'll post a PR with the change shortly. It occurs belatedly that Cython code like a = np.arange(10) b = np.arange(10) c = a + b might end up calling tp_add with refcnt 1 arrays. Ditto for same with cdef np.ndarray or cdef object added. We should check... -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Tue Feb 18 10:50:04 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 18 Feb 2014 15:50:04 +0000 (UTC) Subject: [Numpy-discussion] Suggestion: Port Theano RNG implementation to NumPy References: Message-ID: <2059992641414429782.340690sturla.molden-gmail.com@news.gmane.org> AFAIK, CMRG (MRG31k3p) is more equidistributed than Mersenne Twister, but the period is much shorter. However, MT is getting acceptance as the PRNG of choice for numerical work. And when we are doing stochastic simulations in Python, the speed of the PRNG is unlikely to be the bottleneck. Sturla Fr?d?ric Bastien wrote: > Hi, > > In a ticket I did a coment and Charles suggested that I post it here: > > In Theano we have an C implementation of a faster RNG: MRG31k3p. It is > faster on CPU, and we have a GPU implementation. It would be > relatively easy to parallize on the CPU with OpenMP. > > If someone is interested to port this to numpy, their wouldn't be any > dependency problem. No license problem as Theano license have the same > license as NumPy. > > The speed difference is significant, but I don't recall numbers. > > Fred From matthieu.brucher at gmail.com Tue Feb 18 10:56:40 2014 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 18 Feb 2014 15:56:40 +0000 Subject: [Numpy-discussion] Suggestion: Port Theano RNG implementation to NumPy In-Reply-To: <2059992641414429782.340690sturla.molden-gmail.com@news.gmane.org> References: <2059992641414429782.340690sturla.molden-gmail.com@news.gmane.org> Message-ID: Hi, The main issue with PRNG and MT is that you don't know how to initialize all MT generators properly. A hash-based PRNG is much more efficient in that regard (see Random123 for a more detailed explanation). >From what I heard, if MT is indeed chosen for RNG in numerical world, in parallel world, it is not as obvious because of this pitfall. Cheers, Matthieu 2014-02-18 15:50 GMT+00:00 Sturla Molden : > AFAIK, CMRG (MRG31k3p) is more equidistributed than Mersenne Twister, but > the period is much shorter. However, MT is getting acceptance as the PRNG > of choice for numerical work. And when we are doing stochastic simulations > in Python, the speed of the PRNG is unlikely to be the bottleneck. > > Sturla > > > Fr?d?ric Bastien wrote: >> Hi, >> >> In a ticket I did a coment and Charles suggested that I post it here: >> >> In Theano we have an C implementation of a faster RNG: MRG31k3p. It is >> faster on CPU, and we have a GPU implementation. It would be >> relatively easy to parallize on the CPU with OpenMP. >> >> If someone is interested to port this to numpy, their wouldn't be any >> dependency problem. No license problem as Theano license have the same >> license as NumPy. >> >> The speed difference is significant, but I don't recall numbers. >> >> Fred > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/ From sturla.molden at gmail.com Tue Feb 18 10:58:45 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 18 Feb 2014 15:58:45 +0000 (UTC) Subject: [Numpy-discussion] allocated memory cache for numpy References: <530263AC.4030304@googlemail.com> <1454055383414360591.251341sturla.molden-gmail.com@news.gmane.org> Message-ID: <662189258414431804.957982sturla.molden-gmail.com@news.gmane.org> I am cross-posting this to Cython user group to make sure they see this. Sturla Nathaniel Smith wrote: > On 18 Feb 2014 10:21, "Julian Taylor" wrote: > > On Mon, Feb 17, 2014 at 9:42 PM, Nathaniel Smith wrote: > On 17 Feb 2014 15:17, "Sturla Molden" wrote: > > Julian Taylor wrote: > > When an array is created it tries to get its memory from the cache > > and > > when its deallocated it returns it to the cache. > > ... > > Another optimization we should consider that might help a lot in the > > same > > situations where this would help: for code called from the cpython eval > loop, it's afaict possible to determine which inputs are temporaries by > checking their refcnt. In the second call to __add__ in '(a + b) + c', > > the > > temporary will have refcnt 1, while the other arrays will all have > > refcnt > > 1. In such cases (subject to various sanity checks on shape, dtype, > > etc) we > > could elide temporaries by reusing the input array for the output. The > > risk > > is that there may be some code out there that calls these operations > directly from C with non-temp arrays that nonetheless have refcnt 1, > > but we > > should at least investigate the feasibility. E.g. maybe we can do the > optimization for tp_add but not PyArray_Add. > > this seems to be a really good idea, I experimented a bit and it solves > the temporary problem for this types of arithmetic nicely. Its simple to > implement, just change to inplace in array_{add,sub,mul,div} handlers for > the python slots. Doing so does not fail numpy, scipy and pandas > testsuite so it seems save. Performance wise, besides the simple page > zeroing limited benchmarks (a+b+c), it also it brings the laplace out of > place benchmark to the same speed as the inplace benchmark [0]. This is > very nice as the inplace variant is significantly harder to read. > > Sweet. > > Does anyone see any issue we might be overlooking in this refcount == 1 > optimization for the python api? I'll post a PR with the change shortly. > > It occurs belatedly that Cython code like a = np.arange(10) > b = np.arange(10) > c = a + b might end up calling tp_add with refcnt 1 arrays. Ditto for > same with cdef np.ndarray or cdef object added. We should check... > > -n > > _______________________________________________ NumPy-Discussion mailing list > NumPy-Discussion at scipy.org href="http://mail.scipy.org/mailman/listinfo/numpy-discussion">http://mail.scipy.org/mailman/listinfo/numpy-discussion From nouiz at nouiz.org Tue Feb 18 11:00:56 2014 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Tue, 18 Feb 2014 11:00:56 -0500 Subject: [Numpy-discussion] Suggestion: Port Theano RNG implementation to NumPy In-Reply-To: References: <2059992641414429782.340690sturla.molden-gmail.com@news.gmane.org> Message-ID: I won't go in the discussion of which RNG is better for some problems. I'll just tell why we pick this one. We needed a parallel RNG and we wanted to use the same RNG on CPU and on GPU. We discussed with a professor in our department that is well know in that field(Pierre L'Ecuyer) and he recommanded this one for our problem. For the GPU, we don't want an rng that have too much register too. Robert K. commented that this would need refactoring of numpy.random and then it would be easy to have many rng. Fred On Tue, Feb 18, 2014 at 10:56 AM, Matthieu Brucher wrote: > Hi, > > The main issue with PRNG and MT is that you don't know how to > initialize all MT generators properly. A hash-based PRNG is much more > efficient in that regard (see Random123 for a more detailed > explanation). > >From what I heard, if MT is indeed chosen for RNG in numerical world, > in parallel world, it is not as obvious because of this pitfall. > > Cheers, > > Matthieu > > 2014-02-18 15:50 GMT+00:00 Sturla Molden : >> AFAIK, CMRG (MRG31k3p) is more equidistributed than Mersenne Twister, but >> the period is much shorter. However, MT is getting acceptance as the PRNG >> of choice for numerical work. And when we are doing stochastic simulations >> in Python, the speed of the PRNG is unlikely to be the bottleneck. >> >> Sturla >> >> >> Fr?d?ric Bastien wrote: >>> Hi, >>> >>> In a ticket I did a coment and Charles suggested that I post it here: >>> >>> In Theano we have an C implementation of a faster RNG: MRG31k3p. It is >>> faster on CPU, and we have a GPU implementation. It would be >>> relatively easy to parallize on the CPU with OpenMP. >>> >>> If someone is interested to port this to numpy, their wouldn't be any >>> dependency problem. No license problem as Theano license have the same >>> license as NumPy. >>> >>> The speed difference is significant, but I don't recall numbers. >>> >>> Fred >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > -- > Information System Engineer, Ph.D. > Blog: http://matt.eifelle.com > LinkedIn: http://www.linkedin.com/in/matthieubrucher > Music band: http://liliejay.com/ > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Tue Feb 18 11:05:30 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 18 Feb 2014 09:05:30 -0700 Subject: [Numpy-discussion] New (old) function proposal. Message-ID: Hi All, There is an old ticket, #1499 , that suggest adding a segment_axis function. def segment_axis(a, length, overlap=0, axis=None, end='cut', endvalue=0): """Generate a new array that chops the given array along the given axis into overlapping frames. Parameters ---------- a : array-like The array to segment length : int The length of each frame overlap : int, optional The number of array elements by which the frames should overlap axis : int, optional The axis to operate on; if None, act on the flattened array end : {'cut', 'wrap', 'end'}, optional What to do with the last frame, if the array is not evenly divisible into pieces. - 'cut' Simply discard the extra values - 'wrap' Copy values from the beginning of the array - 'pad' Pad with a constant value endvalue : object The value to use for end='pad' Examples -------- >>> segment_axis(arange(10), 4, 2) array([[0, 1, 2, 3], [2, 3, 4, 5], [4, 5, 6, 7], [6, 7, 8, 9]]) Is there and interest in having this function available? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue Feb 18 11:09:33 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 18 Feb 2014 17:09:33 +0100 Subject: [Numpy-discussion] Rethinking multiple dimensional indexing with sequences? Message-ID: <1392739773.11102.10.camel@sebastian-t440> Hey all, currently in numpy this is possible: a = np.zeros((5, 5)) a[[0, slice(None, None)]] #this behaviour has its quirks, since the "correct" way is: a[(0, slice(None, None))] # or identically a[0, :] The problem with using an arbitrary sequence is, that an arbitrary sequence is also typically an "array like" so there is a lot of guessing involved: a[[0, slice(None, None)]] == a[(0, slice(None, None))] # but: a[[0, 1]] == a[np.array([0, 1])] Now also NumPy commonly uses lists here to build up indexing tuples (since they are mutable), however would it really be so bad if we had to do `arr[tuple(slice_list)]` in the end to resolve this issue? So the proposal would be to deprecate anything but (base class) tuples, or maybe at least only allow this weird logic for lists and not all sequences. I do not believe we can find a logic to decide what to do which will not be broken in some way... PS: The code implementing the "advanced index or nd-index" logic is here: https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/mapping.c#L196 - Sebastian Another confusing example: In [9]: a = np.arange(10) In [10]: a[[(0, 1), (2, 3)] * 17] # a[np.array([(0, 1), (2, 3)] * 17)] Out[10]: array([[0, 1], [2, 3]]) In [11]: a[[(0, 1), (2, 3)]] # a[np.array([0, 1]), np.array([2, 3])] --------------------------------------------------------------------------- IndexError Traceback (most recent call last) in () ----> 1 a[[(0, 1), (2, 3)]] IndexError: too many indices for array From ben.root at ou.edu Tue Feb 18 11:12:30 2014 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 18 Feb 2014 11:12:30 -0500 Subject: [Numpy-discussion] New (old) function proposal. In-Reply-To: References: Message-ID: yes, but I don't like the name too much. Unfortunately, I can't think of a better one. Ben Root On Tue, Feb 18, 2014 at 11:05 AM, Charles R Harris < charlesr.harris at gmail.com> wrote: > Hi All, > > There is an old ticket, #1499 , > that suggest adding a segment_axis function. > > def segment_axis(a, length, overlap=0, axis=None, end='cut', endvalue=0): > """Generate a new array that chops the given array along the given axis > into overlapping frames. > > Parameters > ---------- > a : array-like > The array to segment > length : int > The length of each frame > overlap : int, optional > The number of array elements by which the frames should overlap > axis : int, optional > The axis to operate on; if None, act on the flattened array > end : {'cut', 'wrap', 'end'}, optional > What to do with the last frame, if the array is not evenly > divisible into pieces. > > - 'cut' Simply discard the extra values > - 'wrap' Copy values from the beginning of the array > - 'pad' Pad with a constant value > > endvalue : object > The value to use for end='pad' > > > Examples > -------- > >>> segment_axis(arange(10), 4, 2) > array([[0, 1, 2, 3], > [2, 3, 4, 5], > [4, 5, 6, 7], > [6, 7, 8, 9]]) > > > Is there and interest in having this function available? > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthieu.brucher at gmail.com Tue Feb 18 11:23:18 2014 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 18 Feb 2014 16:23:18 +0000 Subject: [Numpy-discussion] Suggestion: Port Theano RNG implementation to NumPy In-Reply-To: References: <2059992641414429782.340690sturla.molden-gmail.com@news.gmane.org> Message-ID: I won't dive into the discussion as well, except to say that parallel RNGs have to have specific characteristics, mainly to initialize many RNGs at the same time. I don't know how MRG31k3p handles this, as the publications was not very clear on this aspect. I guess it falls down as the other from that "time" (http://dl.acm.org/citation.cfm?id=1276928). BTW, Random123 works also on GPU and can use intrinsics to be also faster than usual congruent RNGs (see http://www.thesalmons.org/john/random123/papers/random123sc11.pdf table 2). Matthieu 2014-02-18 16:00 GMT+00:00 Fr?d?ric Bastien : > I won't go in the discussion of which RNG is better for some problems. > I'll just tell why we pick this one. > > We needed a parallel RNG and we wanted to use the same RNG on CPU and > on GPU. We discussed with a professor in our department that is well > know in that field(Pierre L'Ecuyer) and he recommanded this one for > our problem. For the GPU, we don't want an rng that have too much > register too. > > Robert K. commented that this would need refactoring of numpy.random > and then it would be easy to have many rng. > > Fred > > On Tue, Feb 18, 2014 at 10:56 AM, Matthieu Brucher > wrote: >> Hi, >> >> The main issue with PRNG and MT is that you don't know how to >> initialize all MT generators properly. A hash-based PRNG is much more >> efficient in that regard (see Random123 for a more detailed >> explanation). >> >From what I heard, if MT is indeed chosen for RNG in numerical world, >> in parallel world, it is not as obvious because of this pitfall. >> >> Cheers, >> >> Matthieu >> >> 2014-02-18 15:50 GMT+00:00 Sturla Molden : >>> AFAIK, CMRG (MRG31k3p) is more equidistributed than Mersenne Twister, but >>> the period is much shorter. However, MT is getting acceptance as the PRNG >>> of choice for numerical work. And when we are doing stochastic simulations >>> in Python, the speed of the PRNG is unlikely to be the bottleneck. >>> >>> Sturla >>> >>> >>> Fr?d?ric Bastien wrote: >>>> Hi, >>>> >>>> In a ticket I did a coment and Charles suggested that I post it here: >>>> >>>> In Theano we have an C implementation of a faster RNG: MRG31k3p. It is >>>> faster on CPU, and we have a GPU implementation. It would be >>>> relatively easy to parallize on the CPU with OpenMP. >>>> >>>> If someone is interested to port this to numpy, their wouldn't be any >>>> dependency problem. No license problem as Theano license have the same >>>> license as NumPy. >>>> >>>> The speed difference is significant, but I don't recall numbers. >>>> >>>> Fred >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> -- >> Information System Engineer, Ph.D. >> Blog: http://matt.eifelle.com >> LinkedIn: http://www.linkedin.com/in/matthieubrucher >> Music band: http://liliejay.com/ >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher Music band: http://liliejay.com/ From sebastian at sipsolutions.net Tue Feb 18 11:24:47 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 18 Feb 2014 17:24:47 +0100 Subject: [Numpy-discussion] Rethinking multiple dimensional indexing with sequences? In-Reply-To: <1392739773.11102.10.camel@sebastian-t440> References: <1392739773.11102.10.camel@sebastian-t440> Message-ID: <1392740687.11102.12.camel@sebastian-t440> On Di, 2014-02-18 at 17:09 +0100, Sebastian Berg wrote: > Hey all, > > Now also NumPy commonly uses lists here to build up indexing tuples > (since they are mutable), however would it really be so bad if we had to > do `arr[tuple(slice_list)]` in the end to resolve this issue? So the > proposal would be to deprecate anything but (base class) tuples, or > maybe at least only allow this weird logic for lists and not all > sequences. I do not believe we can find a logic to decide what to do > which will not be broken in some way... > You may wonder why I would suddenly care. The reason is that array-like's such as pandas types should behave like arrays in indexing. With this logic in place for arbitrary sequences, these might (even suddenly!) switch to the nd-index behaviour. > PS: The code implementing the "advanced index or nd-index" logic is > here: > https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/mapping.c#L196 > > - Sebastian > > > Another confusing example: > > In [9]: a = np.arange(10) > > In [10]: a[[(0, 1), (2, 3)] * 17] # a[np.array([(0, 1), (2, 3)] * 17)] > Out[10]: > array([[0, 1], > > [2, 3]]) > > In [11]: a[[(0, 1), (2, 3)]] # a[np.array([0, 1]), np.array([2, 3])] > --------------------------------------------------------------------------- > IndexError Traceback (most recent call > last) > in () > ----> 1 a[[(0, 1), (2, 3)]] > > IndexError: too many indices for array > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sebastian at sipsolutions.net Tue Feb 18 11:27:26 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Tue, 18 Feb 2014 17:27:26 +0100 Subject: [Numpy-discussion] New (old) function proposal. In-Reply-To: References: Message-ID: <1392740846.11102.14.camel@sebastian-t440> On Di, 2014-02-18 at 09:05 -0700, Charles R Harris wrote: > Hi All, > > > There is an old ticket, #1499, that suggest adding a segment_axis > function. > > def segment_axis(a, length, overlap=0, axis=None, end='cut', endvalue=0): > """Generate a new array that chops the given array along the given axis > into overlapping frames. > > Parameters > ---------- > a : array-like > The array to segment > length : int > The length of each frame > overlap : int, optional > The number of array elements by which the frames should overlap > axis : int, optional > The axis to operate on; if None, act on the flattened array > end : {'cut', 'wrap', 'end'}, optional > What to do with the last frame, if the array is not evenly > divisible into pieces. > > - 'cut' Simply discard the extra values > - 'wrap' Copy values from the beginning of the array > - 'pad' Pad with a constant value > > endvalue : object > The value to use for end='pad' > > > Examples > -------- > >>> segment_axis(arange(10), 4, 2) > array([[0, 1, 2, 3], > [2, 3, 4, 5], > [4, 5, 6, 7], > [6, 7, 8, 9]]) > > > Is there and interest in having this function available? > Just to note, there have been similar proposals with a rolling_window function. It could be made ND aware, too (though maybe this one is also). > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From njs at pobox.com Tue Feb 18 11:40:49 2014 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 18 Feb 2014 11:40:49 -0500 Subject: [Numpy-discussion] New (old) function proposal. In-Reply-To: References: Message-ID: On 18 Feb 2014 11:05, "Charles R Harris" wrote: > > Hi All, > > There is an old ticket, #1499, that suggest adding a segment_axis function. > > def segment_axis(a, length, overlap=0, axis=None, end='cut', endvalue=0): > """Generate a new array that chops the given array along the given axis > into overlapping frames. > > Parameters > ---------- > a : array-like > The array to segment > length : int > The length of each frame > overlap : int, optional > The number of array elements by which the frames should overlap > axis : int, optional > The axis to operate on; if None, act on the flattened array > end : {'cut', 'wrap', 'end'}, optional > What to do with the last frame, if the array is not evenly > divisible into pieces. > > - 'cut' Simply discard the extra values > - 'wrap' Copy values from the beginning of the array > - 'pad' Pad with a constant value > > endvalue : object > The value to use for end='pad' > > > Examples > -------- > >>> segment_axis(arange(10), 4, 2) > array([[0, 1, 2, 3], > [2, 3, 4, 5], > [4, 5, 6, 7], > [6, 7, 8, 9]]) > > > Is there and interest in having this function available? I'd use it, though haven't looked at the details of this api per set yet. rolling_window or shingle are better names. It should probably be documented and implemented to return a view when possible (using stride tricks). Along with a note that whether this is possible depends heavily on 32- vs. 64-bitness. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Tue Feb 18 11:47:36 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 18 Feb 2014 16:47:36 +0000 (UTC) Subject: [Numpy-discussion] Suggestion: Port Theano RNG implementation to NumPy References: <2059992641414429782.340690sturla.molden-gmail.com@news.gmane.org> Message-ID: <668317748414434647.878759sturla.molden-gmail.com@news.gmane.org> Matthieu Brucher wrote: > Hi, > > The main issue with PRNG and MT is that you don't know how to > initialize all MT generators properly. A hash-based PRNG is much more > efficient in that regard (see Random123 for a more detailed > explanation). >> From what I heard, if MT is indeed chosen for RNG in numerical world, > in parallel world, it is not as obvious because of this pitfall. > It is possible to solve this by using a set of independent MT generators, one per thread. Independence in this case means that the characteristic polynomials are relatively prime to each other: http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dgene.pdf Undortunately the DCMT code was LGPL, not BSD, I don't know if this has changed. Sturla From lists at hilboll.de Tue Feb 18 11:54:14 2014 From: lists at hilboll.de (Andreas Hilboll) Date: Tue, 18 Feb 2014 17:54:14 +0100 Subject: [Numpy-discussion] Suggestion: Port Theano RNG implementation to NumPy In-Reply-To: <668317748414434647.878759sturla.molden-gmail.com@news.gmane.org> References: <2059992641414429782.340690sturla.molden-gmail.com@news.gmane.org> <668317748414434647.878759sturla.molden-gmail.com@news.gmane.org> Message-ID: <53039036.9040800@hilboll.de> On 18.02.2014 17:47, Sturla Molden wrote: > Matthieu Brucher wrote: >> Hi, >> >> The main issue with PRNG and MT is that you don't know how to >> initialize all MT generators properly. A hash-based PRNG is much more >> efficient in that regard (see Random123 for a more detailed >> explanation). >>> From what I heard, if MT is indeed chosen for RNG in numerical world, >> in parallel world, it is not as obvious because of this pitfall. >> > > It is possible to solve this by using a set of independent MT generators, > one per thread. Independence in this case means that the characteristic > polynomials are relatively prime to each other: > > http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dgene.pdf > > Undortunately the DCMT code was LGPL, not BSD, I don't know if this has > changed. I just checked. The file dcmt0.6.1b.tgz, available from http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dc.html, contains this license: ---8<------- Copyright (C) 2001-2009 Makoto Matsumoto and Takuji Nishimura. Copyright (C) 2009 Mutsuo Saito All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Cheers, Andreas. From charlesr.harris at gmail.com Tue Feb 18 12:03:56 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 18 Feb 2014 10:03:56 -0700 Subject: [Numpy-discussion] New (old) function proposal. In-Reply-To: References: Message-ID: On Tue, Feb 18, 2014 at 9:40 AM, Nathaniel Smith wrote: > On 18 Feb 2014 11:05, "Charles R Harris" > wrote: > > > > Hi All, > > > > There is an old ticket, #1499, that suggest adding a segment_axis > function. > > > > def segment_axis(a, length, overlap=0, axis=None, end='cut', endvalue=0): > > """Generate a new array that chops the given array along the given > axis > > into overlapping frames. > > > > Parameters > > ---------- > > a : array-like > > The array to segment > > length : int > > The length of each frame > > overlap : int, optional > > The number of array elements by which the frames should overlap > > axis : int, optional > > The axis to operate on; if None, act on the flattened array > > end : {'cut', 'wrap', 'end'}, optional > > What to do with the last frame, if the array is not evenly > > divisible into pieces. > > > > - 'cut' Simply discard the extra values > > - 'wrap' Copy values from the beginning of the array > > - 'pad' Pad with a constant value > > > > endvalue : object > > The value to use for end='pad' > > > > > > Examples > > -------- > > >>> segment_axis(arange(10), 4, 2) > > array([[0, 1, 2, 3], > > [2, 3, 4, 5], > > [4, 5, 6, 7], > > [6, 7, 8, 9]]) > > > > > > Is there and interest in having this function available? > > I'd use it, though haven't looked at the details of this api per set yet. > > rolling_window or shingle are better names. > > It should probably be documented and implemented to return a view when > possible (using stride tricks). Along with a note that whether this is > possible depends heavily on 32- vs. 64-bitness. > I believe it does return views when possible. There are two patches attached to the issue, one for the function and another for tests. So here is an easy commit for someone ;) The original author seems to be Anne Archibald, who should be mentioned if this is put in. Where does 'shingle' come from. I can see the analogy but haven't seen that as a technical term. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Tue Feb 18 12:07:32 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Tue, 18 Feb 2014 17:07:32 +0000 (UTC) Subject: [Numpy-discussion] Suggestion: Port Theano RNG implementation to NumPy References: <2059992641414429782.340690sturla.molden-gmail.com@news.gmane.org> <668317748414434647.878759sturla.molden-gmail.com@news.gmane.org> <53039036.9040800@hilboll.de> Message-ID: <262219091414435588.384986sturla.molden-gmail.com@news.gmane.org> Andreas Hilboll wrote: > On 18.02.2014 17:47, Sturla Molden wrote: >> Undortunately the DCMT code was LGPL, not BSD, I don't know if this has >> changed. > > I just checked. The file dcmt0.6.1b.tgz, available from > http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/DC/dc.html, contains > this license: Fantastic :-) For speed we could just pre-compute parameters for a set of N generators, up to some reasonable number. (By the way, this generator is also used by NVidia, so it can be used on a GPU.) Sturla From jaime.frio at gmail.com Tue Feb 18 12:11:23 2014 From: jaime.frio at gmail.com (=?ISO-8859-1?Q?Jaime_Fern=E1ndez_del_R=EDo?=) Date: Tue, 18 Feb 2014 09:11:23 -0800 Subject: [Numpy-discussion] New (old) function proposal. In-Reply-To: References: Message-ID: On Tue, Feb 18, 2014 at 9:03 AM, Charles R Harris wrote: > > > > On Tue, Feb 18, 2014 at 9:40 AM, Nathaniel Smith wrote: > >> On 18 Feb 2014 11:05, "Charles R Harris" >> wrote: >> > >> > Hi All, >> > >> > There is an old ticket, #1499, that suggest adding a segment_axis >> function. >> > >> > def segment_axis(a, length, overlap=0, axis=None, end='cut', >> endvalue=0): >> > """Generate a new array that chops the given array along the given >> axis >> > into overlapping frames. >> > >> > Parameters >> > ---------- >> > a : array-like >> > The array to segment >> > length : int >> > The length of each frame >> > overlap : int, optional >> > The number of array elements by which the frames should overlap >> > axis : int, optional >> > The axis to operate on; if None, act on the flattened array >> > end : {'cut', 'wrap', 'end'}, optional >> > What to do with the last frame, if the array is not evenly >> > divisible into pieces. >> > >> > - 'cut' Simply discard the extra values >> > - 'wrap' Copy values from the beginning of the array >> > - 'pad' Pad with a constant value >> > >> > endvalue : object >> > The value to use for end='pad' >> > >> > >> > Examples >> > -------- >> > >>> segment_axis(arange(10), 4, 2) >> > array([[0, 1, 2, 3], >> > [2, 3, 4, 5], >> > [4, 5, 6, 7], >> > [6, 7, 8, 9]]) >> > >> > >> > Is there and interest in having this function available? >> >> I'd use it, though haven't looked at the details of this api per set yet. >> >> rolling_window or shingle are better names. >> >> It should probably be documented and implemented to return a view when >> possible (using stride tricks). Along with a note that whether this is >> possible depends heavily on 32- vs. 64-bitness. >> > > I believe it does return views when possible. There are two patches > attached to the issue, one for the function and another for tests. So here > is an easy commit for someone ;) The original author seems to be Anne > Archibald, who should be mentioned if this is put in. > > Where does 'shingle' come from. I can see the analogy but haven't seen > that as a technical term. > In an inkjet printing pipeline, one of the last steps is to split the image into the several passes that will be needed to physically print it. This is often done with a tiled, non-overlapping mask, known as a "shingling mask." -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Feb 18 13:20:10 2014 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 18 Feb 2014 13:20:10 -0500 Subject: [Numpy-discussion] New (old) function proposal. In-Reply-To: References: Message-ID: On 18 Feb 2014 12:04, "Charles R Harris" wrote: > Where does 'shingle' come from. I can see the analogy but haven't seen that as a technical term. It just seems like a good name :-). -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Tue Feb 18 14:19:04 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 18 Feb 2014 20:19:04 +0100 Subject: [Numpy-discussion] allocated memory cache for numpy In-Reply-To: References: <530263AC.4030304@googlemail.com> <1454055383414360591.251341sturla.molden-gmail.com@news.gmane.org> Message-ID: <5303B228.9010403@googlemail.com> On 18.02.2014 16:21, Julian Taylor wrote: > On Mon, Feb 17, 2014 at 9:42 PM, Nathaniel Smith wrote: >> On 17 Feb 2014 15:17, "Sturla Molden" wrote: >>> >>> Julian Taylor wrote: >>> >>>> When an array is created it tries to get its memory from the cache and >>>> when its deallocated it returns it to the cache. >>> > ... >> >> Another optimization we should consider that might help a lot in the same >> situations where this would help: for code called from the cpython eval >> loop, it's afaict possible to determine which inputs are temporaries by >> checking their refcnt. In the second call to __add__ in '(a + b) + c', the >> temporary will have refcnt 1, while the other arrays will all have refcnt >>> 1. In such cases (subject to various sanity checks on shape, dtype, etc) we >> could elide temporaries by reusing the input array for the output. The risk >> is that there may be some code out there that calls these operations >> directly from C with non-temp arrays that nonetheless have refcnt 1, but we >> should at least investigate the feasibility. E.g. maybe we can do the >> optimization for tp_add but not PyArray_Add. >> > > this seems to be a really good idea, I experimented a bit and it > solves the temporary problem for this types of arithmetic nicely. > Its simple to implement, just change to inplace in > array_{add,sub,mul,div} handlers for the python slots. Doing so does > not fail numpy, scipy and pandas testsuite so it seems save. > Performance wise, besides the simple page zeroing limited benchmarks > (a+b+c), it also it brings the laplace out of place benchmark to the > same speed as the inplace benchmark [0]. This is very nice as the > inplace variant is significantly harder to read. > > Does anyone see any issue we might be overlooking in this refcount == > 1 optimization for the python api? > I'll post a PR with the change shortly. here is the PR: https://github.com/numpy/numpy/pull/4322 probably still lacking some checks, but I think it can be tested From tsyu80 at gmail.com Tue Feb 18 14:55:35 2014 From: tsyu80 at gmail.com (Tony Yu) Date: Tue, 18 Feb 2014 13:55:35 -0600 Subject: [Numpy-discussion] New (old) function proposal. In-Reply-To: References: Message-ID: On Tue, Feb 18, 2014 at 11:11 AM, Jaime Fern?ndez del R?o < jaime.frio at gmail.com> wrote: > > > > On Tue, Feb 18, 2014 at 9:03 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> >> On Tue, Feb 18, 2014 at 9:40 AM, Nathaniel Smith wrote: >> >>> On 18 Feb 2014 11:05, "Charles R Harris" >>> wrote: >>> > >>> > Hi All, >>> > >>> > There is an old ticket, #1499, that suggest adding a segment_axis >>> function. >>> > >>> > def segment_axis(a, length, overlap=0, axis=None, end='cut', >>> endvalue=0): >>> > """Generate a new array that chops the given array along the given >>> axis >>> > into overlapping frames. >>> > >>> > Parameters >>> > ---------- >>> > a : array-like >>> > The array to segment >>> > length : int >>> > The length of each frame >>> > overlap : int, optional >>> > The number of array elements by which the frames should overlap >>> > axis : int, optional >>> > The axis to operate on; if None, act on the flattened array >>> > end : {'cut', 'wrap', 'end'}, optional >>> > What to do with the last frame, if the array is not evenly >>> > divisible into pieces. >>> > >>> > - 'cut' Simply discard the extra values >>> > - 'wrap' Copy values from the beginning of the array >>> > - 'pad' Pad with a constant value >>> > >>> > endvalue : object >>> > The value to use for end='pad' >>> > >>> > >>> > Examples >>> > -------- >>> > >>> segment_axis(arange(10), 4, 2) >>> > array([[0, 1, 2, 3], >>> > [2, 3, 4, 5], >>> > [4, 5, 6, 7], >>> > [6, 7, 8, 9]]) >>> > >>> > >>> > Is there and interest in having this function available? >>> >>> I'd use it, though haven't looked at the details of this api per set yet. >>> >>> rolling_window or shingle are better names. >>> >>> It should probably be documented and implemented to return a view when >>> possible (using stride tricks). Along with a note that whether this is >>> possible depends heavily on 32- vs. 64-bitness. >>> >> >> I believe it does return views when possible. There are two patches >> attached to the issue, one for the function and another for tests. So here >> is an easy commit for someone ;) The original author seems to be Anne >> Archibald, who should be mentioned if this is put in. >> >> Where does 'shingle' come from. I can see the analogy but haven't seen >> that as a technical term. >> > > In an inkjet printing pipeline, one of the last steps is to split the > image into the several passes that will be needed to physically print it. > This is often done with a tiled, non-overlapping mask, known as a > "shingling mask." > > Just for reference, scikit-image has a similar function (w/o padding) called `view_as_blocks`: http://scikit-image.org/docs/0.9.x/api/skimage.util.html#view-as-blocks (and a rolling-window version called `view_as_windows`). Cheers, -Tony -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Tue Feb 18 15:59:13 2014 From: njs at pobox.com (Nathaniel Smith) Date: Tue, 18 Feb 2014 15:59:13 -0500 Subject: [Numpy-discussion] Rethinking multiple dimensional indexing with sequences? In-Reply-To: <1392739773.11102.10.camel@sebastian-t440> References: <1392739773.11102.10.camel@sebastian-t440> Message-ID: So to be clear - what's being suggested is that code like this will be deprecated in 1.9, and then in some future release break: slices = [] for i in ...: slices.append(make_slice(...)) subarray = arr[slices] Instead, you will have to do: subarray = arr[tuple(slices)] And the reason is that when we allow multi-dimensional indexes to be passed as lists instead of a tuple, numpy has no reliable way to tell what to do with something like arr[[0, 1]] Maybe it means arr[0, 1] Or maybe it means arr[np.asarray([0, 1])] Who knows? Right now we have some heuristics to guess based on what exact index objects are in there, but really making a guess at all is a pretty broken approach, and will be getting more broken as more non-ndarray array-like types come into common use -- in particular, the way things are right now, arr[pandas_series] will soon be (or is already) triggering this same guessing logic. So, any objections to requiring tuples here? -n On Tue, Feb 18, 2014 at 11:09 AM, Sebastian Berg wrote: > Hey all, > > currently in numpy this is possible: > > a = np.zeros((5, 5)) > a[[0, slice(None, None)]] > #this behaviour has its quirks, since the "correct" way is: > a[(0, slice(None, None))] # or identically a[0, :] > > The problem with using an arbitrary sequence is, that an arbitrary > sequence is also typically an "array like" so there is a lot of guessing > involved: > > a[[0, slice(None, None)]] == a[(0, slice(None, None))] > # but: > a[[0, 1]] == a[np.array([0, 1])] > > Now also NumPy commonly uses lists here to build up indexing tuples > (since they are mutable), however would it really be so bad if we had to > do `arr[tuple(slice_list)]` in the end to resolve this issue? So the > proposal would be to deprecate anything but (base class) tuples, or > maybe at least only allow this weird logic for lists and not all > sequences. I do not believe we can find a logic to decide what to do > which will not be broken in some way... > > PS: The code implementing the "advanced index or nd-index" logic is > here: > https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/mapping.c#L196 > > - Sebastian > > > Another confusing example: > > In [9]: a = np.arange(10) > > In [10]: a[[(0, 1), (2, 3)] * 17] # a[np.array([(0, 1), (2, 3)] * 17)] > Out[10]: > array([[0, 1], > > [2, 3]]) > > In [11]: a[[(0, 1), (2, 3)]] # a[np.array([0, 1]), np.array([2, 3])] > --------------------------------------------------------------------------- > IndexError Traceback (most recent call > last) > in () > ----> 1 a[[(0, 1), (2, 3)]] > > IndexError: too many indices for array > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From charlesr.harris at gmail.com Tue Feb 18 18:03:32 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 18 Feb 2014 16:03:32 -0700 Subject: [Numpy-discussion] Default dtype of genfromtxt Message-ID: This is apropos issue #1860 , where it is proposed that the default dtype of genfromtxt should be None rather than float. A decision is needed. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Tue Feb 18 18:29:59 2014 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 18 Feb 2014 18:29:59 -0500 Subject: [Numpy-discussion] Default dtype of genfromtxt In-Reply-To: References: Message-ID: I can certainly see the advantage of switching over to None. It makes a lot of sense. One thing that concerns me, though is a file full of whole number values. Would that get interpreted as integers? If so, then that affect needs to be aligned with some other proposed changes with respect to operations with integers such as "**". We might have some unintended consequences if we are not careful. Cheers! Ben Root On Tue, Feb 18, 2014 at 6:03 PM, Charles R Harris wrote: > This is apropos issue #1860 , > where it is proposed that the default dtype of genfromtxt should be None > rather than float. A decision is needed. > > Thoughts? > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndarray at mac.com Tue Feb 18 22:18:04 2014 From: ndarray at mac.com (Alexander Belopolsky) Date: Tue, 18 Feb 2014 22:18:04 -0500 Subject: [Numpy-discussion] except expression discussion on python-ideas Message-ID: I would like to invite numpy community to weigh in on the idea that is getting momentum at https://mail.python.org/pipermail/python-ideas/2014-February/025437.html The main motivation is to provide syntactic alternative to proliferation of default value options, so that x = getattr(u, 'answer', 42) can be written as x = y.answer except ... 42 For a dictionary d, x = d.get('answer', 42) can be written as x = d['answer'] except ... 42 For a list L, try: x = L[i] except IndexError: x= 42 can be written as x = L[i] except ... 42 The ellipsis in the above stands for syntax being debated. Effectively, Python is about to gain support for a new operator and operators are very precious for numpy. So, I think numpy community has a horse in that race. -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Feb 19 01:42:50 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 18 Feb 2014 23:42:50 -0700 Subject: [Numpy-discussion] Document server error. Message-ID: >From issue #1951 : The following URL shows an 500 internal server error: > http://docs.scipy.org/doc/numpy/reference/generated/numpy.var.html > Can someone with access to the server take a look? TIA, Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Wed Feb 19 08:50:56 2014 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 19 Feb 2014 13:50:56 +0000 Subject: [Numpy-discussion] except expression discussion on python-ideas In-Reply-To: References: Message-ID: On Wed, Feb 19, 2014 at 3:18 AM, Alexander Belopolsky wrote: > I would like to invite numpy community to weigh in on the idea that is > getting momentum at > > https://mail.python.org/pipermail/python-ideas/2014-February/025437.html > > The main motivation is to provide syntactic alternative to proliferation of > default value options, so that > > x = getattr(u, 'answer', 42) > > can be written as > > x = y.answer except ... 42 > Effectively, Python is about to gain support for a new operator and > operators are very precious for numpy. So, I think numpy community has a > horse in that race. It's control flow, not an operator. I haven't seen a proposal that would use any precious syntactic possibilities that numpy might want to use for an operator. -- Robert Kern From ben.root at ou.edu Wed Feb 19 09:25:18 2014 From: ben.root at ou.edu (Benjamin Root) Date: Wed, 19 Feb 2014 09:25:18 -0500 Subject: [Numpy-discussion] except expression discussion on python-ideas In-Reply-To: References: Message-ID: So, this is kind of like Perl's "unless"? On Tue, Feb 18, 2014 at 10:18 PM, Alexander Belopolsky wrote: > I would like to invite numpy community to weigh in on the idea that is > getting momentum at > > https://mail.python.org/pipermail/python-ideas/2014-February/025437.html > > The main motivation is to provide syntactic alternative to proliferation > of default value options, so that > > x = getattr(u, 'answer', 42) > > can be written as > > x = y.answer except ... 42 > > For a dictionary d, > > x = d.get('answer', 42) > > can be written as > > x = d['answer'] except ... 42 > > For a list L, > > try: > x = L[i] > except IndexError: > x= 42 > > can be written as > > x = L[i] except ... 42 > > > The ellipsis in the above stands for syntax being debated. > > Effectively, Python is about to gain support for a new operator and > operators are very precious for numpy. So, I think numpy community has a > horse in that race. > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From warren.weckesser at gmail.com Wed Feb 19 12:48:07 2014 From: warren.weckesser at gmail.com (Warren Weckesser) Date: Wed, 19 Feb 2014 12:48:07 -0500 Subject: [Numpy-discussion] New (old) function proposal. In-Reply-To: <1392740846.11102.14.camel@sebastian-t440> References: <1392740846.11102.14.camel@sebastian-t440> Message-ID: On Tue, Feb 18, 2014 at 11:27 AM, Sebastian Berg wrote: > On Di, 2014-02-18 at 09:05 -0700, Charles R Harris wrote: > > Hi All, > > > > > > There is an old ticket, #1499, that suggest adding a segment_axis > > function. > > > > def segment_axis(a, length, overlap=0, axis=None, end='cut', endvalue=0): > > """Generate a new array that chops the given array along the given > axis > > into overlapping frames. > > > > Parameters > > ---------- > > a : array-like > > The array to segment > > length : int > > The length of each frame > > overlap : int, optional > > The number of array elements by which the frames should overlap > > axis : int, optional > > The axis to operate on; if None, act on the flattened array > > end : {'cut', 'wrap', 'end'}, optional > > What to do with the last frame, if the array is not evenly > > divisible into pieces. > > > > - 'cut' Simply discard the extra values > > - 'wrap' Copy values from the beginning of the array > > - 'pad' Pad with a constant value > > > > endvalue : object > > The value to use for end='pad' > > > > > > Examples > > -------- > > >>> segment_axis(arange(10), 4, 2) > > array([[0, 1, 2, 3], > > [2, 3, 4, 5], > > [4, 5, 6, 7], > > [6, 7, 8, 9]]) > > > > > > Is there and interest in having this function available? > > > > Just to note, there have been similar proposals with a rolling_window > function. It could be made ND aware, too (though maybe this one is > also). > For example: https://github.com/numpy/numpy/pull/31 Warren > > > > Chuck > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jenny.stone125 at gmail.com Wed Feb 19 14:46:32 2014 From: jenny.stone125 at gmail.com (Jennifer stone) Date: Thu, 20 Feb 2014 01:16:32 +0530 Subject: [Numpy-discussion] Suggestions for GSoC Projects In-Reply-To: References: Message-ID: If you are interested in the hypergeometric numerical evaluation, it's > probably a good idea to take a look at this recent master's thesis > written on the problem: > > http://people.maths.ox.ac.uk/porterm/research/pearson_final.pdf > > The thesis is really comprehensive and detailed with quite convincing conclusions on the methods to be used with varying a,b,x (though I am yet to read the thesis properly enough understand and validate each of the multitude of the cases for the boundaries for the parameters). It seems to be an assuring and reliable walk through for the project. > This may give some systematic overview on the range of methods > available. (Note that for copyright reasons, it's not a good idea to > look closely at the source codes linked from that thesis, as they are > not available under a compatible license.) > > It may well be that the best approach for evaluating these functions, > if accuracy in the whole parameter range is wanted, in the end turns > out to require arbitrary-precision computations. In that case, it > would be a very good idea to look at how the problem is approached in > mpmath. There are existing multiprecision packages written in C, and > using one of them in scipy.special could bring better evaluation > performance even if the algorithm is the same. > Yeah, this seems to be brilliant idea. mpmath too, I assume, must have used some of the methods mentioned in the thesis. I ll look through the code and get back. I am still unaware of the complexity of project expected at GSoC. This project looks engaging to me. Will an attempt to improve both Spherical harmonic functions ( improving the present algorithm to avoid the calculation for lower n's and m's) and hypergeometric functions be too ambitious or is it doable? Regards Jennifer > -- > Pauli Virtanen > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jaime.frio at gmail.com Wed Feb 19 17:18:13 2014 From: jaime.frio at gmail.com (=?ISO-8859-1?Q?Jaime_Fern=E1ndez_del_R=EDo?=) Date: Wed, 19 Feb 2014 14:18:13 -0800 Subject: [Numpy-discussion] PR adding an axis argument to np.bincount Message-ID: I have just submitted a PR (https://github.com/numpy/numpy/pull/4330) adding an axis argument to bincount. It lets you do things that would have been hard before, but the UI when broadcasting arrays together and having an axis argument can get tricky, and there is no obvious example already in place to follow, so I'd like to get some feedback on my choices. *With no weights* When not using the 'weights' parameter, the counting is done over the axes passed in 'axis'. This defaults to all axes, i.e. the flattened array. The output will have the shape of the original array, with those axes removed, and an extra dimension of size 'n' added at the end, where 'n' is the larger of 'minlength' and the maximum value in the array plus one. This is pretty straightforward. I think the only design choices that warrant some discussion are: 1. Should the default value for 'axis' be all axes, just the last, just the first? 2. Having the extra dimension added at the end. It may seem more natural to have the new dimension replace a dimension that has been removed. But because 'axis' can hold multiple axes, this would require some guessing (the first? the last? first or last based in position in the array, or in position in the 'axis' tuple?), which is avoided by having a fixed position. The other option would be at the beginning, not the end of the shape. For counting I think the last dimensions is the right choice, but... As an example of how it works: >>> a = np.random.randint(5, size=(3, 400, 500)) >>> np.bincount(a, axis=(-1, -2)) array([[39763, 40086, 39832, 39970, 40349], [40006, 39892, 40226, 39938, 39938], [39990, 40082, 40184, 39818, 39926]]) So there were 40184 occurrences of 2 in a[2, :, :]. *With weights* This can get complicated, but the rules are simple: the two arrays are broadcasted together, and the axes removed refer to the axes in the input array before broadcasting. This is probably best illustrated with an example: >>> w = np.random.rand(100, 3, 2) >>> a = np.random.randint(4, size=(100,)) >>> np.bincount(a, w.T).T array([[[ 8.29654919, 9.65794721], [ 12.01620609, 10.06676672], [ 11.73217521, 10.42220345]], [[ 10.67034693, 11.7945728 ], [ 13.47044072, 11.45176676], [ 10.83104283, 12.00869285]], [[ 14.30506753, 8.18840995], [ 13.44466573, 13.18924624], [ 11.95200531, 12.92169698]], [[ 16.78580192, 16.96104034], [ 12.80863984, 15.04778831], [ 16.35114845, 14.63648771]]]) Here 'w' has shape '(100, 3, 2)', interpreted as a list of 100 arrays of shape '(3, 2)'. We want to add together the arrays into several groups, as indicated by another array 'a' of shape '(100,)', which is what is achieved above. Other options to consider are: >>> np.bincount(a[:, None, None], w).shape (4,) <-- WRONG: the axis of dimension 1 have not been added by broadcasting, so they get removed >>> np.bincount(a[:, None, None], w, axis=0).shape (3, 2, 4) <-- RIGHT, but this doesn't seem the ordering of the dimensions one would want. It seems to me that what anyone trying to do this would like to get back is an array of shape '(4, 3, 2)', so I think the construct bincount(x, w.T).T will be used often enough that it warrants some less convoluted way of getting that back. But unless someone can figure out a smart way of handling this, I'd rather wait to see how it gets used, and modify it later, rather than making up an uninformed UI which turns out to be useless. The obvious question for bincount with weights are: 1. Should axis refer to the axes **after** broadcasting? I don't think it makes sense to add over a dimension of size 1 in the input array, you can get the same result by summing over that dimension in `weights` before calling bincount, but I am open to other opinons. 2. Any ideas on how to better handle multidimensional weights? Thanks, Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial.I -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Wed Feb 19 19:25:40 2014 From: njs at pobox.com (Nathaniel Smith) Date: Wed, 19 Feb 2014 19:25:40 -0500 Subject: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe In-Reply-To: References: Message-ID: Hey all, Just a heads up: thanks to the tireless work of Olivier Grisel, the OpenBLAS development branch is now fork-safe when built with its default threading support. (It is still not thread-safe when built using OMP for threading and gcc, but this is not the default.) Gory details: https://github.com/xianyi/OpenBLAS/issues/294 Check it out - if it works you might want to consider lobbying your favorite distro to backport it. -n -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Thu Feb 20 05:32:01 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Thu, 20 Feb 2014 11:32:01 +0100 Subject: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe In-Reply-To: References: Message-ID: On Thu, Feb 20, 2014 at 1:25 AM, Nathaniel Smith wrote: > Hey all, > > Just a heads up: thanks to the tireless work of Olivier Grisel, the OpenBLAS > development branch is now fork-safe when built with its default threading > support. (It is still not thread-safe when built using OMP for threading and > gcc, but this is not the default.) > > Gory details: https://github.com/xianyi/OpenBLAS/issues/294 > > Check it out - if it works you might want to consider lobbying your favorite > distro to backport it. > debian unstable and the upcoming ubuntu 14.04 are already fixed. From stefan.otte at gmail.com Thu Feb 20 05:34:52 2014 From: stefan.otte at gmail.com (Stefan Otte) Date: Thu, 20 Feb 2014 11:34:52 +0100 Subject: [Numpy-discussion] Proposal: Chaining np.dot with mdot helper function In-Reply-To: References: Message-ID: Hey, so I propose the following. I'll implement a new function `mdot`. Incorporating the changes in `dot` are unlikely. Later, one can still include the features in `dot` if desired. `mdot` will have a default parameter `optimize`. If `optimize==True` the reordering of the multiplication is done. Otherwise it simply chains the multiplications. I'll test and benchmark my implementation and create a pull request. Cheers, Stefan From hoogendoorn.eelco at gmail.com Thu Feb 20 06:41:42 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Thu, 20 Feb 2014 12:41:42 +0100 Subject: [Numpy-discussion] Proposal: Chaining np.dot with mdot helper function In-Reply-To: References: Message-ID: If the standard semantics are not affected, and the most common two-argument scenario does not take more than a single if-statement overhead, I don't see why it couldn't be a replacement for the existing np.dot; but others mileage may vary. On Thu, Feb 20, 2014 at 11:34 AM, Stefan Otte wrote: > Hey, > > so I propose the following. I'll implement a new function `mdot`. > Incorporating the changes in `dot` are unlikely. Later, one can still > include > the features in `dot` if desired. > > `mdot` will have a default parameter `optimize`. If `optimize==True` the > reordering of the multiplication is done. Otherwise it simply chains the > multiplications. > > I'll test and benchmark my implementation and create a pull request. > > Cheers, > Stefan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Thu Feb 20 07:09:06 2014 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 20 Feb 2014 13:09:06 +0100 Subject: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe In-Reply-To: References: Message-ID: 2014-02-20 11:32 GMT+01:00 Julian Taylor : > On Thu, Feb 20, 2014 at 1:25 AM, Nathaniel Smith wrote: >> Hey all, >> >> Just a heads up: thanks to the tireless work of Olivier Grisel, the OpenBLAS >> development branch is now fork-safe when built with its default threading >> support. (It is still not thread-safe when built using OMP for threading and >> gcc, but this is not the default.) >> >> Gory details: https://github.com/xianyi/OpenBLAS/issues/294 >> >> Check it out - if it works you might want to consider lobbying your favorite >> distro to backport it. >> > > debian unstable and the upcoming ubuntu 14.04 are already fixed. Nice! -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel From sergio_r at mail.com Thu Feb 20 07:14:02 2014 From: sergio_r at mail.com (Sergio Rojas) Date: Thu, 20 Feb 2014 07:14:02 -0500 Subject: [Numpy-discussion] NumPy 1.8.0 ERRORS under INTEL MKL Message-ID: <20140220121402.285740@gmx.com> Below is the output of running numpy test installed using Intel MKL library. Alll 5001 Numpy tests pass if installed using either Atlas or OpenBlass Is there ny known way to fix them? Sergio $ python Python 2.7.3 (default, Feb 11 2014, 16:24:48) [GCC Intel(R) C++ gcc 4.6 mode] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> numpy.show_config() lapack_opt_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/include'] blas_opt_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/include'] openblas_info: NOT AVAILABLE lapack_mkl_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/include'] blas_mkl_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/include'] mkl_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/include'] >>> numpy.test('full', verbose=2) Running unit tests for numpy NumPy version 1.8.0 NumPy is installed in /opt/myProg/Python273intel/Linux64b/lib/python2.7/site-pac kages/numpy Python version 2.7.3 (default, Feb 11 2014, 16:24:48) [GCC Intel(R) C++ gcc 4.6 mode] nose version 1.3.0 test_api.test_array_array ... ok ... ... test_matlib.test_empty ... ok test_matlib.test_ones ... ok test_matlib.test_zeros ... ok test_matlib.test_identity ... ok test_matlib.test_eye ... ok test_matlib.test_rand ... ok test_matlib.test_randn ... ok test_matlib.test_repmat ... ok ====================================================================== ERROR: test_assumed_shape.TestAssumedShapeSumExample.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 353, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_callback.TestF77Callback.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 163, in build_code module_name=module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_callback.TestF77Callback.test_docstring ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 80, in wrapper raise ret ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_kind.TestKind.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 353, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_mixed.TestMixed.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 353, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_mixed.TestMixed.test_docstring ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 353, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 80, in wrapper raise ret ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_return_character.TestF77ReturnCharacter.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 163, in build_code module_name=module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_return_character.TestF90ReturnCharacter.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 163, in build_code module_name=module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_return_complex.TestF77ReturnComplex.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 163, in build_code module_name=module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_return_complex.TestF90ReturnComplex.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 163, in build_code module_name=module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_return_integer.TestF77ReturnInteger.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 163, in build_code module_name=module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_return_integer.TestF90ReturnInteger.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 163, in build_code module_name=module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_return_logical.TestF77ReturnLogical.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 163, in build_code module_name=module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_return_logical.TestF90ReturnLogical.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 163, in build_code module_name=module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_return_real.TestF77ReturnReal.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 163, in build_code module_name=module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_return_real.TestF90ReturnReal.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 163, in build_code module_name=module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_size.TestSizeSumExample.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 353, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_size.TestSizeSumExample.test_flatten ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 353, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 80, in wrapper raise ret ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_size.TestSizeSumExample.test_transpose ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 353, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 80, in wrapper raise ret ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ---------------------------------------------------------------------- Ran 5001 tests in 161.433s FAILED (KNOWNFAIL=5, SKIP=7, errors=19) >>> quit() -------------- next part -------------- An HTML attachment was scrubbed... URL: From sergio_r at mail.com Thu Feb 20 07:14:02 2014 From: sergio_r at mail.com (Sergio Rojas) Date: Thu, 20 Feb 2014 07:14:02 -0500 Subject: [Numpy-discussion] NumPy 1.8.0 ERRORS under INTEL MKL Message-ID: <20140220121534.115020@gmx.com> Below is the output of running numpy test installed using Intel MKL library. Alll 5001 Numpy tests pass if installed using either Atlas or OpenBlass Is there ny known way to fix them? Sergio $ python Python 2.7.3 (default, Feb 11 2014, 16:24:48) [GCC Intel(R) C++ gcc 4.6 mode] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> numpy.show_config() lapack_opt_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/include'] blas_opt_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/include'] openblas_info: NOT AVAILABLE lapack_mkl_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/include'] blas_mkl_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/include'] mkl_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64'] define_macros = [('SCIPY_MKL_H', None)] include_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/include'] >>> numpy.test('full', verbose=2) Running unit tests for numpy NumPy version 1.8.0 NumPy is installed in /opt/myProg/Python273intel/Linux64b/lib/python2.7/site-pac kages/numpy Python version 2.7.3 (default, Feb 11 2014, 16:24:48) [GCC Intel(R) C++ gcc 4.6 mode] nose version 1.3.0 test_api.test_array_array ... ok ... ... test_matlib.test_empty ... ok test_matlib.test_ones ... ok test_matlib.test_zeros ... ok test_matlib.test_identity ... ok test_matlib.test_eye ... ok test_matlib.test_rand ... ok test_matlib.test_randn ... ok test_matlib.test_repmat ... ok ====================================================================== ERROR: test_assumed_shape.TestAssumedShapeSumExample.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 353, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_callback.TestF77Callback.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 163, in build_code module_name=module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_callback.TestF77Callback.test_docstring ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 80, in wrapper raise ret ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_kind.TestKind.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 353, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_mixed.TestMixed.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 353, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_mixed.TestMixed.test_docstring ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 353, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 80, in wrapper raise ret ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_return_character.TestF77ReturnCharacter.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 163, in build_code module_name=module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_return_character.TestF90ReturnCharacter.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 163, in build_code module_name=module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_return_complex.TestF77ReturnComplex.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 163, in build_code module_name=module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_return_complex.TestF90ReturnComplex.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 163, in build_code module_name=module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_return_integer.TestF77ReturnInteger.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 163, in build_code module_name=module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_return_integer.TestF90ReturnInteger.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 163, in build_code module_name=module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_return_logical.TestF77ReturnLogical.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 163, in build_code module_name=module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_return_logical.TestF90ReturnLogical.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 163, in build_code module_name=module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_return_real.TestF77ReturnReal.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 163, in build_code module_name=module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_return_real.TestF90ReturnReal.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 348, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 163, in build_code module_name=module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_size.TestSizeSumExample.test_all ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 353, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 74, in wrapper memo[key] = func(*a, **kw) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 144, in build_module __import__(module_name) ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_size.TestSizeSumExample.test_flatten ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 353, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 80, in wrapper raise ret ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ====================================================================== ERROR: test_size.TestSizeSumExample.test_transpose ---------------------------------------------------------------------- Traceback (most recent call last): File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/case.py", line 381, in setUp try_run(self.inst, ('setup', 'setUp')) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 .0-py2.7.egg/nose/util.py", line 469, in try_run return func() File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 353, in setUp module_name=self.module_name) File "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 py/tests/util.py", line 80, in wrapper raise ret ImportError: dynamic module does not define init function (init_test_ext_module_ 5403) ---------------------------------------------------------------------- Ran 5001 tests in 161.433s FAILED (KNOWNFAIL=5, SKIP=7, errors=19) >>> quit() -------------- next part -------------- An HTML attachment was scrubbed... URL: From pyviennacl at tsmithe.net Thu Feb 20 07:46:07 2014 From: pyviennacl at tsmithe.net (Toby St Clere Smithe) Date: Thu, 20 Feb 2014 12:46:07 +0000 Subject: [Numpy-discussion] PyViennaCL Message-ID: <8761oa3qb4.fsf@tsmithe.net> Hi all, Apologies for posting across lists; I thought that this might be of interest to both groups. I have just released PyViennaCL 1.0.0, which is a set of largely NumPy-compatible Python bindings to the ViennaCL linear algebra and numerical computation library for GPGPU and heterogeneous systems. PyViennaCL aims to make powerful GPGPU computing really transparently easy, especially for users already using NumPy for representing matrices. Please see my announcement below for links to source and packages and documentation, a list of features, and a list of missing pieces. I hope to iron out all those missing bits over the coming months, and work on closer integration, especially with PyOpenCL / PyCUDA, over the summer. Best wishes, Toby St Clere Smithe -------------- next part -------------- An embedded message was scrubbed... From: Toby St Clere Smithe Subject: PyViennaCL 1.0.0 is released! Date: Thu, 20 Feb 2014 11:54:44 +0000 Size: 2401 URL: From sturla.molden at gmail.com Thu Feb 20 08:28:52 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Thu, 20 Feb 2014 13:28:52 +0000 (UTC) Subject: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe References: Message-ID: <1225660970414595360.835902sturla.molden-gmail.com@news.gmane.org> Will this mean NumPy, SciPy et al. can start using OpenBLAS in the "official" binary packages, e.g. on Windows and Mac OS X? ATLAS is slow and Accelerate conflicts with fork as well. Will dotblas be built against OpenBLAS? AFAIK, it is only buit against ATLAS or MKL, not any other BLAS, but it should just be a matter of changing the build/link process. Sturla Nathaniel Smith wrote: > Hey all, > > Just a heads up: thanks to the tireless work of Olivier Grisel, the > OpenBLAS development branch is now fork-safe when built with its default > threading support. (It is still not thread-safe when built using OMP for > threading and gcc, but this is not the default.) > > Gory details: href="https://github.com/xianyi/OpenBLAS/issues/294">https://github.com/xianyi/OpenBLAS/issues/294 > > Check it out - if it works you might want to consider lobbying your > favorite distro to backport it. > > -n > > _______________________________________________ NumPy-Discussion mailing list > NumPy-Discussion at scipy.org href="http://mail.scipy.org/mailman/listinfo/numpy-discussion">http://mail.scipy.org/mailman/listinfo/numpy-discussion From ewm at redtetrahedron.org Thu Feb 20 09:27:38 2014 From: ewm at redtetrahedron.org (Eric Moore) Date: Thu, 20 Feb 2014 09:27:38 -0500 Subject: [Numpy-discussion] Proposal: Chaining np.dot with mdot helper function In-Reply-To: References: Message-ID: On Thursday, February 20, 2014, Eelco Hoogendoorn < hoogendoorn.eelco at gmail.com> wrote: > If the standard semantics are not affected, and the most common > two-argument scenario does not take more than a single if-statement > overhead, I don't see why it couldn't be a replacement for the existing > np.dot; but others mileage may vary. > > > On Thu, Feb 20, 2014 at 11:34 AM, Stefan Otte > > wrote: > >> Hey, >> >> so I propose the following. I'll implement a new function `mdot`. >> Incorporating the changes in `dot` are unlikely. Later, one can still >> include >> the features in `dot` if desired. >> >> `mdot` will have a default parameter `optimize`. If `optimize==True` the >> reordering of the multiplication is done. Otherwise it simply chains the >> multiplications. >> >> I'll test and benchmark my implementation and create a pull request. >> >> Cheers, >> Stefan >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > Another consideration here is that we need a better way to work with stacked matrices such as np.linalg handles now. Ie I want to compute the matrix product of two (k, n, n) arrays producing a (k,n,n) result. Near as I can tell there isn't a way to do this right now that doesn't involve an explicit loop. Since dot will return a (k, n, k, n) result. Yes this output contains what I want but it also computes a lot of things that I don't want too. It would also be nice to be able to do a matrix product reduction, (k, n, n) -> (n, n) in a single line too. Eric -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Thu Feb 20 09:40:12 2014 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 20 Feb 2014 15:40:12 +0100 Subject: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe In-Reply-To: <1225660970414595360.835902sturla.molden-gmail.com@news.gmane.org> References: <1225660970414595360.835902sturla.molden-gmail.com@news.gmane.org> Message-ID: 2014-02-20 14:28 GMT+01:00 Sturla Molden : > Will this mean NumPy, SciPy et al. can start using OpenBLAS in the > "official" binary packages, e.g. on Windows and Mac OS X? ATLAS is slow and > Accelerate conflicts with fork as well. This what I would like to do personnally. Ideally as a distribution of wheel packages To do so I built the current develop branch of OpenBLAS with: make USE_OPENMP=0 NUM_THREAD=32 NO_AFFINITY=1 make PREFIX=/opt/OpenBLAS-noomp install Then I added a site.cfg file in the numpy source folder with the lines: [openblas] libraries = openblas library_dirs = /opt/OpenBLAS-noomp/lib include_dirs = /opt/OpenBLAS-noomp/include > Will dotblas be built against OpenBLAS? Yes: $ ldd numpy/core/_dotblas.so linux-vdso.so.1 => (0x00007fff24d04000) libopenblas.so.0 => /opt/OpenBLAS-noomp/lib/libopenblas.so.0 (0x00007f432882f000) libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f4328449000) libm.so.6 => /lib/x86_64-linux-gnu/libm.so.6 (0x00007f432814c000) libpthread.so.0 => /lib/x86_64-linux-gnu/libpthread.so.0 (0x00007f4327f2f000) libgfortran.so.3 => /usr/lib/x86_64-linux-gnu/libgfortran.so.3 (0x00007f4327c18000) /lib64/ld-linux-x86-64.so.2 (0x00007f43298d3000) libquadmath.so.0 => /usr/lib/x86_64-linux-gnu/libquadmath.so.0 (0x00007f43279e1000) However when testing this I noticed the following strange slow import and memory usage in an IPython session: >>> import os, psutil >>> psutil.Process(os.getpid()).get_memory_info().rss / 1e6 20.324352 >>> %time import numpy CPU times: user 1.95 s, sys: 1.3 s, total: 3.25 s Wall time: 530 ms >>> psutil.Process(os.getpid()).get_memory_info().rss / 1e6 349.507584 The libopenblas.so file is just 14MB so I don't understand how I could get those 330MB from. It's even worst when using static linking (libopenblas.a instead of libopenblas.so under linux). With Atlas or MKL I get import times under 50 ms and the memory overhead of the numpy import is just ~15MB. I would be very interested in any help on this: - can you reproduce this behavior? - do you have an idea of a possible cause? - how to investigate? -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel From hoogendoorn.eelco at gmail.com Thu Feb 20 09:41:36 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Thu, 20 Feb 2014 15:41:36 +0100 Subject: [Numpy-discussion] Proposal: Chaining np.dot with mdot helper function In-Reply-To: References: Message-ID: Erik; take a look at np.einsum The only reason against such dot semantics is that there isn't much to be gained in elegance that np.einsum already provides, For a plain chaining, multiple arguments to dot would be an improvement; but if you want to go for more complex products, the elegance of np.einsum will be hard to beat On Thu, Feb 20, 2014 at 3:27 PM, Eric Moore wrote: > > > On Thursday, February 20, 2014, Eelco Hoogendoorn < > hoogendoorn.eelco at gmail.com> wrote: > >> If the standard semantics are not affected, and the most common >> two-argument scenario does not take more than a single if-statement >> overhead, I don't see why it couldn't be a replacement for the existing >> np.dot; but others mileage may vary. >> >> >> On Thu, Feb 20, 2014 at 11:34 AM, Stefan Otte wrote: >> >>> Hey, >>> >>> so I propose the following. I'll implement a new function `mdot`. >>> Incorporating the changes in `dot` are unlikely. Later, one can still >>> include >>> the features in `dot` if desired. >>> >>> `mdot` will have a default parameter `optimize`. If `optimize==True` the >>> reordering of the multiplication is done. Otherwise it simply chains the >>> multiplications. >>> >>> I'll test and benchmark my implementation and create a pull request. >>> >>> Cheers, >>> Stefan >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> Another consideration here is that we need a better way to work with > stacked matrices such as np.linalg handles now. Ie I want to compute the > matrix product of two (k, n, n) arrays producing a (k,n,n) result. Near > as I can tell there isn't a way to do this right now that doesn't involve > an explicit loop. Since dot will return a (k, n, k, n) result. Yes this > output contains what I want but it also computes a lot of things that I > don't want too. > > It would also be nice to be able to do a matrix product reduction, (k, n, > n) -> (n, n) in a single line too. > > Eric > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Thu Feb 20 09:42:36 2014 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 20 Feb 2014 15:42:36 +0100 Subject: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe In-Reply-To: References: <1225660970414595360.835902sturla.molden-gmail.com@news.gmane.org> Message-ID: FYI: to build scipy against OpenBLAS I used the following site.cfg at the root of my scipy source folder: [DEFAULT] library_dirs = /opt/OpenBLAS-noomp/lib:/usr/local/lib include_dirs = /opt/OpenBLAS-noomp/include:/usr/local/include [blas_opt] libraries = openblas [lapack_opt] libraries = openblas But this is unrelated to the previous numpy memory pattern as it occurs independendly of scipy. -- Olivier From cmkleffner at gmail.com Thu Feb 20 09:43:06 2014 From: cmkleffner at gmail.com (Carl Kleffner) Date: Thu, 20 Feb 2014 15:43:06 +0100 Subject: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe In-Reply-To: <1225660970414595360.835902sturla.molden-gmail.com@news.gmane.org> References: <1225660970414595360.835902sturla.molden-gmail.com@news.gmane.org> Message-ID: Hi, some days ago I put some preliminary mingw-w64 binaries and code based on python2.7 on my google drive to discuss it with Matthew Brett. Maybe its time for a broader discussion. IMHO it is ready for testing but not for consumption. url: https://drive.google.com/folderview?id=0B4DmELLTwYmldUVpSjdpZlpNM1k&usp=sharing contains: (1) patches used numpy.patch scipy.patch (2) 64 bit GCC toolchain amd64/ mingw-w64-toolchain-static_amd64-gcc-4.8.2_vc90_rev-20140131.7z libpython27.a (3) numpy-1.8.0 linked against OpenBLAS amd64/numpy-1.8.0/ numpy-1.8.0.win-amd64-py2.7.exe numpy-1.8.0-cp27-none-win_amd64.whl numpy_amd64_fcompiler.log numpy_amd64_build.log numpy_amd64_test.log _numpyconfig.h config.h (4) scipy-0.13.3 linked against OpenBLAS amd64/scipy-0.13.3/ scipy-0.13.3.win-amd64-py2.7.exe scipy-0.13.3-cp27-none-win_amd64.whl scipy_amd64_fcompiler.log scipy_amd64_build.log scipy_amd64_build_cont.log scipy_amd64_test._segfault.log scipy_amd64_test.log (5) 32 bit GCC toolchain win32/ mingw-w64-toolchain-static_win32-gcc-4.8.2_vc90_rev-20140131.7z libpython27.a (6) numpy-1.8.0 linked against OpenBLAS win32/numpy-1.8.0/ numpy-1.8.0.win32-py2.7.exe numpy-1.8.0-cp27-none-win32.whl numpy_win32_fcompiler.log numpy_win32_build.log numpy_win32_test.log _numpyconfig.h config.h (7) scipy-0.13.3 linked against OpenBLAS win32/scipy-0.13.3/ scipy-0.13.3.win32-py2.7.exe scipy-0.13.3-cp27-none-win32.whl scipy_win32_fcompiler.log scipy_win32_build.log scipy_win32_build_cont.log scipy_win32_test.log Summary to compile numpy: (1) \bin and python should be in the PATH. Choose 32 bit or 64 bit architecture. (2) copy libpython27.a to \libs check, that \libs does not contain libmsvcr90.a (3) apply numpy.patch (4) copy libopenblas.dll from \bin to numpy\core of course don't ever mix 32bit and 64 bit code (5) create a site.cfg in the numpy folder with the absolute path to the mingw import files/header files. I copied the openblas header files, importlibs into the GCC toolchain. (6) create a mingw distutils.cfg file (7) test the configuration python setup.py config_fc --verbose and python setup.py build --help-fcompiler (8) build python setup.py build --fcompiler=gnu95 (9) make a distro python setup.py bdist --format=wininst (10) make a wheel wininst2wheel numpy-1.8.0.win32-py2.7.exe (for 32 bit) (11) install wheel install numpy-1.8.0-cp27-none-win32.whl (12) import numpy; numpy.test() Summary to compile scipy: (1) apply scipy.patch (2) python setup.py build --fcompiler=gnu95 and a second time python setup.py build --fcompiler=gnu95 (3) python setup.py bdist --format=wininst (4) install (5) import scipy; scipy.test() Hints: (1) libpython import file: The libpython27.a import files has been generated with gendef and dlltool according to the recommendations on the mingw-w64 faq site. It is essential to not use import libraries from anywhere, but create it with the tools in the GCC toolchain. The GCC toolchains contains correct generated mscvrXX import files per default. (2) OpenBLAS: the openblas DLL must be copied to numpy/core before building numpy. All Blas and Lapack code will be linked dynamically to this DLL. Because of this the overall distro size gets much smaller compared to numpy-MKL or scipy-MKL. It is not necessary to add numpy/core to the path! (at least on my machine). To load libopenblas.dll to the process space it is only necessary to import numpy - nothing else. libopenblas.dll is linked against the msvcr90.dll, just like python. The DLL itself is a fat binary containing all optimized kernels for all supported platforms. DLL, headers and import files have been included into the toolchain. (3) mingw-w64 toolchain: In short it is an extended version of the 'recommended' mingw-builds toolchain with some minor patches and customizations. I used https://github.com/niXman/mingw-builds for my build. It is a 'statically' build, thus all gcc related runtimes are linked statically into the resulting binaries. (4) Results: Some FAILS - see corresp. log-files. I got a segfault with scipy.test() (64 bit) with multithreaded OpenBLAS (test_ssygv_1) but not in single threaded mode. Due to time constraints I didn't made further tests right now. Regards Carl 2014-02-20 14:28 GMT+01:00 Sturla Molden : > Will this mean NumPy, SciPy et al. can start using OpenBLAS in the > "official" binary packages, e.g. on Windows and Mac OS X? ATLAS is slow and > Accelerate conflicts with fork as well. > > Will dotblas be built against OpenBLAS? AFAIK, it is only buit against > ATLAS or MKL, not any other BLAS, but it should just be a matter of > changing the build/link process. > > Sturla > > > > Nathaniel Smith wrote: > > Hey all, > > > > Just a heads up: thanks to the tireless work of Olivier Grisel, the > > OpenBLAS development branch is now fork-safe when built with its default > > threading support. (It is still not thread-safe when built using OMP for > > threading and gcc, but this is not the default.) > > > > Gory details: > href="https://github.com/xianyi/OpenBLAS/issues/294"> > https://github.com/xianyi/OpenBLAS/issues/294 > > > > Check it out - if it works you might want to consider lobbying your > > favorite distro to backport it. > > > > -n > > > > _______________________________________________ NumPy-Discussion mailing > list > > NumPy-Discussion at scipy.org > href="http://mail.scipy.org/mailman/listinfo/numpy-discussion"> > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Thu Feb 20 09:50:14 2014 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 20 Feb 2014 15:50:14 +0100 Subject: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe In-Reply-To: References: <1225660970414595360.835902sturla.molden-gmail.com@news.gmane.org> Message-ID: Thanks for sharing, this is all very interesting. Have you tried to have a look at the memory usage and import time of numpy when linked against libopenblas.dll? -- Olivier From jtaylor.debian at googlemail.com Thu Feb 20 10:01:12 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Thu, 20 Feb 2014 16:01:12 +0100 Subject: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe In-Reply-To: References: <1225660970414595360.835902sturla.molden-gmail.com@news.gmane.org> Message-ID: On Thu, Feb 20, 2014 at 3:50 PM, Olivier Grisel wrote: > Thanks for sharing, this is all very interesting. > > Have you tried to have a look at the memory usage and import time of > numpy when linked against libopenblas.dll? > > -- this is probably caused by the memory warmup it can be disabled with NO_WARMUP=1 in some configuration file. From cmkleffner at gmail.com Thu Feb 20 10:01:25 2014 From: cmkleffner at gmail.com (Carl Kleffner) Date: Thu, 20 Feb 2014 16:01:25 +0100 Subject: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe In-Reply-To: References: <1225660970414595360.835902sturla.molden-gmail.com@news.gmane.org> Message-ID: looked at the taskmanager there is not much difference to numpy-MKL. I didn't made any qualified measurements however. Carl 2014-02-20 15:50 GMT+01:00 Olivier Grisel : > Thanks for sharing, this is all very interesting. > > Have you tried to have a look at the memory usage and import time of > numpy when linked against libopenblas.dll? > > -- > Olivier > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Thu Feb 20 10:02:09 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 20 Feb 2014 10:02:09 -0500 Subject: [Numpy-discussion] Proposal: Chaining np.dot with mdot helper function In-Reply-To: References: Message-ID: If you send a patch that deprecates dot's current behaviour for ndim>2, we'll probably merge it. (We'd like it to function like you suggest, for consistency with other gufuncs. But to get there we have to deprecate the current behaviour first.) While I'm wishing for things I'll also mention that it would be really neat if binary gufuncs would have a .outer method like regular ufuncs do, so anyone currently using ndim>2 dot could just switch to that. But that's a lot more work than just deprecating something :-). -n On 20 Feb 2014 09:27, "Eric Moore" wrote: > > > On Thursday, February 20, 2014, Eelco Hoogendoorn < > hoogendoorn.eelco at gmail.com> wrote: > >> If the standard semantics are not affected, and the most common >> two-argument scenario does not take more than a single if-statement >> overhead, I don't see why it couldn't be a replacement for the existing >> np.dot; but others mileage may vary. >> >> >> On Thu, Feb 20, 2014 at 11:34 AM, Stefan Otte wrote: >> >>> Hey, >>> >>> so I propose the following. I'll implement a new function `mdot`. >>> Incorporating the changes in `dot` are unlikely. Later, one can still >>> include >>> the features in `dot` if desired. >>> >>> `mdot` will have a default parameter `optimize`. If `optimize==True` the >>> reordering of the multiplication is done. Otherwise it simply chains the >>> multiplications. >>> >>> I'll test and benchmark my implementation and create a pull request. >>> >>> Cheers, >>> Stefan >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> Another consideration here is that we need a better way to work with > stacked matrices such as np.linalg handles now. Ie I want to compute the > matrix product of two (k, n, n) arrays producing a (k,n,n) result. Near > as I can tell there isn't a way to do this right now that doesn't involve > an explicit loop. Since dot will return a (k, n, k, n) result. Yes this > output contains what I want but it also computes a lot of things that I > don't want too. > > It would also be nice to be able to do a matrix product reduction, (k, n, > n) -> (n, n) in a single line too. > > Eric > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmkleffner at gmail.com Thu Feb 20 10:02:30 2014 From: cmkleffner at gmail.com (Carl Kleffner) Date: Thu, 20 Feb 2014 16:02:30 +0100 Subject: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe In-Reply-To: References: <1225660970414595360.835902sturla.molden-gmail.com@news.gmane.org> Message-ID: good point, I didn't used this option. Carl 2014-02-20 16:01 GMT+01:00 Julian Taylor : > On Thu, Feb 20, 2014 at 3:50 PM, Olivier Grisel > wrote: > > Thanks for sharing, this is all very interesting. > > > > Have you tried to have a look at the memory usage and import time of > > numpy when linked against libopenblas.dll? > > > > -- > > this is probably caused by the memory warmup > it can be disabled with NO_WARMUP=1 in some configuration file. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Feb 20 10:04:46 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 20 Feb 2014 08:04:46 -0700 Subject: [Numpy-discussion] NumPy 1.8.0 ERRORS under INTEL MKL In-Reply-To: <20140220121402.285740@gmx.com> References: <20140220121402.285740@gmx.com> Message-ID: Those are all f2py related tests, so I suspect something is wrong with your fortran/configuration. Might make sure to remove both the install and build directories and do a clean build also. On Thu, Feb 20, 2014 at 5:14 AM, Sergio Rojas wrote: > > Below is the output of running numpy test installed using Intel MKL > library. > Alll 5001 Numpy tests pass if installed using either Atlas or OpenBlass > > Is there ny known way to fix them? > > Sergio > > > $ python > Python 2.7.3 (default, Feb 11 2014, 16:24:48) > [GCC Intel(R) C++ gcc 4.6 mode] on linux2 > Type "help", "copyright", "credits" or "license" for more information. > >>> import numpy > >>> numpy.show_config() > lapack_opt_info: > libraries = ['mkl_rt', 'pthread'] > library_dirs = > ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64'] > define_macros = [('SCIPY_MKL_H', None)] > include_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/include'] > blas_opt_info: > libraries = ['mkl_rt', 'pthread'] > library_dirs = > ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64'] > define_macros = [('SCIPY_MKL_H', None)] > include_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/include'] > openblas_info: > NOT AVAILABLE > lapack_mkl_info: > libraries = ['mkl_rt', 'pthread'] > library_dirs = > ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64'] > define_macros = [('SCIPY_MKL_H', None)] > include_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/include'] > blas_mkl_info: > libraries = ['mkl_rt', 'pthread'] > library_dirs = > ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64'] > define_macros = [('SCIPY_MKL_H', None)] > include_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/include'] > mkl_info: > libraries = ['mkl_rt', 'pthread'] > library_dirs = > ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/lib/intel64'] > define_macros = [('SCIPY_MKL_H', None)] > include_dirs = ['/opt/intel/composer_xe_2013_sp1.1.106/mkl/include'] > >>> numpy.test('full', verbose=2) > Running unit tests for numpy > NumPy version 1.8.0 > NumPy is installed in > /opt/myProg/Python273intel/Linux64b/lib/python2.7/site-pac > kages/numpy > Python version 2.7.3 (default, Feb 11 2014, 16:24:48) [GCC Intel(R) C++ > gcc 4.6 > mode] > nose version 1.3.0 > test_api.test_array_array ... ok > ... > ... > test_matlib.test_empty ... ok > test_matlib.test_ones ... ok > test_matlib.test_zeros ... ok > test_matlib.test_identity ... ok > test_matlib.test_eye ... ok > test_matlib.test_rand ... ok > test_matlib.test_randn ... ok > test_matlib.test_repmat ... ok > > ====================================================================== > ERROR: test_assumed_shape.TestAssumedShapeSumExample.test_all > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/case.py", line 381, in setUp > try_run(self.inst, ('setup', 'setUp')) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/util.py", line 469, in try_run > return func() > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 353, in setUp > module_name=self.module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 144, in build_module > __import__(module_name) > ImportError: dynamic module does not define init function > (init_test_ext_module_ > 5403) > > ====================================================================== > ERROR: test_callback.TestF77Callback.test_all > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/case.py", line 381, in setUp > try_run(self.inst, ('setup', 'setUp')) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/util.py", line 469, in try_run > return func() > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 348, in setUp > module_name=self.module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 163, in build_code > module_name=module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 144, in build_module > __import__(module_name) > ImportError: dynamic module does not define init function > (init_test_ext_module_ > 5403) > > ====================================================================== > ERROR: test_callback.TestF77Callback.test_docstring > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/case.py", line 381, in setUp > try_run(self.inst, ('setup', 'setUp')) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/util.py", line 469, in try_run > return func() > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 348, in setUp > module_name=self.module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 80, in wrapper > raise ret > ImportError: dynamic module does not define init function > (init_test_ext_module_ > 5403) > > ====================================================================== > ERROR: test_kind.TestKind.test_all > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/case.py", line 381, in setUp > try_run(self.inst, ('setup', 'setUp')) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/util.py", line 469, in try_run > return func() > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 353, in setUp > module_name=self.module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 144, in build_module > __import__(module_name) > ImportError: dynamic module does not define init function > (init_test_ext_module_ > 5403) > > ====================================================================== > ERROR: test_mixed.TestMixed.test_all > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/case.py", line 381, in setUp > try_run(self.inst, ('setup', 'setUp')) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/util.py", line 469, in try_run > return func() > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 353, in setUp > module_name=self.module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 144, in build_module > __import__(module_name) > ImportError: dynamic module does not define init function > (init_test_ext_module_ > 5403) > > ====================================================================== > ERROR: test_mixed.TestMixed.test_docstring > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/case.py", line 381, in setUp > try_run(self.inst, ('setup', 'setUp')) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/util.py", line 469, in try_run > return func() > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 353, in setUp > module_name=self.module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 80, in wrapper > raise ret > ImportError: dynamic module does not define init function > (init_test_ext_module_ > 5403) > > ====================================================================== > ERROR: test_return_character.TestF77ReturnCharacter.test_all > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/case.py", line 381, in setUp > try_run(self.inst, ('setup', 'setUp')) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/util.py", line 469, in try_run > return func() > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 348, in setUp > module_name=self.module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 163, in build_code > module_name=module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 144, in build_module > __import__(module_name) > ImportError: dynamic module does not define init function > (init_test_ext_module_ > 5403) > > ====================================================================== > ERROR: test_return_character.TestF90ReturnCharacter.test_all > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/case.py", line 381, in setUp > try_run(self.inst, ('setup', 'setUp')) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/util.py", line 469, in try_run > return func() > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 348, in setUp > module_name=self.module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 163, in build_code > module_name=module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 144, in build_module > __import__(module_name) > ImportError: dynamic module does not define init function > (init_test_ext_module_ > 5403) > > ====================================================================== > ERROR: test_return_complex.TestF77ReturnComplex.test_all > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/case.py", line 381, in setUp > try_run(self.inst, ('setup', 'setUp')) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/util.py", line 469, in try_run > return func() > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 348, in setUp > module_name=self.module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 163, in build_code > module_name=module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 144, in build_module > __import__(module_name) > ImportError: dynamic module does not define init function > (init_test_ext_module_ > 5403) > > ====================================================================== > ERROR: test_return_complex.TestF90ReturnComplex.test_all > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/case.py", line 381, in setUp > try_run(self.inst, ('setup', 'setUp')) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/util.py", line 469, in try_run > return func() > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 348, in setUp > module_name=self.module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 163, in build_code > module_name=module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 144, in build_module > __import__(module_name) > ImportError: dynamic module does not define init function > (init_test_ext_module_ > 5403) > > ====================================================================== > ERROR: test_return_integer.TestF77ReturnInteger.test_all > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/case.py", line 381, in setUp > try_run(self.inst, ('setup', 'setUp')) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/util.py", line 469, in try_run > return func() > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 348, in setUp > module_name=self.module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 163, in build_code > module_name=module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 144, in build_module > __import__(module_name) > ImportError: dynamic module does not define init function > (init_test_ext_module_ > 5403) > > ====================================================================== > ERROR: test_return_integer.TestF90ReturnInteger.test_all > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/case.py", line 381, in setUp > try_run(self.inst, ('setup', 'setUp')) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/util.py", line 469, in try_run > return func() > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 348, in setUp > module_name=self.module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 163, in build_code > module_name=module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 144, in build_module > __import__(module_name) > ImportError: dynamic module does not define init function > (init_test_ext_module_ > 5403) > > ====================================================================== > ERROR: test_return_logical.TestF77ReturnLogical.test_all > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/case.py", line 381, in setUp > try_run(self.inst, ('setup', 'setUp')) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/util.py", line 469, in try_run > return func() > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 348, in setUp > module_name=self.module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 163, in build_code > module_name=module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 144, in build_module > __import__(module_name) > ImportError: dynamic module does not define init function > (init_test_ext_module_ > 5403) > > ====================================================================== > ERROR: test_return_logical.TestF90ReturnLogical.test_all > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/case.py", line 381, in setUp > try_run(self.inst, ('setup', 'setUp')) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/util.py", line 469, in try_run > return func() > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 348, in setUp > module_name=self.module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 163, in build_code > module_name=module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 144, in build_module > __import__(module_name) > ImportError: dynamic module does not define init function > (init_test_ext_module_ > 5403) > > ====================================================================== > ERROR: test_return_real.TestF77ReturnReal.test_all > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/case.py", line 381, in setUp > try_run(self.inst, ('setup', 'setUp')) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/util.py", line 469, in try_run > return func() > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 348, in setUp > module_name=self.module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 163, in build_code > module_name=module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 144, in build_module > __import__(module_name) > ImportError: dynamic module does not define init function > (init_test_ext_module_ > 5403) > > ====================================================================== > ERROR: test_return_real.TestF90ReturnReal.test_all > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/case.py", line 381, in setUp > try_run(self.inst, ('setup', 'setUp')) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/util.py", line 469, in try_run > return func() > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 348, in setUp > module_name=self.module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 163, in build_code > module_name=module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 144, in build_module > __import__(module_name) > ImportError: dynamic module does not define init function > (init_test_ext_module_ > 5403) > > ====================================================================== > ERROR: test_size.TestSizeSumExample.test_all > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/case.py", line 381, in setUp > try_run(self.inst, ('setup', 'setUp')) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/util.py", line 469, in try_run > return func() > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 353, in setUp > module_name=self.module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 74, in wrapper > memo[key] = func(*a, **kw) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 144, in build_module > __import__(module_name) > ImportError: dynamic module does not define init function > (init_test_ext_module_ > 5403) > > ====================================================================== > ERROR: test_size.TestSizeSumExample.test_flatten > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/case.py", line 381, in setUp > try_run(self.inst, ('setup', 'setUp')) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/util.py", line 469, in try_run > return func() > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 353, in setUp > module_name=self.module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 80, in wrapper > raise ret > ImportError: dynamic module does not define init function > (init_test_ext_module_ > 5403) > > ====================================================================== > ERROR: test_size.TestSizeSumExample.test_transpose > ---------------------------------------------------------------------- > Traceback (most recent call last): > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/case.py", line 381, in setUp > try_run(self.inst, ('setup', 'setUp')) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/nose-1.3 > .0-py2.7.egg/nose/util.py", line 469, in try_run > return func() > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 353, in setUp > module_name=self.module_name) > File > "/opt/myProg/Python273intel/Linux64b/lib/python2.7/site-packages/numpy/f2 > py/tests/util.py", line 80, in wrapper > raise ret > ImportError: dynamic module does not define init function > (init_test_ext_module_ > 5403) > > ---------------------------------------------------------------------- > Ran 5001 tests in 161.433s > > FAILED (KNOWNFAIL=5, SKIP=7, errors=19) > > >>> quit() > > > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From olivier.grisel at ensta.org Thu Feb 20 10:30:55 2014 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 20 Feb 2014 16:30:55 +0100 Subject: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe In-Reply-To: References: <1225660970414595360.835902sturla.molden-gmail.com@news.gmane.org> Message-ID: 2014-02-20 16:01 GMT+01:00 Julian Taylor : > > this is probably caused by the memory warmup > it can be disabled with NO_WARMUP=1 in some configuration file. This was it, I now get: >>> import os, psutil >>> psutil.Process(os.getpid()).get_memory_info().rss / 1e6 20.324352 >>> %time import numpy CPU times: user 84 ms, sys: 464 ms, total: 548 ms Wall time: 59.3 ms >>> psutil.Process(os.getpid()).get_memory_info().rss / 1e6 27.906048 Thanks for the tip. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel From jurgen.vangael at gmail.com Thu Feb 20 11:57:14 2014 From: jurgen.vangael at gmail.com (Jurgen Van Gael) Date: Thu, 20 Feb 2014 16:57:14 +0000 Subject: [Numpy-discussion] OpenBLAS on Mac Message-ID: Hi All, I run Mac OS X 10.9.1 and was trying to get OpenBLAS working for numpy. I've downloaded the OpenBLAS source and compiled it (thanks to Olivier Grisel). I installed everything to /usr/local/lib (I believe): e.g. "ll /usr/local/lib/ | grep openblas" lrwxr-xr-x 1 37B 10 Feb 14:51 libopenblas.a@ -> libopenblas_sandybridgep-r0.2.9.rc1.a lrwxr-xr-x 1 56B 10 Feb 14:51 libopenblas.dylib@ -> /usr/local/lib/libopenblas_sandybridgep-r0.2.9.rc1.dylib -rw-r--r-- 1 18M 7 Feb 16:02 libopenblas_sandybridgep-r0.2.9.rc1.a -rwxr-xr-x 1 12M 10 Feb 14:51 libopenblas_sandybridgep-r0.2.9.rc1.dylib* Then I download the numpy sources and add a site.cfg with the only three lines uncommented being: [openblas] libraries = openblas library_dirs = /usr/local/lib include_dirs = /usr/local/include When I run python setup.py config I get message that say that openblas_info has been found. I then run setup build and setup install (into a virtualenv). In the virtualenv, when I then check what _dotblas.so is linked to, I keep getting that it is linked to Accelerate. E.g. otool -L .../numpy/core/_dotblas.so => /System/Library/Frameworks/Accelerate.framework/Versions/A/Accelerate Any suggestions on getting my numpy working with OpenBLAS? Thanks, Jurgen -------------- next part -------------- An HTML attachment was scrubbed... URL: From sole at esrf.fr Thu Feb 20 12:21:31 2014 From: sole at esrf.fr (V. Armando Sole) Date: Thu, 20 Feb 2014 18:21:31 +0100 Subject: [Numpy-discussion] [JOB ANNOUNCEMENT] Software Developer permanent position available at ESRF, France Message-ID: Dear colleagues, The ESRF is looking for a Software Developer: http://esrf.profilsearch.com/recrute/fo_annonce_voir.php?id=300 Our ideal candidate would be experienced on OpenGL, OpenCL and Python. Best regards, Armando From newellm at blur.com Thu Feb 20 12:33:35 2014 From: newellm at blur.com (Matt Newell) Date: Thu, 20 Feb 2014 09:33:35 -0800 Subject: [Numpy-discussion] Header files in windows installers Message-ID: <14688223.zKv6Hc4UNA@obsidian> I have a small c++ extension used to feed a 1d numpy array into a QPainterPath. Very simple just using PyArray_[Check|FLAGS|SIZE|DATA]. I developed it on debian which was of course very straightforward, but now I need to deploy on windows, which is of course where the fun always begins. I was unpleasantly surprised to discover that the numpy installers for window do not have an option to install the include files, which appear to be all that is needed to compile extensions that use numpy's C-api. I have actually managed to copy my set of include files from linux and with a few modifications to _numpyconfig.h got my extension compiled and working. Is there any chance that the includes could be included in future releases? Thanks, Matt Newell From olivier.grisel at ensta.org Thu Feb 20 13:07:33 2014 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 20 Feb 2014 19:07:33 +0100 Subject: [Numpy-discussion] OpenBLAS on Mac In-Reply-To: References: Message-ID: I have exactly the same setup as yours and it links to OpenBLAS correctly (in a venv as well, installed with python setup.py install). The only difference is that I installed OpenBLAS in the default folder: /opt/OpenBLAS (and I reflected that in site.cfg). When you run otool -L, is it in your source tree or do you point to the numpy/core/_dotblas.so of the site-packages folder of your venv? If you activate your venv, go to a different folder (e.g. /tmp) and type: python -c "import numpy as np; np.show_config()" what do you get? I get: $ python -c "import numpy as np; np.show_config()" lapack_opt_info: libraries = ['openblas', 'openblas'] library_dirs = ['/opt/OpenBLAS/lib'] language = f77 blas_opt_info: libraries = ['openblas', 'openblas'] library_dirs = ['/opt/OpenBLAS/lib'] language = f77 openblas_info: libraries = ['openblas', 'openblas'] library_dirs = ['/opt/OpenBLAS/lib'] language = f77 blas_mkl_info: NOT AVAILABLE -- Olivier From argriffi at ncsu.edu Thu Feb 20 13:24:36 2014 From: argriffi at ncsu.edu (alex) Date: Thu, 20 Feb 2014 13:24:36 -0500 Subject: [Numpy-discussion] svd error checking vs. speed In-Reply-To: <245746841414353707.772460sturla.molden-gmail.com@news.gmane.org> References: <1556894915414345274.194992sturla.molden-gmail.com@news.gmane.org> <1458618056414352506.887385sturla.molden-gmail.com@news.gmane.org> <245746841414353707.772460sturla.molden-gmail.com@news.gmane.org> Message-ID: On Mon, Feb 17, 2014 at 1:24 PM, Sturla Molden wrote: > Sturla Molden wrote: >> Dave Hirschfeld wrote: >> >>> Even if lapack_lite always performed the isfinite check and threw a python >>> error if False, it would be much better than either hanging or segfaulting and >>> people who care about the isfinite cost probably would be linking to a fast >>> lapack anyway. >> >> +1 (if I have a vote) >> >> Correctness is always more important than speed. Segfaulting or hanging >> while burning the CPU is not something we should allow "by design". And >> those who need speed should in any case use a different lapack library >> instead. The easiest place to put a finiteness test is the check_object >> function here: >> >> https://github.com/numpy/numpy/blob/master/numpy/linalg/lapack_litemodule.c >> >> But in that case we should probably use a macro guard to leave it out if >> any other LAPACK than the builtin f2c version is used. > > > It seems even the more recent (3.4.x) versions of LAPACK have places where > NANs can cause infinite loops. As long as this is an issue it might perhaps > be worth checking everywhere. > > http://www.netlib.org/lapack/bug_list.html > > The semi-official C interface LAPACKE implements NAN checking as well: > > http://www.netlib.org/lapack/lapacke.html#_nan_checking > > If Intel's engineers put NAN checking inside LAPACKE it probably were for a > good reason. As more evidence that checking isfinite could be important for stability even for non-lapack-lite LAPACKs, MKL docs currently include the following warning: WARNING LAPACK routines assume that input matrices do not contain IEEE 754 special values such as INF or NaN values. Using these special values may cause LAPACK to return unexpected results or become unstable. From stefan.otte at gmail.com Thu Feb 20 13:35:24 2014 From: stefan.otte at gmail.com (Stefan Otte) Date: Thu, 20 Feb 2014 19:35:24 +0100 Subject: [Numpy-discussion] Proposal: Chaining np.dot with mdot helper function In-Reply-To: References: Message-ID: Hey guys, I quickly hacked together a prototype of the optimization step: https://github.com/sotte/numpy_mdot I think there is still room for improvements so feedback is welcome :) I'll probably have some time to code on the weekend. @Nathaniel, I'm still not sure about integrating it in dot. Don't a lot of people use the optional out parameter of dot? Best, Stefan On Thu, Feb 20, 2014 at 4:02 PM, Nathaniel Smith wrote: > If you send a patch that deprecates dot's current behaviour for ndim>2, > we'll probably merge it. (We'd like it to function like you suggest, for > consistency with other gufuncs. But to get there we have to deprecate the > current behaviour first.) > > While I'm wishing for things I'll also mention that it would be really neat > if binary gufuncs would have a .outer method like regular ufuncs do, so > anyone currently using ndim>2 dot could just switch to that. But that's a > lot more work than just deprecating something :-). > > -n > > On 20 Feb 2014 09:27, "Eric Moore" wrote: >> >> >> >> On Thursday, February 20, 2014, Eelco Hoogendoorn >> wrote: >>> >>> If the standard semantics are not affected, and the most common >>> two-argument scenario does not take more than a single if-statement >>> overhead, I don't see why it couldn't be a replacement for the existing >>> np.dot; but others mileage may vary. >>> >>> >>> On Thu, Feb 20, 2014 at 11:34 AM, Stefan Otte >>> wrote: >>>> >>>> Hey, >>>> >>>> so I propose the following. I'll implement a new function `mdot`. >>>> Incorporating the changes in `dot` are unlikely. Later, one can still >>>> include >>>> the features in `dot` if desired. >>>> >>>> `mdot` will have a default parameter `optimize`. If `optimize==True` >>>> the >>>> reordering of the multiplication is done. Otherwise it simply chains >>>> the >>>> multiplications. >>>> >>>> I'll test and benchmark my implementation and create a pull request. >>>> >>>> Cheers, >>>> Stefan >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> Another consideration here is that we need a better way to work with >> stacked matrices such as np.linalg handles now. Ie I want to compute the >> matrix product of two (k, n, n) arrays producing a (k,n,n) result. Near as >> I can tell there isn't a way to do this right now that doesn't involve an >> explicit loop. Since dot will return a (k, n, k, n) result. Yes this output >> contains what I want but it also computes a lot of things that I don't want >> too. >> >> It would also be nice to be able to do a matrix product reduction, (k, n, >> n) -> (n, n) in a single line too. >> >> Eric >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sole at esrf.fr Thu Feb 20 14:13:04 2014 From: sole at esrf.fr (V. Armando Sole) Date: Thu, 20 Feb 2014 20:13:04 +0100 Subject: [Numpy-discussion] [JOB ANNOUNCEMENT] Software Developer permanent position available at ESRF, France In-Reply-To: References: Message-ID: <8bccf9f157ac3369280b818a8eb542c7@esrf.fr> Sorry, the link was in French ... The English version: http://esrf.profilsearch.com/recrute/fo_form_cand.php?_lang=en&id=300 Best regards, Armando On 20.02.2014 18:21, V. Armando Sole wrote: > Dear colleagues, > > The ESRF is looking for a Software Developer: > > http://esrf.profilsearch.com/recrute/fo_annonce_voir.php?id=300 > > Our ideal candidate would be experienced on OpenGL, OpenCL and > Python. > > Best regards, > > Armando From njs at pobox.com Thu Feb 20 14:39:29 2014 From: njs at pobox.com (Nathaniel Smith) Date: Thu, 20 Feb 2014 14:39:29 -0500 Subject: [Numpy-discussion] Proposal: Chaining np.dot with mdot helper function In-Reply-To: References: Message-ID: On Thu, Feb 20, 2014 at 1:35 PM, Stefan Otte wrote: > Hey guys, > > I quickly hacked together a prototype of the optimization step: > https://github.com/sotte/numpy_mdot > > I think there is still room for improvements so feedback is welcome :) > I'll probably have some time to code on the weekend. > > @Nathaniel, I'm still not sure about integrating it in dot. Don't a > lot of people use the optional out parameter of dot? The email you're replying to below about deprecating stuff in 'dot' was in reply to Eric's email about using dot on arrays with shape (k, n, n), so those comments are unrelated to the mdot stuff. I wouldn't mind seeing out= arguments become kw-only in general, but even if we decided to do that it would take a long deprecation period, so yeah, let's give up on 'dot(A, B, C, D)' as syntax for mdot. However, the suggestion of supporting np.dot([A, B, C, D]) still seems like it might be a good idea...? I have mixed feelings about it -- one less item cluttering up the namespace, but it is weird and magical to have two totally different calling conventions for the same function. -n > On Thu, Feb 20, 2014 at 4:02 PM, Nathaniel Smith wrote: >> If you send a patch that deprecates dot's current behaviour for ndim>2, >> we'll probably merge it. (We'd like it to function like you suggest, for >> consistency with other gufuncs. But to get there we have to deprecate the >> current behaviour first.) >> >> While I'm wishing for things I'll also mention that it would be really neat >> if binary gufuncs would have a .outer method like regular ufuncs do, so >> anyone currently using ndim>2 dot could just switch to that. But that's a >> lot more work than just deprecating something :-). >> >> -n >> >> On 20 Feb 2014 09:27, "Eric Moore" wrote: >>> >>> >>> >>> On Thursday, February 20, 2014, Eelco Hoogendoorn >>> wrote: >>>> >>>> If the standard semantics are not affected, and the most common >>>> two-argument scenario does not take more than a single if-statement >>>> overhead, I don't see why it couldn't be a replacement for the existing >>>> np.dot; but others mileage may vary. >>>> >>>> >>>> On Thu, Feb 20, 2014 at 11:34 AM, Stefan Otte >>>> wrote: >>>>> >>>>> Hey, >>>>> >>>>> so I propose the following. I'll implement a new function `mdot`. >>>>> Incorporating the changes in `dot` are unlikely. Later, one can still >>>>> include >>>>> the features in `dot` if desired. >>>>> >>>>> `mdot` will have a default parameter `optimize`. If `optimize==True` >>>>> the >>>>> reordering of the multiplication is done. Otherwise it simply chains >>>>> the >>>>> multiplications. >>>>> >>>>> I'll test and benchmark my implementation and create a pull request. >>>>> >>>>> Cheers, >>>>> Stefan >>>>> _______________________________________________ >>>>> NumPy-Discussion mailing list >>>>> NumPy-Discussion at scipy.org >>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>>> >>> Another consideration here is that we need a better way to work with >>> stacked matrices such as np.linalg handles now. Ie I want to compute the >>> matrix product of two (k, n, n) arrays producing a (k,n,n) result. Near as >>> I can tell there isn't a way to do this right now that doesn't involve an >>> explicit loop. Since dot will return a (k, n, k, n) result. Yes this output >>> contains what I want but it also computes a lot of things that I don't want >>> too. >>> >>> It would also be nice to be able to do a matrix product reduction, (k, n, >>> n) -> (n, n) in a single line too. >>> >>> Eric >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From olivier.grisel at ensta.org Thu Feb 20 17:17:10 2014 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Thu, 20 Feb 2014 23:17:10 +0100 Subject: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe In-Reply-To: References: <1225660970414595360.835902sturla.molden-gmail.com@news.gmane.org> Message-ID: I had a quick look (without running the procedure) but I don't understand some elements: - apparently you never tell in the numpy's site.cfg nor the scipy.cfg to use the openblas lib nor set the library_dirs: how does numpy.distutils know that it should dynlink against numpy/core/libopenblas.dll - how to you deal with the link to the following libraries: libgfortran.3.dll libgcc_s.1.dll libquadmath.0.dll If MinGW is installed on the system I assume that the linker will find them. But would it work when the wheel packages are installed on a system that does not have MinGW installed? Best, -- Olivier From jtaylor.debian at googlemail.com Thu Feb 20 17:54:30 2014 From: jtaylor.debian at googlemail.com (jtaylor.debian at googlemail.com) Date: Thu, 20 Feb 2014 14:54:30 -0800 (PST) Subject: [Numpy-discussion] [cython-users] Re: avoiding numpy temporaries via refcount In-Reply-To: <530466DB.9010302@behnel.de> References: <662189258414431804.957982sturla.molden-gmail.com@news.gmane.org> <1071465104414450610.332803sturla.molden-gmail.com@news.gmane.org> <530466DB.9010302@behnel.de> Message-ID: On Wednesday, February 19, 2014 9:10:03 AM UTC+1, Stefan Behnel wrote: > > >> Nathaniel Smith wrote: > >>> Does anyone see any issue we might be overlooking in this refcount == > 1 > >>> optimization for the python api? I'll post a PR with the change > shortly. > >>> > >>> It occurs belatedly that Cython code like a = np.arange(10) > >>> b = np.arange(10) > >>> c = a + b might end up calling tp_add with refcnt 1 arrays. Ditto > for > >>> same with cdef np.ndarray or cdef object added. We should check... > > That can happen, yes. Cython only guarantees that the object it passes is > safely owned so that the reference cannot go away while it's being > processed by a function. If it's in a local (non-closure) variable (or > Cython temporary variable), that guarantee holds, so it's safe to pass > objects with only a single reference into a C function, and Cython will do > that. > > Stefan > thats unfortunate, it would be a quite significant improvement to numpy (~30% improvement to all operations involving temporaries). Is increasing the reference count before going into PyNumber functions really that expensive that its worth avoiding? -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmkleffner at gmail.com Thu Feb 20 17:56:17 2014 From: cmkleffner at gmail.com (Carl Kleffner) Date: Thu, 20 Feb 2014 23:56:17 +0100 Subject: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe In-Reply-To: References: <1225660970414595360.835902sturla.molden-gmail.com@news.gmane.org> Message-ID: Hi, 2014-02-20 23:17 GMT+01:00 Olivier Grisel : > I had a quick look (without running the procedure) but I don't > understand some elements: > > - apparently you never tell in the numpy's site.cfg nor the scipy.cfg > to use the openblas lib nor set the > library_dirs: how does numpy.distutils know that it should dynlink > against numpy/core/libopenblas.dll > numpy's site.cfg is something like: (64 bit) [openblas] libraries = openblas library_dirs = D:/devel/mingw64static/x86_64-w64-mingw32/lib include_dirs = D:/devel/mingw64static/x86_64-w64-mingw32/include or (32 bit) [openblas] libraries = openblas library_dirs = D:/devel32/mingw32static/i686-w64-mingw32/lib include_dirs = D:/devel32/mingw32static/i686-w64-mingw32/include Please adapt the paths of course and apply the patches to numpy. > > - how to you deal with the link to the following libraries: > libgfortran.3.dll > libgcc_s.1.dll > libquadmath.0.dll > > You won't need them. I build the toolchain statically. Thus you don't have to mess up with GCC runtime libs. You can check the dependencies with MS depends or with ntldd (included in the toolchain) > If MinGW is installed on the system I assume that the linker will find > them. But would it work when the wheel packages are installed on a > system that does not have MinGW installed? > The wheels should be sufficient regardless if you have mingw installed or not. with best Regards Carl -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Feb 21 01:05:10 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 20 Feb 2014 23:05:10 -0700 Subject: [Numpy-discussion] Automatic issue triage. Message-ID: After 6 days of trudging through the numpy issues and finally passing the half way point, I'm wondering if we can set up so that new defects get a small test that can be parsed out and run periodically to mark issues that might be fixed. I expect it can be done, but might be more trouble than it is worth to keep working. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at gmail.com Fri Feb 21 01:09:34 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Fri, 21 Feb 2014 07:09:34 +0100 Subject: [Numpy-discussion] Header files in windows installers In-Reply-To: <14688223.zKv6Hc4UNA@obsidian> References: <14688223.zKv6Hc4UNA@obsidian> Message-ID: On Thu, Feb 20, 2014 at 6:33 PM, Matt Newell wrote: > > I have a small c++ extension used to feed a 1d numpy array into a > QPainterPath. Very simple just using PyArray_[Check|FLAGS|SIZE|DATA]. > I developed it on debian which was of course very straightforward, but now > I > need to deploy on windows, which is of course where the fun always begins. > > I was unpleasantly surprised to discover that the numpy installers for > window > do not have an option to install the include files, which appear to be all > that > is needed to compile extensions that use numpy's C-api. > That would be a bug. They can't all be missing though, because I'm able to compile scipy against numpy installed with those installers without problems. Could you open an issue on Github and give details on which headers are missing where? Ralf > I have actually managed to copy my set of include files from linux and > with a > few modifications to _numpyconfig.h got my extension compiled and working. > > Is there any chance that the includes could be included in future releases? > > Thanks, > Matt Newell > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sole at esrf.fr Fri Feb 21 01:43:17 2014 From: sole at esrf.fr (=?ISO-8859-1?Q?=22V=2E_Armando_Sol=E9=22?=) Date: Fri, 21 Feb 2014 07:43:17 +0100 Subject: [Numpy-discussion] [JOB ANNOUNCEMENT] Software Developer permanent position available at ESRF, France In-Reply-To: References: Message-ID: <5306F585.6050409@esrf.fr> Sorry, The link I sent you is in French. This is the English version. EUROPEAN SYNCHROTRON RADIATION FACILITY INSTALLATION EUROPEENNE DE RAYONNEMENT SYNCHROTRON The ESRF is a multinational research institute, situated in Grenoble, France and financed by 20 countries mostly European. It operates a powerful synchrotron X-ray source with some 30 beamlines (instruments) covering a wide range of scientific research in fields such as biology and medicine, chemistry, earth and environmental sciences, materials and surface science, and physics. The ESRF employs about 600 staff and is organized as a French /soci?t? civil./ / / /Within the Instrumentation Services and Development Division, the Software Group is now seeking to recruit a:/ Software Developer (m/f) permanent contract THE FUNCTION The ESRF is in the process of a major upgrade of the accelerator source and of several beamlines. In particular, the Upgrade Programme has created a heavy demand for data visualisation and analysis due to the massive data flow coming from the new detectors. The next generation of experiments will rely on both advanced parallelised algorithms for data analysis and high performance tools for data visualization. You will join the /Data Analysis Unit/ in the Software Group of the ISDD and will develop software for data analysis and visualization. You will be expected to: * develop and maintain software and graphical user interfaces for visualizing scientific data * help develop a long term strategy for the visualization and analysis of data (online and offline) * contribute to the general effort of adapting existing software and developing new solutions for data analysis You will need to be able to understand data analysis requirements and propose working solutions. QUALIFICATIONS AND EXPERIENCE The candidate should have a higher university degree (Master, MSc, DESS, Diplom, Diploma, Ingeniera Superior, Licenciatura, Laurea or equivalent) in Computer Science, Mathematics, Physics, Chemistry, Bioinformatics, Engineering or related areas. Applicants must have at least 3 years of experience in scientific programming in the fields of data analysis and visualisation. The candidate must have good knowledge of OpenGL and the OpenGL Shading Language or similar visualisation libraries. Experience in data analysis, especially of large datasets is highly desirable, particularly using e.g. OpenCL, CUDA. Knowledge of one high level programming language (Python, Matlab, ...), a high-level graphics library (VTK ,...) and one low level language (C, C++, ...) will be considered assets in addition to competence in using development tools for compilation, distribution and code management. Proven contributions to open source projects will also be appreciated. The successful candidate should be able to work independently as well as in multidisciplinary teams. Good English communication and presentation skills are required. Further information on the post can be obtained from Andy G?tz (andy.gotz at esrf.fr ) and/or Claudio Ferrero (ferrero at esrf.fr). *Ref. 8173* *- Deadline for returning application forms: * *01/04/2014* -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: img_charte1.jpg Type: image/jpeg Size: 18405 bytes Desc: not available URL: From jenny.stone125 at gmail.com Fri Feb 21 03:18:54 2014 From: jenny.stone125 at gmail.com (Jennifer stone) Date: Fri, 21 Feb 2014 13:48:54 +0530 Subject: [Numpy-discussion] GSOC In-Reply-To: References: Message-ID: https://wiki.python.org/moin/SummerOfCode/2014 The link provided by Josef is yet to list SciPy/NumPy under it. Somebody please contact Terri. That page acts as major guiding factor for Python-GSoC prospective students. Please have SciPy listed there. -------------- next part -------------- An HTML attachment was scrubbed... URL: From fancyerii at gmail.com Fri Feb 21 03:44:56 2014 From: fancyerii at gmail.com (Li Li) Date: Fri, 21 Feb 2014 16:44:56 +0800 Subject: [Numpy-discussion] dot product by 1*n maxtrix multiply it's transform not equal to raw python dot product? Message-ID: hi all I am porting some python code to java but got different results. after long time debugging, I found the reason is numpy's result is not the same as java(even the result of raw python codes) one case is: vector:[0.446141, 0.10414999999999996](that's not accurate) it's binary representation of IEEE754 is: 0.446141:0011111111011100100011011001001011111011000110011110011100110010 0.10419..:0011111110111010101010011001001100001011111000001101111011010000 dot product by java(or raw python) of x1*x1+x2*x2 is: 0.20988901438..(0011111111001010110111011010010010101010010001110010110101000011) but the result of pynum: x*x.T is: 0011111111001010110111011010010010101010010001110010110101000010 the last bit of mantissa is one difference. I don't floating points always hard to make sure to get same result in different environment(not only hardware but also software such as compiler optimizing and reordering) But I still want to know why, thanks. My environment: Python 2.7.3 (default, Sep 26 2013, 20:03:06) [GCC 4.6.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> numpy.version.version '1.6.1' From d.l.goldsmith at gmail.com Fri Feb 21 04:55:45 2014 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Fri, 21 Feb 2014 01:55:45 -0800 Subject: [Numpy-discussion] [JOB ANNOUNCEMENT] Software Developer permanent position Message-ID: On Thu, Feb 20, 2014 at 10:37 PM, wrote: > Date: Fri, 21 Feb 2014 07:43:17 +0100 > From: "V. Armando Sol?" > *Ref. 8173* *- Deadline for returning application forms: * *01/04/2014* > I assume that's the European date format, i.e., the due date is April 1, 2014, not Jan. 4 2014, oui? DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From jurgen.vangael at gmail.com Fri Feb 21 04:57:24 2014 From: jurgen.vangael at gmail.com (Jurgen Van Gael) Date: Fri, 21 Feb 2014 09:57:24 +0000 Subject: [Numpy-discussion] OpenBLAS on Mac In-Reply-To: References: Message-ID: First thing I noticed when installing into /opt/OpenBLAS was that the LAPACK header files were not being copied properly. This was because the OpenBLAS makefile uses the "-D" option in the install command which the default Mac install doesn't support. A quick "brew install coreutils" solved that problem. I rebuilt a new virtualenv and rebuilt numpy into it using the OpenBLAS in /opt/OpenBLAS and things seem to be absolutely fine now. I can run the OpenBLAS version on my mac. Thanks for the suggestions! I ran the test: https://gist.githubusercontent.com/osdf/3842524/raw/df01f7fa9d849bec353d6ab03eae0c1ee68f1538/test_numpy.py On my Macbook Pro (Intel(R) Core(TM) i7-2720QM CPU @ 2.20GHz, 8GB Ram) disappointingly: the Atlas version gives consistent 0.02, the OpenBLAS version runs in 0.1 with OMP_NUM_THREADS=1 and 0.04 with OMP_NUM_THREADS=8. Happy to run a more extensive test suite if anyone is interested. Jurgen On Thu, Feb 20, 2014 at 6:07 PM, Olivier Grisel wrote: > I have exactly the same setup as yours and it links to OpenBLAS > correctly (in a venv as well, installed with python setup.py install). > The only difference is that I installed OpenBLAS in the default > folder: /opt/OpenBLAS (and I reflected that in site.cfg). > > When you run otool -L, is it in your source tree or do you point to > the numpy/core/_dotblas.so of the site-packages folder of your venv? > > If you activate your venv, go to a different folder (e.g. /tmp) and type: > > python -c "import numpy as np; np.show_config()" > > what do you get? I get: > > $ python -c "import numpy as np; np.show_config()" > lapack_opt_info: > libraries = ['openblas', 'openblas'] > library_dirs = ['/opt/OpenBLAS/lib'] > language = f77 > blas_opt_info: > libraries = ['openblas', 'openblas'] > library_dirs = ['/opt/OpenBLAS/lib'] > language = f77 > openblas_info: > libraries = ['openblas', 'openblas'] > library_dirs = ['/opt/OpenBLAS/lib'] > language = f77 > blas_mkl_info: > NOT AVAILABLE > > -- > Olivier > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Fri Feb 21 05:07:34 2014 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 21 Feb 2014 10:07:34 +0000 (UTC) Subject: [Numpy-discussion] Automatic issue triage. References: Message-ID: Charles R Harris gmail.com> writes: > After 6 days of trudging through the numpy issues and > finally passing the half way point, I'm wondering if we > can set up so that new defects get a small test that can > be parsed out and run periodically to mark issues that might > be fixed. I expect it can be done, but might be more trouble > than it is worth to keep working. Github has an API for accessing issue contents. curl -i "https://api.github.com/repos/numpy/numpy/issues?state=open" If some markup for test cases is devised, a tool can be written that detects them. Alternatively, one could just add a separate git repository numpy/bugs.git for bug test cases, containing e.g. files `gh-1234.py`. Such scripts need to be written anyway at some point (or copypasted to Python shell). It would also be better from security POV to use a separate repo for bug test cases. This would also solve the issue of how to add attachments to bug reports in one way. -- Pauli Virtanen From sole at esrf.fr Fri Feb 21 05:12:42 2014 From: sole at esrf.fr (V. Armando Sole) Date: Fri, 21 Feb 2014 11:12:42 +0100 Subject: [Numpy-discussion] [JOB ANNOUNCEMENT] Software Developer permanent position In-Reply-To: References: Message-ID: <11e198a7ca9566796647d46dfe19e70b@esrf.fr> On 21.02.2014 10:55, David Goldsmith wrote: > On Thu, Feb 20, 2014 at 10:37 PM, wrote: > >> Date: Fri, 21 Feb 2014 07:43:17 +0100 >> From: "V. Armando Sol?" >> *Ref. 8173* *- Deadline for returning application forms: * >> *01/04/2014* > > I assume thats the European date format, i.e., the due date is April > 1, 2014, not Jan. 4 2014, oui? > Yes, it is European date format and it is *not* an April 1st joke. Armando From robert.kern at gmail.com Fri Feb 21 06:07:20 2014 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 21 Feb 2014 11:07:20 +0000 Subject: [Numpy-discussion] Automatic issue triage. In-Reply-To: References: Message-ID: On Fri, Feb 21, 2014 at 10:07 AM, Pauli Virtanen wrote: > Charles R Harris gmail.com> writes: >> After 6 days of trudging through the numpy issues and >> finally passing the half way point, I'm wondering if we >> can set up so that new defects get a small test that can >> be parsed out and run periodically to mark issues that might >> be fixed. I expect it can be done, but might be more trouble >> than it is worth to keep working. > > Github has an API for accessing issue contents. > > curl -i "https://api.github.com/repos/numpy/numpy/issues?state=open" > > If some markup for test cases is devised, a tool can be written > that detects them. > > Alternatively, one could just add a separate git repository > numpy/bugs.git for bug test cases, containing e.g. files > `gh-1234.py`. Such scripts need to be written anyway at some > point (or copypasted to Python shell). It would also be better > from security POV to use a separate repo for bug test cases. > > This would also solve the issue of how to add attachments > to bug reports in one way. Seems like more trouble than it's worth to automate. We don't want just anyone with a Github account to add arbitrary code to our test suites, do we? The idea of an "expected failure" test suite is a good one, but it seems to me that it could be maintained by normal PR processes just fine. -- Robert Kern From pav at iki.fi Fri Feb 21 06:16:41 2014 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 21 Feb 2014 11:16:41 +0000 (UTC) Subject: [Numpy-discussion] Automatic issue triage. References: Message-ID: Robert Kern gmail.com> writes: [clip] > Seems like more trouble than it's worth to automate. We don't want > just anyone with a Github account to add arbitrary code to our test > suites, do we? The idea of an "expected failure" test suite is a good > one, but it seems to me that it could be maintained by normal PR > processes just fine. Yes. However, using a separate repository might make this more easy to deal with. This also does not have the "running arbitrary code" problem. -- Pauli Virtanen From olivier.grisel at ensta.org Fri Feb 21 06:26:32 2014 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Fri, 21 Feb 2014 12:26:32 +0100 Subject: [Numpy-discussion] OpenBLAS on Mac In-Reply-To: References: Message-ID: Indeed I just ran the bench on my Mac and OSX Veclib is more than 2x faster than OpenBLAS on such squared matrix multiplication (I just have 2 physical cores on this box). MKL from Canopy Express is slightly slower OpenBLAS for this GEMM bench on that box. I really wonder why Veclib is faster in this case. Maybe OSX 10.9 did improve its perf... From robert.kern at gmail.com Fri Feb 21 06:37:19 2014 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 21 Feb 2014 11:37:19 +0000 Subject: [Numpy-discussion] Automatic issue triage. In-Reply-To: References: Message-ID: On Fri, Feb 21, 2014 at 11:16 AM, Pauli Virtanen wrote: > Robert Kern gmail.com> writes: > [clip] >> Seems like more trouble than it's worth to automate. We don't want >> just anyone with a Github account to add arbitrary code to our test >> suites, do we? The idea of an "expected failure" test suite is a good >> one, but it seems to me that it could be maintained by normal PR >> processes just fine. > > Yes. However, using a separate repository might make this more > easy to deal with. This also does not have the "running arbitrary > code" problem. Yes, I agree with keeping it in a separate repo. Just add to it via the normal PR processes. -- Robert Kern From olivier.grisel at ensta.org Fri Feb 21 06:41:39 2014 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Fri, 21 Feb 2014 12:41:39 +0100 Subject: [Numpy-discussion] Default builds of OpenBLAS development branch are now fork safe In-Reply-To: References: <1225660970414595360.835902sturla.molden-gmail.com@news.gmane.org> Message-ID: 2014-02-20 23:56 GMT+01:00 Carl Kleffner : > Hi, > > 2014-02-20 23:17 GMT+01:00 Olivier Grisel : > >> I had a quick look (without running the procedure) but I don't >> understand some elements: >> >> - apparently you never tell in the numpy's site.cfg nor the scipy.cfg >> to use the openblas lib nor set the >> library_dirs: how does numpy.distutils know that it should dynlink >> against numpy/core/libopenblas.dll > > > numpy's site.cfg is something like: (64 bit) > > [openblas] > libraries = openblas > library_dirs = D:/devel/mingw64static/x86_64-w64-mingw32/lib > include_dirs = D:/devel/mingw64static/x86_64-w64-mingw32/include > > or (32 bit) > > [openblas] > libraries = openblas > library_dirs = D:/devel32/mingw32static/i686-w64-mingw32/lib > include_dirs = D:/devel32/mingw32static/i686-w64-mingw32/include Thanks, what I don't understand is how the libopenblas.dll will be found at runtime. Is it a specific "feature" of windows? For instance how would the scipy/linalg/_*.so file know that the libopenblas.dll can be found in $PYTHONPATH/numpy/core? > Please adapt the paths of course and apply the patches to numpy. > >> >> - how to you deal with the link to the following libraries: >> libgfortran.3.dll >> libgcc_s.1.dll >> libquadmath.0.dll >> > You won't need them. I build the toolchain statically. Thus you don't have > to mess up with GCC runtime libs. You can check the dependencies with MS > depends or with ntldd (included in the toolchain) Great! I did not know it was possible. I guess that if we want to replicate that for Linux and Mac we will have to also build custom static GCC toolchains as well. Is there a good reference doc somewhere on how to do so? When googling I only find posts by people who cannot make their toolchain build statically correctly. -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel From dpinte at enthought.com Fri Feb 21 09:26:35 2014 From: dpinte at enthought.com (Didrik Pinte) Date: Fri, 21 Feb 2014 15:26:35 +0100 Subject: [Numpy-discussion] Document server error. In-Reply-To: References: Message-ID: On 19 February 2014 07:42, Charles R Harris wrote: > From issue #1951 : > > The following URL shows an 500 internal server error: >> http://docs.scipy.org/doc/numpy/reference/generated/numpy.var.html >> > > Can someone with access to the server take a look? > Issue has been isolated. This machine needs cleanup and a fix to the http config. We're trying to do that asap. -- Didrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From dpinte at enthought.com Fri Feb 21 09:57:36 2014 From: dpinte at enthought.com (Didrik Pinte) Date: Fri, 21 Feb 2014 15:57:36 +0100 Subject: [Numpy-discussion] Document server error. In-Reply-To: References: Message-ID: That specific issue is fixed. -- Didrik On 21 February 2014 15:26, Didrik Pinte wrote: > > > > On 19 February 2014 07:42, Charles R Harris wrote: > >> From issue #1951 : >> >> The following URL shows an 500 internal server error: >>> http://docs.scipy.org/doc/numpy/reference/generated/numpy.var.html >>> >> >> Can someone with access to the server take a look? >> > > Issue has been isolated. This machine needs cleanup and a fix to the http > config. We're trying to do that asap. > > -- Didrik > -- Didrik Pinte +32 475 665 668 +44 1223 969515 Enthought Europe dpinte at enthought.com Scientific Computing Solutions http://www.enthought.com The information contained in this message is Enthought confidential & not to be dissiminated to outside parties without explicit prior approval from sender. This message is intended solely for the addressee(s), If you are not the intended recipient, please contact the sender by return e-mail and destroy all copies of the original message. -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Feb 21 10:38:44 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 21 Feb 2014 08:38:44 -0700 Subject: [Numpy-discussion] Document server error. In-Reply-To: References: Message-ID: On Fri, Feb 21, 2014 at 7:57 AM, Didrik Pinte wrote: > That specific issue is fixed. > > -- Didrik > > > On 21 February 2014 15:26, Didrik Pinte wrote: > >> >> >> >> On 19 February 2014 07:42, Charles R Harris wrote: >> >>> From issue #1951 : >>> >>> The following URL shows an 500 internal server error: >>>> http://docs.scipy.org/doc/numpy/reference/generated/numpy.var.html >>>> >>> >>> Can someone with access to the server take a look? >>> >> >> Issue has been isolated. This machine needs cleanup and a fix to the http >> config. We're trying to do that asap. >> >> -- Didrik >> > > > > -- > Didrik Pinte +32 475 665 668 > +44 1223 969515 > Enthought Europe dpinte at enthought.com > Scientific Computing Solutions http://www.enthought.com > > The information contained in this message is Enthought confidential & not > to be dissiminated to outside parties without explicit prior approval from > sender. This message is intended solely for the addressee(s), If you are > not the intended recipient, please contact the sender by return e-mail and > destroy all copies of the original message. > > Great, thanks. Any useful info I should tack onto the numpy ticket before closing it? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From dpinte at enthought.com Fri Feb 21 10:48:15 2014 From: dpinte at enthought.com (Didrik Pinte) Date: Fri, 21 Feb 2014 16:48:15 +0100 Subject: [Numpy-discussion] Document server error. In-Reply-To: References: Message-ID: On 21 February 2014 16:38, Charles R Harris wrote: > > > > On Fri, Feb 21, 2014 at 7:57 AM, Didrik Pinte wrote: > >> That specific issue is fixed. >> >> -- Didrik >> >> >> On 21 February 2014 15:26, Didrik Pinte wrote: >> >>> >>> >>> >>> On 19 February 2014 07:42, Charles R Harris wrote: >>> >>>> From issue #1951 : >>>> >>>> The following URL shows an 500 internal server error: >>>>> http://docs.scipy.org/doc/numpy/reference/generated/numpy.var.html >>>>> >>>> >>>> Can someone with access to the server take a look? >>>> >>> >>> Issue has been isolated. This machine needs cleanup and a fix to the >>> http config. We're trying to do that asap >>> >> I added the info with the root cause of the problem. I think it is already closed. -- Didrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Feb 21 11:04:07 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 21 Feb 2014 09:04:07 -0700 Subject: [Numpy-discussion] Document server error. In-Reply-To: References: Message-ID: On Fri, Feb 21, 2014 at 8:48 AM, Didrik Pinte wrote: > > > > On 21 February 2014 16:38, Charles R Harris wrote: > >> >> >> >> On Fri, Feb 21, 2014 at 7:57 AM, Didrik Pinte wrote: >> >>> That specific issue is fixed. >>> >>> -- Didrik >>> >>> >>> On 21 February 2014 15:26, Didrik Pinte wrote: >>> >>>> >>>> >>>> >>>> On 19 February 2014 07:42, Charles R Harris wrote: >>>> >>>>> From issue #1951 : >>>>> >>>>> The following URL shows an 500 internal server error: >>>>>> http://docs.scipy.org/doc/numpy/reference/generated/numpy.var.html >>>>>> >>>>> >>>>> Can someone with access to the server take a look? >>>>> >>>> >>>> Issue has been isolated. This machine needs cleanup and a fix to the >>>> http config. We're trying to do that asap >>>> >>> > I added the info with the root cause of the problem. I think it is already > closed. > > Yeah, I closed it after I saw you had commented on it. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Feb 21 11:17:16 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 21 Feb 2014 09:17:16 -0700 Subject: [Numpy-discussion] GSOC In-Reply-To: References: Message-ID: On Fri, Feb 21, 2014 at 1:18 AM, Jennifer stone wrote: > https://wiki.python.org/moin/SummerOfCode/2014 > The link provided by Josef is yet to list SciPy/NumPy under it. Somebody > please contact Terri. > That page acts as major guiding factor for Python-GSoC prospective > students. Please have SciPy listed there. > > Ralph, do you know if there is there someone representing Scipy/Numpy who is officially supposed to handle this, or should I take a shot at it? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Feb 21 11:24:15 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 21 Feb 2014 08:24:15 -0800 Subject: [Numpy-discussion] Header files in windows installers In-Reply-To: References: <14688223.zKv6Hc4UNA@obsidian> Message-ID: > > I was unpleasantly surprised to discover that the numpy installers for >> window >> do not have an option to install the include files, which appear to be >> all that >> is needed to compile extensions that use numpy's C-api. >> > > That would be a bug. They can't all be missing though, because I'm able to > compile scipy against numpy installed with those installers without > problems. > agreed -- i have been compiling extensions against numpy on Windows with numpy from various binary installers for eyars -- never had a proble (with include file,s snaywa...) ARe you using: numpy.get_include() to get the location of the headers for that install? It should be in your setup.py file, something like: my_ext = Extension(.... include = [np.get_include()]. ....) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From newellm at blur.com Fri Feb 21 12:28:57 2014 From: newellm at blur.com (Matt Newell) Date: Fri, 21 Feb 2014 09:28:57 -0800 Subject: [Numpy-discussion] Header files in windows installers In-Reply-To: References: <14688223.zKv6Hc4UNA@obsidian> Message-ID: <3831954.6MS7gNjnQa@obsidian> On Friday, February 21, 2014 08:24:15 AM Chris Barker wrote: > > I was unpleasantly surprised to discover that the numpy installers for > > > >> window > >> do not have an option to install the include files, which appear to be > >> all that > >> is needed to compile extensions that use numpy's C-api. > > > > That would be a bug. They can't all be missing though, because I'm able to > > compile scipy against numpy installed with those installers without > > problems. > > agreed -- i have been compiling extensions against numpy on Windows with > numpy from various binary installers for eyars -- never had a proble (with > include file,s snaywa...) > > ARe you using: > > numpy.get_include() > Thank you, problem solved! Header files are indeed installed and there is no bug to report. I only assumed that the header files weren't installed because I couldn't find them in the include dir inside the python installation, and the installer gave no options. I'm not well versed in the normal system for compiling and distributing python extensions because my extensions are using SIP as they are heavily integrated with Qt/PyQt. Thanks, Matt Newell From ondrej.certik at gmail.com Sat Feb 22 00:35:50 2014 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Fri, 21 Feb 2014 22:35:50 -0700 Subject: [Numpy-discussion] argsort speed In-Reply-To: References: <53021A2C.5020601@continuum.io> <530255B2.60509@googlemail.com> Message-ID: On Mon, Feb 17, 2014 at 11:40 AM, Charles R Harris wrote: > > > > On Mon, Feb 17, 2014 at 11:32 AM, Julian Taylor > wrote: >> >> On 17.02.2014 15:18, Francesc Alted wrote: >> > On 2/17/14, 1:08 AM, josef.pktd at gmail.com wrote: >> >> On Sun, Feb 16, 2014 at 6:12 PM, Da?id wrote: >> >>> On 16 February 2014 23:43, wrote: >> >>>> What's the fastest argsort for a 1d array with around 28 Million >> >>>> elements, roughly uniformly distributed, random order? >> >>> >> >>> On numpy latest version: >> >>> >> >>> for kind in ['quicksort', 'mergesort', 'heapsort']: >> >>> print kind >> >>> %timeit np.sort(data, kind=kind) >> >>> %timeit np.argsort(data, kind=kind) >> >>> >> >>> >> >>> quicksort >> >>> 1 loops, best of 3: 3.55 s per loop >> >>> 1 loops, best of 3: 10.3 s per loop >> >>> mergesort >> >>> 1 loops, best of 3: 4.84 s per loop >> >>> 1 loops, best of 3: 9.49 s per loop >> >>> heapsort >> >>> 1 loops, best of 3: 12.1 s per loop >> >>> 1 loops, best of 3: 39.3 s per loop >> >>> >> >>> >> >>> It looks quicksort is quicker sorting, but mergesort is marginally >> >>> faster >> >>> sorting args. The diference is slim, but upon repetition, it remains >> >>> significant. >> >>> >> >>> Why is that? Probably part of the reason is what Eelco said, and part >> >>> is >> >>> that in sort comparison are done accessing the array elements >> >>> directly, but >> >>> in argsort you have to index the array, introducing some overhead. >> >> Thanks, both. >> >> >> >> I also gain a second with mergesort. >> >> >> >> matlab would be nicer in my case, it returns both. >> >> I still need to use the argsort to index into the array to also get >> >> the sorted array. >> > >> > Many years ago I needed something similar, so I made some functions for >> > sorting and argsorting in one single shot. Maybe you want to reuse >> > them. Here it is an example of the C implementation: >> > >> > https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619 >> > >> > and here the Cython wrapper for all of them: >> > >> > >> > https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129 >> > >> > Francesc >> > >> >> that doesn't really make a big difference if the data is randomly >> distributed. >> the sorting operation is normally much more expensive than latter >> applying the indices: >> >> In [1]: d = np.arange(10000000) >> >> In [2]: np.random.shuffle(d) >> >> In [3]: %timeit np.argsort(d) >> 1 loops, best of 3: 1.99 s per loop >> >> In [4]: idx = np.argsort(d) >> >> In [5]: %timeit d[idx] >> 1 loops, best of 3: 213 ms per loop >> >> >> >> But if your data is not random it can make a difference as even >> quicksort can be a lot faster then. >> timsort would be a nice addition to numpy, it performs very well for >> partially sorted data. Unfortunately its quite complicated to implement. > > > Quicksort and shellsort gain speed by having simple inner loops. I have the > impression that timsort is optimal when compares and memory access are > expensive, but I haven't seen any benchmarks for native types in contiguous > memory. I found some benchmarks for continuous memory here: https://github.com/swenson/sort/ https://github.com/gfx/cpp-TimSort The first one seems the best, it probably can be directly reused in numpy. The only issue is that it only sorts the array, but does not provide argsort. Ondrej From charlesr.harris at gmail.com Sat Feb 22 01:09:21 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 21 Feb 2014 23:09:21 -0700 Subject: [Numpy-discussion] argsort speed In-Reply-To: References: <53021A2C.5020601@continuum.io> <530255B2.60509@googlemail.com> Message-ID: On Fri, Feb 21, 2014 at 10:35 PM, Ond?ej ?ert?k wrote: > On Mon, Feb 17, 2014 at 11:40 AM, Charles R Harris > wrote: > > > > > > > > On Mon, Feb 17, 2014 at 11:32 AM, Julian Taylor > > wrote: > >> > >> On 17.02.2014 15:18, Francesc Alted wrote: > >> > On 2/17/14, 1:08 AM, josef.pktd at gmail.com wrote: > >> >> On Sun, Feb 16, 2014 at 6:12 PM, Da?id > wrote: > >> >>> On 16 February 2014 23:43, wrote: > >> >>>> What's the fastest argsort for a 1d array with around 28 Million > >> >>>> elements, roughly uniformly distributed, random order? > >> >>> > >> >>> On numpy latest version: > >> >>> > >> >>> for kind in ['quicksort', 'mergesort', 'heapsort']: > >> >>> print kind > >> >>> %timeit np.sort(data, kind=kind) > >> >>> %timeit np.argsort(data, kind=kind) > >> >>> > >> >>> > >> >>> quicksort > >> >>> 1 loops, best of 3: 3.55 s per loop > >> >>> 1 loops, best of 3: 10.3 s per loop > >> >>> mergesort > >> >>> 1 loops, best of 3: 4.84 s per loop > >> >>> 1 loops, best of 3: 9.49 s per loop > >> >>> heapsort > >> >>> 1 loops, best of 3: 12.1 s per loop > >> >>> 1 loops, best of 3: 39.3 s per loop > >> >>> > >> >>> > >> >>> It looks quicksort is quicker sorting, but mergesort is marginally > >> >>> faster > >> >>> sorting args. The diference is slim, but upon repetition, it remains > >> >>> significant. > >> >>> > >> >>> Why is that? Probably part of the reason is what Eelco said, and > part > >> >>> is > >> >>> that in sort comparison are done accessing the array elements > >> >>> directly, but > >> >>> in argsort you have to index the array, introducing some overhead. > >> >> Thanks, both. > >> >> > >> >> I also gain a second with mergesort. > >> >> > >> >> matlab would be nicer in my case, it returns both. > >> >> I still need to use the argsort to index into the array to also get > >> >> the sorted array. > >> > > >> > Many years ago I needed something similar, so I made some functions > for > >> > sorting and argsorting in one single shot. Maybe you want to reuse > >> > them. Here it is an example of the C implementation: > >> > > >> > https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619 > >> > > >> > and here the Cython wrapper for all of them: > >> > > >> > > >> > > https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129 > >> > > >> > Francesc > >> > > >> > >> that doesn't really make a big difference if the data is randomly > >> distributed. > >> the sorting operation is normally much more expensive than latter > >> applying the indices: > >> > >> In [1]: d = np.arange(10000000) > >> > >> In [2]: np.random.shuffle(d) > >> > >> In [3]: %timeit np.argsort(d) > >> 1 loops, best of 3: 1.99 s per loop > >> > >> In [4]: idx = np.argsort(d) > >> > >> In [5]: %timeit d[idx] > >> 1 loops, best of 3: 213 ms per loop > >> > >> > >> > >> But if your data is not random it can make a difference as even > >> quicksort can be a lot faster then. > >> timsort would be a nice addition to numpy, it performs very well for > >> partially sorted data. Unfortunately its quite complicated to implement. > > > > > > Quicksort and shellsort gain speed by having simple inner loops. I have > the > > impression that timsort is optimal when compares and memory access are > > expensive, but I haven't seen any benchmarks for native types in > contiguous > > memory. > > I found some benchmarks for continuous memory here: > > https://github.com/swenson/sort/ > https://github.com/gfx/cpp-TimSort > > The first one seems the best, it probably can be directly reused in numpy. > The only issue is that it only sorts the array, but does not provide > argsort. > > I'm impressed by the heapsort time. Heapsort is the slowest of the numpy sorts. So either the heapsort implementation is better than ours or the other sort are worse ;) Partially sorted sequence are pretty common, so timsort might be a worthy addition. Last time I looked, JDK was using timsort for sorting objects, and quicksort for native types. Another sort is dual pivot quicksort that I've heard some good things about. Adding indirect sorts isn't so difficult once the basic sort is available. Since the memory access tends to be larger as it gets randomly accessed, timsort might be a good choice for that. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Sat Feb 22 15:55:31 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 22 Feb 2014 21:55:31 +0100 Subject: [Numpy-discussion] OpenBLAS on Mac In-Reply-To: References: Message-ID: On 20/02/14 17:57, Jurgen Van Gael wrote: > Hi All, > > I run Mac OS X 10.9.1 and was trying to get OpenBLAS working for numpy. > I've downloaded the OpenBLAS source and compiled it (thanks to Olivier > Grisel). How? $ make TARGET=SANDYBRIDGE USE_OPENMP=0 BINARY=64 NOFORTRAN=1 make: *** No targets specified and no makefile found. Stop. (staying with MKL...) Sturla From robert.kern at gmail.com Sat Feb 22 16:00:35 2014 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 22 Feb 2014 21:00:35 +0000 Subject: [Numpy-discussion] OpenBLAS on Mac In-Reply-To: References: Message-ID: On Sat, Feb 22, 2014 at 8:55 PM, Sturla Molden wrote: > On 20/02/14 17:57, Jurgen Van Gael wrote: >> Hi All, >> >> I run Mac OS X 10.9.1 and was trying to get OpenBLAS working for numpy. >> I've downloaded the OpenBLAS source and compiled it (thanks to Olivier >> Grisel). > > How? > > $ make TARGET=SANDYBRIDGE USE_OPENMP=0 BINARY=64 NOFORTRAN=1 > make: *** No targets specified and no makefile found. Stop. > > (staying with MKL...) Without any further details about what you downloaded and where you executed this command, one can only assume PEBCAK. There is certainly a Makefile in the root directory of the OpenBLAS source: https://github.com/xianyi/OpenBLAS If you actually want some help, you will have to provide a *little* more detail. -- Robert Kern From sturla.molden at gmail.com Sat Feb 22 16:07:27 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 22 Feb 2014 22:07:27 +0100 Subject: [Numpy-discussion] OpenBLAS on Mac In-Reply-To: References: Message-ID: On 22/02/14 22:00, Robert Kern wrote: > If you actually want some help, you will have to provide a *little* more detail. $ git clone https://github.com/xianyi/OpenBLAS Oops... $ cd OpenBLAS did the trick. I need some coffee :) Sturla From njs at pobox.com Sat Feb 22 16:15:31 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 22 Feb 2014 16:15:31 -0500 Subject: [Numpy-discussion] OpenBLAS on Mac In-Reply-To: References: Message-ID: On Sat, Feb 22, 2014 at 3:55 PM, Sturla Molden wrote: > On 20/02/14 17:57, Jurgen Van Gael wrote: >> Hi All, >> >> I run Mac OS X 10.9.1 and was trying to get OpenBLAS working for numpy. >> I've downloaded the OpenBLAS source and compiled it (thanks to Olivier >> Grisel). > > How? > > $ make TARGET=SANDYBRIDGE USE_OPENMP=0 BINARY=64 NOFORTRAN=1 You'll definitely want to disable the affinity support too, and probably memory warmup. And possibly increase the maximum thread count, unless you'll only use the library on the computer it was built on. And maybe other things. The OpenBLAS build process has so many ways to accidentally impale yourself, it's an object lesson in why building regulations are a good thing. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From njs at pobox.com Sat Feb 22 17:00:47 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 22 Feb 2014 17:00:47 -0500 Subject: [Numpy-discussion] A PEP for adding infix matrix multiply to Python Message-ID: [Apologies for wide distribution -- please direct followups to either the github PR linked below, or else numpy-discussion at scipy.org] After the numpy-discussion thread about np.matrix a week or so back, I got curious and read the old PEPs that attempted to add better matrix/elementwise operators to Python. http://legacy.python.org/dev/peps/pep-0211/ http://legacy.python.org/dev/peps/pep-0225/ And I was a bit surprised -- if I were BDFL I probably would have rejected these PEPs too. One is actually a proposal to make itertools.product into an infix operator, which no-one would consider seriously on its own merits. And the other adds a whole pile of weirdly spelled new operators with no clear idea about what they should do. But it seems to me that at this point, with the benefit of multiple years more experience, we know much better what we want -- basically, just a nice clean infix op for matrix multiplication. And that just asking for this directly, and explaining clearly why we want it, is something that hasn't been tried. So maybe we should try and see what happens. As a starting point for discussion, I wrote a draft. It can be read and commented on here: https://github.com/numpy/numpy/pull/4351 It's important that if we're going to do this at all, we do it right, and that means being able to honestly say that this document represents our consensus when going to python-dev. So if you think you might object please do so now :-) -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From njs at pobox.com Sat Feb 22 17:03:03 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 22 Feb 2014 17:03:03 -0500 Subject: [Numpy-discussion] How exactly ought 'dot' to work? Message-ID: Hi all, Currently numpy's 'dot' acts a bit weird for ndim>2 or ndim<1. In practice this doesn't usually matter much, because these are very rarely used. But, I would like to nail down the behaviour so we can say something precise in the matrix multiplication PEP. So here's one proposal. # CURRENT: dot(0d, any) -> scalar multiplication dot(any, 0d) -> scalar multiplication dot(1d, 1d) -> inner product dot(2d, 1d) -> treat 1d as column matrix, matrix-multiply, then discard added axis dot(1d, 2d) -> treat 1d as row matrix, matrix-multiply, then discard added axis dot(2d, 2d) -> matrix multiply dot(2-or-more d, 2-or-more d) -> a complicated outer product thing: Specifically, if the inputs have shapes (r, n, m), (s, m, k), then numpy returns an array with shape (r, s, n, k), created like: for i in range(r): for j in range(s): output[i, j, :, :] = np.dot(input1[i, :, :], input2[j, :, :]) # PROPOSED: General rule: given dot on shape1, shape2, we try to match these shapes against two templates like (..., n?, m) and (..., m, k?) where ... indicates zero or more dimensions, and ? indicates an optional axis. ? axes are always matched before ... axes, so for an input with ndim>=2, the ? axis is always matched. An unmatched ? axis is treated as having size 1. Next, the ... axes are broadcast against each other in the usual way (prepending 1s to make lengths the same, requiring corresponding entries to either match or have the value 1). And then the actual computations are performed using the usual broadcasting rules. Finally, we return an output with shape (..., n?, k?). Here "..." indicates the result of broadcasting the input ...'s against each other. And, n? and k? mean: "either the value taken from the input shape, if the corresponding entry was matched -- but if no match was made, then we leave this entry out." The idea is that just as a column vector on the right is "m x 1", a 1d vector on the right is treated as "m x ". For purposes of actually computing the product, acts like 1, as mentioned above. But it makes a difference in what we return: in each of these cases we copy the input shape into the output, so we can get an output with shape (n, ), or (, k), or (, ), which work out to be (n,), (k,) and (), respectively. This gives a (somewhat) intuitive principle for why dot(1d, 1d), dot(1d, 2d), dot(2d, 1d) are handled the way they are, and a general template for extending this behaviour to other operations like gufunc 'solve'. Anyway, the end result of this is that the PROPOSED behaviour differs from the current behaviour in the following ways: - passing 0d arrays to 'dot' becomes an error. (This in particular is an important thing to know, because if core Python adds an operator for 'dot', then we must decide what it should do for Python scalars, which are logically 0d.) - ndim>2 arrays are now handled by aligning and broadcasting the extra axes, instead of taking an outer product. So dot((r, m, n), (r, n, k)) returns (r, m, k), not (r, r, m, k). Comments? -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From matthew.brett at gmail.com Sat Feb 22 17:17:05 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 22 Feb 2014 14:17:05 -0800 Subject: [Numpy-discussion] How exactly ought 'dot' to work? In-Reply-To: References: Message-ID: Hi, On Sat, Feb 22, 2014 at 2:03 PM, Nathaniel Smith wrote: > Hi all, > > Currently numpy's 'dot' acts a bit weird for ndim>2 or ndim<1. In > practice this doesn't usually matter much, because these are very > rarely used. But, I would like to nail down the behaviour so we can > say something precise in the matrix multiplication PEP. So here's one > proposal. > > # CURRENT: > > dot(0d, any) -> scalar multiplication > dot(any, 0d) -> scalar multiplication > dot(1d, 1d) -> inner product > dot(2d, 1d) -> treat 1d as column matrix, matrix-multiply, then > discard added axis > dot(1d, 2d) -> treat 1d as row matrix, matrix-multiply, then discard added axis > dot(2d, 2d) -> matrix multiply > dot(2-or-more d, 2-or-more d) -> a complicated outer product thing: > Specifically, if the inputs have shapes (r, n, m), (s, m, k), then > numpy returns an array with shape (r, s, n, k), created like: > for i in range(r): > for j in range(s): > output[i, j, :, :] = np.dot(input1[i, :, :], input2[j, :, :]) > > # PROPOSED: > > General rule: given dot on shape1, shape2, we try to match these > shapes against two templates like > (..., n?, m) and (..., m, k?) > where ... indicates zero or more dimensions, and ? indicates an > optional axis. ? axes are always matched before ... axes, so for an > input with ndim>=2, the ? axis is always matched. An unmatched ? axis > is treated as having size 1. > > Next, the ... axes are broadcast against each other in the usual way > (prepending 1s to make lengths the same, requiring corresponding > entries to either match or have the value 1). And then the actual > computations are performed using the usual broadcasting rules. > > Finally, we return an output with shape (..., n?, k?). Here "..." > indicates the result of broadcasting the input ...'s against each > other. And, n? and k? mean: "either the value taken from the input > shape, if the corresponding entry was matched -- but if no match was > made, then we leave this entry out." The idea is that just as a column > vector on the right is "m x 1", a 1d vector on the right is treated as > "m x ". For purposes of actually computing the product, > acts like 1, as mentioned above. But it makes a difference > in what we return: in each of these cases we copy the input shape into > the output, so we can get an output with shape (n, ), or > (, k), or (, ), which work out to be (n,), > (k,) and (), respectively. This gives a (somewhat) intuitive principle > for why dot(1d, 1d), dot(1d, 2d), dot(2d, 1d) are handled the way they > are, and a general template for extending this behaviour to other > operations like gufunc 'solve'. > > Anyway, the end result of this is that the PROPOSED behaviour differs > from the current behaviour in the following ways: > - passing 0d arrays to 'dot' becomes an error. (This in particular is > an important thing to know, because if core Python adds an operator > for 'dot', then we must decide what it should do for Python scalars, > which are logically 0d.) > - ndim>2 arrays are now handled by aligning and broadcasting the extra > axes, instead of taking an outer product. So dot((r, m, n), (r, n, k)) > returns (r, m, k), not (r, r, m, k). > > Comments? The discussion might become confusing in the conflation of: * backward incompatible changes to dot * coherent behavior to propose in a PEP Maybe we could concentrate on the second, on the basis it's likely to be a few years until it is available, and the work out compatibility if the PEP gets accepted. If A @ B means 'matrix multiply A, B' - then it would be a shame to raise an error if A or B is a scalar. Sympy, for example, will allow matrix multiplication by a scalar, MATLAB / Octave too. I have used > 2D dot calls in the past, maybe still do, I'm not sure. Cheers, Matthew From jaime.frio at gmail.com Sat Feb 22 17:37:03 2014 From: jaime.frio at gmail.com (=?ISO-8859-1?Q?Jaime_Fern=E1ndez_del_R=EDo?=) Date: Sat, 22 Feb 2014 14:37:03 -0800 Subject: [Numpy-discussion] How exactly ought 'dot' to work? In-Reply-To: References: Message-ID: On Feb 22, 2014 2:03 PM, "Nathaniel Smith" wrote: > > Hi all, > > Currently numpy's 'dot' acts a bit weird for ndim>2 or ndim<1. In > practice this doesn't usually matter much, because these are very > rarely used. But, I would like to nail down the behaviour so we can > say something precise in the matrix multiplication PEP. So here's one > proposal. > > # CURRENT: > > dot(0d, any) -> scalar multiplication > dot(any, 0d) -> scalar multiplication > dot(1d, 1d) -> inner product > dot(2d, 1d) -> treat 1d as column matrix, matrix-multiply, then > discard added axis > dot(1d, 2d) -> treat 1d as row matrix, matrix-multiply, then discard added axis > dot(2d, 2d) -> matrix multiply > dot(2-or-more d, 2-or-more d) -> a complicated outer product thing: > Specifically, if the inputs have shapes (r, n, m), (s, m, k), then > numpy returns an array with shape (r, s, n, k), created like: > for i in range(r): > for j in range(s): > output[i, j, :, :] = np.dot(input1[i, :, :], input2[j, :, :]) > > # PROPOSED: > > General rule: given dot on shape1, shape2, we try to match these > shapes against two templates like > (..., n?, m) and (..., m, k?) > where ... indicates zero or more dimensions, and ? indicates an > optional axis. ? axes are always matched before ... axes, so for an > input with ndim>=2, the ? axis is always matched. An unmatched ? axis > is treated as having size 1. > > Next, the ... axes are broadcast against each other in the usual way > (prepending 1s to make lengths the same, requiring corresponding > entries to either match or have the value 1). And then the actual > computations are performed using the usual broadcasting rules. > > Finally, we return an output with shape (..., n?, k?). Here "..." > indicates the result of broadcasting the input ...'s against each > other. And, n? and k? mean: "either the value taken from the input > shape, if the corresponding entry was matched -- but if no match was > made, then we leave this entry out." The idea is that just as a column > vector on the right is "m x 1", a 1d vector on the right is treated as > "m x ". For purposes of actually computing the product, > acts like 1, as mentioned above. But it makes a difference > in what we return: in each of these cases we copy the input shape into > the output, so we can get an output with shape (n, ), or > (, k), or (, ), which work out to be (n,), > (k,) and (), respectively. This gives a (somewhat) intuitive principle > for why dot(1d, 1d), dot(1d, 2d), dot(2d, 1d) are handled the way they > are, and a general template for extending this behaviour to other > operations like gufunc 'solve'. > > Anyway, the end result of this is that the PROPOSED behaviour differs > from the current behaviour in the following ways: > - passing 0d arrays to 'dot' becomes an error. (This in particular is > an important thing to know, because if core Python adds an operator > for 'dot', then we must decide what it should do for Python scalars, > which are logically 0d.) > - ndim>2 arrays are now handled by aligning and broadcasting the extra > axes, instead of taking an outer product. So dot((r, m, n), (r, n, k)) > returns (r, m, k), not (r, r, m, k). > > Comments? The proposed behavior for ndim > 2 is what matrix_multiply (is it still in umath_tests?) does. The nice thing of the proposed new behavior is that the old behavior is easy to reproduce by fooling a little around with the shape of the first argument, while the opposite is not true. Jaime > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Sat Feb 22 17:37:40 2014 From: njs at pobox.com (Nathaniel Smith) Date: Sat, 22 Feb 2014 17:37:40 -0500 Subject: [Numpy-discussion] How exactly ought 'dot' to work? In-Reply-To: References: Message-ID: On Sat, Feb 22, 2014 at 5:17 PM, Matthew Brett wrote: > The discussion might become confusing in the conflation of: > > * backward incompatible changes to dot > * coherent behavior to propose in a PEP Right, I definitely am asking about how we think the "ideal" dot operator should work. The migration strategy to get to there from here is a separate question, that will raise a bunch of details that are a distraction from the fundamental question of what we *want*. > If A @ B means 'matrix multiply A, B' - then it would be a shame to > raise an error if A or B is a scalar. > > Sympy, for example, will allow matrix multiplication by a scalar, > MATLAB / Octave too. Interesting! We do disagree on this then. I feel strongly that given separate matrix product and elementwise product operators @ and *, then 'scalar @ matrix' should be an error, and if you want scalar (elementwise) product then you should write 'scalar * matrix'. Sympy, MATLAB, Octave are not really good guides, because either they have only a single operator available (Sympy with *, np.matrix with *), or they have an alternative operator available but it's annoying to type and rarely used (MATLAB/Octave with .*). For us, the two operators are both first-class, and we've all decided that the scalar/elementwise operator is actually the more important and commonly used one, and matrix multiply is the unusual case (regardless of whether we end up spelling it 'dot' or '@'). So why would we want a special case for scalars in dot/@? And anyway, TOOWTDI. Notation like 'L * X @ Y' really makes it immediately clear what sort of operations we're dealing with, too. > I have used > 2D dot calls in the past, maybe still do, I'm not sure. Have you used dot(>2D, >2D)? That's the case that this proposal would change -- dot(>2D, <=2D) is the same under both the outer product and broadcasting proposals. I feel pretty strongly that the broadcasting proposal is the right thing, for consistency with other operators -- e.g., all ufuncs and all functions in np.linalg already do broadcasting, so 'dot' is currently really the odd one out. The outer product functionality is potentially useful, but it should be called np.dot.outer, because logically it has the same relationship to np.dot that np.add.outer has to np.add, etc. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From sturla.molden at gmail.com Sat Feb 22 17:39:11 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sat, 22 Feb 2014 23:39:11 +0100 Subject: [Numpy-discussion] OpenBLAS on Mac In-Reply-To: References: Message-ID: On 22/02/14 22:15, Nathaniel Smith wrote: >> $ make TARGET=SANDYBRIDGE USE_OPENMP=0 BINARY=64 NOFORTRAN=1 > > You'll definitely want to disable the affinity support too, and > probably memory warmup. And possibly increase the maximum thread > count, unless you'll only use the library on the computer it was built > on. And maybe other things. The OpenBLAS build process has so many > ways to accidentally impale yourself, it's an object lesson in why > building regulations are a good thing. Thanks for the advice. Right now I am just testing on my own computer. cblas_dgemm is running roughly 50 % faster with OpenBLAS than MKL 11.1 update 2, sometimes OpenBLAS is twice as fast as MKL. WTF??? :-D Ok, next runner up is Accelerate. Let's see how it compares to OpenBLAS and MKL on Mavericks. Sturla -------------- next part -------------- #include #include #include #include "mkl.h" double nanodiff(const uint64_t _t0, const uint64_t _t1) { long double t0, t1, numer, denom, nanosec; mach_timebase_info_data_t tb_info; mach_timebase_info(&tb_info); numer = (long double)(tb_info.numer); denom = (long double)(tb_info.denom); t0 = (long double)(_t0); t1 = (long double)(_t1); nanosec = (t1 - t0) * numer / denom; return (double)nanosec; } int main(int argc, char **argv) { const int BOUNDARY = 64; long double nanosec; int n = 512; int m = n, k = n; double *A = (double*)mkl_malloc(n*n*sizeof(double), BOUNDARY); double *B = (double*)mkl_malloc(n*n*sizeof(double), BOUNDARY); double *C = (double*)mkl_malloc(n*n*sizeof(double), BOUNDARY); uint64_t t0, t1; t0 = mach_absolute_time(); cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, 1.0, A, k, B, n, 1.0, C, n); t1 = mach_absolute_time(); nanosec = nanodiff(t0, t1); printf("elapsed time: %g ns\n", (double)nanosec); mkl_free(A); mkl_free(B); mkl_free(C); } -------------- next part -------------- #include #include #include #include double nanodiff(const uint64_t _t0, const uint64_t _t1) { long double t0, t1, numer, denom, nanosec; mach_timebase_info_data_t tb_info; mach_timebase_info(&tb_info); numer = (long double)(tb_info.numer); denom = (long double)(tb_info.denom); t0 = (long double)(_t0); t1 = (long double)(_t1); nanosec = (t1 - t0) * numer / denom; return (double)nanosec; } int main(int argc, char **argv) { long double nanosec; int n = 512; int m = n, k = n; double *A = (double*)malloc(n*n*sizeof(double)); double *B = (double*)malloc(n*n*sizeof(double)); double *C = (double*)malloc(n*n*sizeof(double)); uint64_t t0, t1; t0 = mach_absolute_time(); cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, 1.0, A, k, B, n, 1.0, C, n); t1 = mach_absolute_time(); nanosec = nanodiff(t0, t1); printf("elapsed time: %g ns\n", (double)nanosec); free(A); free(B); free(C); } From josef.pktd at gmail.com Sat Feb 22 17:48:33 2014 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 22 Feb 2014 17:48:33 -0500 Subject: [Numpy-discussion] How exactly ought 'dot' to work? In-Reply-To: References: Message-ID: On Sat, Feb 22, 2014 at 5:17 PM, Matthew Brett wrote: > Hi, > > On Sat, Feb 22, 2014 at 2:03 PM, Nathaniel Smith wrote: >> Hi all, >> >> Currently numpy's 'dot' acts a bit weird for ndim>2 or ndim<1. In >> practice this doesn't usually matter much, because these are very >> rarely used. But, I would like to nail down the behaviour so we can >> say something precise in the matrix multiplication PEP. So here's one >> proposal. >> >> # CURRENT: >> >> dot(0d, any) -> scalar multiplication >> dot(any, 0d) -> scalar multiplication >> dot(1d, 1d) -> inner product >> dot(2d, 1d) -> treat 1d as column matrix, matrix-multiply, then >> discard added axis >> dot(1d, 2d) -> treat 1d as row matrix, matrix-multiply, then discard added axis >> dot(2d, 2d) -> matrix multiply >> dot(2-or-more d, 2-or-more d) -> a complicated outer product thing: >> Specifically, if the inputs have shapes (r, n, m), (s, m, k), then >> numpy returns an array with shape (r, s, n, k), created like: >> for i in range(r): >> for j in range(s): >> output[i, j, :, :] = np.dot(input1[i, :, :], input2[j, :, :]) >> >> # PROPOSED: >> >> General rule: given dot on shape1, shape2, we try to match these >> shapes against two templates like >> (..., n?, m) and (..., m, k?) >> where ... indicates zero or more dimensions, and ? indicates an >> optional axis. ? axes are always matched before ... axes, so for an >> input with ndim>=2, the ? axis is always matched. An unmatched ? axis >> is treated as having size 1. >> >> Next, the ... axes are broadcast against each other in the usual way >> (prepending 1s to make lengths the same, requiring corresponding >> entries to either match or have the value 1). And then the actual >> computations are performed using the usual broadcasting rules. >> >> Finally, we return an output with shape (..., n?, k?). Here "..." >> indicates the result of broadcasting the input ...'s against each >> other. And, n? and k? mean: "either the value taken from the input >> shape, if the corresponding entry was matched -- but if no match was >> made, then we leave this entry out." The idea is that just as a column >> vector on the right is "m x 1", a 1d vector on the right is treated as >> "m x ". For purposes of actually computing the product, >> acts like 1, as mentioned above. But it makes a difference >> in what we return: in each of these cases we copy the input shape into >> the output, so we can get an output with shape (n, ), or >> (, k), or (, ), which work out to be (n,), >> (k,) and (), respectively. This gives a (somewhat) intuitive principle >> for why dot(1d, 1d), dot(1d, 2d), dot(2d, 1d) are handled the way they >> are, and a general template for extending this behaviour to other >> operations like gufunc 'solve'. >> >> Anyway, the end result of this is that the PROPOSED behaviour differs >> from the current behaviour in the following ways: >> - passing 0d arrays to 'dot' becomes an error. (This in particular is >> an important thing to know, because if core Python adds an operator >> for 'dot', then we must decide what it should do for Python scalars, >> which are logically 0d.) >> - ndim>2 arrays are now handled by aligning and broadcasting the extra >> axes, instead of taking an outer product. So dot((r, m, n), (r, n, k)) >> returns (r, m, k), not (r, r, m, k). I don't quite manage to follow that description for nd behavior. How do you figure out which axes to use for the cross-product dot((m,m,m), (m, m)) ? Doesn't numpy have a new gufunc that does this kind of vectorized/broadcasted dot product? I cannot find it right now. >> >> Comments? > > The discussion might become confusing in the conflation of: > > * backward incompatible changes to dot > * coherent behavior to propose in a PEP > > Maybe we could concentrate on the second, on the basis it's likely to > be a few years until it is available, and the work out compatibility > if the PEP gets accepted. > > If A @ B means 'matrix multiply A, B' - then it would be a shame to > raise an error if A or B is a scalar. > > Sympy, for example, will allow matrix multiplication by a scalar, > MATLAB / Octave too. I also don't see a reason to disallow multiplication with a scalar (it's just the math), but I doubt we use it in statsmodels. > > I have used > 2D dot calls in the past, maybe still do, I'm not sure. I've never found a use for dot with ndim > 2 (nor tensordot), it never did what I needed. Josef > > Cheers, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sturla.molden at gmail.com Sat Feb 22 18:11:46 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 23 Feb 2014 00:11:46 +0100 Subject: [Numpy-discussion] OpenBLAS on Mac In-Reply-To: References: Message-ID: On 22/02/14 23:39, Sturla Molden wrote: > Ok, next runner up is Accelerate. Let's see how it compares to OpenBLAS > and MKL on Mavericks. It seems Accelerate has roughly the same performance as MKL now. Did the upgrade to Mavericks do this? These are the compile lines, in case you wonder: $ CC -O2 -o perftest_openblas -I/opt/OpenBLAS/include -L/opt/OpenBLAS/lib perftest_openblas.c -lopenblas $ CC -O2 -o perftest_accelerate perftest_accelerate.c -framework Accelerate $ source /opt/intel/composer_xe_2013/mkl/bin/mklvars.sh intel64 $ icc -O2 -o perftest_mkl -mkl -static-intel perftest_mkl.c Sturla -------------- next part -------------- #include #include #include #include double nanodiff(const uint64_t _t0, const uint64_t _t1) { long double t0, t1, numer, denom, nanosec; mach_timebase_info_data_t tb_info; mach_timebase_info(&tb_info); numer = (long double)(tb_info.numer); denom = (long double)(tb_info.denom); t0 = (long double)(_t0); t1 = (long double)(_t1); nanosec = (t1 - t0) * numer / denom; return (double)nanosec; } int main(int argc, char **argv) { long double nanosec; int n = 512; int m = n, k = n; double *A = (double*)malloc(n*n*sizeof(double)); double *B = (double*)malloc(n*n*sizeof(double)); double *C = (double*)malloc(n*n*sizeof(double)); uint64_t t0, t1; t0 = mach_absolute_time(); cblas_dgemm(CblasRowMajor, CblasNoTrans, CblasNoTrans, m, n, k, 1.0, A, k, B, n, 1.0, C, n); t1 = mach_absolute_time(); nanosec = nanodiff(t0, t1); printf("elapsed time: %g ns\n", (double)nanosec); free(A); free(B); free(C); } From alan.isaac at gmail.com Sat Feb 22 18:30:26 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Sat, 22 Feb 2014 18:30:26 -0500 Subject: [Numpy-discussion] How exactly ought 'dot' to work? In-Reply-To: References: Message-ID: <53093312.5060500@gmail.com> It would help me follow this discussion if it were broken up into pieces: - what is being asserted as first principles for `dot` (e.g., what mathematical understanding)? - to what extent do other important implementations (e.g., Mathematica and Julia) deviate from the proposed mathematical understanding? - were the motivations for any deviations adequate (e.g., supported by strong use cases)? An example is the discussion of whether scalar multiplication of a matrix should be represented by * or by a new operator (e.g., @). I am personally most comfortable with the idea that a new matrix multiplication operator would not handle scalar multiplication or violate tensor product rules (perhaps I am influenced by Mathematica), but I am not prepared to argue the principles of such choices, and would appreciate hearing from those who are. Thanks, Alan Isaac From pav at iki.fi Sat Feb 22 19:09:48 2014 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 23 Feb 2014 02:09:48 +0200 Subject: [Numpy-discussion] How exactly ought 'dot' to work? In-Reply-To: References: Message-ID: 23.02.2014 00:03, Nathaniel Smith kirjoitti: > Currently numpy's 'dot' acts a bit weird for ndim>2 or ndim<1. In > practice this doesn't usually matter much, because these are very > rarely used. But, I would like to nail down the behaviour so we can > say something precise in the matrix multiplication PEP. I'm not sure it's necessary to say much about this in the PEP. It should in my view concentrate on arguing why the new binop is needed in the Python language, and for that, restricting to 2D is good enough IMHO. How exactly Numpy makes use of the capability for > 2-dim arrays is something that should definitely be discussed. But I think this is a problem mainly interesting for Numpy devs, and not for CPython devs. -- Pauli Virtanen From sturla.molden at gmail.com Sat Feb 22 20:43:00 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 23 Feb 2014 02:43:00 +0100 Subject: [Numpy-discussion] OpenBLAS on Mac In-Reply-To: References: Message-ID: On 23/02/14 00:11, Sturla Molden wrote: > Did the upgrade to Mavericks do this? > Testing different matrix sizes and averaging 30 trials, they are quite similar, actually. Accelerate is perhaps the winner, but it really depends on the matrix size. See for yourself. :-) Sturla List of attachments: Plots of the average runtime: dgemm_test.png dgemm_test2.png C codes: perftest_openblas.c perftest_accelerate.c perftest_mkl.c Timings from my MacBook Pro (2.4 GHz i7) accelerate.txt openblas.txt mkl.txt -------------- next part -------------- A non-text attachment was scrubbed... Name: dgemm_test.png Type: image/png Size: 59517 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: dgemm_test2.png Type: image/png Size: 49065 bytes Desc: not available URL: -------------- next part -------------- #include #include #include #include #include #include "mkl.h" const int matrix_size[] = { 10, 13, 16, 21, 26, 34, 43, 55, 70, 89, 113, 144, 183, 234, 298, 379, 483, 616, 785, 1000 }; const int matrix_size_pow2[] = { 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048 }; const int NREPEATS = 30; const int MAX_N = 2048; double nanodiff(const uint64_t _t0, const uint64_t _t1, const mach_timebase_info_data_t *tb_info) { long double t0, t1, numer, denom, nanosec; numer = (long double)(tb_info->numer); denom = (long double)(tb_info->denom); t0 = (long double)(_t0); t1 = (long double)(_t1); nanosec = (t1 - t0) * numer / denom; return (double)nanosec; } void fill_with_random(const int n, double *x) { static unsigned int m_w = 123456; static unsigned int m_z = 5635273; int i; for (i=0; i> 16); m_w = 18000 * (m_w & 65535) + (m_w >> 16); *x++ = ((m_z << 16) + m_w) * 2.3283064365386963e-10; } } void statistics(const int n, const double *x, double *m, double *s, double *min, double *max) { double sum_x=0.0, cx=0.0, sum_cxcx=0.0, _m; double minval, maxval, v; int i; for (i=0; i v ? v : minval); } *max = maxval; *min = minval; } int main(int argc, char **argv) { double nanosec[NREPEATS]; uint64_t t0, t1; mach_timebase_info_data_t tb_info; double *A = (double*)mkl_malloc(MAX_N*MAX_N*sizeof(double),64); double *B = (double*)mkl_malloc(MAX_N*MAX_N*sizeof(double),64); double *C = (double*)mkl_malloc(MAX_N*MAX_N*sizeof(double),64); double mean, std, min, max; int i, j, k, m, n; mach_timebase_info(&tb_info); fill_with_random(MAX_N*MAX_N, A); fill_with_random(MAX_N*MAX_N, B); fill_with_random(MAX_N*MAX_N, C); for (i=0; i<20; i++) { n = matrix_size[i]; m = n; k = n; for (j=0; j #include #include #include #include #include const int matrix_size[] = { 10, 13, 16, 21, 26, 34, 43, 55, 70, 89, 113, 144, 183, 234, 298, 379, 483, 616, 785, 1000 }; const int matrix_size_pow2[] = { 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048 }; const int NREPEATS = 30; const int MAX_N = 2048; double nanodiff(const uint64_t _t0, const uint64_t _t1, const mach_timebase_info_data_t *tb_info) { long double t0, t1, numer, denom, nanosec; numer = (long double)(tb_info->numer); denom = (long double)(tb_info->denom); t0 = (long double)(_t0); t1 = (long double)(_t1); nanosec = (t1 - t0) * numer / denom; return (double)nanosec; } void fill_with_random(const int n, double *x) { static unsigned int m_w = 123456; static unsigned int m_z = 5635273; int i; for (i=0; i> 16); m_w = 18000 * (m_w & 65535) + (m_w >> 16); *x++ = ((m_z << 16) + m_w) * 2.3283064365386963e-10; } } void statistics(const int n, const double *x, double *m, double *s, double *min, double *max) { double sum_x=0.0, cx=0.0, sum_cxcx=0.0, _m; double minval, maxval, v; int i; for (i=0; i v ? v : minval); } *max = maxval; *min = minval; } int main(int argc, char **argv) { double nanosec[NREPEATS]; uint64_t t0, t1; mach_timebase_info_data_t tb_info; double *A = (double*)malloc(MAX_N*MAX_N*sizeof(double)); double *B = (double*)malloc(MAX_N*MAX_N*sizeof(double)); double *C = (double*)malloc(MAX_N*MAX_N*sizeof(double)); double mean, std, min, max; int i, j, k, m, n; mach_timebase_info(&tb_info); fill_with_random(MAX_N*MAX_N, A); fill_with_random(MAX_N*MAX_N, B); fill_with_random(MAX_N*MAX_N, C); for (i=0; i<20; i++) { n = matrix_size[i]; m = n; k = n; for (j=0; j /* #include */ #include #include #include #include typedef struct { int status; uint64_t t0; uint64_t t1; mach_timebase_info_data_t tb_info; } perf_timer_t; const int PERF_TIMER_CLEAR = 0; const int PERF_TIMER_RUNNING = 1; const int PERF_TIMER_STOPPED = 2; perf_timer_t *create_perf_timer(void) { perf_timer_t *timer = (perf_timer_t *)malloc(sizeof(perf_timer_t)); if (timer == NULL) goto error; timer->status = PERF_TIMER_CLEAR; mach_timebase_info(&timer->tb_info); return timer; error: if (timer != NULL) free(timer); return NULL; } void destroy_perf_timer(perf_timer_t *timer) { if (timer != NULL) free(timer); } int start_perf_timer(perf_timer_t *timer) { if (timer == NULL) goto error; if (timer->status == PERF_TIMER_RUNNING) goto error; timer->t0 = mach_absolute_time(); timer->status = PERF_TIMER_RUNNING; return 0; error: return -1; } int stop_perf_timer(perf_timer_t *timer) { if (timer == NULL) goto error; if (timer->status != PERF_TIMER_RUNNING) goto error; timer->t1 = mach_absolute_time(); timer->status = PERF_TIMER_STOPPED; return 0; error: return -1; } int clear_perf_timer(perf_timer_t *timer) { if (timer == NULL) return -1; if (timer->status != PERF_TIMER_STOPPED) return -1; timer->status = PERF_TIMER_CLEAR; return 0; } int nanodiff_perf_timer(const perf_timer_t *timer, long double *nanosec) { long double t0, t1, numer, denom; if (timer == NULL) return -1; if (timer->status != PERF_TIMER_STOPPED) return -1; numer = (long double)(timer->tb_info.numer); denom = (long double)(timer->tb_info.denom); t0 = (long double)(timer->t0); t1 = (long double)(timer->t1); if (nanosec != NULL) { *nanosec = (t1 - t0) * numer / denom; return 0; } else return -1; } int nanores_perf_timer(const perf_timer_t *timer, long double *nanosec_resolution) { long double numer, denom; if (timer == NULL) return -1; numer = (long double)(timer->tb_info.numer); denom = (long double)(timer->tb_info.denom); if (nanosec_resolution != NULL) { *nanosec_resolution = numer / denom; return 0; } else return -1; } -------------- next part -------------- [ 10, 1.8249e+03, 7.5052e+03, 4.3600e+02, 4.1562e+04], [ 13, 9.4907e+02, 2.7929e+02, 8.9100e+02, 2.4260e+03], [ 16, 2.3644e+03, 6.0826e+03, 1.0810e+03, 3.4278e+04], [ 21, 2.7637e+03, 2.5322e+03, 2.0580e+03, 1.3706e+04], [ 26, 3.6573e+03, 1.8597e+03, 2.9700e+03, 9.9230e+03], [ 34, 5.7156e+03, 8.1641e+02, 5.4040e+03, 8.9120e+03], [ 43, 1.0956e+04, 2.3088e+03, 1.0122e+04, 1.9766e+04], [ 55, 2.1788e+04, 1.1596e+04, 1.8955e+04, 8.2381e+04], [ 70, 3.8077e+04, 4.2441e+03, 3.6798e+04, 5.4417e+04], [ 89, 7.0032e+04, 5.1650e+03, 6.8568e+04, 9.6636e+04], [ 113, 1.3922e+05, 1.3897e+04, 1.3513e+05, 2.0781e+05], [ 144, 1.2979e+05, 7.2822e+04, 9.9868e+04, 4.4901e+05], [ 183, 2.0268e+05, 1.0931e+04, 1.9760e+05, 2.4071e+05], [ 234, 4.0132e+05, 9.1694e+04, 3.5535e+05, 7.1502e+05], [ 298, 8.6309e+05, 2.6523e+05, 6.9578e+05, 1.4252e+06], [ 379, 1.5985e+06, 4.0820e+05, 1.4171e+06, 2.8624e+06], [ 483, 3.0864e+06, 6.0088e+05, 2.8326e+06, 5.4301e+06], [ 616, 6.6489e+06, 1.5387e+06, 5.7692e+06, 1.0613e+07], [ 785, 1.4754e+07, 3.4156e+06, 1.2107e+07, 2.1649e+07], [1000, 3.1154e+07, 5.5740e+06, 2.4195e+07, 4.0360e+07], [ 4, 1.9477e+02, 3.2552e+02, 1.2800e+02, 1.9150e+03], [ 8, 3.8077e+02, 7.9966e+01, 3.5500e+02, 8.0200e+02], [ 16, 2.7588e+03, 6.3393e+03, 1.4950e+03, 3.6304e+04], [ 32, 6.6004e+03, 2.1850e+03, 5.5530e+03, 1.5607e+04], [ 64, 3.5622e+04, 1.2820e+03, 3.4828e+04, 4.1095e+04], [ 128, 9.6848e+04, 3.6344e+04, 7.9238e+04, 2.3208e+05], [ 256, 4.5615e+05, 9.2865e+04, 4.2092e+05, 8.2640e+05], [ 512, 3.3334e+06, 2.6552e+05, 3.2502e+06, 4.7093e+06], [1024, 3.5191e+07, 5.5548e+06, 2.5881e+07, 4.5181e+07], [2048, 2.7725e+08, 1.5691e+07, 2.3737e+08, 2.9737e+08], -------------- next part -------------- [ 10, 2.4680e+03, 1.0001e+04, 5.2700e+02, 5.5399e+04], [ 13, 8.8493e+02, 3.0951e+02, 8.0500e+02, 2.5100e+03], [ 16, 3.3365e+03, 6.1387e+03, 2.0660e+03, 3.5831e+04], [ 21, 3.5532e+03, 6.4100e+02, 3.1980e+03, 6.2850e+03], [ 26, 4.5164e+03, 1.6061e+02, 4.3510e+03, 5.0860e+03], [ 34, 5.4646e+03, 8.0315e+02, 5.0180e+03, 9.6500e+03], [ 43, 9.5117e+03, 7.1500e+02, 9.1560e+03, 1.3219e+04], [ 55, 1.5673e+04, 9.1927e+02, 1.5228e+04, 2.0464e+04], [ 70, 2.3438e+04, 2.3378e+03, 2.2688e+04, 3.5773e+04], [ 89, 4.8458e+04, 9.8972e+02, 4.7769e+04, 5.3483e+04], [ 113, 6.3075e+04, 1.8880e+04, 4.7137e+04, 1.0836e+05], [ 144, 9.1631e+04, 1.5589e+04, 8.3371e+04, 1.3086e+05], [ 183, 1.7476e+05, 6.5323e+03, 1.7318e+05, 2.0928e+05], [ 234, 4.6118e+05, 1.5100e+05, 3.2846e+05, 6.8653e+05], [ 298, 1.3137e+06, 2.2717e+04, 1.2116e+06, 1.3345e+06], [ 379, 2.6917e+06, 1.7294e+05, 1.8249e+06, 2.8141e+06], [ 483, 4.5088e+06, 1.1748e+06, 2.5604e+06, 5.5026e+06], [ 616, 9.0293e+06, 2.1052e+06, 5.3924e+06, 1.1081e+07], [ 785, 2.2010e+07, 1.7049e+06, 1.7738e+07, 2.6944e+07], [1000, 3.4778e+07, 8.2156e+06, 2.2372e+07, 4.5916e+07], [ 4, 4.3840e+02, 4.3370e+02, 2.5200e+02, 2.6970e+03], [ 8, 4.4577e+02, 1.7493e+02, 3.8900e+02, 1.3540e+03], [ 16, 1.9848e+03, 4.6771e+02, 1.6710e+03, 4.4150e+03], [ 32, 3.0532e+03, 5.4374e+02, 2.7920e+03, 5.2560e+03], [ 64, 9.3250e+03, 6.6518e+02, 8.9210e+03, 1.2746e+04], [ 128, 5.4620e+04, 2.0069e+03, 5.3877e+04, 6.5200e+04], [ 256, 3.7464e+05, 7.6448e+03, 3.7055e+05, 4.0864e+05], [ 512, 3.7553e+06, 1.3328e+06, 2.9490e+06, 6.1633e+06], [1024, 2.9947e+07, 5.5860e+06, 2.3243e+07, 4.3105e+07], [2048, 2.4223e+08, 2.3419e+07, 1.9434e+08, 2.8706e+08], -------------- next part -------------- [ 10, 1.3607e+03, 5.1284e+03, 3.4300e+02, 2.8490e+04], [ 13, 6.4280e+02, 1.9452e+02, 5.9600e+02, 1.6710e+03], [ 16, 7.2043e+02, 6.6691e+01, 6.9700e+02, 1.0690e+03], [ 21, 1.5310e+04, 7.3601e+04, 1.8240e+03, 4.0500e+05], [ 26, 3.1701e+03, 1.6513e+03, 2.7050e+03, 1.0508e+04], [ 34, 1.2321e+04, 3.9041e+04, 4.9730e+03, 2.1902e+05], [ 43, 9.0754e+03, 1.8779e+03, 8.5860e+03, 1.8982e+04], [ 55, 1.4239e+04, 3.5869e+03, 1.3076e+04, 3.0638e+04], [ 70, 2.2525e+04, 5.4555e+03, 2.1133e+04, 5.1306e+04], [ 89, 4.4427e+04, 6.8388e+03, 4.2697e+04, 8.0575e+04], [ 113, 8.3632e+04, 1.0850e+04, 8.0893e+04, 1.4103e+05], [ 144, 1.5535e+05, 1.6323e+04, 1.5149e+05, 2.4163e+05], [ 183, 3.1204e+05, 2.8211e+04, 2.5645e+05, 4.4332e+05], [ 234, 6.2586e+05, 3.5000e+04, 6.1664e+05, 8.1070e+05], [ 298, 1.1201e+06, 1.3749e+05, 9.2511e+05, 1.3264e+06], [ 379, 2.5531e+06, 6.4311e+05, 1.6633e+06, 3.3693e+06], [ 483, 5.3612e+06, 7.7730e+05, 2.9665e+06, 5.8146e+06], [ 616, 8.0512e+06, 2.3215e+06, 5.5969e+06, 1.2458e+07], [ 785, 1.8979e+07, 4.2977e+06, 1.1792e+07, 2.3341e+07], [1000, 3.6060e+07, 5.8910e+06, 2.2430e+07, 4.7225e+07], [ 4, 3.7207e+02, 6.6756e+02, 1.9800e+02, 3.9020e+03], [ 8, 3.7760e+02, 1.2692e+02, 3.2600e+02, 1.0430e+03], [ 16, 1.2662e+03, 1.9406e+02, 1.1360e+03, 2.2460e+03], [ 32, 6.7047e+03, 2.3775e+03, 5.6690e+03, 1.9156e+04], [ 64, 2.1552e+04, 1.6013e+04, 1.4132e+04, 6.6789e+04], [ 128, 1.0602e+05, 4.5159e+03, 1.0420e+05, 1.2973e+05], [ 256, 7.7634e+05, 8.7983e+04, 6.3991e+05, 9.0461e+05], [ 512, 6.4976e+06, 2.7818e+05, 5.3517e+06, 6.6782e+06], [1024, 3.7353e+07, 6.4473e+06, 2.4760e+07, 4.7515e+07], [2048, 2.8886e+08, 2.5273e+07, 2.4149e+08, 3.3363e+08], From sturla.molden at gmail.com Sat Feb 22 20:58:59 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Sun, 23 Feb 2014 02:58:59 +0100 Subject: [Numpy-discussion] OpenBLAS on Mac In-Reply-To: References: Message-ID: On 23/02/14 02:43, Sturla Molden wrote:> > Testing different matrix sizes and averaging 30 trials, they are quite > similar, actually. Accelerate is perhaps the winner, but it really > depends on the matrix size. > > See for yourself. Here is a plot of the relative runtime (using Accelerate as reference). Sturla -------------- next part -------------- A non-text attachment was scrubbed... Name: dgemm_relative_perf.png Type: image/png Size: 52464 bytes Desc: not available URL: From matthew.brett at gmail.com Sat Feb 22 21:30:05 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 22 Feb 2014 18:30:05 -0800 Subject: [Numpy-discussion] How exactly ought 'dot' to work? In-Reply-To: References: Message-ID: Hi, On Sat, Feb 22, 2014 at 4:09 PM, Pauli Virtanen wrote: > 23.02.2014 00:03, Nathaniel Smith kirjoitti: >> Currently numpy's 'dot' acts a bit weird for ndim>2 or ndim<1. In >> practice this doesn't usually matter much, because these are very >> rarely used. But, I would like to nail down the behaviour so we can >> say something precise in the matrix multiplication PEP. > > I'm not sure it's necessary to say much about this in the PEP. It should > in my view concentrate on arguing why the new binop is needed in the > Python language, and for that, restricting to 2D is good enough IMHO. Yes, I was thinking the same. Best, Matthew From ralf.gommers at gmail.com Sun Feb 23 04:30:37 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 23 Feb 2014 10:30:37 +0100 Subject: [Numpy-discussion] GSOC In-Reply-To: References: Message-ID: On Fri, Feb 21, 2014 at 5:17 PM, Charles R Harris wrote: > > > > On Fri, Feb 21, 2014 at 1:18 AM, Jennifer stone wrote: > >> https://wiki.python.org/moin/SummerOfCode/2014 >> The link provided by Josef is yet to list SciPy/NumPy under it. Somebody >> please contact Terri. >> That page acts as major guiding factor for Python-GSoC prospective >> students. Please have SciPy listed there. >> >> > Ralph, do you know if there is there someone representing Scipy/Numpy who > is officially supposed to handle this, or should I take a shot at it? > Last year I did that, but we don't really have an official role within the project for that. I had intended to do it within the deadline, but that happened to coincide with me traveling without network connection. Sorry about the delay. Should be fixable though, last year we were way later. Todo is now: 1. fix up ideas page with scipy/numpy descriptions, idea difficulty levels and preferably some more ideas. 2. add scipy/numpy to PSF page 3. contact PSF admins I can get around to doing this today. (1) is probably the most work, if you could help out that would be great. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Sun Feb 23 04:44:09 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Sun, 23 Feb 2014 10:44:09 +0100 Subject: [Numpy-discussion] GSOC In-Reply-To: References: Message-ID: <1393148649.3910.5.camel@sebastian-t440> On So, 2014-02-23 at 10:30 +0100, Ralf Gommers wrote: > > > > On Fri, Feb 21, 2014 at 5:17 PM, Charles R Harris > wrote: > > > > On Fri, Feb 21, 2014 at 1:18 AM, Jennifer stone > wrote: > https://wiki.python.org/moin/SummerOfCode/2014 > The link provided by Josef is yet to list SciPy/NumPy > under it. Somebody please contact Terri. > That page acts as major guiding factor for Python-GSoC > prospective students. Please have SciPy listed there. > > > > > Ralph, do you know if there is there someone representing > Scipy/Numpy who is officially supposed to handle this, or > should I take a shot at it? > > > > Last year I did that, but we don't really have an official role within > the project for that. I had intended to do it within the deadline, but > that happened to coincide with me traveling without network > connection. Sorry about the delay. Should be fixable though, last year > we were way later. > > > Todo is now: > > 1. fix up ideas page with scipy/numpy descriptions, idea difficulty > levels and preferably some more ideas. > Considering that little has been moving there, the datetime cleanup might be a decent project[1]? But to be honest, I have no idea if the work load fits at all or how much work fixing this should be anyway. - Sebastian [1] I know, everyone wants to see it fixed in 1.9. and that would be better... > 2. add scipy/numpy to PSF page > > 3. contact PSF admins > > > I can get around to doing this today. (1) is probably the most work, > if you could help out that would be great. > > > Ralf > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From ralf.gommers at gmail.com Sun Feb 23 04:54:04 2014 From: ralf.gommers at gmail.com (Ralf Gommers) Date: Sun, 23 Feb 2014 10:54:04 +0100 Subject: [Numpy-discussion] GSOC In-Reply-To: <1393148649.3910.5.camel@sebastian-t440> References: <1393148649.3910.5.camel@sebastian-t440> Message-ID: On Sun, Feb 23, 2014 at 10:44 AM, Sebastian Berg wrote: > On So, 2014-02-23 at 10:30 +0100, Ralf Gommers wrote: > > > > > > > > On Fri, Feb 21, 2014 at 5:17 PM, Charles R Harris > > wrote: > > > > > > > > On Fri, Feb 21, 2014 at 1:18 AM, Jennifer stone > > wrote: > > https://wiki.python.org/moin/SummerOfCode/2014 > > The link provided by Josef is yet to list SciPy/NumPy > > under it. Somebody please contact Terri. > > That page acts as major guiding factor for Python-GSoC > > prospective students. Please have SciPy listed there. > > > > > > > > > > Ralph, do you know if there is there someone representing > > Scipy/Numpy who is officially supposed to handle this, or > > should I take a shot at it? > > > > > > > > Last year I did that, but we don't really have an official role within > > the project for that. I had intended to do it within the deadline, but > > that happened to coincide with me traveling without network > > connection. Sorry about the delay. Should be fixable though, last year > > we were way later. > > > > > > Todo is now: > > > > 1. fix up ideas page with scipy/numpy descriptions, idea difficulty > > levels and preferably some more ideas. > > > > Considering that little has been moving there, the datetime cleanup > might be a decent project[1]? But to be honest, I have no idea if the > work load fits at all or how much work fixing this should be anyway. > +1 would be very good to get that moving. Should fit in a GSoC, but the difficulty should be set to high. That's true for most numpy ideas though; scipy can offer a wider range of difficulty levels. Ralf > > - Sebastian > > [1] I know, everyone wants to see it fixed in 1.9. and that would be > better... > > > 2. add scipy/numpy to PSF page > > > > 3. contact PSF admins > > > > > > I can get around to doing this today. (1) is probably the most work, > > if you could help out that would be great. > > > > > > Ralf > > > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sun Feb 23 07:49:20 2014 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 23 Feb 2014 14:49:20 +0200 Subject: [Numpy-discussion] GSOC In-Reply-To: References: Message-ID: 23.02.2014 11:30, Ralf Gommers kirjoitti: [clip] > 1. fix up ideas page with scipy/numpy descriptions, idea difficulty levels > and preferably some more ideas. Here's a start: https://github.com/scipy/scipy/wiki/GSoC-project-ideas From jenny.stone125 at gmail.com Sun Feb 23 08:18:11 2014 From: jenny.stone125 at gmail.com (Jennifer stone) Date: Sun, 23 Feb 2014 18:48:11 +0530 Subject: [Numpy-discussion] Suggestions for GSoC Projects In-Reply-To: References: Message-ID: In an attempt to analyze the accuracy of hyp2f1, Different cases mentioned in Abramowitz ( http://people.math.sfu.ca/~cbm/aands/page_561.htm ) and also in the Thesis on 'Computation of Hypergeometric functions" (http://people.maths.ox.ac.uk/porterm/research/pearson_final.pdf, pg 65-66) were tried out, and the function fails without warning when: c<0, c is not integral |c|>>|a| and |b| For example: sp.hyp2f1(10,5,-300.5,0.5) >>-6.5184949735e+156 while the answer is *-3.8520770815e+32* this case appears to filter down to hys2f1 in the source code (scipy.special.cephes.hyp2f1) I tried the same input in mpmath to check if it works there: hyp2f1(10,5,-300.5,0.5) >>mpf('0.9211827166328477893913199888') which is the solution when we apply power series expansion. however MATLAB succeeds in giving the required solution. Another interesting fact is that of the methods mentioned in the thesis: Taylor series expansion, fraction method with double precision, Gauss-Jacobi method and RK4), none succeeds in the given case. I don't have any idea how the function itself is evaluated in the given case. Any leads on how it is done and how MATLAB executes it? On Thu, Feb 20, 2014 at 1:16 AM, Jennifer stone wrote: > > If you are interested in the hypergeometric numerical evaluation, it's > >> probably a good idea to take a look at this recent master's thesis >> written on the problem: >> >> http://people.maths.ox.ac.uk/porterm/research/pearson_final.pdf >> >> The thesis is really comprehensive and detailed with quite convincing > conclusions on the methods to be used with varying a,b,x (though I am > yet to read the thesis properly enough understand and validate each > of the multitude of the cases for the boundaries for the parameters). > It seems to be an assuring and reliable walk through for the project. > > >> This may give some systematic overview on the range of methods >> available. (Note that for copyright reasons, it's not a good idea to >> look closely at the source codes linked from that thesis, as they are >> not available under a compatible license.) >> >> It may well be that the best approach for evaluating these functions, >> if accuracy in the whole parameter range is wanted, in the end turns >> out to require arbitrary-precision computations. In that case, it >> would be a very good idea to look at how the problem is approached in >> mpmath. There are existing multiprecision packages written in C, and >> using one of them in scipy.special could bring better evaluation >> performance even if the algorithm is the same. >> > > Yeah, this seems to be brilliant idea. mpmath too, I assume, must have > used some of the methods mentioned in the thesis. I ll look through the > code and get back. > > I am still unaware of the complexity of project expected at GSoC. This > project > looks engaging to me. Will an attempt to improve both Spherical harmonic > functions ( improving the present algorithm to avoid the calculation for > lower n's and m's) and hypergeometric functions be too ambitious or > is it doable? > > Regards > Jennifer > >> -- >> Pauli Virtanen >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sun Feb 23 08:58:03 2014 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 23 Feb 2014 15:58:03 +0200 Subject: [Numpy-discussion] Suggestions for GSoC Projects In-Reply-To: References: Message-ID: 23.02.2014 15:18, Jennifer stone kirjoitti: [clip] > I tried the same input in mpmath to check if it works there: > hyp2f1(10,5,-300.5,0.5) > >>> mpf('0.9211827166328477893913199888') > > which is the solution when we apply power series expansion. Typically, you need to tell mpmath to use an appropriate precision for the evaluation: >>> mpmath.mp.dps = 300 >>> float(mpmath.hyp2f1('10','5','-300.5','0.5')) -3.8520270815239185e+32 > however MATLAB succeeds in giving the required solution. > Another interesting fact is that of the methods mentioned in the thesis: > Taylor series expansion, fraction method with double precision, > Gauss-Jacobi method and RK4), none succeeds in the given case. > > I don't have any idea how the function itself is evaluated in the given > case. Any leads on how it is done and how MATLAB executes it? You can look in their documentation what references the implementation is based on. However, it's not a good idea to go beyond that in wondering how it does things --- from legal POV and from fair play. *** What can help in hyp2f1 for large values of a,b,c is use of recurrence relations. These are typically stable in one direction only. [1] This seems to be still a partially open research question... Our current hyp2f1 implementation does use recurrences (hyp2f1ra), but perhaps they are not invoked for this case. The problem here can be the accurate determination of the convergence region for each parameter value. [1] http://www.ams.org/journals/mcom/2007-76-259/S0025-5718-07-01918-7/ -- Pauli Virtanen From jenny.stone125 at gmail.com Sun Feb 23 11:05:07 2014 From: jenny.stone125 at gmail.com (Jennifer stone) Date: Sun, 23 Feb 2014 21:35:07 +0530 Subject: [Numpy-discussion] Suggestions for GSoC Projects In-Reply-To: References: Message-ID: > > Typically, you need to tell mpmath to use an appropriate precision for > the evaluation: > > >>> mpmath.mp.dps = 300 > >>> float(mpmath.hyp2f1('10','5','-300.5','0.5')) > -3.8520270815239185e+32 > > Oh k! I didn't pay that much heed to it and set mp.dpsto 100. Wonder why the difference was so drastic. Anyways thanks a lot. > > *** > > What can help in hyp2f1 for large values of a,b,c is use of recurrence > relations. These are typically stable in one direction only. [1] This > seems to be still a partially open research question... > > Our current hyp2f1 implementation does use recurrences (hyp2f1ra), but > perhaps they are not invoked for this case. The problem here can be the > accurate determination of the convergence region for each parameter value. > > [1] http://www.ams.org/journals/mcom/2007-76-259/S0025-5718-07-01918-7/ > > -- > Pauli Virtanen > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun Feb 23 13:26:43 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 23 Feb 2014 11:26:43 -0700 Subject: [Numpy-discussion] 1.8.1 release Message-ID: Hi All, A lot of fixes have gone into the 1.8.x branch and it looks about time to do a bugfix release. There are a couple of important bugfixes still to backport, but if all goes well next weekend, March 1, looks like a good target date. So give the current 1.8.x branch a try so as to check that it covers your most urgent bugfix needs. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From njs at pobox.com Mon Feb 24 00:21:29 2014 From: njs at pobox.com (Nathaniel Smith) Date: Mon, 24 Feb 2014 00:21:29 -0500 Subject: [Numpy-discussion] How exactly ought 'dot' to work? In-Reply-To: References: Message-ID: On Sat, Feb 22, 2014 at 7:09 PM, Pauli Virtanen wrote: > 23.02.2014 00:03, Nathaniel Smith kirjoitti: >> Currently numpy's 'dot' acts a bit weird for ndim>2 or ndim<1. In >> practice this doesn't usually matter much, because these are very >> rarely used. But, I would like to nail down the behaviour so we can >> say something precise in the matrix multiplication PEP. > > I'm not sure it's necessary to say much about this in the PEP. It should > in my view concentrate on arguing why the new binop is needed in the > Python language, and for that, restricting to 2D is good enough IMHO. > > How exactly Numpy makes use of the capability for > 2-dim arrays is > something that should definitely be discussed. > > But I think this is a problem mainly interesting for Numpy devs, and not > for CPython devs. I actually disagree strongly. I think it's very important to make clear that we have a clear, well thought through, and cross-project approach to what @ is supposed to mean, so that this doesn't come across as numpy asking python-dev for a blank check to go and define the de facto semantics of a new operator just for ourselves and without any adult supervision. I just don't think they trust us that much. (Honestly I probably wouldn't either in their place). It's true that the higher-dim cases aren't the most central ones, but it can't hurt to document all the details. I've tentatively rewritten the first section of the PEP to try and accomplish this framing: https://github.com/njsmith/numpy/blob/matmul-pep/doc/neps/return-of-revenge-of-matmul-pep.rst Comments welcome etc. Also BTW in the process I discovered another reason why broadcasting is better than the outer product semantics -- with broadcasting, writing down matrix power for >2d is trivial and natural, but with the outer product semantics, it's practically impossible! -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org From oscar.j.benjamin at gmail.com Mon Feb 24 07:35:23 2014 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 24 Feb 2014 12:35:23 +0000 Subject: [Numpy-discussion] How exactly ought 'dot' to work? In-Reply-To: References: Message-ID: On 24 February 2014 05:21, Nathaniel Smith wrote: > > I've tentatively rewritten the first section of the PEP to try and > accomplish this framing: > https://github.com/njsmith/numpy/blob/matmul-pep/doc/neps/return-of-revenge-of-matmul-pep.rst > Comments welcome etc. I've not been following the discussion about this but just out of interest is there no interest in Matlab style left- and right- matrix division? Personally I always found the idea of matrix division confusing and dislike teaching it to students but it does seem to be popular among Matlab users. Oscar From daniele at grinta.net Mon Feb 24 08:26:35 2014 From: daniele at grinta.net (Daniele Nicolodi) Date: Mon, 24 Feb 2014 14:26:35 +0100 Subject: [Numpy-discussion] np.savetxt() default format Message-ID: <530B488B.6070904@grinta.net> Hello, I've noticed that numpy default format for saving data in ascii representation with np.savetxt() is "%.18e". Given that the default data type for numpy is double and that the resolution of doubles is 15 decimal digits, what's the reason of the the additional three digits? The three additional digits are definitely not an issue, but I would like to understand the reason why they are there. Thanks. Best, Daniele From oscar.j.benjamin at gmail.com Mon Feb 24 09:09:57 2014 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Mon, 24 Feb 2014 14:09:57 +0000 Subject: [Numpy-discussion] np.savetxt() default format In-Reply-To: <530B488B.6070904@grinta.net> References: <530B488B.6070904@grinta.net> Message-ID: On 24 February 2014 13:26, Daniele Nicolodi wrote: > Hello, > > I've noticed that numpy default format for saving data in ascii > representation with np.savetxt() is "%.18e". Given that the default > data type for numpy is double and that the resolution of doubles is 15 > decimal digits, what's the reason of the the additional three digits? > > The three additional digits are definitely not an issue, but I would > like to understand the reason why they are there. It is reasonable to think of doubles as being limited to 15 decimal digits when reasoning about rounding errors but that's a conservative estimate. If you want to be able to round trip from IEEE-754 double precision to decimal and back you need more than 15 digits. The difference between 1.0 and the next smallest double precision number appears only at the 16th decimal place: >>> b = 1 + 1.0000000001*2**-53 >>> b 1.0000000000000002 >>> len(repr(b)) - 2 16 I think the convention to use 18 digits generally stems from not fully trusting the IEEE-754-ness of other systems that might read your data. Oscar From alan.isaac at gmail.com Mon Feb 24 09:31:00 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 24 Feb 2014 09:31:00 -0500 Subject: [Numpy-discussion] How exactly ought 'dot' to work? In-Reply-To: References: Message-ID: <530B57A4.7010005@gmail.com> >> 23.02.2014 00:03, Nathaniel Smith kirjoitti: >>> Currently numpy's 'dot' acts a bit weird for ndim>2 or ndim<1. In >>> practice this doesn't usually matter much, because these are very >>> rarely used. But, I would like to nail down the behaviour so we can >>> say something precise in the matrix multiplication PEP. > On Sat, Feb 22, 2014 at 7:09 PM, Pauli Virtanen wrote: >> I'm not sure it's necessary to say much about this in the PEP. It should >> in my view concentrate on arguing why the new binop is needed in the >> Python language, and for that, restricting to 2D is good enough IMHO. >> How exactly Numpy makes use of the capability for > 2-dim arrays is >> something that should definitely be discussed. >> But I think this is a problem mainly interesting for Numpy devs, and not >> for CPython devs. On 2/24/2014 12:21 AM, Nathaniel Smith wrote: > I actually disagree strongly. I think it's very important to make > clear that we have a clear, well thought through, and > cross-project approach to what @ is supposed to mean I think Paul is right. We know `@` is supposed to mean "matrix multiply" when dealing with conformable 2d arrays. That is the real motivation of the PEP. I cannot see why the PEP itself would need to go beyond that. The behavior of `@` in other cases seems a discussion that should go *much* slower than that of the core of the PEP, which is to get an operator for matrix multiplication. Furthermore, I am not able to understand the principles behind the discussion of how `@` should behave in other cases. I do not think they are being clearly stated. (I have added a comment to the PEP asking for clarification.) To be concrete, if `@` is proposed to behave unlike Mathematica's `Dot` command, I would hope to hear a very clear mathematical motivation for this. (Specifically, I do not understand why `@` would do scalar product.) Otoh, if the proposal is just that `@` should behave just like NumPy's `dot` does, that should be simply stated. Cheers, Alan Isaac From stefan.otte at gmail.com Mon Feb 24 09:52:49 2014 From: stefan.otte at gmail.com (Stefan Otte) Date: Mon, 24 Feb 2014 15:52:49 +0100 Subject: [Numpy-discussion] Proposal: Chaining np.dot with mdot helper function In-Reply-To: References: Message-ID: Hey guys, I just pushed an updated version to github: https://github.com/sotte/numpy_mdot Here is an ipython notebook with some experiments: http://nbviewer.ipython.org/urls/raw2.github.com/sotte/numpy_mdot/master/2014-02_numpy_mdot.ipynb - I added (almost numpy compliant) documentation. - I use a function for len(args) == 3 to improve the speed. - Some general cleanup. Before I create a pull request I have a few questions: - Should there be an "optimize" argument or should we always optimize the parentheses? There is an overhead, but maybe we could neglect it? I think we should keep the flag, but set it to True by default. - I currently use a recursive algorithm to do the multiplication. Any objections? - In which file should `mdot` live? - I wrote a function `print_optimal_chain_order(D, A, B, C, names=list("DABC"))` which determines the optimal parentheses and print out a numpy expression. It's kinda handy but do we actually need it? Beste Gr??e, Stefan On Thu, Feb 20, 2014 at 8:39 PM, Nathaniel Smith wrote: > On Thu, Feb 20, 2014 at 1:35 PM, Stefan Otte wrote: >> Hey guys, >> >> I quickly hacked together a prototype of the optimization step: >> https://github.com/sotte/numpy_mdot >> >> I think there is still room for improvements so feedback is welcome :) >> I'll probably have some time to code on the weekend. >> >> @Nathaniel, I'm still not sure about integrating it in dot. Don't a >> lot of people use the optional out parameter of dot? > > The email you're replying to below about deprecating stuff in 'dot' > was in reply to Eric's email about using dot on arrays with shape (k, > n, n), so those comments are unrelated to the mdot stuff. > > I wouldn't mind seeing out= arguments become kw-only in general, but > even if we decided to do that it would take a long deprecation period, > so yeah, let's give up on 'dot(A, B, C, D)' as syntax for mdot. > > However, the suggestion of supporting np.dot([A, B, C, D]) still seems > like it might be a good idea...? I have mixed feelings about it -- one > less item cluttering up the namespace, but it is weird and magical to > have two totally different calling conventions for the same function. > > -n > >> On Thu, Feb 20, 2014 at 4:02 PM, Nathaniel Smith wrote: >>> If you send a patch that deprecates dot's current behaviour for ndim>2, >>> we'll probably merge it. (We'd like it to function like you suggest, for >>> consistency with other gufuncs. But to get there we have to deprecate the >>> current behaviour first.) >>> >>> While I'm wishing for things I'll also mention that it would be really neat >>> if binary gufuncs would have a .outer method like regular ufuncs do, so >>> anyone currently using ndim>2 dot could just switch to that. But that's a >>> lot more work than just deprecating something :-). >>> >>> -n >>> >>> On 20 Feb 2014 09:27, "Eric Moore" wrote: >>>> >>>> >>>> >>>> On Thursday, February 20, 2014, Eelco Hoogendoorn >>>> wrote: >>>>> >>>>> If the standard semantics are not affected, and the most common >>>>> two-argument scenario does not take more than a single if-statement >>>>> overhead, I don't see why it couldn't be a replacement for the existing >>>>> np.dot; but others mileage may vary. >>>>> >>>>> >>>>> On Thu, Feb 20, 2014 at 11:34 AM, Stefan Otte >>>>> wrote: >>>>>> >>>>>> Hey, >>>>>> >>>>>> so I propose the following. I'll implement a new function `mdot`. >>>>>> Incorporating the changes in `dot` are unlikely. Later, one can still >>>>>> include >>>>>> the features in `dot` if desired. >>>>>> >>>>>> `mdot` will have a default parameter `optimize`. If `optimize==True` >>>>>> the >>>>>> reordering of the multiplication is done. Otherwise it simply chains >>>>>> the >>>>>> multiplications. >>>>>> >>>>>> I'll test and benchmark my implementation and create a pull request. >>>>>> >>>>>> Cheers, >>>>>> Stefan >>>>>> _______________________________________________ >>>>>> NumPy-Discussion mailing list >>>>>> NumPy-Discussion at scipy.org >>>>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>>> >>>>> >>>> Another consideration here is that we need a better way to work with >>>> stacked matrices such as np.linalg handles now. Ie I want to compute the >>>> matrix product of two (k, n, n) arrays producing a (k,n,n) result. Near as >>>> I can tell there isn't a way to do this right now that doesn't involve an >>>> explicit loop. Since dot will return a (k, n, k, n) result. Yes this output >>>> contains what I want but it also computes a lot of things that I don't want >>>> too. >>>> >>>> It would also be nice to be able to do a matrix product reduction, (k, n, >>>> n) -> (n, n) in a single line too. >>>> >>>> Eric >>>> >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > -- > Nathaniel J. Smith > Postdoctoral researcher - Informatics - University of Edinburgh > http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From mhughes at cs.brown.edu Mon Feb 24 14:58:23 2014 From: mhughes at cs.brown.edu (Michael Hughes) Date: Mon, 24 Feb 2014 14:58:23 -0500 Subject: [Numpy-discussion] problems building numpy with ACML blas/lapack Message-ID: Hello, I'm trying to build numpy from source to use AMD's ACML for matrix multiplication (specifically the multi-threaded versions gfortran64_mp). I'm able to successfully compile and use a working version of np.dot, but my resulting installation doesn't pass numpy's test suite, instead, I get a segfault. I'm hoping for some advice on what might be wrong... I'm on Debian, with a fresh install of Python-2.7.6. To install numpy, I've followed exactly the instructions previously posted to this list by Thomas Unterthiner. See http://numpy-discussion.10968.n7.nabble.com/numpy-ACML-support-is-kind-of-broken-td35454.html. The only thing I've adjusted is to try to use the gfortran64_mp version of ACML instead of just gfortran64. Using those instructions, I can compile numpy-1.8.0 so that it successfully uses the desired ACML libraries. I can confirm this by `ldd site-packages/numpy/core/_dotblas.so`, which shows that I'm linked to libacml_mp.so as desired. Furthermore, some quick timing tests indicate that for a 1000x1000 matrix X, calls to np.dot(X,X) have similar speeds as using custom C code that directly calls the ACML libraries. So, dot seems to work as desired. However, when I run numpy.test(verbose=4), I find that I get a seg fault ``` test_einsum_sums_cfloat128 (test_einsum.TestEinSum) ... Segmentation fault ``` Any ideas what might be wrong? From my benchmark tests, ACML is way faster than MKL or other options on my system, so I'd really like to use it, but I don't trust this current install. Thanks! - Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: From ondrej.certik at gmail.com Mon Feb 24 15:13:01 2014 From: ondrej.certik at gmail.com (=?UTF-8?B?T25kxZllaiDEjGVydMOtaw==?=) Date: Mon, 24 Feb 2014 13:13:01 -0700 Subject: [Numpy-discussion] argsort speed In-Reply-To: References: <53021A2C.5020601@continuum.io> <530255B2.60509@googlemail.com> Message-ID: On Fri, Feb 21, 2014 at 11:09 PM, Charles R Harris wrote: > > > > On Fri, Feb 21, 2014 at 10:35 PM, Ond?ej ?ert?k > wrote: >> >> On Mon, Feb 17, 2014 at 11:40 AM, Charles R Harris >> wrote: >> > >> > >> > >> > On Mon, Feb 17, 2014 at 11:32 AM, Julian Taylor >> > wrote: >> >> >> >> On 17.02.2014 15:18, Francesc Alted wrote: >> >> > On 2/17/14, 1:08 AM, josef.pktd at gmail.com wrote: >> >> >> On Sun, Feb 16, 2014 at 6:12 PM, Da?id >> >> >> wrote: >> >> >>> On 16 February 2014 23:43, wrote: >> >> >>>> What's the fastest argsort for a 1d array with around 28 Million >> >> >>>> elements, roughly uniformly distributed, random order? >> >> >>> >> >> >>> On numpy latest version: >> >> >>> >> >> >>> for kind in ['quicksort', 'mergesort', 'heapsort']: >> >> >>> print kind >> >> >>> %timeit np.sort(data, kind=kind) >> >> >>> %timeit np.argsort(data, kind=kind) >> >> >>> >> >> >>> >> >> >>> quicksort >> >> >>> 1 loops, best of 3: 3.55 s per loop >> >> >>> 1 loops, best of 3: 10.3 s per loop >> >> >>> mergesort >> >> >>> 1 loops, best of 3: 4.84 s per loop >> >> >>> 1 loops, best of 3: 9.49 s per loop >> >> >>> heapsort >> >> >>> 1 loops, best of 3: 12.1 s per loop >> >> >>> 1 loops, best of 3: 39.3 s per loop >> >> >>> >> >> >>> >> >> >>> It looks quicksort is quicker sorting, but mergesort is marginally >> >> >>> faster >> >> >>> sorting args. The diference is slim, but upon repetition, it >> >> >>> remains >> >> >>> significant. >> >> >>> >> >> >>> Why is that? Probably part of the reason is what Eelco said, and >> >> >>> part >> >> >>> is >> >> >>> that in sort comparison are done accessing the array elements >> >> >>> directly, but >> >> >>> in argsort you have to index the array, introducing some overhead. >> >> >> Thanks, both. >> >> >> >> >> >> I also gain a second with mergesort. >> >> >> >> >> >> matlab would be nicer in my case, it returns both. >> >> >> I still need to use the argsort to index into the array to also get >> >> >> the sorted array. >> >> > >> >> > Many years ago I needed something similar, so I made some functions >> >> > for >> >> > sorting and argsorting in one single shot. Maybe you want to reuse >> >> > them. Here it is an example of the C implementation: >> >> > >> >> > https://github.com/PyTables/PyTables/blob/develop/src/idx-opt.c#L619 >> >> > >> >> > and here the Cython wrapper for all of them: >> >> > >> >> > >> >> > >> >> > https://github.com/PyTables/PyTables/blob/develop/tables/indexesextension.pyx#L129 >> >> > >> >> > Francesc >> >> > >> >> >> >> that doesn't really make a big difference if the data is randomly >> >> distributed. >> >> the sorting operation is normally much more expensive than latter >> >> applying the indices: >> >> >> >> In [1]: d = np.arange(10000000) >> >> >> >> In [2]: np.random.shuffle(d) >> >> >> >> In [3]: %timeit np.argsort(d) >> >> 1 loops, best of 3: 1.99 s per loop >> >> >> >> In [4]: idx = np.argsort(d) >> >> >> >> In [5]: %timeit d[idx] >> >> 1 loops, best of 3: 213 ms per loop >> >> >> >> >> >> >> >> But if your data is not random it can make a difference as even >> >> quicksort can be a lot faster then. >> >> timsort would be a nice addition to numpy, it performs very well for >> >> partially sorted data. Unfortunately its quite complicated to >> >> implement. >> > >> > >> > Quicksort and shellsort gain speed by having simple inner loops. I have >> > the >> > impression that timsort is optimal when compares and memory access are >> > expensive, but I haven't seen any benchmarks for native types in >> > contiguous >> > memory. >> >> I found some benchmarks for continuous memory here: >> >> https://github.com/swenson/sort/ >> https://github.com/gfx/cpp-TimSort >> >> The first one seems the best, it probably can be directly reused in numpy. >> The only issue is that it only sorts the array, but does not provide >> argsort. >> > > I'm impressed by the heapsort time. Heapsort is the slowest of the numpy > sorts. So either the heapsort implementation is better than ours or the > other sort are worse ;) > > Partially sorted sequence are pretty common, so timsort might be a worthy > addition. Last time I looked, JDK was using timsort for sorting objects, and > quicksort for native types. Another sort is dual pivot quicksort that I've > heard some good things about. > > Adding indirect sorts isn't so difficult once the basic sort is available. > Since the memory access tends to be larger as it gets randomly accessed, > timsort might be a good choice for that. Indeed, I think one has to be very careful about these benchmarks, since it highly depends on the structure of the arrays being sorted. I've been looking into this a bit, since I need some fast algorithm in Fortran, that returns indices that sort the array. So far I use quicksort, but this Timsort might perform better for partially sorted arrays, which is typically my use case. Ondrej From matthew.brett at gmail.com Mon Feb 24 15:40:27 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 24 Feb 2014 12:40:27 -0800 Subject: [Numpy-discussion] 1.8.1 release In-Reply-To: References: Message-ID: Hi, On Sun, Feb 23, 2014 at 10:26 AM, Charles R Harris wrote: > Hi All, > > A lot of fixes have gone into the 1.8.x branch and it looks about time to do > a bugfix release. There are a couple of important bugfixes still to > backport, but if all goes well next weekend, March 1, looks like a good > target date. So give the current 1.8.x branch a try so as to check that it > covers your most urgent bugfix needs. I'd like to volunteer to make a .whl build for Mac. Is there anything special I should do to coordinate with y'all? It would be very good to put it up on pypi for seamless pip install... Thanks a lot, Matthew From rays at blue-cove.com Mon Feb 24 15:54:45 2014 From: rays at blue-cove.com (RayS) Date: Mon, 24 Feb 2014 12:54:45 -0800 Subject: [Numpy-discussion] 1.8.1 release In-Reply-To: References: Message-ID: <201402242054.s1OKsg08027826@blue-cove.com> Has anyone alerted C Gohlke? http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy - Ray From charlesr.harris at gmail.com Mon Feb 24 16:42:20 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 24 Feb 2014 14:42:20 -0700 Subject: [Numpy-discussion] 1.8.1 release In-Reply-To: <201402242054.s1OKsg08027826@blue-cove.com> References: <201402242054.s1OKsg08027826@blue-cove.com> Message-ID: On Mon, Feb 24, 2014 at 1:54 PM, RayS wrote: > Has anyone alerted C Gohlke? > http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy > > - Ray > Christolph seems to keep a pretty good eye on numpy and we rely on him to test it on windows. In anycase, there are enough fixes backported, that I think we better start with a 1.8.1rc. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jeffreback at gmail.com Mon Feb 24 16:53:05 2014 From: jeffreback at gmail.com (Jeff Reback) Date: Mon, 24 Feb 2014 16:53:05 -0500 Subject: [Numpy-discussion] 1.8.1 release In-Reply-To: References: <201402242054.s1OKsg08027826@blue-cove.com> Message-ID: I am pretty sure that you guys test pandas master but 1.8.1 looks good to me > On Feb 24, 2014, at 4:42 PM, Charles R Harris wrote: > > > > >> On Mon, Feb 24, 2014 at 1:54 PM, RayS wrote: >> Has anyone alerted C Gohlke? >> http://www.lfd.uci.edu/~gohlke/pythonlibs/#numpy >> >> - Ray > > Christolph seems to keep a pretty good eye on numpy and we rely on him to test it on windows. In anycase, there are enough fixes backported, that I think we better start with a 1.8.1rc. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Mon Feb 24 18:48:09 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 24 Feb 2014 15:48:09 -0800 Subject: [Numpy-discussion] 1.8.1 release In-Reply-To: References: Message-ID: Hi, On Mon, Feb 24, 2014 at 12:40 PM, Matthew Brett wrote: > Hi, > > On Sun, Feb 23, 2014 at 10:26 AM, Charles R Harris > wrote: >> Hi All, >> >> A lot of fixes have gone into the 1.8.x branch and it looks about time to do >> a bugfix release. There are a couple of important bugfixes still to >> backport, but if all goes well next weekend, March 1, looks like a good >> target date. So give the current 1.8.x branch a try so as to check that it >> covers your most urgent bugfix needs. > > I'd like to volunteer to make a .whl build for Mac. Is there > anything special I should do to coordinate with y'all? It would be > very good to put it up on pypi for seamless pip install... Current trunk wheel at http://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.0.dev_a89a36e-cp27-none-macosx_10_6_intel.whl Testing welcome. You'll need OSX and latest setuptools and latest pip and : curl -O http://nipy.bic.berkeley.edu/scipy_installers/numpy-1.8.0.dev_a89a36e-cp27-none-macosx_10_6_intel.whl pip install --pre --no-index --find-links . numpy I built the wheel on a 10.9 laptop and tested it on a bare-metal no-compiler 10.6 laptop, so I think it will work. I'll set up daily wheel build / tests on our buildbots as well. Cheers, Matthew From charlesr.harris at gmail.com Mon Feb 24 19:26:39 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 24 Feb 2014 17:26:39 -0700 Subject: [Numpy-discussion] recfunctions Message-ID: Hi All, Does anyone recall if it was a deliberate choice to not expose recfunctions in the lib package? This is apropos issue #4242 . Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Mon Feb 24 19:39:34 2014 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 25 Feb 2014 01:39:34 +0100 Subject: [Numpy-discussion] recfunctions In-Reply-To: References: Message-ID: On February 25, 2014 at 01:26:51 , Charles R Harris (charlesr.harris at gmail.com) wrote: Hi All, Does anyone recall if it was a deliberate choice to not expose recfunctions in the lib package? This is apropos issue?#4242. Yes. This job was a rip-off of John Hunter?s similar functions on matplotlib (with some rewriting and extensions) that sounded like a good idea at the time. However, I never really used recarrays, so wasn?t sure whether the implementation could be useful and? oh, look: http://mail.scipy.org/pipermail/numpy-discussion/2011-July/057477.html -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Mon Feb 24 19:53:42 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Mon, 24 Feb 2014 16:53:42 -0800 Subject: [Numpy-discussion] 1.8.1 release In-Reply-To: References: Message-ID: What's up with the OpenBLAS work? Any chance that might make it into official binaries? Or is is just too fresh? Also -- from an off-hand comment in the thread is looked like OpenBLAS could provide a library that selects for optimized code at run-time depending on hardware -- this would solve the "superpack" problem with wheels, which would be really nice... Or did I dream that? -Chris On Mon, Feb 24, 2014 at 12:40 PM, Matthew Brett wrote: > Hi, > > On Sun, Feb 23, 2014 at 10:26 AM, Charles R Harris > wrote: > > Hi All, > > > > A lot of fixes have gone into the 1.8.x branch and it looks about time > to do > > a bugfix release. There are a couple of important bugfixes still to > > backport, but if all goes well next weekend, March 1, looks like a good > > target date. So give the current 1.8.x branch a try so as to check that > it > > covers your most urgent bugfix needs. > > I'd like to volunteer to make a .whl build for Mac. Is there > anything special I should do to coordinate with y'all? It would be > very good to put it up on pypi for seamless pip install... > > Thanks a lot, > > Matthew > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From rays at blue-cove.com Mon Feb 24 23:04:44 2014 From: rays at blue-cove.com (RayS) Date: Mon, 24 Feb 2014 20:04:44 -0800 Subject: [Numpy-discussion] 1.8.1 release In-Reply-To: References: Message-ID: <201402250404.s1P44h2V001222@blue-cove.com> When will we see a http://sourceforge.net/projects/numpy/files/NumPy/1.8.1/Changelog/download changelog? I'd like to get this into our organization's SRS, and a list of fixes (related or not) would be great. - Ray From alan.isaac at gmail.com Tue Feb 25 00:37:29 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Tue, 25 Feb 2014 00:37:29 -0500 Subject: [Numpy-discussion] flatnonzero fails with lists Message-ID: <530C2C19.9080707@gmail.com> I was surprised that `flatnonzero` fails with lists because it calls the `ravel` method, which they do not have, instead of using the `ravel` function. I do not know that there is any policy that NumPy functions should always work on lists, but I have gotten used to them doing so. So I'm just asking, is this intentional? (version 1.7.1) Thanks, Alan Isaac From charlesr.harris at gmail.com Tue Feb 25 01:15:17 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 24 Feb 2014 23:15:17 -0700 Subject: [Numpy-discussion] flatnonzero fails with lists In-Reply-To: <530C2C19.9080707@gmail.com> References: <530C2C19.9080707@gmail.com> Message-ID: On Mon, Feb 24, 2014 at 10:37 PM, Alan G Isaac wrote: > I was surprised that `flatnonzero` fails with lists > because it calls the `ravel` method, which they do not have, > instead of using the `ravel` function. > > I do not know that there is any policy that NumPy > functions should always work on lists, > but I have gotten used to them doing so. So I'm just > asking, is this intentional? (version 1.7.1) > > It's documented to take an ndarray, but I see no reason that it shouldn't be modified to take array_like. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.madan at gmail.com Tue Feb 25 03:19:45 2014 From: chris.madan at gmail.com (Chris) Date: Tue, 25 Feb 2014 08:19:45 +0000 (UTC) Subject: [Numpy-discussion] cPickle.loads and Numeric Message-ID: I have some old code that uses cPickle.loads which used to work, but now reports an error in loading the module Numeric. Since Numeric has been replaced by numpy, this makes sense, but, how can I get cPickle.loads to work? I tested the code again on an older machine and it works fine there, but, I'd like to get it working again on a modern set-up as well. Thanks! From pierre.haessig at crans.org Tue Feb 25 03:39:15 2014 From: pierre.haessig at crans.org (Pierre Haessig) Date: Tue, 25 Feb 2014 09:39:15 +0100 Subject: [Numpy-discussion] cPickle.loads and Numeric In-Reply-To: References: Message-ID: <530C56B3.2020909@crans.org> Hi, Le 25/02/2014 09:19, Chris a ?crit : > I have some old code that uses cPickle.loads which used to work, but now > reports an error in loading the module Numeric. Since Numeric has been > replaced by numpy, this makes sense, but, how can I get cPickle.loads to > work? I tested the code again on an older machine and it works fine > there, but, I'd like to get it working again on a modern set-up as well. > > Thanks! > Do you have big archives of pickled arrays ? I have the feeling that your question is related to this SO question: http://stackoverflow.com/questions/2121874/python-pickling-after-changing-a-modules-directory From the accepted SO answer, I'm getting that it is not easy to manually edit the pickled files (except in the case of the ASCII pickle protocol) So if you still have an old setup that can open the pickled arrays, I would suggest to use it to convert it to a format that is more appropriate to long term archiving. Maybe a simple text format (CSV ?) or HDF5 depending on the volume and the complexity (but I'm not a specialist data archiving) best, Pierre -------------- next part -------------- A non-text attachment was scrubbed... Name: pierre_haessig.vcf Type: text/x-vcard Size: 329 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 880 bytes Desc: OpenPGP digital signature URL: From daniele at grinta.net Tue Feb 25 06:08:15 2014 From: daniele at grinta.net (Daniele Nicolodi) Date: Tue, 25 Feb 2014 12:08:15 +0100 Subject: [Numpy-discussion] Custom floating point representation to IEEE 754 double Message-ID: <530C799F.40205@grinta.net> Hello, I'm dealing with an instrument that transfers numerical values through an RS232 port in a custom (?) floating point representation (56 bits, 4 bits exponent and 52 bits significand). Of course I need to convert this format to a standard IEEE 754 double to be able to do anything useful with it. I came up with this simple code: def tofloat(data): # 56 bits floating point representation # 4 bits exponent # 52 bits significand d = frombytes(data) l = 56 p = l - 4 e = int(d >> p) + 17 v = 0 for i in xrange(p): b = (d >> i) & 0x01 v += b * pow(2, i - p + e) return v where frombytes() is a simple function that assembles 7 bytes read from the serial port into an integer for easing the manipulation: def frombytes(bytes): # convert from bytes string value = 0 for i, b in enumerate(reversed(bytes)): value += b * (1 << (i * 8)) return value I believe that tofloat() can be simplified a bit, but except optimizations (and cythonization) of this code, there is any simpler way of achieving this? Thanks. Cheers, Daniele From robert.kern at gmail.com Tue Feb 25 06:28:40 2014 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 25 Feb 2014 11:28:40 +0000 Subject: [Numpy-discussion] cPickle.loads and Numeric In-Reply-To: References: Message-ID: On Tue, Feb 25, 2014 at 8:19 AM, Chris wrote: > I have some old code that uses cPickle.loads which used to work, but now > reports an error in loading the module Numeric. Since Numeric has been > replaced by numpy, this makes sense, but, how can I get cPickle.loads to > work? I tested the code again on an older machine and it works fine > there, but, I'd like to get it working again on a modern set-up as well. It's relatively straightforward to subclass Unpickler to redirect it when it goes to look for the array constructor that it expects from the Numeric module. from cStringIO import StringIO import pickle import numpy as np TEST_NUMERIC_PICKLE = ('\x80\x02cNumeric\narray_constructor\nq\x01(K\x05\x85U' '\x01lU(\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00' '\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03' '\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00' '\x00\x00K\x01tRq\x02.') # Constant from Numeric. LittleEndian = 1 def array_constructor(shape, typecode, thestr, Endian=LittleEndian): """ The old Numeric array constructor for pickle, recast for numpy. """ if typecode == "O": x = np.array(thestr, "O") else: x = np.fromstring(thestr, typecode) x.shape = shape if LittleEndian != Endian: return x.byteswapped() else: return x class NumericUnpickler(pickle.Unpickler): """ Allow loading of pickles containing Numeric arrays and converting them to numpy arrays. """ def find_class(self, module, name): """ Return the constructor callable for a given "class". Overridden to handle Numeric.array_constructor specially. """ if module == 'Numeric' and name == 'array_constructor': return array_constructor else: return pickle.Unpickler.find_class(self, module, name) def load(fp): return NumericUnpickler(fp).load() def loads(pickle_string): fp = StringIO(pickle_string) return NumericUnpickler(fp).load() if __name__ == '__main__': import sys print loads(TEST_NUMERIC_PICKLE) # Look, Ma! No Numeric! assert 'Numeric' not in sys.modules -- Robert Kern From cimrman3 at ntc.zcu.cz Tue Feb 25 08:57:11 2014 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Tue, 25 Feb 2014 14:57:11 +0100 Subject: [Numpy-discussion] ANN: SfePy 2014.1 Message-ID: <530CA137.8090606@ntc.zcu.cz> I am pleased to announce release 2014.1 of SfePy. Description ----------- SfePy (simple finite elements in Python) is a software for solving systems of coupled partial differential equations by the finite element method. The code is based on NumPy and SciPy packages. It is distributed under the new BSD license. Home page: http://sfepy.org Mailing list: http://groups.google.com/group/sfepy-devel Git (source) repository, issue tracker, wiki: http://github.com/sfepy Highlights of this release -------------------------- - sfepy.fem was split to separate FEM-specific and general modules - lower memory usage by creating active DOF connectivities directly from field connectivities - new handling of field and variable shapes - clean up: many obsolete modules were removed, all module names follow naming conventions For full release notes see http://docs.sfepy.org/doc/release_notes.html#id1 (rather long and technical). Best regards, Robert Cimrman and Contributors (*) (*) Contributors to this release (alphabetical order): Vladim?r Luke?, Maty?? Nov?k, Jaroslav Vond?ejc From alan.isaac at gmail.com Tue Feb 25 11:06:17 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Tue, 25 Feb 2014 11:06:17 -0500 Subject: [Numpy-discussion] numpy.random.geometric is shifted Message-ID: <530CBF79.3020506@gmail.com> Just got momentarily snagged by not checking the excellent documentation, which clearly says that numpy provides the shifted geometric. I'm wondering why? Who else does? (Not Mathematica, Matlab, Maple, or Octave.) Thanks, Alan Isaac From robert.kern at gmail.com Tue Feb 25 11:28:27 2014 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 25 Feb 2014 16:28:27 +0000 Subject: [Numpy-discussion] numpy.random.geometric is shifted In-Reply-To: <530CBF79.3020506@gmail.com> References: <530CBF79.3020506@gmail.com> Message-ID: On Tue, Feb 25, 2014 at 4:06 PM, Alan G Isaac wrote: > Just got momentarily snagged by not checking the > excellent documentation, which clearly says that > numpy provides the shifted geometric. I'm wondering > why? As with most such questions, because the reference I was working from defined it that way and gave the algorithms with that convention. http://luc.devroye.org/rnbookindex.html http://luc.devroye.org/chapter_ten.pdf Page 498. -- Robert Kern From ben.root at ou.edu Tue Feb 25 11:29:29 2014 From: ben.root at ou.edu (Benjamin Root) Date: Tue, 25 Feb 2014 11:29:29 -0500 Subject: [Numpy-discussion] cPickle.loads and Numeric In-Reply-To: <530C56B3.2020909@crans.org> References: <530C56B3.2020909@crans.org> Message-ID: Just to echo this sentiment a bit. I seem to recall reading somewhere that pickles are not intended to be long-term archives as there is no guarantee that a pickle made in one version of python would work in another version, much less between different versions of the same (or similar) packages. Ben Root On Tue, Feb 25, 2014 at 3:39 AM, Pierre Haessig wrote: > Hi, > > Le 25/02/2014 09:19, Chris a ?crit : > > I have some old code that uses cPickle.loads which used to work, but now > > reports an error in loading the module Numeric. Since Numeric has been > > replaced by numpy, this makes sense, but, how can I get cPickle.loads to > > work? I tested the code again on an older machine and it works fine > > there, but, I'd like to get it working again on a modern set-up as well. > > > > Thanks! > > > Do you have big archives of pickled arrays ? > > I have the feeling that your question is related to this SO question: > > http://stackoverflow.com/questions/2121874/python-pickling-after-changing-a-modules-directory > From the accepted SO answer, I'm getting that it is not easy to manually > edit the pickled files (except in the case of the ASCII pickle protocol) > > So if you still have an old setup that can open the pickled arrays, I > would suggest to use it to convert it to a format that is more > appropriate to long term archiving. Maybe a simple text format (CSV ?) > or HDF5 depending on the volume and the complexity (but I'm not a > specialist data archiving) > > best, > Pierre > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndarray at mac.com Tue Feb 25 11:41:42 2014 From: ndarray at mac.com (Alexander Belopolsky) Date: Tue, 25 Feb 2014 11:41:42 -0500 Subject: [Numpy-discussion] cPickle.loads and Numeric In-Reply-To: References: <530C56B3.2020909@crans.org> Message-ID: On Tue, Feb 25, 2014 at 11:29 AM, Benjamin Root wrote: > I seem to recall reading somewhere that pickles are not intended to be > long-term archives as there is no guarantee that a pickle made in one > version of python would work in another version, much less between > different versions of the same (or similar) packages. That's not true about Python core and stdlib. Python developers strive to maintain backward compatibility and any instance of newer python failing to read older pickles would be considered a bug. This is even true across 2.x / 3.x line. You mileage with 3rd party packages, especially 10+ years old ones may vary. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Tue Feb 25 11:57:29 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Tue, 25 Feb 2014 17:57:29 +0100 Subject: [Numpy-discussion] cPickle.loads and Numeric In-Reply-To: References: <530C56B3.2020909@crans.org> Message-ID: On Tue, Feb 25, 2014 at 5:41 PM, Alexander Belopolsky wrote: > > On Tue, Feb 25, 2014 at 11:29 AM, Benjamin Root wrote: >> >> I seem to recall reading somewhere that pickles are not intended to be >> long-term archives as there is no guarantee that a pickle made in one >> version of python would work in another version, much less between different >> versions of the same (or similar) packages. > > > That's not true about Python core and stdlib. Python developers strive to > maintain backward compatibility and any instance of newer python failing to > read older pickles would be considered a bug. This is even true across 2.x > / 3.x line. > > You mileage with 3rd party packages, especially 10+ years old ones may vary. The promise to keep compatibility does still not make it a good format for long term storage. pickles are a processing format bound to one specific tool and it is not trivial to read it with any other. The same applies to HDF5, it may work well now but there is no guarantee anyone will be able to read it in 50 years when we have moved on to the next generation of data storage formats. For long term storage simpler formats like FITS [0] are much more suitable. Writing a basic FITS parser in any language is easy. But in return it is not the best format for data processing. [0] http://fits.gsfc.nasa.gov/fits_standard.html From p.j.a.cock at googlemail.com Tue Feb 25 11:59:19 2014 From: p.j.a.cock at googlemail.com (Peter Cock) Date: Tue, 25 Feb 2014 16:59:19 +0000 Subject: [Numpy-discussion] cPickle.loads and Numeric In-Reply-To: References: <530C56B3.2020909@crans.org> Message-ID: On Tue, Feb 25, 2014 at 4:41 PM, Alexander Belopolsky wrote: > > On Tue, Feb 25, 2014 at 11:29 AM, Benjamin Root wrote: >> >> I seem to recall reading somewhere that pickles are not intended to be >> long-term archives as there is no guarantee that a pickle made in one >> version of python would work in another version, much less between different >> versions of the same (or similar) packages. > > That's not true about Python core and stdlib. Python developers strive to > maintain backward compatibility and any instance of newer python failing to > read older pickles would be considered a bug. This is even true across 2.x > / 3.x line. > > You mileage with 3rd party packages, especially 10+ years old ones may vary. As an example of a 10+ year old project, Biopython has accidentally broken some pickled objects from older versions of Biopython. Accidental breakages aside, I personally would not use pickle for long term storage. Domain specific data formats or something simple like tabular data, or JSON seems safer. Peter From jniehof at lanl.gov Tue Feb 25 12:27:27 2014 From: jniehof at lanl.gov (Jonathan T. Niehof) Date: Tue, 25 Feb 2014 10:27:27 -0700 Subject: [Numpy-discussion] cPickle.loads and Numeric In-Reply-To: References: <530C56B3.2020909@crans.org> Message-ID: <530CD27F.7000604@lanl.gov> On 02/25/2014 09:41 AM, Alexander Belopolsky wrote: > That's not true about Python core and stdlib. Python developers strive > to maintain backward compatibility and any instance of newer python > failing to read older pickles would be considered a bug. This is even > true across 2.x / 3.x line. Note that this doesn't extend to forward compatibility--the default pickling format in Python 3 isn't readable in Python 2, and for numpy in particular, even version 0 pickles of numpy arrays from Python 3 aren't readable in Python 2. -- Jonathan Niehof ISR-3 Space Data Systems Los Alamos National Laboratory MS-D466 Los Alamos, NM 87545 Phone: 505-667-9595 email: jniehof at lanl.gov Correspondence / Technical data or Software Publicly Available From raul at virtualmaterials.com Tue Feb 25 12:36:37 2014 From: raul at virtualmaterials.com (Raul Cota) Date: Tue, 25 Feb 2014 10:36:37 -0700 Subject: [Numpy-discussion] cPickle.loads and Numeric In-Reply-To: References: Message-ID: <530CD4A5.2020802@virtualmaterials.com> Robert is right, you can always implement your own function. What version of numpy and Python are you using ? There may be something you can add to your numpy installation related to the old Numeric support which I believe is now deprecated. Raul On 25/02/2014 4:28 AM, Robert Kern wrote: > On Tue, Feb 25, 2014 at 8:19 AM, Chris wrote: >> I have some old code that uses cPickle.loads which used to work, but now >> reports an error in loading the module Numeric. Since Numeric has been >> replaced by numpy, this makes sense, but, how can I get cPickle.loads to >> work? I tested the code again on an older machine and it works fine >> there, but, I'd like to get it working again on a modern set-up as well. > It's relatively straightforward to subclass Unpickler to redirect it > when it goes to look for the array constructor that it expects from > the Numeric module. > > > from cStringIO import StringIO > import pickle > > import numpy as np > > > TEST_NUMERIC_PICKLE = ('\x80\x02cNumeric\narray_constructor\nq\x01(K\x05\x85U' > '\x01lU(\x00\x00\x00\x00\x00\x00\x00\x00\x01\x00\x00\x00' > '\x00\x00\x00\x00\x02\x00\x00\x00\x00\x00\x00\x00\x03' > '\x00\x00\x00\x00\x00\x00\x00\x04\x00\x00\x00\x00\x00' > '\x00\x00K\x01tRq\x02.') > > > # Constant from Numeric. > LittleEndian = 1 > > def array_constructor(shape, typecode, thestr, Endian=LittleEndian): > """ The old Numeric array constructor for pickle, recast for numpy. > """ > if typecode == "O": > x = np.array(thestr, "O") > else: > x = np.fromstring(thestr, typecode) > x.shape = shape > if LittleEndian != Endian: > return x.byteswapped() > else: > return x > > > class NumericUnpickler(pickle.Unpickler): > """ Allow loading of pickles containing Numeric arrays and > converting them to numpy arrays. > """ > > def find_class(self, module, name): > """ Return the constructor callable for a given "class". > > Overridden to handle Numeric.array_constructor specially. > """ > if module == 'Numeric' and name == 'array_constructor': > return array_constructor > else: > return pickle.Unpickler.find_class(self, module, name) > > > def load(fp): > return NumericUnpickler(fp).load() > > > def loads(pickle_string): > fp = StringIO(pickle_string) > return NumericUnpickler(fp).load() > > > if __name__ == '__main__': > import sys > print loads(TEST_NUMERIC_PICKLE) > # Look, Ma! No Numeric! > assert 'Numeric' not in sys.modules > From alan.isaac at gmail.com Tue Feb 25 14:20:22 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Tue, 25 Feb 2014 14:20:22 -0500 Subject: [Numpy-discussion] shortcut nonzero? Message-ID: <530CECF6.1070200@gmail.com> Is there a shortcut version for finding the first (k) instance(s) of nonzero entries? I'm thinking of Matlab's `find(X,k)`: http://www.mathworks.com/help/matlab/ref/find.html Easy enough to write of course. I thought `flatnonzero` would be the obvious place for this, but it does not have a `first=k` option. Is such an option worth suggesting? Thanks, Alan Isaac From yw5aj at virginia.edu Tue Feb 25 14:28:11 2014 From: yw5aj at virginia.edu (Yuxiang Wang) Date: Tue, 25 Feb 2014 14:28:11 -0500 Subject: [Numpy-discussion] shortcut nonzero? In-Reply-To: <530CECF6.1070200@gmail.com> References: <530CECF6.1070200@gmail.com> Message-ID: Hi Alan, If you are only dealing with 1d array, What about: np.nonzero(your_array)[0][:k] ? -Shawn On Tue, Feb 25, 2014 at 2:20 PM, Alan G Isaac wrote: > Is there a shortcut version for finding the first (k) instance(s) of > nonzero entries? > I'm thinking of Matlab's `find(X,k)`: > http://www.mathworks.com/help/matlab/ref/find.html > Easy enough to write of course. > > I thought `flatnonzero` would be the obvious place for this, > but it does not have a `first=k` option. > Is such an option worth suggesting? > > Thanks, > Alan Isaac > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Yuxiang "Shawn" Wang Gerling Research Lab University of Virginia yw5aj at virginia.edu +1 (434) 284-0836 https://sites.google.com/a/virginia.edu/yw5aj/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From daniele at grinta.net Tue Feb 25 14:33:13 2014 From: daniele at grinta.net (Daniele Nicolodi) Date: Tue, 25 Feb 2014 20:33:13 +0100 Subject: [Numpy-discussion] shortcut nonzero? In-Reply-To: References: <530CECF6.1070200@gmail.com> Message-ID: <530CEFF9.60606@grinta.net> > On Tue, Feb 25, 2014 at 2:20 PM, Alan G Isaac > wrote: > > Is there a shortcut version for finding the first (k) instance(s) of > nonzero entries? > I'm thinking of Matlab's `find(X,k)`: > http://www.mathworks.com/help/matlab/ref/find.html > Easy enough to write of course. > > I thought `flatnonzero` would be the obvious place for this, > but it does not have a `first=k` option. > Is such an option worth suggesting? On 25/02/2014 20:28, Yuxiang Wang wrote:> Hi Alan, > If you are only dealing with 1d array, What about: > > np.nonzero(your_array)[0][:k] I believe that Alan is looking for a solution that does not need to iterate all the array to extract only the firs k occurrences. PS: avoid top posting, please. Cheers, Daniele From cmkleffner at gmail.com Tue Feb 25 17:52:15 2014 From: cmkleffner at gmail.com (Carl Kleffner) Date: Tue, 25 Feb 2014 23:52:15 +0100 Subject: [Numpy-discussion] 1.8.1 release In-Reply-To: References: Message-ID: I build wheels for 32bit and 64bit (Windows, OpenBLAS) and put them here: https://drive.google.com/folderview?id=0B4DmELLTwYmlX05WSWpYVWJfRjg&usp=sharing Due to shortage of time I give not much more detailed informations before 1st of March. Carl 2014-02-25 1:53 GMT+01:00 Chris Barker : > What's up with the OpenBLAS work? > > Any chance that might make it into official binaries? Or is is just too > fresh? > > Also -- from an off-hand comment in the thread is looked like OpenBLAS > could provide a library that selects for optimized code at run-time > depending on hardware -- this would solve the "superpack" problem with > wheels, which would be really nice... > > Or did I dream that? > > -Chris > > > > On Mon, Feb 24, 2014 at 12:40 PM, Matthew Brett wrote: > >> Hi, >> >> On Sun, Feb 23, 2014 at 10:26 AM, Charles R Harris >> wrote: >> > Hi All, >> > >> > A lot of fixes have gone into the 1.8.x branch and it looks about time >> to do >> > a bugfix release. There are a couple of important bugfixes still to >> > backport, but if all goes well next weekend, March 1, looks like a good >> > target date. So give the current 1.8.x branch a try so as to check that >> it >> > covers your most urgent bugfix needs. >> >> I'd like to volunteer to make a .whl build for Mac. Is there >> anything special I should do to coordinate with y'all? It would be >> very good to put it up on pypi for seamless pip install... >> >> Thanks a lot, >> >> Matthew >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > > -- > > Christopher Barker, Ph.D. > Oceanographer > > Emergency Response Division > NOAA/NOS/OR&R (206) 526-6959 voice > 7600 Sand Point Way NE (206) 526-6329 fax > Seattle, WA 98115 (206) 526-6317 main reception > > Chris.Barker at noaa.gov > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sransom at nrao.edu Tue Feb 25 17:52:59 2014 From: sransom at nrao.edu (Scott Ransom) Date: Tue, 25 Feb 2014 17:52:59 -0500 Subject: [Numpy-discussion] assigning full precision values to longdouble scalars Message-ID: <530D1ECB.9030307@nrao.edu> Hi All, So I have a need to use longdouble numpy scalars in an application, and I need to be able to reliably set long-double precision values in them. Currently I don't see an easy way to do that. For example: In [19]: numpy.longdouble("1.12345678901234567890") Out[19]: 1.1234567890123456912 Note the loss of those last couple digits. In [20]: numpy.float("1.12345678901234567890") Out[20]: 1.1234567890123457 In [21]: numpy.longdouble("1.12345678901234567890") - numpy.float("1.12345678901234567890") Out[21]: 0.0 And so internally they are identical. In this case, the string appears to be converted to a C double (i.e. numpy float) before being assigned to the numpy scalar. And therefore it loses precision. Is there a good way of setting longdouble values? Is this a numpy bug? I was considering using a tiny cython wrapper of strtold() to do a conversion from a string to a long double, but it seems like this is basically what should be happening internally in numpy in the above example! Thanks, Scott -- Scott M. Ransom Address: NRAO Phone: (434) 296-0320 520 Edgemont Rd. email: sransom at nrao.edu Charlottesville, VA 22903 USA GPG Fingerprint: 06A9 9553 78BE 16DB 407B FFCA 9BFA B6FF FFD3 2989 From jonathan.j.buck at gmail.com Tue Feb 25 18:04:26 2014 From: jonathan.j.buck at gmail.com (JB) Date: Tue, 25 Feb 2014 23:04:26 +0000 (UTC) Subject: [Numpy-discussion] Help Understanding Indexing Behavior Message-ID: At the risk of igniting a flame war...can someone please help me understand the indexing behavior of NumPy? I will readily I admit I come from a Matlab background, but I appreciate the power of Python and am trying to learn more. >From a Matlab user's perspective, the behavior of indexing in NumPy seems very bizarre. For example, if I define an array: x = np.array([1,2,3,4,5,6,7,8,9,10]) If I want the first 5 elements, what do I do? Well, I say to myself, Python is zero-based, whereas Matlab is one-based, so if I want the values 1 - 5, then I want to index 0 - 4. So I type: x[0:4] And get in return: array([1, 2, 3, 4]). So I got the first value of my array, but I did not get the 5th value of the array. So the "start" index needs to be zero-based, but the "end" index needs to be one-based. Or to put it another way, if I type x[4] and x[0:4], the 4 means different things depending on which set of brackets you're looking at! It's hard for me to see this as anything by extremely confusing. Can someone explain this more clearly. Feel free to post links if you'd like. I know this has been discussed ad nauseam online; I just haven't found any of the explanations satisfactory (or sufficiently clear, at any rate). From oscar.j.benjamin at gmail.com Tue Feb 25 18:12:46 2014 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Tue, 25 Feb 2014 23:12:46 +0000 Subject: [Numpy-discussion] Custom floating point representation to IEEE 754 double In-Reply-To: <530C799F.40205@grinta.net> References: <530C799F.40205@grinta.net> Message-ID: On 25 February 2014 11:08, Daniele Nicolodi wrote: > Hello, > > I'm dealing with an instrument that transfers numerical values through > an RS232 port in a custom (?) floating point representation (56 bits, 4 > bits exponent and 52 bits significand). > > Of course I need to convert this format to a standard IEEE 754 double to > be able to do anything useful with it. I came up with this simple code: > > def tofloat(data): > # 56 bits floating point representation > # 4 bits exponent > # 52 bits significand > d = frombytes(data) > l = 56 > p = l - 4 > e = int(d >> p) + 17 > v = 0 > for i in xrange(p): > b = (d >> i) & 0x01 > v += b * pow(2, i - p + e) > return v > > where frombytes() is a simple function that assembles 7 bytes read from > the serial port into an integer for easing the manipulation: > > def frombytes(bytes): > # convert from bytes string > value = 0 > for i, b in enumerate(reversed(bytes)): > value += b * (1 << (i * 8)) > return value > > I believe that tofloat() can be simplified a bit, but except > optimizations (and cythonization) of this code, there is any simpler way > of achieving this? My first approach would be that if you have an int and you want it as bits then you can use the bin() function e.g.: >>> bin(1234) '0b10011010010' You can then slice and reconstruct as ints with >>> int('0b101', 2) 5 Similarly my first port of call for simplicity would be to just do float(Fraction(mantissa, 2**exponent)). It doesn't really lend itself to cythonisation but it should be accurate and easy enough to understand. Oscar From jtaylor.debian at googlemail.com Tue Feb 25 18:15:55 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Wed, 26 Feb 2014 00:15:55 +0100 Subject: [Numpy-discussion] Help Understanding Indexing Behavior In-Reply-To: References: Message-ID: <530D242B.1060907@googlemail.com> On 26.02.2014 00:04, JB wrote: > At the risk of igniting a flame war...can someone please help me understand > the indexing behavior of NumPy? I will readily I admit I come from a Matlab > background, but I appreciate the power of Python and am trying to learn more. > >>From a Matlab user's perspective, the behavior of indexing in NumPy seems > very bizarre. For example, if I define an array: > > x = np.array([1,2,3,4,5,6,7,8,9,10]) > > If I want the first 5 elements, what do I do? Well, I say to myself, Python > is zero-based, whereas Matlab is one-based, so if I want the values 1 - 5, > then I want to index 0 - 4. So I type: x[0:4] > > And get in return: array([1, 2, 3, 4]). So I got the first value of my > array, but I did not get the 5th value of the array. So the "start" index > needs to be zero-based, but the "end" index needs to be one-based. Or to put > it another way, if I type x[4] and x[0:4], the 4 means different things > depending on which set of brackets you're looking at! > > It's hard for me to see this as anything by extremely confusing. Can someone > explain this more clearly. Feel free to post links if you'd like. I know > this has been discussed ad nauseam online; I just haven't found any of the > explanations satisfactory (or sufficiently clear, at any rate). > > numpy indexing is like conventional C indexing beginning from inclusive 0 to exclusive upper bound: [0, 5[. So the selection length is upper bound - lower bound. as a for loop: for (i = 0; i < 5; i++) select(i); This is the same way Python treats slices. in comparison one based indexing is usually inclusive 1 to inclusive upper bound: [1, 4]. So the selection length is upper bound - lower bound + 1. for (i = 1; i <= 4; i++) select(i); From aaron.oleary at gmail.com Tue Feb 25 18:18:49 2014 From: aaron.oleary at gmail.com (Aaron O'Leary) Date: Tue, 25 Feb 2014 23:18:49 +0000 Subject: [Numpy-discussion] Help Understanding Indexing Behavior In-Reply-To: References: Message-ID: Think of the python indices as the edges of the boxes, whereas the matlab indices are the boxes themselves. matlab: [1][2][3][4] python: 0[ ]1[ ]2[ ]3[ ]4[ ]5 you need to do 0:5 in python or you won't contain all the boxes! On 25 February 2014 23:04, JB wrote: > At the risk of igniting a flame war...can someone please help me understand > the indexing behavior of NumPy? I will readily I admit I come from a Matlab > background, but I appreciate the power of Python and am trying to learn more. > > >From a Matlab user's perspective, the behavior of indexing in NumPy seems > very bizarre. For example, if I define an array: > > x = np.array([1,2,3,4,5,6,7,8,9,10]) > > If I want the first 5 elements, what do I do? Well, I say to myself, Python > is zero-based, whereas Matlab is one-based, so if I want the values 1 - 5, > then I want to index 0 - 4. So I type: x[0:4] > > And get in return: array([1, 2, 3, 4]). So I got the first value of my > array, but I did not get the 5th value of the array. So the "start" index > needs to be zero-based, but the "end" index needs to be one-based. Or to put > it another way, if I type x[4] and x[0:4], the 4 means different things > depending on which set of brackets you're looking at! > > It's hard for me to see this as anything by extremely confusing. Can someone > explain this more clearly. Feel free to post links if you'd like. I know > this has been discussed ad nauseam online; I just haven't found any of the > explanations satisfactory (or sufficiently clear, at any rate). > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From daniele at grinta.net Tue Feb 25 19:35:23 2014 From: daniele at grinta.net (Daniele Nicolodi) Date: Wed, 26 Feb 2014 01:35:23 +0100 Subject: [Numpy-discussion] Custom floating point representation to IEEE 754 double In-Reply-To: References: <530C799F.40205@grinta.net> Message-ID: <530D36CB.8080103@grinta.net> On 26/02/2014 00:12, Oscar Benjamin wrote: > On 25 February 2014 11:08, Daniele Nicolodi wrote: >> Hello, >> >> I'm dealing with an instrument that transfers numerical values through >> an RS232 port in a custom (?) floating point representation (56 bits, 4 >> bits exponent and 52 bits significand). >> >> Of course I need to convert this format to a standard IEEE 754 double to >> be able to do anything useful with it. I came up with this simple code: >> > > My first approach would be that if you have an int and you want it as > bits then you can use the bin() function e.g.: >>>> bin(1234) > '0b10011010010' > > You can then slice and reconstruct as ints with >>>> int('0b101', 2) > 5 How would that be helpful? I believe it is much more computationally expensive than relying on simple integer math, especially in the optics of cythonization. > Similarly my first port of call for simplicity would be to just do > float(Fraction(mantissa, 2**exponent)). It doesn't really lend itself > to cythonisation but it should be accurate and easy enough to > understand. "simpler" in my original email has to be read as involving less operations and thus more efficient, not simpler to understand, indeed it is already a simple implementation of the definition. What I would like to know is if there are some smart shortcuts to make the computation more efficient. Cheers, Daniele From hoogendoorn.eelco at gmail.com Tue Feb 25 19:41:36 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Wed, 26 Feb 2014 01:41:36 +0100 Subject: [Numpy-discussion] Help Understanding Indexing Behavior In-Reply-To: <530D242B.1060907@googlemail.com> References: <530D242B.1060907@googlemail.com> Message-ID: To elaborate on what Julian wrote: it is indeed simply a convention; slices/ranges in python are from the start to one-past-the-end. The reason for the emergence of this convention is that C code using iterators looks most natural this way. This manifests in a simple for (i = 0; i < 5; i++), but also when specifying a slice of a linked list, for instance. We don't want to terminate the loop when we are just arriving at the last item; we want to terminate a loop when we have gone past the last item. Also, the length of a range is simply end-start under this convention; no breaking your head over -1 or +1. Such little nudges of elegance pop up all over C code; and that's where the convention comes from. Same as zero-based indexing; just a convention, and if you are going to pick a convention you might as well pick one that minimizes the number of required operations. Anything but zero-based indexing will require additional integer math to find an array element, given its base pointer. On Wed, Feb 26, 2014 at 12:15 AM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > On 26.02.2014 00:04, JB wrote: > > At the risk of igniting a flame war...can someone please help me > understand > > the indexing behavior of NumPy? I will readily I admit I come from a > Matlab > > background, but I appreciate the power of Python and am trying to learn > more. > > > >>From a Matlab user's perspective, the behavior of indexing in NumPy seems > > very bizarre. For example, if I define an array: > > > > x = np.array([1,2,3,4,5,6,7,8,9,10]) > > > > If I want the first 5 elements, what do I do? Well, I say to myself, > Python > > is zero-based, whereas Matlab is one-based, so if I want the values 1 - > 5, > > then I want to index 0 - 4. So I type: x[0:4] > > > > And get in return: array([1, 2, 3, 4]). So I got the first value of my > > array, but I did not get the 5th value of the array. So the "start" index > > needs to be zero-based, but the "end" index needs to be one-based. Or to > put > > it another way, if I type x[4] and x[0:4], the 4 means different things > > depending on which set of brackets you're looking at! > > > > It's hard for me to see this as anything by extremely confusing. Can > someone > > explain this more clearly. Feel free to post links if you'd like. I know > > this has been discussed ad nauseam online; I just haven't found any of > the > > explanations satisfactory (or sufficiently clear, at any rate). > > > > > > numpy indexing is like conventional C indexing beginning from inclusive > 0 to exclusive upper bound: [0, 5[. So the selection length is upper > bound - lower bound. > as a for loop: > for (i = 0; i < 5; i++) > select(i); > > This is the same way Python treats slices. > > in comparison one based indexing is usually inclusive 1 to inclusive > upper bound: [1, 4]. So the selection length is upper bound - lower > bound + 1. > for (i = 1; i <= 4; i++) > select(i); > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian at sipsolutions.net Tue Feb 25 19:54:52 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Wed, 26 Feb 2014 01:54:52 +0100 Subject: [Numpy-discussion] assigning full precision values to longdouble scalars In-Reply-To: <530D1ECB.9030307@nrao.edu> References: <530D1ECB.9030307@nrao.edu> Message-ID: <1393376092.14326.3.camel@sebastian-t440> On Di, 2014-02-25 at 17:52 -0500, Scott Ransom wrote: > Hi All, > > So I have a need to use longdouble numpy scalars in an application, and > I need to be able to reliably set long-double precision values in them. > Currently I don't see an easy way to do that. For example: > > In [19]: numpy.longdouble("1.12345678901234567890") > Out[19]: 1.1234567890123456912 > > Note the loss of those last couple digits. > > In [20]: numpy.float("1.12345678901234567890") > Out[20]: 1.1234567890123457 > > In [21]: numpy.longdouble("1.12345678901234567890") - > numpy.float("1.12345678901234567890") > Out[21]: 0.0 > > And so internally they are identical. > > In this case, the string appears to be converted to a C double (i.e. > numpy float) before being assigned to the numpy scalar. And therefore > it loses precision. > > Is there a good way of setting longdouble values? Is this a numpy bug? > Yes, this is a bug I think (never checked), we use the python parsing functions where possible. But for longdouble python float (double) is obviously not enough. A hack would be to split it into two: np.float128(1.1234567890) + np.float128(1234567890e-something) Though it would be better for the numpy parser to parse the full precision when given a string. - Sebastian > I was considering using a tiny cython wrapper of strtold() to do a > conversion from a string to a long double, but it seems like this is > basically what should be happening internally in numpy in the above example! > > Thanks, > > Scott > From daniele at grinta.net Tue Feb 25 20:01:16 2014 From: daniele at grinta.net (Daniele Nicolodi) Date: Wed, 26 Feb 2014 02:01:16 +0100 Subject: [Numpy-discussion] Help Understanding Indexing Behavior In-Reply-To: References: Message-ID: <530D3CDC.9030201@grinta.net> On 26/02/2014 00:04, JB wrote: > At the risk of igniting a flame war...can someone please help me understand > the indexing behavior of NumPy? I will readily I admit I come from a Matlab > background, but I appreciate the power of Python and am trying to learn more. > >>From a Matlab user's perspective, the behavior of indexing in NumPy seems > very bizarre. For example, if I define an array: > > x = np.array([1,2,3,4,5,6,7,8,9,10]) > > If I want the first 5 elements, what do I do? Well, I say to myself, Python > is zero-based, whereas Matlab is one-based, so if I want the values 1 - 5, > then I want to index 0 - 4. So I type: x[0:4] The Python slicing syntax a:b defines the interval [a, b), while the Matlab syntax defines the interval [a:b]. This post from Guido van Rossum (the creator of Python) explains the choice of zero indexing and of this particular slice notation: https://plus.google.com/115212051037621986145/posts/YTUxbXYZyfi I actually find how Python works more straight forward: obtaining the first n elements of array x is simply x[:n], and obtaining n elements starting at index i is x[i:i+n]. Cheers, Daniele From charlesr.harris at gmail.com Tue Feb 25 23:39:44 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 25 Feb 2014 21:39:44 -0700 Subject: [Numpy-discussion] Help Understanding Indexing Behavior In-Reply-To: <530D3CDC.9030201@grinta.net> References: <530D3CDC.9030201@grinta.net> Message-ID: On Tue, Feb 25, 2014 at 6:01 PM, Daniele Nicolodi wrote: > On 26/02/2014 00:04, JB wrote: > > At the risk of igniting a flame war...can someone please help me > understand > > the indexing behavior of NumPy? I will readily I admit I come from a > Matlab > > background, but I appreciate the power of Python and am trying to learn > more. > > > >>From a Matlab user's perspective, the behavior of indexing in NumPy seems > > very bizarre. For example, if I define an array: > > > > x = np.array([1,2,3,4,5,6,7,8,9,10]) > > > > If I want the first 5 elements, what do I do? Well, I say to myself, > Python > > is zero-based, whereas Matlab is one-based, so if I want the values 1 - > 5, > > then I want to index 0 - 4. So I type: x[0:4] > > The Python slicing syntax a:b defines the interval [a, b), while the > Matlab syntax defines the interval [a:b]. > > This post from Guido van Rossum (the creator of Python) explains the > choice of zero indexing and of this particular slice notation: > > https://plus.google.com/115212051037621986145/posts/YTUxbXYZyfi > > I actually find how Python works more straight forward: obtaining the > first n elements of array x is simply x[:n], and obtaining n elements > starting at index i is x[i:i+n]. > > To enlarge just a bit, as said, python indexing comes from C, Matlab indexing comes from Fortran/Matrix conventions. If you look at how Fortran compiles, it translates to zero based under the hood, starting with a pointer to memory one location before the actual array data, so C just got rid of that little wart. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From gregor.thalhammer at gmail.com Wed Feb 26 04:16:05 2014 From: gregor.thalhammer at gmail.com (Gregor Thalhammer) Date: Wed, 26 Feb 2014 10:16:05 +0100 Subject: [Numpy-discussion] Custom floating point representation to IEEE 754 double In-Reply-To: <530C799F.40205@grinta.net> References: <530C799F.40205@grinta.net> Message-ID: <89CB3F7C-E920-4FC5-8DD2-7B2EF537B0CD@gmail.com> Am 25.02.2014 um 12:08 schrieb Daniele Nicolodi : > Hello, > > I'm dealing with an instrument that transfers numerical values through > an RS232 port in a custom (?) floating point representation (56 bits, 4 > bits exponent and 52 bits significand). > > Of course I need to convert this format to a standard IEEE 754 double to > be able to do anything useful with it. I came up with this simple code: > > def tofloat(data): > # 56 bits floating point representation > # 4 bits exponent > # 52 bits significand > d = frombytes(data) > l = 56 > p = l - 4 > e = int(d >> p) + 17 > v = 0 > for i in xrange(p): > b = (d >> i) & 0x01 > v += b * pow(2, i - p + e) > return v > > where frombytes() is a simple function that assembles 7 bytes read from > the serial port into an integer for easing the manipulation: > > def frombytes(bytes): > # convert from bytes string > value = 0 > for i, b in enumerate(reversed(bytes)): > value += b * (1 << (i * 8)) > return value > > I believe that tofloat() can be simplified a bit, but except > optimizations (and cythonization) of this code, there is any simpler way > of achieving this? > I have no ready made code at hand, but an alternative approach would be to use the struct module and assemble a standard double from your data by fiddling on the byte level. Most of the data you can just copy, your 4 bit exponent needs to be expanded (appending zeros?) to 12 bit. But since your data comes from a serial port, efficiency might not be important at all. Gregor From oscar.j.benjamin at gmail.com Wed Feb 26 06:37:06 2014 From: oscar.j.benjamin at gmail.com (Oscar Benjamin) Date: Wed, 26 Feb 2014 11:37:06 +0000 Subject: [Numpy-discussion] Custom floating point representation to IEEE 754 double In-Reply-To: <530D36CB.8080103@grinta.net> References: <530C799F.40205@grinta.net> <530D36CB.8080103@grinta.net> Message-ID: On 26 February 2014 00:35, Daniele Nicolodi wrote: > > "simpler" in my original email has to be read as involving less > operations and thus more efficient, not simpler to understand, indeed it > is already a simple implementation of the definition. What I would like > to know is if there are some smart shortcuts to make the computation > more efficient. Sorry I misunderstood your question. How about this: def tofloat2(data): data = frombytes(data) return float(data & 0x0fffffffffffff) * 2 ** ((data >> 52) - 35) That's roughly how you'd do it in C except for the 2**exponent part. I'm not sure what Cython would do with that. If exponent is positive then you can replace that with 1<>-exponent. Oscar From jslavin at cfa.harvard.edu Wed Feb 26 08:32:17 2014 From: jslavin at cfa.harvard.edu (Slavin, Jonathan) Date: Wed, 26 Feb 2014 08:32:17 -0500 Subject: [Numpy-discussion] Help Understanding Indexing Behavior Message-ID: JB, This behavior is a property of python slicing. It takes some getting used to, but has its advantages. In general in a slice [i:j] the indices go from i to j-1. In the case that i is 0 it's easy to think of it as j giving the number of elements (by the way you can also do a[:j] -- i.e. leaving out the 0 -- and get the same result. Maybe someone else could provide more background on why slicing is defined the way it is in python, but in the end you just have to get used to it. Jon On Tue, Feb 25, 2014 at 6:07 PM, wrote: > From: JB > To: numpy-discussion at scipy.org > Cc: > Date: Tue, 25 Feb 2014 23:04:26 +0000 (UTC) > Subject: [Numpy-discussion] Help Understanding Indexing Behavior > At the risk of igniting a flame war...can someone please help me understand > the indexing behavior of NumPy? I will readily I admit I come from a Matlab > background, but I appreciate the power of Python and am trying to learn > more. > > >From a Matlab user's perspective, the behavior of indexing in NumPy seems > very bizarre. For example, if I define an array: > > x = np.array([1,2,3,4,5,6,7,8,9,10]) > > If I want the first 5 elements, what do I do? Well, I say to myself, Python > is zero-based, whereas Matlab is one-based, so if I want the values 1 - 5, > then I want to index 0 - 4. So I type: x[0:4] > > And get in return: array([1, 2, 3, 4]). So I got the first value of my > array, but I did not get the 5th value of the array. So the "start" index > needs to be zero-based, but the "end" index needs to be one-based. Or to > put > it another way, if I type x[4] and x[0:4], the 4 means different things > depending on which set of brackets you're looking at! > > It's hard for me to see this as anything by extremely confusing. Can > someone > explain this more clearly. Feel free to post links if you'd like. I know > this has been discussed ad nauseam online; I just haven't found any of the > explanations satisfactory (or sufficiently clear, at any rate). > -- ________________________________________________________ Jonathan D. Slavin Harvard-Smithsonian CfA jslavin at cfa.harvard.edu 60 Garden Street, MS 83 phone: (617) 496-7981 Cambridge, MA 02138-1516 fax: (617) 496-7577 USA ________________________________________________________ -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Wed Feb 26 10:07:08 2014 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Wed, 26 Feb 2014 10:07:08 -0500 Subject: [Numpy-discussion] 1.8.1 release In-Reply-To: References: Message-ID: Hi, I have a PR that fix way too much printing to stdout when finding the blas linking information: https://github.com/numpy/numpy/pull/4081 This was created by change in NumPy. I was requested as a comment to put the removed information in the dict that we return to the user. I won't have the time to do this for the 1.8.1rc as I'm in vacation next week and I need to prepar that. I'll try to find someone else to finish this, but it is not sure. I'll keep you updated on this. thanks Fr?d?ric On Tue, Feb 25, 2014 at 5:52 PM, Carl Kleffner wrote: > I build wheels for 32bit and 64bit (Windows, OpenBLAS) and put them here: > https://drive.google.com/folderview?id=0B4DmELLTwYmlX05WSWpYVWJfRjg&usp=sharing > Due to shortage of time I give not much more detailed informations before > 1st of March. > > Carl > > > 2014-02-25 1:53 GMT+01:00 Chris Barker : > >> What's up with the OpenBLAS work? >> >> Any chance that might make it into official binaries? Or is is just too >> fresh? >> >> Also -- from an off-hand comment in the thread is looked like OpenBLAS >> could provide a library that selects for optimized code at run-time >> depending on hardware -- this would solve the "superpack" problem with >> wheels, which would be really nice... >> >> Or did I dream that? >> >> -Chris >> >> >> >> On Mon, Feb 24, 2014 at 12:40 PM, Matthew Brett >> wrote: >>> >>> Hi, >>> >>> On Sun, Feb 23, 2014 at 10:26 AM, Charles R Harris >>> wrote: >>> > Hi All, >>> > >>> > A lot of fixes have gone into the 1.8.x branch and it looks about time >>> > to do >>> > a bugfix release. There are a couple of important bugfixes still to >>> > backport, but if all goes well next weekend, March 1, looks like a good >>> > target date. So give the current 1.8.x branch a try so as to check that >>> > it >>> > covers your most urgent bugfix needs. >>> >>> I'd like to volunteer to make a .whl build for Mac. Is there >>> anything special I should do to coordinate with y'all? It would be >>> very good to put it up on pypi for seamless pip install... >>> >>> Thanks a lot, >>> >>> Matthew >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> >> -- >> >> Christopher Barker, Ph.D. >> Oceanographer >> >> Emergency Response Division >> NOAA/NOS/OR&R (206) 526-6959 voice >> 7600 Sand Point Way NE (206) 526-6329 fax >> Seattle, WA 98115 (206) 526-6317 main reception >> >> Chris.Barker at noaa.gov >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sturla.molden at gmail.com Wed Feb 26 10:45:54 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Wed, 26 Feb 2014 15:45:54 +0000 (UTC) Subject: [Numpy-discussion] Help Understanding Indexing Behavior References: Message-ID: <244582093415118370.332318sturla.molden-gmail.com@news.gmane.org> JB wrote: > > x = np.array([1,2,3,4,5,6,7,8,9,10]) > > If I want the first 5 elements, what do I do? x[:5] From tom.augspurger88 at gmail.com Wed Feb 26 11:27:15 2014 From: tom.augspurger88 at gmail.com (Tom Augspurger) Date: Wed, 26 Feb 2014 08:27:15 -0800 (PST) Subject: [Numpy-discussion] 1.8.1 release In-Reply-To: References: Message-ID: <1393432035193-36655.post@n7.nabble.com> Thanks for posting those wheels Matthew. I'm on a Mac (10.9.2) and I had trouble installing numpy from your wheel in a fresh virtualenv with the latests pip, setuptools, and wheel. ``` $pip install ~/Downloads/numpy-1.8.0.dev_a89a36e-cp27-none-macosx_10_9_intel.whl numpy-1.8.0.dev_a89a36e-cp27-none-macosx_10_9_intel.whl is not a supported wheel on this platform. Storing debug log for failure in /Users/admin/.pip/pip.log ``` When I build a wheel from source, my platform is `x86_64`. Changing `intel` to `x86_64`: ``` $mv numpy-1.8.0.dev_a89a36e-cp27-none-macosx_10_6_intel.whl numpy-1.8.0.dev_a89a36e-cp27-none-macosx_10_6_x86_64.whl ``` and then running `pip install` on that wheel successfully installed numpy (and the test suite passed). I've been searching for where the platform tag is defined, but haven't had luck yet. I'll post if I find anything. -- View this message in context: http://numpy-discussion.10968.n7.nabble.com/1-8-1-release-tp36603p36655.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From chris.barker at noaa.gov Wed Feb 26 13:50:16 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 26 Feb 2014 10:50:16 -0800 Subject: [Numpy-discussion] Help Understanding Indexing Behavior In-Reply-To: References: Message-ID: On Wed, Feb 26, 2014 at 5:32 AM, Slavin, Jonathan wrote: > This behavior is a property of python slicing. It takes some getting used > to, but has its advantages. > quite a few, actually! They key with slicing is to think of the index as pointing to the space between the elements: 0 1 2 3 4 5 | | | | | | but the reason (and beauty) of this is that it results in a number of nifty properties: len( seq [i:j] ) == j-i seq[i:j] + seq[j:k] == seq[i:k] len( seq[:i] ) == i len( seq[-i:] ) == i and if you have an array representing a axis, for instance, and want to know the value of a given index: x = x_0 + i*dx or the index of a given value: i = (x - x_0) / dx Notice that I don't have a single +1 or -1 in all of that -- this make sit easier to undersand and a lot less error prone. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Wed Feb 26 14:02:15 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 26 Feb 2014 11:02:15 -0800 Subject: [Numpy-discussion] 1.8.1 release In-Reply-To: <1393432035193-36655.post@n7.nabble.com> References: <1393432035193-36655.post@n7.nabble.com> Message-ID: On Wed, Feb 26, 2014 at 8:27 AM, Tom Augspurger wrote: > Thanks for posting those wheels Matthew. > > I'm on a Mac (10.9.2) and I had trouble installing numpy from your wheel in > a fresh virtualenv with the latests pip, setuptools, and wheel. > > ``` > $pip install > ~/Downloads/numpy-1.8.0.dev_a89a36e-cp27-none-macosx_10_9_intel.whl > numpy-1.8.0.dev_a89a36e-cp27-none-macosx_10_9_intel.whl is not a supported > wheel on this platform. > Storing debug log for failure in /Users/admin/.pip/pip.log > ``` > IIRC, those wheels are built for the pyton.org builds of python: macosx_10_9 means it was built on 10.9 (though that should be 10.6, I think, as it is built FOR 106+, not only 10.9..) _intel means it's an Intel build, which in the nomenclature used in the pyton.org builds, means it's a universal 32 and 64 bit Intel > When I build a wheel from source, my platform is `x86_64`. What python are you using? apparently not a Universal 32+64 bit build. The one Apple delivers? ``` > $mv numpy-1.8.0.dev_a89a36e-cp27-none-macosx_10_6_intel.whl > numpy-1.8.0.dev_a89a36e-cp27-none-macosx_10_6_x86_64.whl > ``` > > and then running `pip install` on that wheel successfully installed numpy > (and the test suite passed). > I'm not entirely sure we can count on that working, as I _think_ the SDK that the wheel was built against is different than the one your python was built against. But it theory, one should be able to install a universal wheel into a one-of-the-architectures build, the other one will get ignored (as it seems to be for you), but I think it's fragile in general. This is a serious issue with binary wheels -- there are so many variations to a "platform" -- the naming scheme covers OS, OS version, and bit depth, but not system library versions, and who the heck knows what else. This has been discussed a lot on the dist_utils list, with no real solution in sight: - there are other alternative, for instance, I think conda packages have some sort of hash or something to make sure that binary packages all match. - convention is the other option: - use binary wheel for in-house deplyment to similar systems - use binary wheels for a well-defined python build: - for PyPi, that's the python.org builds for Windows and OS- -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From nouiz at nouiz.org Wed Feb 26 14:07:55 2014 From: nouiz at nouiz.org (=?ISO-8859-1?Q?Fr=E9d=E9ric_Bastien?=) Date: Wed, 26 Feb 2014 14:07:55 -0500 Subject: [Numpy-discussion] 1.8.1 release In-Reply-To: References: Message-ID: Hi, Arnaud finished that in a different way then we had discussed in the PR. https://github.com/numpy/numpy/pull/4081 Fred On Wed, Feb 26, 2014 at 10:07 AM, Fr?d?ric Bastien wrote: > Hi, > > I have a PR that fix way too much printing to stdout when finding the > blas linking information: > > https://github.com/numpy/numpy/pull/4081 > > > This was created by change in NumPy. I was requested as a comment to > put the removed information in the dict that we return to the user. I > won't have the time to do this for the 1.8.1rc as I'm in vacation next > week and I need to prepar that. > > I'll try to find someone else to finish this, but it is not sure. I'll > keep you updated on this. > > thanks > > Fr?d?ric > > > On Tue, Feb 25, 2014 at 5:52 PM, Carl Kleffner wrote: >> I build wheels for 32bit and 64bit (Windows, OpenBLAS) and put them here: >> https://drive.google.com/folderview?id=0B4DmELLTwYmlX05WSWpYVWJfRjg&usp=sharing >> Due to shortage of time I give not much more detailed informations before >> 1st of March. >> >> Carl >> >> >> 2014-02-25 1:53 GMT+01:00 Chris Barker : >> >>> What's up with the OpenBLAS work? >>> >>> Any chance that might make it into official binaries? Or is is just too >>> fresh? >>> >>> Also -- from an off-hand comment in the thread is looked like OpenBLAS >>> could provide a library that selects for optimized code at run-time >>> depending on hardware -- this would solve the "superpack" problem with >>> wheels, which would be really nice... >>> >>> Or did I dream that? >>> >>> -Chris >>> >>> >>> >>> On Mon, Feb 24, 2014 at 12:40 PM, Matthew Brett >>> wrote: >>>> >>>> Hi, >>>> >>>> On Sun, Feb 23, 2014 at 10:26 AM, Charles R Harris >>>> wrote: >>>> > Hi All, >>>> > >>>> > A lot of fixes have gone into the 1.8.x branch and it looks about time >>>> > to do >>>> > a bugfix release. There are a couple of important bugfixes still to >>>> > backport, but if all goes well next weekend, March 1, looks like a good >>>> > target date. So give the current 1.8.x branch a try so as to check that >>>> > it >>>> > covers your most urgent bugfix needs. >>>> >>>> I'd like to volunteer to make a .whl build for Mac. Is there >>>> anything special I should do to coordinate with y'all? It would be >>>> very good to put it up on pypi for seamless pip install... >>>> >>>> Thanks a lot, >>>> >>>> Matthew >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >>> >>> >>> -- >>> >>> Christopher Barker, Ph.D. >>> Oceanographer >>> >>> Emergency Response Division >>> NOAA/NOS/OR&R (206) 526-6959 voice >>> 7600 Sand Point Way NE (206) 526-6329 fax >>> Seattle, WA 98115 (206) 526-6317 main reception >>> >>> Chris.Barker at noaa.gov >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> From matthew.brett at gmail.com Wed Feb 26 14:34:28 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 26 Feb 2014 11:34:28 -0800 Subject: [Numpy-discussion] 1.8.1 release In-Reply-To: References: <1393432035193-36655.post@n7.nabble.com> Message-ID: Hi, On Wed, Feb 26, 2014 at 11:02 AM, Chris Barker wrote: > On Wed, Feb 26, 2014 at 8:27 AM, Tom Augspurger > wrote: >> >> Thanks for posting those wheels Matthew. >> >> I'm on a Mac (10.9.2) and I had trouble installing numpy from your wheel >> in >> a fresh virtualenv with the latests pip, setuptools, and wheel. >> >> ``` >> $pip install >> ~/Downloads/numpy-1.8.0.dev_a89a36e-cp27-none-macosx_10_9_intel.whl >> numpy-1.8.0.dev_a89a36e-cp27-none-macosx_10_9_intel.whl is not a supported >> wheel on this platform. >> Storing debug log for failure in /Users/admin/.pip/pip.log >> ``` > > > IIRC, those wheels are built for the pyton.org builds of python: > > macosx_10_9 means it was built on 10.9 (though that should be 10.6, I > think, as it is built FOR 106+, not only 10.9..) > > _intel means it's an Intel build, which in the nomenclature used in the > pyton.org builds, means it's a universal 32 and 64 bit Intel > > >> >> When I build a wheel from source, my platform is `x86_64`. > > > What python are you using? apparently not a Universal 32+64 bit build. The > one Apple delivers? > >> ``` >> $mv numpy-1.8.0.dev_a89a36e-cp27-none-macosx_10_6_intel.whl >> numpy-1.8.0.dev_a89a36e-cp27-none-macosx_10_6_x86_64.whl >> ``` >> >> and then running `pip install` on that wheel successfully installed numpy >> (and the test suite passed). > > > I'm not entirely sure we can count on that working, as I _think_ the SDK > that the wheel was built against is different than the one your python was > built against. > > But it theory, one should be able to install a universal wheel into a > one-of-the-architectures build, the other one will get ignored (as it seems > to be for you), but I think it's fragile in general. > > This is a serious issue with binary wheels -- there are so many variations > to a "platform" -- the naming scheme covers OS, OS version, and bit depth, > but not system library versions, and who the heck knows what else. > > This has been discussed a lot on the dist_utils list, with no real solution > in sight: > - there are other alternative, for instance, I think conda packages have > some sort of hash or something to make sure that binary packages all match. > > - convention is the other option: > - use binary wheel for in-house deplyment to similar systems > - use binary wheels for a well-defined python build: > - for PyPi, that's the python.org builds for Windows and OS- Thanks - that is a very useful summary. It would make sense I think to provide numpy wheels like mine via pypi - as pyzmq does for example. In this case, I believe (Chris correct me if I'm wrong) that someone running via system python would get the usual compile / install, but someone running python.org python would get a near instant numpy, so that seems like a clear win. Cheers, Matthew From chris.barker at noaa.gov Wed Feb 26 17:33:32 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Wed, 26 Feb 2014 14:33:32 -0800 Subject: [Numpy-discussion] 1.8.1 release In-Reply-To: References: <1393432035193-36655.post@n7.nabble.com> Message-ID: On Wed, Feb 26, 2014 at 11:34 AM, Matthew Brett wrote: > > - convention is the other option: > > - use binary wheel for in-house deplyment to similar systems > > - use binary wheels for a well-defined python build: > > - for PyPi, that's the python.org builds for Windows and OS- > > Thanks - that is a very useful summary. > > It would make sense I think to provide numpy wheels like mine via pypi > - as pyzmq does for example. > Indeed -- and I really appreciate your efforts on this -- I think we should be able to get the whole "stack" up there pretty soon (though there is an issue with iPython and readline...) Ralf had put together a test set of these, too a little while ago. > In this case, I believe (Chris correct me if I'm wrong) that someone > running via system python would get the usual compile / install, but > someone running python.org python would get a near instant numpy, That's the idea -- though not entirely sure how that would go without testing. Also, I think with pip, you need to tell it to look for binary wheels -- it won't do that by default. pip install --use-wheel numpy so that seems like a clear win. > Agreed. The trick is that it's reasonable for users of Apple's python build to want this too -- but I don't know how we can hope to provide that. (and macports, and homebrew... but those I feel better about requiring to build your own -- really, that's what those systems are designed to do) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthew.brett at gmail.com Wed Feb 26 17:48:31 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 26 Feb 2014 14:48:31 -0800 Subject: [Numpy-discussion] 1.8.1 release In-Reply-To: References: <1393432035193-36655.post@n7.nabble.com> Message-ID: Hi, On Wed, Feb 26, 2014 at 2:33 PM, Chris Barker wrote: > On Wed, Feb 26, 2014 at 11:34 AM, Matthew Brett > wrote: > >> >> > - convention is the other option: >> > - use binary wheel for in-house deplyment to similar systems >> > - use binary wheels for a well-defined python build: >> > - for PyPi, that's the python.org builds for Windows and OS- >> >> Thanks - that is a very useful summary. >> >> It would make sense I think to provide numpy wheels like mine via pypi >> - as pyzmq does for example. > > > Indeed -- and I really appreciate your efforts on this -- I think we should > be able to get the whole "stack" up there pretty soon (though there is an > issue with iPython and readline...) Ralf had put together a test set of > these, too a little while ago. > > >> >> In this case, I believe (Chris correct me if I'm wrong) that someone >> running via system python would get the usual compile / install, but >> someone running python.org python would get a near instant numpy, > > > That's the idea -- though not entirely sure how that would go without > testing. It's currently working for pyzmq - so I supect it would work. > Also, I think with pip, you need to tell it to look for binary wheels -- it > won't do that by default. > > pip install --use-wheel numpy I think --use-wheel is the default for the latest pip ... >> so that seems like a clear win. > > > Agreed. The trick is that it's reasonable for users of Apple's python build > to want this too -- but I don't know how we can hope to provide that. We don't support system python for the mpkg, so I think it's reasonable to leave this little gift for our fellow python.org friends. In that case, the OSX instructions could (within the next few months) be as simple as: Install python from binary installer at python.org curl -O https://raw.github.com/pypa/pip/master/contrib/get-pip.py python get-pip.py pip install scipy-stack or similar. > (and macports, and homebrew... but those I feel better about requiring to > build your own -- really, that's what those systems are designed to do) Yes, that seems right to me. Cheers, Matthew From tom.augspurger88 at gmail.com Wed Feb 26 18:10:07 2014 From: tom.augspurger88 at gmail.com (Tom Augspurger) Date: Wed, 26 Feb 2014 15:10:07 -0800 (PST) Subject: [Numpy-discussion] 1.8.1 release In-Reply-To: References: <1393432035193-36655.post@n7.nabble.com> Message-ID: <1393456207055-36662.post@n7.nabble.com> Thanks Chris, Chris Barker - NOAA Federal wrote > What python are you using? apparently not a Universal 32+64 bit build. The > one Apple delivers? I'm using homebrew python, so the platform difference seems to have come from there. I agree about renaming the file as not being a real solution. I'd say the official python.org python is the one to worry about for PyPI (which is what pyzmq seems to do). -Tom -- View this message in context: http://numpy-discussion.10968.n7.nabble.com/1-8-1-release-tp36603p36662.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From matthew.brett at gmail.com Wed Feb 26 18:17:02 2014 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 26 Feb 2014 15:17:02 -0800 Subject: [Numpy-discussion] 1.8.1 release In-Reply-To: <1393456207055-36662.post@n7.nabble.com> References: <1393432035193-36655.post@n7.nabble.com> <1393456207055-36662.post@n7.nabble.com> Message-ID: Hi, On Wed, Feb 26, 2014 at 3:10 PM, Tom Augspurger wrote: > Thanks Chris, > > > Chris Barker - NOAA Federal wrote >> What python are you using? apparently not a Universal 32+64 bit build. The >> one Apple delivers? > > I'm using homebrew python, so the platform difference seems to have come > from there. I agree about renaming the file as not being a real solution. > I'd say the official python.org python is the one to worry about for PyPI > (which is what pyzmq seems to do). I use homebrew myself, but only for big libraries. I've stuck with python.org python on the basis that that is what we ask new users to install. Of course that has meant problems with installing the scipy stack - but I think we're now close to a standard libre solution to that with wheels. Cheers, Matthew From scopatz at gmail.com Wed Feb 26 19:24:13 2014 From: scopatz at gmail.com (Anthony Scopatz) Date: Wed, 26 Feb 2014 18:24:13 -0600 Subject: [Numpy-discussion] ANN: XDress v0.4 Message-ID: Hello All, I am *extremely *pleased to be able to announce the version 0.4 release of xdress. This version contains much anticipated full support for Clang as a parser! This is almost entirely due to the efforts of Geoffrey Irving. Please thank him the next time you get a chance :) This release also contains a lot of other goodies that you can read about in the release notes below. Happy Generating! Anthony XDress 0.4 Release Notes XDress is a numpy-aware automatic wrapper generator for C/C++ written in pure Python. Currently, xdress may generate Python bindings (via Cython) for C++ classes, functions, and certain variable types. It also contains idiomatic wrappers for C++ standard library containers (sets, vectors, maps). In the future, other tools and bindings will be supported. The main enabling feature of xdress is a dynamic type system that was designed with the purpose of API generation in mind. Release highlights: - Clang support! All kudos to Geoffrey Irving! - NumPy dtypes may be created independently of C++ STL vectors - A complete test suite refactor - Arbitrary source code locations - Global run control files - A plethora of useful bug fixes This version of xdress is *not* 100% backwards compatible with previous versions of xdress. We apologize in the name of progress. It represents ans impressive 245 files changed, 44917 aggregate line insertions (+), and 7893 deletions (-). Please visit the website for more information: http://xdress.org/ Ask questions on the mailing list: https://groups.google.com/forum/#!forum/xdress Download the code from GitHub: http://github.com/xdress/xdress XDress is free & open source (BSD 2-clause license) and requires Python 2.7+, NumPy 1.5+, Cython 0.19+, and optionally Clang, GCC-XML, pycparser, dOxygen, or lxml. New Features Clang Support Through the herculean efforts of Geoffrey Irving xdress finally has full, first-class Clang/LLVM support! This is major advancement as it allows xdress to wrap more modern versions of C++ than GCC-XML can handle. Because of deficiencies in the existing libclang and Python bindings it was necessary for us to fork libclang for xdress in the short term. We hope to integrate these changes upstream. Clang versions 3.2 - 3.4 are supported. Independent NumPy Dtypes In previous versions of xdress, to create a dtype of type T the user needed to declare the desire for a wrapper of an STL vector of type T. These two desires have now been separated. It is now possible to create a dtype via the dtypes run control parameter. STL vectors are still wrapped via dtypes. See the dtypes module for more information. Shiny New Test Suite The xdress test suite has been completely revamped to include both unit and integration tests which are run for all available parsers. The integration tests are accomplished though two fake projects - cproj and cppproj - on which the xdress CLI is run. These tests are now fully platform independent, unlike the previous BASH-based test suite. Source Paths Source file paths are now given by either their absolute or relative path. This allows source code to be located anywhere on the user's file system and enable the wrapping of dependencies or externally supplied libraries as needed. The run control parametersourcedir has been deprecated. Global Run Control Files It is sometimes useful to be able to set system-wide run control parameters. XDress will now search the following files in order of increasing precedence. - $HOME/.xdressrc - $HOME/.xdressrc.py - $HOME/.config/xdressrc - $HOME/.config/xdressrc.py $HOME is the user's home directory. Settings in the project run control file take precedence over the values here. Major Bug Fixes - Debug file now always written when in debug mode. - STL sets of custom types now allowed. - Template parameters now allowed to be enum values. - Allow classes with no default constructor. Join in the Fun! If you are interested in using xdress on your project (and need help), contributing back to xdress, starting up a development team, or writing your own code generation plugin tool, please let us know. Participation is very welcome! Authors - Anthony Scopatz - Geoffrey Irving * - James Casbon * - Kevin Tew * - Spencer Lyon - John Wiggins - Matt McCormick - Brad Buran - Chris Harris * - Gerald Dalley * - Micky Latowicki * - Mike C. Fletcher * - Robert Schwarz * An * indicates a first time contributor. Links 1. Homepage - http://xdress.org/ 2. Mailing List - https://groups.google.com/forum/#!forum/xdress 3. GitHub Organization - https://github.com/xdress -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Thu Feb 27 02:51:08 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Thu, 27 Feb 2014 08:51:08 +0100 Subject: [Numpy-discussion] ANN: XDress v0.4 In-Reply-To: References: Message-ID: Thanks for the heads up, I wasn't aware of this project. While boost.python is a very nice package, its distributability is nothing short of nonexistent, so its great to have a pure python binding generator. One thing which I have often found frustrating is natural ndarray interop between python and C++. Is there a (planned) mechanism for mapping arbitrary strided python ndarrays to boost arrays? On Thu, Feb 27, 2014 at 1:24 AM, Anthony Scopatz wrote: > Hello All, > > I am *extremely *pleased to be able to announce the version 0.4 release > of xdress. This version contains much anticipated full support for Clang > as a parser! This is almost entirely due to the efforts of Geoffrey > Irving. Please thank him the next time you get a chance :) > > This release also contains a lot of other goodies that you can read about > in the release notes below. > > Happy Generating! > Anthony > > XDress 0.4 Release Notes > > XDress is a numpy-aware automatic wrapper generator for C/C++ written in > pure Python. Currently, xdress may generate Python bindings (via Cython) > for C++ classes, functions, and certain variable types. It also contains > idiomatic wrappers for C++ standard library containers (sets, vectors, > maps). In the future, other tools and bindings will be supported. > > The main enabling feature of xdress is a dynamic type system that was > designed with the purpose of API generation in mind. > > Release highlights: > > > - Clang support! All kudos to Geoffrey Irving! > - NumPy dtypes may be created independently of C++ STL vectors > - A complete test suite refactor > - Arbitrary source code locations > - Global run control files > - A plethora of useful bug fixes > > This version of xdress is *not* 100% backwards compatible with previous > versions of xdress. We apologize in the name of progress. It represents ans > impressive 245 files changed, 44917 aggregate line insertions (+), and 7893 > deletions (-). > > Please visit the website for more information: http://xdress.org/ > > Ask questions on the mailing list: > https://groups.google.com/forum/#!forum/xdress > > Download the code from GitHub: http://github.com/xdress/xdress > > XDress is free & open source (BSD 2-clause license) and requires Python > 2.7+, NumPy 1.5+, Cython 0.19+, and optionally Clang, GCC-XML, pycparser, > dOxygen, or lxml. > New Features > Clang Support > > Through the herculean efforts of Geoffrey Irving xdress finally has full, > first-class Clang/LLVM support! This is major advancement as it allows > xdress to wrap more modern versions of C++ than GCC-XML can handle. Because > of deficiencies in the existing libclang and Python bindings it was > necessary for us to fork libclang for xdress in the short term. We hope to > integrate these changes upstream. Clang versions 3.2 - 3.4 are supported. > Independent NumPy Dtypes > > In previous versions of xdress, to create a dtype of type T the user > needed to declare the desire for a wrapper of an STL vector of type T. > These two desires have now been separated. It is now possible to create a > dtype via the dtypes run control parameter. STL vectors are still wrapped > via dtypes. See the dtypes module for more information. > Shiny New Test Suite > > The xdress test suite has been completely revamped to include both unit > and integration tests which are run for all available parsers. The > integration tests are accomplished though two fake projects - cproj and > cppproj - on which the xdress CLI is run. These tests are now fully > platform independent, unlike the previous BASH-based test suite. > Source Paths > > Source file paths are now given by either their absolute or relative path. > This allows source code to be located anywhere on the user's file system > and enable the wrapping of dependencies or externally supplied libraries as > needed. The run control parametersourcedir has been deprecated. > Global Run Control Files > > It is sometimes useful to be able to set system-wide run control > parameters. XDress will now search the following files in order of > increasing precedence. > > - $HOME/.xdressrc > - $HOME/.xdressrc.py > - $HOME/.config/xdressrc > - $HOME/.config/xdressrc.py > > $HOME is the user's home directory. Settings in the project run control > file take precedence over the values here. > Major Bug Fixes > > - Debug file now always written when in debug mode. > - STL sets of custom types now allowed. > - Template parameters now allowed to be enum values. > - Allow classes with no default constructor. > > Join in the Fun! > > If you are interested in using xdress on your project (and need help), > contributing back to xdress, starting up a development team, or writing > your own code generation plugin tool, please let us know. Participation is > very welcome! > Authors > > - Anthony Scopatz > - Geoffrey Irving * > - James Casbon * > - Kevin Tew * > - Spencer Lyon > - John Wiggins > - Matt McCormick > - Brad Buran > - Chris Harris * > - Gerald Dalley * > - Micky Latowicki * > - Mike C. Fletcher * > - Robert Schwarz * > > An * indicates a first time contributor. > Links > > 1. Homepage - http://xdress.org/ > 2. Mailing List - https://groups.google.com/forum/#!forum/xdress > 3. GitHub Organization - https://github.com/xdress > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pyviennacl at tsmithe.net Thu Feb 27 04:51:32 2014 From: pyviennacl at tsmithe.net (Toby St Clere Smithe) Date: Thu, 27 Feb 2014 09:51:32 +0000 Subject: [Numpy-discussion] ANN: XDress v0.4 References: Message-ID: <87vbw0hoij.fsf@tsmithe.net> Hi, Eelco Hoogendoorn writes: > Thanks for the heads up, I wasn't aware of this project. While boost.python > is a very nice package, its distributability is nothing short of > nonexistent, so its great to have a pure python binding generator. > > One thing which I have often found frustrating is natural ndarray interop > between python and C++. Is there a (planned) mechanism for mapping > arbitrary strided python ndarrays to boost arrays? Have you tried boost.numpy? https://github.com/ndarray/Boost.NumPy I have a fork which builds against Python 3, as well -- though it's mainly used for PyViennaCL, and might need a bit of cleaning. Cheers, Toby > > On Thu, Feb 27, 2014 at 1:24 AM, Anthony Scopatz wrote: > >> Hello All, >> >> I am *extremely *pleased to be able to announce the version 0.4 release >> of xdress. This version contains much anticipated full support for Clang >> as a parser! This is almost entirely due to the efforts of Geoffrey >> Irving. Please thank him the next time you get a chance :) >> >> This release also contains a lot of other goodies that you can read about >> in the release notes below. >> >> Happy Generating! >> Anthony >> >> XDress 0.4 Release Notes >> >> XDress is a numpy-aware automatic wrapper generator for C/C++ written in >> pure Python. Currently, xdress may generate Python bindings (via Cython) >> for C++ classes, functions, and certain variable types. It also contains >> idiomatic wrappers for C++ standard library containers (sets, vectors, >> maps). In the future, other tools and bindings will be supported. >> >> The main enabling feature of xdress is a dynamic type system that was >> designed with the purpose of API generation in mind. >> >> Release highlights: >> >> >> - Clang support! All kudos to Geoffrey Irving! >> - NumPy dtypes may be created independently of C++ STL vectors >> - A complete test suite refactor >> - Arbitrary source code locations >> - Global run control files >> - A plethora of useful bug fixes >> >> This version of xdress is *not* 100% backwards compatible with previous >> versions of xdress. We apologize in the name of progress. It represents ans >> impressive 245 files changed, 44917 aggregate line insertions (+), and 7893 >> deletions (-). >> >> Please visit the website for more information: http://xdress.org/ >> >> Ask questions on the mailing list: >> https://groups.google.com/forum/#!forum/xdress >> >> Download the code from GitHub: http://github.com/xdress/xdress >> >> XDress is free & open source (BSD 2-clause license) and requires Python >> 2.7+, NumPy 1.5+, Cython 0.19+, and optionally Clang, GCC-XML, pycparser, >> dOxygen, or lxml. >> New Features >> Clang Support >> >> Through the herculean efforts of Geoffrey Irving xdress finally has full, >> first-class Clang/LLVM support! This is major advancement as it allows >> xdress to wrap more modern versions of C++ than GCC-XML can handle. Because >> of deficiencies in the existing libclang and Python bindings it was >> necessary for us to fork libclang for xdress in the short term. We hope to >> integrate these changes upstream. Clang versions 3.2 - 3.4 are supported. >> Independent NumPy Dtypes >> >> In previous versions of xdress, to create a dtype of type T the user >> needed to declare the desire for a wrapper of an STL vector of type T. >> These two desires have now been separated. It is now possible to create a >> dtype via the dtypes run control parameter. STL vectors are still wrapped >> via dtypes. See the dtypes module for more information. >> Shiny New Test Suite >> >> The xdress test suite has been completely revamped to include both unit >> and integration tests which are run for all available parsers. The >> integration tests are accomplished though two fake projects - cproj and >> cppproj - on which the xdress CLI is run. These tests are now fully >> platform independent, unlike the previous BASH-based test suite. >> Source Paths >> >> Source file paths are now given by either their absolute or relative path. >> This allows source code to be located anywhere on the user's file system >> and enable the wrapping of dependencies or externally supplied libraries as >> needed. The run control parametersourcedir has been deprecated. >> Global Run Control Files >> >> It is sometimes useful to be able to set system-wide run control >> parameters. XDress will now search the following files in order of >> increasing precedence. >> >> - $HOME/.xdressrc >> - $HOME/.xdressrc.py >> - $HOME/.config/xdressrc >> - $HOME/.config/xdressrc.py >> >> $HOME is the user's home directory. Settings in the project run control >> file take precedence over the values here. >> Major Bug Fixes >> >> - Debug file now always written when in debug mode. >> - STL sets of custom types now allowed. >> - Template parameters now allowed to be enum values. >> - Allow classes with no default constructor. >> >> Join in the Fun! >> >> If you are interested in using xdress on your project (and need help), >> contributing back to xdress, starting up a development team, or writing >> your own code generation plugin tool, please let us know. Participation is >> very welcome! >> Authors >> >> - Anthony Scopatz >> - Geoffrey Irving * >> - James Casbon * >> - Kevin Tew * >> - Spencer Lyon >> - John Wiggins >> - Matt McCormick >> - Brad Buran >> - Chris Harris * >> - Gerald Dalley * >> - Micky Latowicki * >> - Mike C. Fletcher * >> - Robert Schwarz * >> >> An * indicates a first time contributor. >> Links >> >> 1. Homepage - http://xdress.org/ >> 2. Mailing List - https://groups.google.com/forum/#!forum/xdress >> 3. GitHub Organization - https://github.com/xdress >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From hoogendoorn.eelco at gmail.com Thu Feb 27 05:39:34 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Thu, 27 Feb 2014 11:39:34 +0100 Subject: [Numpy-discussion] ANN: XDress v0.4 In-Reply-To: <87vbw0hoij.fsf@tsmithe.net> References: <87vbw0hoij.fsf@tsmithe.net> Message-ID: I have; but if I recall correctly, it does not solve the problem of distributing code that uses it, or does it? On Thu, Feb 27, 2014 at 10:51 AM, Toby St Clere Smithe < pyviennacl at tsmithe.net> wrote: > Hi, > > Eelco Hoogendoorn writes: > > Thanks for the heads up, I wasn't aware of this project. While > boost.python > > is a very nice package, its distributability is nothing short of > > nonexistent, so its great to have a pure python binding generator. > > > > One thing which I have often found frustrating is natural ndarray interop > > between python and C++. Is there a (planned) mechanism for mapping > > arbitrary strided python ndarrays to boost arrays? > > Have you tried boost.numpy? > > https://github.com/ndarray/Boost.NumPy > > I have a fork which builds against Python 3, as well -- though it's > mainly used for PyViennaCL, and might need a bit of cleaning. > > Cheers, > > Toby > > > > > > > > On Thu, Feb 27, 2014 at 1:24 AM, Anthony Scopatz > wrote: > > > >> Hello All, > >> > >> I am *extremely *pleased to be able to announce the version 0.4 release > >> of xdress. This version contains much anticipated full support for > Clang > >> as a parser! This is almost entirely due to the efforts of Geoffrey > >> Irving. Please thank him the next time you get a chance :) > >> > >> This release also contains a lot of other goodies that you can read > about > >> in the release notes below. > >> > >> Happy Generating! > >> Anthony > >> > >> XDress 0.4 Release Notes< > http://xdress.org/previous/0.4_release_notes.html#xdress-0-4-release-notes > > > >> > >> XDress is a numpy-aware automatic wrapper generator for C/C++ written in > >> pure Python. Currently, xdress may generate Python bindings (via Cython) > >> for C++ classes, functions, and certain variable types. It also contains > >> idiomatic wrappers for C++ standard library containers (sets, vectors, > >> maps). In the future, other tools and bindings will be supported. > >> > >> The main enabling feature of xdress is a dynamic type system that was > >> designed with the purpose of API generation in mind. > >> > >> Release highlights: > >> > >> > >> - Clang support! All kudos to Geoffrey Irving! > >> - NumPy dtypes may be created independently of C++ STL vectors > >> - A complete test suite refactor > >> - Arbitrary source code locations > >> - Global run control files > >> - A plethora of useful bug fixes > >> > >> This version of xdress is *not* 100% backwards compatible with previous > >> versions of xdress. We apologize in the name of progress. It represents > ans > >> impressive 245 files changed, 44917 aggregate line insertions (+), and > 7893 > >> deletions (-). > >> > >> Please visit the website for more information: http://xdress.org/ > >> > >> Ask questions on the mailing list: > >> https://groups.google.com/forum/#!forum/xdress > >> > >> Download the code from GitHub: http://github.com/xdress/xdress > >> > >> XDress is free & open source (BSD 2-clause license) and requires Python > >> 2.7+, NumPy 1.5+, Cython 0.19+, and optionally Clang, GCC-XML, > pycparser, > >> dOxygen, or lxml. > >> New Features< > http://xdress.org/previous/0.4_release_notes.html#new-features> > >> Clang Support< > http://xdress.org/previous/0.4_release_notes.html#clang-support> > >> > >> Through the herculean efforts of Geoffrey Irving xdress finally has > full, > >> first-class Clang/LLVM support! This is major advancement as it allows > >> xdress to wrap more modern versions of C++ than GCC-XML can handle. > Because > >> of deficiencies in the existing libclang and Python bindings it was > >> necessary for us to fork libclang for xdress in the short term. We hope > to > >> integrate these changes upstream. Clang versions 3.2 - 3.4 are > supported. > >> Independent NumPy Dtypes< > http://xdress.org/previous/0.4_release_notes.html#independent-numpy-dtypes > > > >> > >> In previous versions of xdress, to create a dtype of type T the user > >> needed to declare the desire for a wrapper of an STL vector of type T. > >> These two desires have now been separated. It is now possible to create > a > >> dtype via the dtypes run control parameter. STL vectors are still > wrapped > >> via dtypes. See the dtypes module for more information. > >> Shiny New Test Suite< > http://xdress.org/previous/0.4_release_notes.html#shiny-new-test-suite> > >> > >> The xdress test suite has been completely revamped to include both unit > >> and integration tests which are run for all available parsers. The > >> integration tests are accomplished though two fake projects - cproj and > >> cppproj - on which the xdress CLI is run. These tests are now fully > >> platform independent, unlike the previous BASH-based test suite. > >> Source Paths< > http://xdress.org/previous/0.4_release_notes.html#source-paths> > >> > >> Source file paths are now given by either their absolute or relative > path. > >> This allows source code to be located anywhere on the user's file system > >> and enable the wrapping of dependencies or externally supplied > libraries as > >> needed. The run control parametersourcedir has been deprecated. > >> Global Run Control Files< > http://xdress.org/previous/0.4_release_notes.html#global-run-control-files > > > >> > >> It is sometimes useful to be able to set system-wide run control > >> parameters. XDress will now search the following files in order of > >> increasing precedence. > >> > >> - $HOME/.xdressrc > >> - $HOME/.xdressrc.py > >> - $HOME/.config/xdressrc > >> - $HOME/.config/xdressrc.py > >> > >> $HOME is the user's home directory. Settings in the project run control > >> file take precedence over the values here. > >> Major Bug Fixes< > http://xdress.org/previous/0.4_release_notes.html#major-bug-fixes> > >> > >> - Debug file now always written when in debug mode. > >> - STL sets of custom types now allowed. > >> - Template parameters now allowed to be enum values. > >> - Allow classes with no default constructor. > >> > >> Join in the Fun!< > http://xdress.org/previous/0.4_release_notes.html#join-in-the-fun> > >> > >> If you are interested in using xdress on your project (and need help), > >> contributing back to xdress, starting up a development team, or writing > >> your own code generation plugin tool, please let us know. Participation > is > >> very welcome! > >> Authors > >> > >> - Anthony Scopatz > >> - Geoffrey Irving * > >> - James Casbon * > >> - Kevin Tew * > >> - Spencer Lyon > >> - John Wiggins > >> - Matt McCormick > >> - Brad Buran > >> - Chris Harris * > >> - Gerald Dalley * > >> - Micky Latowicki * > >> - Mike C. Fletcher * > >> - Robert Schwarz * > >> > >> An * indicates a first time contributor. > >> Links > >> > >> 1. Homepage - http://xdress.org/ > >> 2. Mailing List - https://groups.google.com/forum/#!forum/xdress > >> 3. GitHub Organization - https://github.com/xdress > >> > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > >> > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pyviennacl at tsmithe.net Thu Feb 27 05:59:48 2014 From: pyviennacl at tsmithe.net (Toby St Clere Smithe) Date: Thu, 27 Feb 2014 10:59:48 +0000 Subject: [Numpy-discussion] ANN: XDress v0.4 References: <87vbw0hoij.fsf@tsmithe.net> Message-ID: <87mwhchlcr.fsf@tsmithe.net> Eelco Hoogendoorn writes: > I have; but if I recall correctly, it does not solve the problem of > distributing code that uses it, or does it? Indeed not. But the Boost licence is very liberal, so I just link it statically; for source distributions, I just ship a minimal copy of the Boost sources I need. Toby > On Thu, Feb 27, 2014 at 10:51 AM, Toby St Clere Smithe < > pyviennacl at tsmithe.net> wrote: > >> Hi, >> >> Eelco Hoogendoorn writes: >> > Thanks for the heads up, I wasn't aware of this project. While >> boost.python >> > is a very nice package, its distributability is nothing short of >> > nonexistent, so its great to have a pure python binding generator. >> > >> > One thing which I have often found frustrating is natural ndarray interop >> > between python and C++. Is there a (planned) mechanism for mapping >> > arbitrary strided python ndarrays to boost arrays? >> >> Have you tried boost.numpy? >> >> https://github.com/ndarray/Boost.NumPy >> >> I have a fork which builds against Python 3, as well -- though it's >> mainly used for PyViennaCL, and might need a bit of cleaning. >> >> Cheers, >> >> Toby >> >> >> >> >> > >> > On Thu, Feb 27, 2014 at 1:24 AM, Anthony Scopatz >> wrote: >> > >> >> Hello All, >> >> >> >> I am *extremely *pleased to be able to announce the version 0.4 release >> >> of xdress. This version contains much anticipated full support for >> Clang >> >> as a parser! This is almost entirely due to the efforts of Geoffrey >> >> Irving. Please thank him the next time you get a chance :) >> >> >> >> This release also contains a lot of other goodies that you can read >> about >> >> in the release notes below. >> >> >> >> Happy Generating! >> >> Anthony >> >> >> >> XDress 0.4 Release Notes< >> http://xdress.org/previous/0.4_release_notes.html#xdress-0-4-release-notes >> > >> >> >> >> XDress is a numpy-aware automatic wrapper generator for C/C++ written in >> >> pure Python. Currently, xdress may generate Python bindings (via Cython) >> >> for C++ classes, functions, and certain variable types. It also contains >> >> idiomatic wrappers for C++ standard library containers (sets, vectors, >> >> maps). In the future, other tools and bindings will be supported. >> >> >> >> The main enabling feature of xdress is a dynamic type system that was >> >> designed with the purpose of API generation in mind. >> >> >> >> Release highlights: >> >> >> >> >> >> - Clang support! All kudos to Geoffrey Irving! >> >> - NumPy dtypes may be created independently of C++ STL vectors >> >> - A complete test suite refactor >> >> - Arbitrary source code locations >> >> - Global run control files >> >> - A plethora of useful bug fixes >> >> >> >> This version of xdress is *not* 100% backwards compatible with previous >> >> versions of xdress. We apologize in the name of progress. It represents >> ans >> >> impressive 245 files changed, 44917 aggregate line insertions (+), and >> 7893 >> >> deletions (-). >> >> >> >> Please visit the website for more information: http://xdress.org/ >> >> >> >> Ask questions on the mailing list: >> >> https://groups.google.com/forum/#!forum/xdress >> >> >> >> Download the code from GitHub: http://github.com/xdress/xdress >> >> >> >> XDress is free & open source (BSD 2-clause license) and requires Python >> >> 2.7+, NumPy 1.5+, Cython 0.19+, and optionally Clang, GCC-XML, >> pycparser, >> >> dOxygen, or lxml. >> >> New Features< >> http://xdress.org/previous/0.4_release_notes.html#new-features> >> >> Clang Support< >> http://xdress.org/previous/0.4_release_notes.html#clang-support> >> >> >> >> Through the herculean efforts of Geoffrey Irving xdress finally has >> full, >> >> first-class Clang/LLVM support! This is major advancement as it allows >> >> xdress to wrap more modern versions of C++ than GCC-XML can handle. >> Because >> >> of deficiencies in the existing libclang and Python bindings it was >> >> necessary for us to fork libclang for xdress in the short term. We hope >> to >> >> integrate these changes upstream. Clang versions 3.2 - 3.4 are >> supported. >> >> Independent NumPy Dtypes< >> http://xdress.org/previous/0.4_release_notes.html#independent-numpy-dtypes >> > >> >> >> >> In previous versions of xdress, to create a dtype of type T the user >> >> needed to declare the desire for a wrapper of an STL vector of type T. >> >> These two desires have now been separated. It is now possible to create >> a >> >> dtype via the dtypes run control parameter. STL vectors are still >> wrapped >> >> via dtypes. See the dtypes module for more information. >> >> Shiny New Test Suite< >> http://xdress.org/previous/0.4_release_notes.html#shiny-new-test-suite> >> >> >> >> The xdress test suite has been completely revamped to include both unit >> >> and integration tests which are run for all available parsers. The >> >> integration tests are accomplished though two fake projects - cproj and >> >> cppproj - on which the xdress CLI is run. These tests are now fully >> >> platform independent, unlike the previous BASH-based test suite. >> >> Source Paths< >> http://xdress.org/previous/0.4_release_notes.html#source-paths> >> >> >> >> Source file paths are now given by either their absolute or relative >> path. >> >> This allows source code to be located anywhere on the user's file system >> >> and enable the wrapping of dependencies or externally supplied >> libraries as >> >> needed. The run control parametersourcedir has been deprecated. >> >> Global Run Control Files< >> http://xdress.org/previous/0.4_release_notes.html#global-run-control-files >> > >> >> >> >> It is sometimes useful to be able to set system-wide run control >> >> parameters. XDress will now search the following files in order of >> >> increasing precedence. >> >> >> >> - $HOME/.xdressrc >> >> - $HOME/.xdressrc.py >> >> - $HOME/.config/xdressrc >> >> - $HOME/.config/xdressrc.py >> >> >> >> $HOME is the user's home directory. Settings in the project run control >> >> file take precedence over the values here. >> >> Major Bug Fixes< >> http://xdress.org/previous/0.4_release_notes.html#major-bug-fixes> >> >> >> >> - Debug file now always written when in debug mode. >> >> - STL sets of custom types now allowed. >> >> - Template parameters now allowed to be enum values. >> >> - Allow classes with no default constructor. >> >> >> >> Join in the Fun!< >> http://xdress.org/previous/0.4_release_notes.html#join-in-the-fun> >> >> >> >> If you are interested in using xdress on your project (and need help), >> >> contributing back to xdress, starting up a development team, or writing >> >> your own code generation plugin tool, please let us know. Participation >> is >> >> very welcome! >> >> Authors >> >> >> >> - Anthony Scopatz >> >> - Geoffrey Irving * >> >> - James Casbon * >> >> - Kevin Tew * >> >> - Spencer Lyon >> >> - John Wiggins >> >> - Matt McCormick >> >> - Brad Buran >> >> - Chris Harris * >> >> - Gerald Dalley * >> >> - Micky Latowicki * >> >> - Mike C. Fletcher * >> >> - Robert Schwarz * >> >> >> >> An * indicates a first time contributor. >> >> Links >> >> >> >> 1. Homepage - http://xdress.org/ >> >> 2. Mailing List - https://groups.google.com/forum/#!forum/xdress >> >> 3. GitHub Organization - https://github.com/xdress >> >> >> >> >> >> _______________________________________________ >> >> NumPy-Discussion mailing list >> >> NumPy-Discussion at scipy.org >> >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From thomas_unterthiner at web.de Thu Feb 27 06:40:27 2014 From: thomas_unterthiner at web.de (Thomas Unterthiner) Date: Thu, 27 Feb 2014 12:40:27 +0100 Subject: [Numpy-discussion] problems building numpy with ACML blas/lapack In-Reply-To: References: Message-ID: <530F242B.3090007@web.de> Hi! (I was contacted about this off-list, but thought it might be worth it to document this on-list) I don't remember ever running into problems while using ACML in my numpy installs. However I never ran the whole test-suite, and I never used np.einsum (or complex numbers) in my own code. All I can tell you is that while using numpy with ACML, I never ran into any weird segfaults. For what it's worth: OpenBLAS has added a lot of AMD-specific (Bulldozer- and Piledriver-specific, to be exact) code after its 0.2.8 version, which made it slightly faster than ACML on my own machines (at least for dgemm, I remember some LAPACK calls were a tiny bit slower, but not so much that it bothered me). Since numpy-integration of OpenBLAS is significantly better than ACML, I don't bother with the latter anymore. You can download the OpenBLAS 0.2.9rc1 -- which includes the Bulldozer/Piledriver code -- from https://github.com/xianyi/OpenBLAS/releases/tag/v0.2.9.rc1 Cheers Thomas On 2014-02-24 20:58, Michael Hughes wrote: > Hello, > > I'm trying to build numpy from source to use AMD's ACML for matrix > multiplication (specifically the multi-threaded versions > gfortran64_mp). I'm able to successfully compile and use a working > version of np.dot, but my resulting installation doesn't pass numpy's > test suite, instead, I get a segfault. I'm hoping for some advice on > what might be wrong... > > I'm on Debian, with a fresh install of Python-2.7.6. To install > numpy, I've followed exactly the instructions previously posted to > this list by Thomas Unterthiner. See > http://numpy-discussion.10968.n7.nabble.com/numpy-ACML-support-is-kind-of-broken-td35454.html. > The only thing I've adjusted is to try to use the gfortran64_mp > version of ACML instead of just gfortran64. > > Using those instructions, I can compile numpy-1.8.0 so that it > successfully uses the desired ACML libraries. I can confirm this by > `ldd site-packages/numpy/core/_dotblas.so`, which shows that I'm > linked to libacml_mp.so as desired. Furthermore, some quick timing > tests indicate that for a 1000x1000 matrix X, calls to np.dot(X,X) > have similar speeds as using custom C code that directly calls the > ACML libraries. So, dot seems to work as desired. > > However, when I run numpy.test(verbose=4), I find that I get a seg fault > ``` > test_einsum_sums_cfloat128 (test_einsum.TestEinSum) ... Segmentation fault > ``` > > Any ideas what might be wrong? From my benchmark tests, ACML is way > faster than MKL or other options on my system, so I'd really like to > use it, but I don't trust this current install. > > Thanks! > - Mike > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From scopatz at gmail.com Thu Feb 27 07:36:14 2014 From: scopatz at gmail.com (Anthony Scopatz) Date: Thu, 27 Feb 2014 06:36:14 -0600 Subject: [Numpy-discussion] ANN: XDress v0.4 In-Reply-To: References: Message-ID: On Thu, Feb 27, 2014 at 1:51 AM, Eelco Hoogendoorn < hoogendoorn.eelco at gmail.com> wrote: > Thanks for the heads up, I wasn't aware of this project. While > boost.python is a very nice package, its distributability is nothing short > of nonexistent, so its great to have a pure python binding generator. > Thanks! > One thing which I have often found frustrating is natural ndarray interop > between python and C++. Is there a (planned) mechanism for mapping > arbitrary strided python ndarrays to boost arrays? > Not yet! The architecture is very modular (it is just a series of plugins) so I would welcome anyone who wants to tackle this to take a look into it. I don't think that it would be *that* hard. You'd just need to write the Py-to-C++ and C++-to-Py converters for the boost array type. This shouldn't be too hard since std::vector goes through pretty much exactly the same mechanism for exposing to numpy. So there are already a couple of examples of this workflow. Please feel free to jump on the xdress mailing list if you want to discuss this in more depth! Also I didn't know about ndarray/Boost.NumPy. This seems like it could be useful! Be Well Anthony > > > On Thu, Feb 27, 2014 at 1:24 AM, Anthony Scopatz wrote: > >> Hello All, >> >> I am *extremely *pleased to be able to announce the version 0.4 release >> of xdress. This version contains much anticipated full support for Clang >> as a parser! This is almost entirely due to the efforts of Geoffrey >> Irving. Please thank him the next time you get a chance :) >> >> This release also contains a lot of other goodies that you can read about >> in the release notes below. >> >> Happy Generating! >> Anthony >> >> XDress 0.4 Release Notes >> >> XDress is a numpy-aware automatic wrapper generator for C/C++ written in >> pure Python. Currently, xdress may generate Python bindings (via Cython) >> for C++ classes, functions, and certain variable types. It also contains >> idiomatic wrappers for C++ standard library containers (sets, vectors, >> maps). In the future, other tools and bindings will be supported. >> >> The main enabling feature of xdress is a dynamic type system that was >> designed with the purpose of API generation in mind. >> >> Release highlights: >> >> >> - Clang support! All kudos to Geoffrey Irving! >> - NumPy dtypes may be created independently of C++ STL vectors >> - A complete test suite refactor >> - Arbitrary source code locations >> - Global run control files >> - A plethora of useful bug fixes >> >> This version of xdress is *not* 100% backwards compatible with previous >> versions of xdress. We apologize in the name of progress. It represents ans >> impressive 245 files changed, 44917 aggregate line insertions (+), and 7893 >> deletions (-). >> >> Please visit the website for more information: http://xdress.org/ >> >> Ask questions on the mailing list: >> https://groups.google.com/forum/#!forum/xdress >> >> Download the code from GitHub: http://github.com/xdress/xdress >> >> XDress is free & open source (BSD 2-clause license) and requires Python >> 2.7+, NumPy 1.5+, Cython 0.19+, and optionally Clang, GCC-XML, pycparser, >> dOxygen, or lxml. >> New Features >> Clang Support >> >> Through the herculean efforts of Geoffrey Irving xdress finally has full, >> first-class Clang/LLVM support! This is major advancement as it allows >> xdress to wrap more modern versions of C++ than GCC-XML can handle. Because >> of deficiencies in the existing libclang and Python bindings it was >> necessary for us to fork libclang for xdress in the short term. We hope to >> integrate these changes upstream. Clang versions 3.2 - 3.4 are supported. >> Independent NumPy Dtypes >> >> In previous versions of xdress, to create a dtype of type T the user >> needed to declare the desire for a wrapper of an STL vector of type T. >> These two desires have now been separated. It is now possible to create a >> dtype via the dtypes run control parameter. STL vectors are still >> wrapped via dtypes. See the dtypes module for more information. >> Shiny New Test Suite >> >> The xdress test suite has been completely revamped to include both unit >> and integration tests which are run for all available parsers. The >> integration tests are accomplished though two fake projects - cproj and >> cppproj - on which the xdress CLI is run. These tests are now fully >> platform independent, unlike the previous BASH-based test suite. >> Source Paths >> >> Source file paths are now given by either their absolute or relative >> path. This allows source code to be located anywhere on the user's file >> system and enable the wrapping of dependencies or externally supplied >> libraries as needed. The run control parametersourcedir has been >> deprecated. >> Global Run Control Files >> >> It is sometimes useful to be able to set system-wide run control >> parameters. XDress will now search the following files in order of >> increasing precedence. >> >> - $HOME/.xdressrc >> - $HOME/.xdressrc.py >> - $HOME/.config/xdressrc >> - $HOME/.config/xdressrc.py >> >> $HOME is the user's home directory. Settings in the project run control >> file take precedence over the values here. >> Major Bug Fixes >> >> - Debug file now always written when in debug mode. >> - STL sets of custom types now allowed. >> - Template parameters now allowed to be enum values. >> - Allow classes with no default constructor. >> >> Join in the Fun! >> >> If you are interested in using xdress on your project (and need help), >> contributing back to xdress, starting up a development team, or writing >> your own code generation plugin tool, please let us know. Participation is >> very welcome! >> Authors >> >> - Anthony Scopatz >> - Geoffrey Irving * >> - James Casbon * >> - Kevin Tew * >> - Spencer Lyon >> - John Wiggins >> - Matt McCormick >> - Brad Buran >> - Chris Harris * >> - Gerald Dalley * >> - Micky Latowicki * >> - Mike C. Fletcher * >> - Robert Schwarz * >> >> An * indicates a first time contributor. >> Links >> >> 1. Homepage - http://xdress.org/ >> 2. Mailing List - https://groups.google.com/forum/#!forum/xdress >> 3. GitHub Organization - https://github.com/xdress >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Thu Feb 27 07:47:25 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Thu, 27 Feb 2014 13:47:25 +0100 Subject: [Numpy-discussion] ANN: XDress v0.4 In-Reply-To: References: Message-ID: I have a file numpy_boost_python.hpp in one of my projects by Michael Droettboom (can seem to find an online source anymore!), which adds mappings between numpy.ndarray and boost.ndarray, which is very neat and seemless. But like boost.python, it tightly couples with the clusterfuck that is bjam. However, something conceptually like that but integrated with XDress would be great. Indeed it does not sound too complicated; though I don't think I will get around to it anytime soon, unfortunately... On Thu, Feb 27, 2014 at 1:36 PM, Anthony Scopatz wrote: > > On Thu, Feb 27, 2014 at 1:51 AM, Eelco Hoogendoorn < > hoogendoorn.eelco at gmail.com> wrote: > >> Thanks for the heads up, I wasn't aware of this project. While >> boost.python is a very nice package, its distributability is nothing short >> of nonexistent, so its great to have a pure python binding generator. >> > > Thanks! > > >> One thing which I have often found frustrating is natural ndarray interop >> between python and C++. Is there a (planned) mechanism for mapping >> arbitrary strided python ndarrays to boost arrays? >> > > Not yet! The architecture is very modular (it is just a series of > plugins) so I would welcome anyone who wants to tackle this to take a look > into it. I don't think that it would be *that* hard. You'd just need to > write the Py-to-C++ and C++-to-Py converters for the boost array type. > This shouldn't be too hard since std::vector goes through pretty much > exactly the same mechanism for exposing to numpy. So there are already a > couple of examples of this workflow. Please feel free to jump on the > xdress mailing list if you want to discuss this in more depth! > > Also I didn't know about ndarray/Boost.NumPy. This seems like it could be > useful! > > Be Well > Anthony > > >> >> >> On Thu, Feb 27, 2014 at 1:24 AM, Anthony Scopatz wrote: >> >>> Hello All, >>> >>> I am *extremely *pleased to be able to announce the version 0.4 release >>> of xdress. This version contains much anticipated full support for Clang >>> as a parser! This is almost entirely due to the efforts of Geoffrey >>> Irving. Please thank him the next time you get a chance :) >>> >>> This release also contains a lot of other goodies that you can read >>> about in the release notes below. >>> >>> Happy Generating! >>> Anthony >>> >>> XDress 0.4 Release Notes >>> >>> XDress is a numpy-aware automatic wrapper generator for C/C++ written in >>> pure Python. Currently, xdress may generate Python bindings (via Cython) >>> for C++ classes, functions, and certain variable types. It also contains >>> idiomatic wrappers for C++ standard library containers (sets, vectors, >>> maps). In the future, other tools and bindings will be supported. >>> >>> The main enabling feature of xdress is a dynamic type system that was >>> designed with the purpose of API generation in mind. >>> >>> Release highlights: >>> >>> >>> - Clang support! All kudos to Geoffrey Irving! >>> - NumPy dtypes may be created independently of C++ STL vectors >>> - A complete test suite refactor >>> - Arbitrary source code locations >>> - Global run control files >>> - A plethora of useful bug fixes >>> >>> This version of xdress is *not* 100% backwards compatible with previous >>> versions of xdress. We apologize in the name of progress. It represents ans >>> impressive 245 files changed, 44917 aggregate line insertions (+), and 7893 >>> deletions (-). >>> >>> Please visit the website for more information: http://xdress.org/ >>> >>> Ask questions on the mailing list: >>> https://groups.google.com/forum/#!forum/xdress >>> >>> Download the code from GitHub: http://github.com/xdress/xdress >>> >>> XDress is free & open source (BSD 2-clause license) and requires Python >>> 2.7+, NumPy 1.5+, Cython 0.19+, and optionally Clang, GCC-XML, pycparser, >>> dOxygen, or lxml. >>> New Features >>> Clang Support >>> >>> Through the herculean efforts of Geoffrey Irving xdress finally has >>> full, first-class Clang/LLVM support! This is major advancement as it >>> allows xdress to wrap more modern versions of C++ than GCC-XML can handle. >>> Because of deficiencies in the existing libclang and Python bindings it was >>> necessary for us to fork libclang for xdress in the short term. We hope to >>> integrate these changes upstream. Clang versions 3.2 - 3.4 are supported. >>> Independent NumPy Dtypes >>> >>> In previous versions of xdress, to create a dtype of type T the user >>> needed to declare the desire for a wrapper of an STL vector of type T. >>> These two desires have now been separated. It is now possible to create a >>> dtype via the dtypes run control parameter. STL vectors are still >>> wrapped via dtypes. See the dtypes module for more information. >>> Shiny New Test Suite >>> >>> The xdress test suite has been completely revamped to include both unit >>> and integration tests which are run for all available parsers. The >>> integration tests are accomplished though two fake projects - cproj and >>> cppproj - on which the xdress CLI is run. These tests are now fully >>> platform independent, unlike the previous BASH-based test suite. >>> Source Paths >>> >>> Source file paths are now given by either their absolute or relative >>> path. This allows source code to be located anywhere on the user's file >>> system and enable the wrapping of dependencies or externally supplied >>> libraries as needed. The run control parametersourcedir has been >>> deprecated. >>> Global Run Control Files >>> >>> It is sometimes useful to be able to set system-wide run control >>> parameters. XDress will now search the following files in order of >>> increasing precedence. >>> >>> - $HOME/.xdressrc >>> - $HOME/.xdressrc.py >>> - $HOME/.config/xdressrc >>> - $HOME/.config/xdressrc.py >>> >>> $HOME is the user's home directory. Settings in the project run control >>> file take precedence over the values here. >>> Major Bug Fixes >>> >>> - Debug file now always written when in debug mode. >>> - STL sets of custom types now allowed. >>> - Template parameters now allowed to be enum values. >>> - Allow classes with no default constructor. >>> >>> Join in the Fun! >>> >>> If you are interested in using xdress on your project (and need help), >>> contributing back to xdress, starting up a development team, or writing >>> your own code generation plugin tool, please let us know. Participation is >>> very welcome! >>> Authors >>> >>> - Anthony Scopatz >>> - Geoffrey Irving * >>> - James Casbon * >>> - Kevin Tew * >>> - Spencer Lyon >>> - John Wiggins >>> - Matt McCormick >>> - Brad Buran >>> - Chris Harris * >>> - Gerald Dalley * >>> - Micky Latowicki * >>> - Mike C. Fletcher * >>> - Robert Schwarz * >>> >>> An * indicates a first time contributor. >>> Links >>> >>> 1. Homepage - http://xdress.org/ >>> 2. Mailing List - https://groups.google.com/forum/#!forum/xdress >>> 3. GitHub Organization - https://github.com/xdress >>> >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pyviennacl at tsmithe.net Thu Feb 27 08:19:36 2014 From: pyviennacl at tsmithe.net (Toby St Clere Smithe) Date: Thu, 27 Feb 2014 13:19:36 +0000 Subject: [Numpy-discussion] ANN: XDress v0.4 References: Message-ID: <87eh2ohevr.fsf@tsmithe.net> Eelco Hoogendoorn writes: > I have a file numpy_boost_python.hpp in one of my projects by Michael > Droettboom (can seem to find an online source anymore!), which adds > mappings between numpy.ndarray and boost.ndarray, which is very neat > and seemless. But like boost.python, it tightly couples with the > clusterfuck that is bjam. However, something conceptually like that but > integrated with XDress would be great. Indeed it does not sound too > complicated; though I don't think I will get around to it anytime soon, > unfortunately... You don't have to use bjam! I have built my projects with distutils and CMake, and never once touched bjam; CMake provides find_package scripts for Boost, Python and NumPy, and for distutils, I just include the relevant files and flags in my project. See [1] for a CMake example, and [2] for a distutils example. [1] https://github.com/tsmithe/viennacl-dev/blob/pyviennacl/CMakeLists.txt [2] https://github.com/viennacl/pyviennacl-dev/blob/master/setup.py Cheers, Toby From hoogendoorn.eelco at gmail.com Thu Feb 27 08:58:22 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Thu, 27 Feb 2014 14:58:22 +0100 Subject: [Numpy-discussion] ANN: XDress v0.4 In-Reply-To: <87eh2ohevr.fsf@tsmithe.net> References: <87eh2ohevr.fsf@tsmithe.net> Message-ID: That is good to know. The boost documentation makes it appear as if bjam is the only way to build boost.python, but good to see examples to the contrary! On Thu, Feb 27, 2014 at 2:19 PM, Toby St Clere Smithe < pyviennacl at tsmithe.net> wrote: > Eelco Hoogendoorn writes: > > I have a file numpy_boost_python.hpp in one of my projects by Michael > > Droettboom (can seem to find an online source anymore!), which adds > > mappings between numpy.ndarray and boost.ndarray, which is very neat > > and seemless. But like boost.python, it tightly couples with the > > clusterfuck that is bjam. However, something conceptually like that but > > integrated with XDress would be great. Indeed it does not sound too > > complicated; though I don't think I will get around to it anytime soon, > > unfortunately... > > You don't have to use bjam! I have built my projects with distutils and > CMake, and never once touched bjam; CMake provides find_package scripts > for Boost, Python and NumPy, and for distutils, I just include the > relevant files and flags in my project. > > See [1] for a CMake example, and [2] for a distutils example. > > [1] https://github.com/tsmithe/viennacl-dev/blob/pyviennacl/CMakeLists.txt > [2] https://github.com/viennacl/pyviennacl-dev/blob/master/setup.py > > > Cheers, > > Toby > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mail at tsmithe.net Thu Feb 27 09:01:08 2014 From: mail at tsmithe.net (Toby St Clere Smithe) Date: Thu, 27 Feb 2014 14:01:08 +0000 Subject: [Numpy-discussion] ANN: XDress v0.4 References: <87eh2ohevr.fsf@tsmithe.net> Message-ID: <87a9dchcyj.fsf@tsmithe.net> Eelco Hoogendoorn writes: > That is good to know. The boost documentation makes it appear as if bjam is > the only way to build boost.python, but good to see examples to the > contrary! Quite. I really wish they would officially adopt something more sensible, but in the meantime, using alternatives isn't too hard :) Toby From abergeron at gmail.com Thu Feb 27 14:11:24 2014 From: abergeron at gmail.com (Arnaud Bergeron) Date: Thu, 27 Feb 2014 14:11:24 -0500 Subject: [Numpy-discussion] Silencing NumPy output In-Reply-To: References: Message-ID: Since there was opposition to just removing the output all the time, I've added a new parameter, 'verbosity' that can be set to 0 to hide the output. This unfortunately requires a bit of code churn and changes the interface to get_info() (but in a backwards-compatible way). I came up with another option, which is to reuse system_info.verbosity to control whether that output is shown/hidden. It is already documented as doing so even if that is mostly ignored in the code. I've made another PR https://github.com/numpy/numpy/pull/4387 to show what I mean. This would supersede #4081 if accepted. 2013-11-27 13:44 GMT-05:00 Fr?d?ric Bastien : > Hi, > > After more investigation, I found that there already exist a way to > suppress those message on posix system. So I reused it in the PR. That > way, it was faster, but prevent change in that area. So there is less > change of breaking other syste: > > https://github.com/numpy/numpy/pull/4081 > > > But it remove the stdout when we run this command: > > numpy.distutils.system_info.get_info("blas_opt") > > But during compilation, we still have the info about what is found: > > atlas_blas_threads_info: > Setting PTATLAS=ATLAS > Setting PTATLAS=ATLAS > customize Gnu95FCompiler > Found executable /usr/bin/gfortran > customize Gnu95FCompiler > customize Gnu95FCompiler using config > compiling '_configtest.c': > > /* This file is generated from numpy/distutils/system_info.py */ > void ATL_buildinfo(void); > int main(void) { > ATL_buildinfo(); > return 0; > } > > C compiler: gcc -pthread -fno-strict-aliasing -DNDEBUG -g -O2 -fPIC > > compile options: '-c' > gcc: _configtest.c > gcc -pthread _configtest.o -L/usr/lib64/atlas -lptf77blas -lptcblas > -latlas -o _configtest > success! > removing: _configtest.c _configtest.o _configtest > Setting PTATLAS=ATLAS > FOUND: > libraries = ['ptf77blas', 'ptcblas', 'atlas'] > library_dirs = ['/usr/lib64/atlas'] > language = c > define_macros = [('ATLAS_INFO', '"\\"3.8.3\\""')] > include_dirs = ['/usr/include'] > > FOUND: > libraries = ['ptf77blas', 'ptcblas', 'atlas'] > library_dirs = ['/usr/lib64/atlas'] > language = c > define_macros = [('ATLAS_INFO', '"\\"3.8.3\\""')] > include_dirs = ['/usr/include'] > > non-existing path in 'numpy/lib': 'benchmarks' > lapack_opt_info: > lapack_mkl_info: > mkl_info: > libraries mkl,vml,guide not found in > ['/opt/lisa/os_v2/common/Canopy_64bit/User/lib', '/usr/local/lib64', > '/usr/local/lib', '/usr/lib64', '/usr/lib'] > NOT AVAILABLE > > NOT AVAILABLE > > atlas_threads_info: > Setting PTATLAS=ATLAS > libraries ptf77blas,ptcblas,atlas not found in > /opt/lisa/os_v2/common/Canopy_64bit/User/lib > libraries lapack_atlas not found in > /opt/lisa/os_v2/common/Canopy_64bit/User/lib > libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib64 > libraries lapack_atlas not found in /usr/local/lib64 > libraries ptf77blas,ptcblas,atlas not found in /usr/local/lib > libraries lapack_atlas not found in /usr/local/lib > libraries lapack_atlas not found in /usr/lib64/atlas > numpy.distutils.system_info.atlas_threads_info > Setting PTATLAS=ATLAS > Setting PTATLAS=ATLAS > FOUND: > libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas'] > library_dirs = ['/usr/lib64/atlas'] > language = f77 > define_macros = [('ATLAS_INFO', '"\\"3.8.3\\""')] > include_dirs = ['/usr/include'] > > FOUND: > libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas'] > library_dirs = ['/usr/lib64/atlas'] > language = f77 > define_macros = [('ATLAS_INFO', '"\\"3.8.3\\""')] > include_dirs = ['/usr/include'] > > Fr?d?ric > > On Fri, Nov 22, 2013 at 4:26 PM, Fr?d?ric Bastien wrote: > > I didn't forgot this, but I got side tracked. Here is the Theano code > > I would like to try to use to replace os.system: > > > > https://github.com/Theano/Theano/blob/master/theano/misc/windows.py > > > > But I won't be able to try this before next week. > > > > Fred > > > > On Fri, Nov 15, 2013 at 5:49 PM, David Cournapeau > wrote: > >> > >> > >> > >> On Fri, Nov 15, 2013 at 7:41 PM, Robert Kern > wrote: > >>> > >>> On Fri, Nov 15, 2013 at 7:28 PM, David Cournapeau > >>> wrote: > >>> > > >>> > On Fri, Nov 15, 2013 at 6:21 PM, Charles R Harris > >>> > wrote: > >>> > >>> >> Sure, give it a shot. Looks like subprocess.Popen was intended to > >>> >> replace os.system in any case. > >>> > > >>> > Except that output is not 'real time' with straight Popen, and doing > so > >>> > reliably on every platform (cough - windows - cough) is not > completely > >>> > trivial. You also have to handle buffered output, etc... That code > is very > >>> > fragile, so this would be quite a lot of testing to change, and I am > not > >>> > sure it worths it. > >>> > >>> It doesn't have to be "real time". Just use .communicate() and print > out > >>> the stdout and stderr to their appropriate streams after the subprocess > >>> finishes. > >> > >> > >> Indeed, it does not have to be, but that's useful for debugging > compilation > >> issues (not so much for numpy itself, but for some packages which have > files > >> that takes a very long time to build, like scipy.sparsetools or > bottleneck). > >> > >> That's a minor point compared to the potential issues when building on > >> windows, though. > >> > >> David > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Thu Feb 27 15:05:11 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Thu, 27 Feb 2014 21:05:11 +0100 Subject: [Numpy-discussion] last call for fixes for numpy 1.8.1rc1 Message-ID: <530F9A77.1090004@googlemail.com> hi, We want to start preparing the release candidate for the bugfix release 1.8.1rc1 this weekend, I'll start preparing the changelog tomorrow. So if you want a certain issue fixed please scream now or better create a pull request/patch on the maintenance/1.8.x branch. Please only consider bugfixes, no enhancements (unless they are really really simple), new features or invasive changes. I just finished my list of issues I want backported to numpy 1.8 (gh-4390, gh-4388). Please check if its already included in these PRs. I'm probably still going to add gh-4284 after some though tomorrow. Cheers, Julian From alan.isaac at gmail.com Thu Feb 27 21:11:33 2014 From: alan.isaac at gmail.com (Alan G Isaac) Date: Thu, 27 Feb 2014 21:11:33 -0500 Subject: [Numpy-discussion] invert bincounts? Message-ID: <530FF055.8050202@gmail.com> I have a bincount array `cts`. I'd like to produce any one array `a` such that `cts==np.bincounts(a)`. Easy to do in a loop, but does NumPy offer a better (i.e., faster) way? Thanks, Alan Isaac From jaime.frio at gmail.com Thu Feb 27 22:01:41 2014 From: jaime.frio at gmail.com (=?ISO-8859-1?Q?Jaime_Fern=E1ndez_del_R=EDo?=) Date: Thu, 27 Feb 2014 19:01:41 -0800 Subject: [Numpy-discussion] invert bincounts? In-Reply-To: <530FF055.8050202@gmail.com> References: <530FF055.8050202@gmail.com> Message-ID: On Thu, Feb 27, 2014 at 6:11 PM, Alan G Isaac wrote: > I have a bincount array `cts`. > I'd like to produce any one array `a` such that `cts==np.bincounts(a)`. > Easy to do in a loop, but does NumPy offer a better (i.e., faster) way? > >>> cts = np.bincount([1,1,2,3,4,4,6]) >>> np.repeat(np.arange(len(cts)), cts) array([1, 1, 2, 3, 4, 4, 6]) Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ay?dale en sus planes de dominaci?n mundial. -------------- next part -------------- An HTML attachment was scrubbed... URL: From scopatz at gmail.com Fri Feb 28 00:49:59 2014 From: scopatz at gmail.com (Anthony Scopatz) Date: Thu, 27 Feb 2014 23:49:59 -0600 Subject: [Numpy-discussion] ndarray is not a sequence Message-ID: Hello All, The semantics of this seem quite insane to me: In [1]: import numpy as np In [2]: import collections In [4]: isinstance(np.arange(5), collections.Sequence) Out[4]: False In [6]: np.version.full_version Out[6]: '1.9.0.dev-eb40f65' Is there any possibility that ndarray could inherit (in the last place) from collections.Sequence? It seems like this would only be a 1 - 5 line fix somewhere. I just spent a few hours tracking down a bug related to this. Thanks for considering! Be Well Anthony -------------- next part -------------- An HTML attachment was scrubbed... URL: From sturla.molden at gmail.com Fri Feb 28 03:47:26 2014 From: sturla.molden at gmail.com (Sturla Molden) Date: Fri, 28 Feb 2014 08:47:26 +0000 (UTC) Subject: [Numpy-discussion] ndarray is not a sequence References: Message-ID: <780064138415269619.294722sturla.molden-gmail.com@news.gmane.org> Anthony Scopatz wrote: > Hello All, > > The semantics of this seem quite insane to me: > > In [1]: import numpy as np > > In [2]: import collections > > In [4]: isinstance(np.arange(5), collections.Sequence) Out[4]: False > > In [6]: np.version.full_version > Out[6]: '1.9.0.dev-eb40f65' > > Is there any possibility that ndarray could inherit (in the last place) > from collections.Sequence? It seems like this would only be a 1 - 5 line > fix somewhere. I just spent a few hours tracking down a bug related to > this. Thanks for considering! > This should be very easy to do. But what would this give us, and what would the extra overhead be? collections.Sequence is basically an abstract base class. If this just slows down ndarray it would be highly undesirable. Note that ndarray has a very specific use (numerical computing). If inheriting collections.Sequence has no benefit for numerical computing it is just wasteful overhead. In this resepect ndarray is very different for other Python containers in that they have no specific use and computational performance is not a big issue. Sturla From sebastian at sipsolutions.net Fri Feb 28 04:03:42 2014 From: sebastian at sipsolutions.net (Sebastian Berg) Date: Fri, 28 Feb 2014 10:03:42 +0100 Subject: [Numpy-discussion] ndarray is not a sequence In-Reply-To: <780064138415269619.294722sturla.molden-gmail.com@news.gmane.org> References: <780064138415269619.294722sturla.molden-gmail.com@news.gmane.org> Message-ID: <1393578222.6392.2.camel@sebastian-t440> On Fr, 2014-02-28 at 08:47 +0000, Sturla Molden wrote: > Anthony Scopatz wrote: > > Hello All, > > > > The semantics of this seem quite insane to me: > > > > In [1]: import numpy as np > > > > In [2]: import collections > > > > In [4]: isinstance(np.arange(5), collections.Sequence) Out[4]: False > > > > In [6]: np.version.full_version > > Out[6]: '1.9.0.dev-eb40f65' > > > > Is there any possibility that ndarray could inherit (in the last place) > > from collections.Sequence? It seems like this would only be a 1 - 5 line > > fix somewhere. I just spent a few hours tracking down a bug related to > > this. Thanks for considering! > > > > This should be very easy to do. But what would this give us, and what would > the extra overhead be? collections.Sequence is basically an abstract base > class. If this just slows down ndarray it would be highly undesirable. Note > that ndarray has a very specific use (numerical computing). If inheriting > collections.Sequence has no benefit for numerical computing it is just > wasteful overhead. In this resepect ndarray is very different for other > Python containers in that they have no specific use and computational > performance is not a big issue. > There is no overhead for the array itself. The biggest concern is about corner cases like 0-d arrays. That said we probably need to do it anyway because the sequence check like that seems standard in python 3. There is an issue about it open on github with some discussion about this issue. - Sebastian > Sturla > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From robert.kern at gmail.com Fri Feb 28 05:59:31 2014 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 28 Feb 2014 10:59:31 +0000 Subject: [Numpy-discussion] ndarray is not a sequence In-Reply-To: <1393578222.6392.2.camel@sebastian-t440> References: <780064138415269619.294722sturla.molden-gmail.com@news.gmane.org> <1393578222.6392.2.camel@sebastian-t440> Message-ID: On Fri, Feb 28, 2014 at 9:03 AM, Sebastian Berg wrote: > On Fr, 2014-02-28 at 08:47 +0000, Sturla Molden wrote: >> Anthony Scopatz wrote: >> > Hello All, >> > >> > The semantics of this seem quite insane to me: >> > >> > In [1]: import numpy as np >> > >> > In [2]: import collections >> > >> > In [4]: isinstance(np.arange(5), collections.Sequence) Out[4]: False >> > >> > In [6]: np.version.full_version >> > Out[6]: '1.9.0.dev-eb40f65' >> > >> > Is there any possibility that ndarray could inherit (in the last place) >> > from collections.Sequence? It seems like this would only be a 1 - 5 line >> > fix somewhere. I just spent a few hours tracking down a bug related to >> > this. Thanks for considering! >> >> This should be very easy to do. But what would this give us, and what would >> the extra overhead be? collections.Sequence is basically an abstract base >> class. If this just slows down ndarray it would be highly undesirable. Note >> that ndarray has a very specific use (numerical computing). If inheriting >> collections.Sequence has no benefit for numerical computing it is just >> wasteful overhead. In this resepect ndarray is very different for other >> Python containers in that they have no specific use and computational >> performance is not a big issue. > > There is no overhead for the array itself. Right, since it's an abstract base class, we don't need to subclass from Sequence, just register ndarray with it. > The biggest concern is about > corner cases like 0-d arrays. I think it's reasonable to allow it. The pre-ABC way to check this kind of thing also gives a false positive on 0-d arrays, so we're not regressing. [~] |1> import operator [~] |2> operator.isSequenceType(np.array(5)) True > That said we probably need to do it anyway > because the sequence check like that seems standard in python 3. There > is an issue about it open on github with some discussion about this > issue. https://github.com/numpy/numpy/issues/2776 Also, while we're doing this, we should also register the scalar types with their appropriate ABCs: numbers.Real.register(np.floating) numbers.Integral.register(np.integer) numbers.Complex.register(np.complexfloating) -- Robert Kern On Fri, Feb 28, 2014 at 9:03 AM, Sebastian Berg wrote: > On Fr, 2014-02-28 at 08:47 +0000, Sturla Molden wrote: >> Anthony Scopatz wrote: >> > Hello All, >> > >> > The semantics of this seem quite insane to me: >> > >> > In [1]: import numpy as np >> > >> > In [2]: import collections >> > >> > In [4]: isinstance(np.arange(5), collections.Sequence) Out[4]: False >> > >> > In [6]: np.version.full_version >> > Out[6]: '1.9.0.dev-eb40f65' >> > >> > Is there any possibility that ndarray could inherit (in the last place) >> > from collections.Sequence? It seems like this would only be a 1 - 5 line >> > fix somewhere. I just spent a few hours tracking down a bug related to >> > this. Thanks for considering! >> > >> >> This should be very easy to do. But what would this give us, and what would >> the extra overhead be? collections.Sequence is basically an abstract base >> class. If this just slows down ndarray it would be highly undesirable. Note >> that ndarray has a very specific use (numerical computing). If inheriting >> collections.Sequence has no benefit for numerical computing it is just >> wasteful overhead. In this resepect ndarray is very different for other >> Python containers in that they have no specific use and computational >> performance is not a big issue. >> > > There is no overhead for the array itself. The biggest concern is about > corner cases like 0-d arrays. That said we probably need to do it anyway > because the sequence check like that seems standard in python 3. There > is an issue about it open on github with some discussion about this > issue. > > - Sebastian > > >> Sturla >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Robert Kern From francesc at continuum.io Fri Feb 28 06:41:00 2014 From: francesc at continuum.io (Francesc Alted) Date: Fri, 28 Feb 2014 12:41:00 +0100 Subject: [Numpy-discussion] last call for fixes for numpy 1.8.1rc1 In-Reply-To: <530F9A77.1090004@googlemail.com> References: <530F9A77.1090004@googlemail.com> Message-ID: <531075CC.6060006@continuum.io> Hi Julian, Any chance that NPY_MAXARGS could be increased to something more than the current value of 32? There is a discussion about this in: https://github.com/numpy/numpy/pull/226 but I think that, as Charles was suggesting, just increasing NPY_MAXARGS to something more reasonable (say 256) should be enough for a long while. This issue limits quite a bit the number of operands in numexpr expressions, and hence, to other projects that depends on it, like PyTables or pandas. See for example this bug report: https://github.com/PyTables/PyTables/issues/286 Thanks, Francesc On 2/27/14, 9:05 PM, Julian Taylor wrote: > hi, > > We want to start preparing the release candidate for the bugfix release > 1.8.1rc1 this weekend, I'll start preparing the changelog tomorrow. > > So if you want a certain issue fixed please scream now or better create > a pull request/patch on the maintenance/1.8.x branch. > Please only consider bugfixes, no enhancements (unless they are really > really simple), new features or invasive changes. > > I just finished my list of issues I want backported to numpy 1.8 > (gh-4390, gh-4388). Please check if its already included in these PRs. > I'm probably still going to add gh-4284 after some though tomorrow. > > Cheers, > Julian > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Francesc Alted From jtaylor.debian at googlemail.com Fri Feb 28 07:09:49 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 28 Feb 2014 13:09:49 +0100 Subject: [Numpy-discussion] last call for fixes for numpy 1.8.1rc1 In-Reply-To: <531075CC.6060006@continuum.io> References: <530F9A77.1090004@googlemail.com> <531075CC.6060006@continuum.io> Message-ID: <53107C8D.2040803@googlemail.com> hm increasing it for PyArrayMapIterObject would break the public ABI. While nobody should be using this part of the ABI its not appropriate for a bugfix release. Note that as it currently stands in numpy 1.9.dev we will break this ABI for the indexing improvements. Though for nditer and some other functions we could change it if thats enough. It would bump some temporary arrays of nditer from 32kb to 128kb, I think that would still be fine, but getting to the point where we should move them onto the heap. On 28.02.2014 12:41, Francesc Alted wrote: > Hi Julian, > > Any chance that NPY_MAXARGS could be increased to something more than > the current value of 32? There is a discussion about this in: > > https://github.com/numpy/numpy/pull/226 > > but I think that, as Charles was suggesting, just increasing NPY_MAXARGS > to something more reasonable (say 256) should be enough for a long while. > > This issue limits quite a bit the number of operands in numexpr > expressions, and hence, to other projects that depends on it, like > PyTables or pandas. See for example this bug report: > > https://github.com/PyTables/PyTables/issues/286 > > Thanks, > Francesc > > On 2/27/14, 9:05 PM, Julian Taylor wrote: >> hi, >> >> We want to start preparing the release candidate for the bugfix release >> 1.8.1rc1 this weekend, I'll start preparing the changelog tomorrow. >> >> So if you want a certain issue fixed please scream now or better create >> a pull request/patch on the maintenance/1.8.x branch. >> Please only consider bugfixes, no enhancements (unless they are really >> really simple), new features or invasive changes. >> >> I just finished my list of issues I want backported to numpy 1.8 >> (gh-4390, gh-4388). Please check if its already included in these PRs. >> I'm probably still going to add gh-4284 after some though tomorrow. >> >> Cheers, >> Julian >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From francesc at continuum.io Fri Feb 28 07:32:23 2014 From: francesc at continuum.io (Francesc Alted) Date: Fri, 28 Feb 2014 13:32:23 +0100 Subject: [Numpy-discussion] last call for fixes for numpy 1.8.1rc1 In-Reply-To: <53107C8D.2040803@googlemail.com> References: <530F9A77.1090004@googlemail.com> <531075CC.6060006@continuum.io> <53107C8D.2040803@googlemail.com> Message-ID: <531081D7.2010809@continuum.io> Well, what numexpr is using is basically NpyIter_AdvancedNew: https://github.com/pydata/numexpr/blob/master/numexpr/interpreter.cpp#L1178 and nothing else. If NPY_MAXARGS could be increased just for that, and without ABI breaking, then fine. If not, we should have to wait until 1.9 I am afraid. On the other hand, increasing the temporary arrays in nditer from 32kb to 128kb is a bit worrying, but probably we should do some benchmarks and see how much performance would be compromised (if any). Francesc On 2/28/14, 1:09 PM, Julian Taylor wrote: > hm increasing it for PyArrayMapIterObject would break the public ABI. > While nobody should be using this part of the ABI its not appropriate > for a bugfix release. > Note that as it currently stands in numpy 1.9.dev we will break this ABI > for the indexing improvements. > > Though for nditer and some other functions we could change it if thats > enough. > It would bump some temporary arrays of nditer from 32kb to 128kb, I > think that would still be fine, but getting to the point where we should > move them onto the heap. > > On 28.02.2014 12:41, Francesc Alted wrote: >> Hi Julian, >> >> Any chance that NPY_MAXARGS could be increased to something more than >> the current value of 32? There is a discussion about this in: >> >> https://github.com/numpy/numpy/pull/226 >> >> but I think that, as Charles was suggesting, just increasing NPY_MAXARGS >> to something more reasonable (say 256) should be enough for a long while. >> >> This issue limits quite a bit the number of operands in numexpr >> expressions, and hence, to other projects that depends on it, like >> PyTables or pandas. See for example this bug report: >> >> https://github.com/PyTables/PyTables/issues/286 >> >> Thanks, >> Francesc >> >> On 2/27/14, 9:05 PM, Julian Taylor wrote: >>> hi, >>> >>> We want to start preparing the release candidate for the bugfix release >>> 1.8.1rc1 this weekend, I'll start preparing the changelog tomorrow. >>> >>> So if you want a certain issue fixed please scream now or better create >>> a pull request/patch on the maintenance/1.8.x branch. >>> Please only consider bugfixes, no enhancements (unless they are really >>> really simple), new features or invasive changes. >>> >>> I just finished my list of issues I want backported to numpy 1.8 >>> (gh-4390, gh-4388). Please check if its already included in these PRs. >>> I'm probably still going to add gh-4284 after some though tomorrow. >>> >>> Cheers, >>> Julian >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Francesc Alted From jtaylor.debian at googlemail.com Fri Feb 28 07:52:18 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Fri, 28 Feb 2014 13:52:18 +0100 Subject: [Numpy-discussion] last call for fixes for numpy 1.8.1rc1 In-Reply-To: <531081D7.2010809@continuum.io> References: <530F9A77.1090004@googlemail.com> <531075CC.6060006@continuum.io> <53107C8D.2040803@googlemail.com> <531081D7.2010809@continuum.io> Message-ID: <53108682.1060906@googlemail.com> performance should not be impacted as long as we stay on the stack, it just increases offset of a stack pointer a bit more. E.g. nditer and einsum use temporary stack arrays of this type for its initialization: op_axes_arrays[NPY_MAXARGS][NPY_MAXDIMS]; // both 32 currently The resulting nditer structure is then in heap space and dependent on the real amount of arguments it got. So I'm more worried about running out of stack space, though the limit is usually 8mb so taking 128kb for a short while should be ok. On 28.02.2014 13:32, Francesc Alted wrote: > Well, what numexpr is using is basically NpyIter_AdvancedNew: > > https://github.com/pydata/numexpr/blob/master/numexpr/interpreter.cpp#L1178 > > and nothing else. If NPY_MAXARGS could be increased just for that, and > without ABI breaking, then fine. If not, we should have to wait until > 1.9 I am afraid. > > On the other hand, increasing the temporary arrays in nditer from 32kb > to 128kb is a bit worrying, but probably we should do some benchmarks > and see how much performance would be compromised (if any). > > Francesc > > On 2/28/14, 1:09 PM, Julian Taylor wrote: >> hm increasing it for PyArrayMapIterObject would break the public ABI. >> While nobody should be using this part of the ABI its not appropriate >> for a bugfix release. >> Note that as it currently stands in numpy 1.9.dev we will break this ABI >> for the indexing improvements. >> >> Though for nditer and some other functions we could change it if thats >> enough. >> It would bump some temporary arrays of nditer from 32kb to 128kb, I >> think that would still be fine, but getting to the point where we should >> move them onto the heap. >> >> On 28.02.2014 12:41, Francesc Alted wrote: >>> Hi Julian, >>> >>> Any chance that NPY_MAXARGS could be increased to something more than >>> the current value of 32? There is a discussion about this in: >>> >>> https://github.com/numpy/numpy/pull/226 >>> >>> but I think that, as Charles was suggesting, just increasing NPY_MAXARGS >>> to something more reasonable (say 256) should be enough for a long while. >>> >>> This issue limits quite a bit the number of operands in numexpr >>> expressions, and hence, to other projects that depends on it, like >>> PyTables or pandas. See for example this bug report: >>> >>> https://github.com/PyTables/PyTables/issues/286 >>> >>> Thanks, >>> Francesc >>> >>> On 2/27/14, 9:05 PM, Julian Taylor wrote: >>>> hi, >>>> >>>> We want to start preparing the release candidate for the bugfix release >>>> 1.8.1rc1 this weekend, I'll start preparing the changelog tomorrow. >>>> >>>> So if you want a certain issue fixed please scream now or better create >>>> a pull request/patch on the maintenance/1.8.x branch. >>>> Please only consider bugfixes, no enhancements (unless they are really >>>> really simple), new features or invasive changes. >>>> >>>> I just finished my list of issues I want backported to numpy 1.8 >>>> (gh-4390, gh-4388). Please check if its already included in these PRs. >>>> I'm probably still going to add gh-4284 after some though tomorrow. >>>> >>>> Cheers, >>>> Julian >>>> _______________________________________________ >>>> NumPy-Discussion mailing list >>>> NumPy-Discussion at scipy.org >>>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From charlesr.harris at gmail.com Fri Feb 28 09:00:57 2014 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 28 Feb 2014 07:00:57 -0700 Subject: [Numpy-discussion] last call for fixes for numpy 1.8.1rc1 In-Reply-To: <53108682.1060906@googlemail.com> References: <530F9A77.1090004@googlemail.com> <531075CC.6060006@continuum.io> <53107C8D.2040803@googlemail.com> <531081D7.2010809@continuum.io> <53108682.1060906@googlemail.com> Message-ID: On Fri, Feb 28, 2014 at 5:52 AM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > performance should not be impacted as long as we stay on the stack, it > just increases offset of a stack pointer a bit more. > E.g. nditer and einsum use temporary stack arrays of this type for its > initialization: > op_axes_arrays[NPY_MAXARGS][NPY_MAXDIMS]; // both 32 currently > The resulting nditer structure is then in heap space and dependent on > the real amount of arguments it got. > So I'm more worried about running out of stack space, though the limit > is usually 8mb so taking 128kb for a short while should be ok. > > On 28.02.2014 13:32, Francesc Alted wrote: > > Well, what numexpr is using is basically NpyIter_AdvancedNew: > > > > > https://github.com/pydata/numexpr/blob/master/numexpr/interpreter.cpp#L1178 > > > > and nothing else. If NPY_MAXARGS could be increased just for that, and > > without ABI breaking, then fine. If not, we should have to wait until > > 1.9 I am afraid. > > > > On the other hand, increasing the temporary arrays in nditer from 32kb > > to 128kb is a bit worrying, but probably we should do some benchmarks > > and see how much performance would be compromised (if any). > > > > Francesc > > > > On 2/28/14, 1:09 PM, Julian Taylor wrote: > >> hm increasing it for PyArrayMapIterObject would break the public ABI. > >> While nobody should be using this part of the ABI its not appropriate > >> for a bugfix release. > >> Note that as it currently stands in numpy 1.9.dev we will break this ABI > >> for the indexing improvements. > >> > >> Though for nditer and some other functions we could change it if thats > >> enough. > >> It would bump some temporary arrays of nditer from 32kb to 128kb, I > >> think that would still be fine, but getting to the point where we should > >> move them onto the heap. > These sort of changes can have subtle side effects and need lots of testing in a release cycle. Bugfix release cycles are kept short by restricting changes to those that look simple and safe. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ghisvail at gmail.com Fri Feb 28 09:31:11 2014 From: ghisvail at gmail.com (Ghislain Vaillant) Date: Fri, 28 Feb 2014 14:31:11 +0000 Subject: [Numpy-discussion] proper way to test Numpy version in C/C++ Message-ID: Hi everyone, I have got code for some python wrappers of a scientific library which needs to support both Numpy 1.6 and later versions. The build of the wrapper (using swig) stopped working because of the deprecated API introduced in v1.7. The error only concerns the renaming of some macros from NPY_XXX to NPY_ARRAY_XXX. I was thinking to just check for the Numpy version at build time and add corresponding #define to provide the necessary renaming in case the build is done with Numpy v1.6. How can I robustly test for Numpy's version API in C ? Ghis -------------- next part -------------- An HTML attachment was scrubbed... URL: From scopatz at gmail.com Fri Feb 28 10:10:41 2014 From: scopatz at gmail.com (Anthony Scopatz) Date: Fri, 28 Feb 2014 09:10:41 -0600 Subject: [Numpy-discussion] ndarray is not a sequence In-Reply-To: References: <780064138415269619.294722sturla.molden-gmail.com@news.gmane.org> <1393578222.6392.2.camel@sebastian-t440> Message-ID: Thanks All, I am sorry I missed the issue. (I still can't seem to find it, actually.) I agree that there would be minimal overhead here and I bet that would be easy to show. I really look forward to seeing this get in! Be Well Anthony On Fri, Feb 28, 2014 at 4:59 AM, Robert Kern wrote: > On Fri, Feb 28, 2014 at 9:03 AM, Sebastian Berg > wrote: > > On Fr, 2014-02-28 at 08:47 +0000, Sturla Molden wrote: > >> Anthony Scopatz wrote: > >> > Hello All, > >> > > >> > The semantics of this seem quite insane to me: > >> > > >> > In [1]: import numpy as np > >> > > >> > In [2]: import collections > >> > > >> > In [4]: isinstance(np.arange(5), collections.Sequence) Out[4]: False > >> > > >> > In [6]: np.version.full_version > >> > Out[6]: '1.9.0.dev-eb40f65' > >> > > >> > Is there any possibility that ndarray could inherit (in the last > place) > >> > from collections.Sequence? It seems like this would only be a 1 - 5 > line > >> > fix somewhere. I just spent a few hours tracking down a bug related > to > >> > this. Thanks for considering! > >> > >> This should be very easy to do. But what would this give us, and what > would > >> the extra overhead be? collections.Sequence is basically an abstract > base > >> class. If this just slows down ndarray it would be highly undesirable. > Note > >> that ndarray has a very specific use (numerical computing). If > inheriting > >> collections.Sequence has no benefit for numerical computing it is just > >> wasteful overhead. In this resepect ndarray is very different for other > >> Python containers in that they have no specific use and computational > >> performance is not a big issue. > > > > There is no overhead for the array itself. > > Right, since it's an abstract base class, we don't need to subclass > from Sequence, just register ndarray with it. > > > The biggest concern is about > > corner cases like 0-d arrays. > > I think it's reasonable to allow it. The pre-ABC way to check this > kind of thing also gives a false positive on 0-d arrays, so we're not > regressing. > > [~] > |1> import operator > > [~] > |2> operator.isSequenceType(np.array(5)) > True > > > That said we probably need to do it anyway > > because the sequence check like that seems standard in python 3. There > > is an issue about it open on github with some discussion about this > > issue. > > https://github.com/numpy/numpy/issues/2776 > > Also, while we're doing this, we should also register the scalar types > with their appropriate ABCs: > > numbers.Real.register(np.floating) > numbers.Integral.register(np.integer) > numbers.Complex.register(np.complexfloating) > > -- > Robert Kern > > On Fri, Feb 28, 2014 at 9:03 AM, Sebastian Berg > wrote: > > On Fr, 2014-02-28 at 08:47 +0000, Sturla Molden wrote: > >> Anthony Scopatz wrote: > >> > Hello All, > >> > > >> > The semantics of this seem quite insane to me: > >> > > >> > In [1]: import numpy as np > >> > > >> > In [2]: import collections > >> > > >> > In [4]: isinstance(np.arange(5), collections.Sequence) Out[4]: False > >> > > >> > In [6]: np.version.full_version > >> > Out[6]: '1.9.0.dev-eb40f65' > >> > > >> > Is there any possibility that ndarray could inherit (in the last > place) > >> > from collections.Sequence? It seems like this would only be a 1 - 5 > line > >> > fix somewhere. I just spent a few hours tracking down a bug related > to > >> > this. Thanks for considering! > >> > > >> > >> This should be very easy to do. But what would this give us, and what > would > >> the extra overhead be? collections.Sequence is basically an abstract > base > >> class. If this just slows down ndarray it would be highly undesirable. > Note > >> that ndarray has a very specific use (numerical computing). If > inheriting > >> collections.Sequence has no benefit for numerical computing it is just > >> wasteful overhead. In this resepect ndarray is very different for other > >> Python containers in that they have no specific use and computational > >> performance is not a big issue. > >> > > > > There is no overhead for the array itself. The biggest concern is about > > corner cases like 0-d arrays. That said we probably need to do it anyway > > because the sequence check like that seems standard in python 3. There > > is an issue about it open on github with some discussion about this > > issue. > > > > - Sebastian > > > > > >> Sturla > >> > >> _______________________________________________ > >> NumPy-Discussion mailing list > >> NumPy-Discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > -- > Robert Kern > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Feb 28 10:23:11 2014 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 28 Feb 2014 15:23:11 +0000 Subject: [Numpy-discussion] ndarray is not a sequence In-Reply-To: References: <780064138415269619.294722sturla.molden-gmail.com@news.gmane.org> <1393578222.6392.2.camel@sebastian-t440> Message-ID: On Fri, Feb 28, 2014 at 3:10 PM, Anthony Scopatz wrote: > Thanks All, > > I am sorry I missed the issue. (I still can't seem to find it, actually.) https://github.com/numpy/numpy/issues/2776 > I agree that there would be minimal overhead here and I bet that would be > easy to show. I really look forward to seeing this get in! There is *no* overhead to ndarray because you don't actually subclass. -- Robert Kern From chris.barker at noaa.gov Fri Feb 28 10:34:21 2014 From: chris.barker at noaa.gov (Chris Barker - NOAA Federal) Date: Fri, 28 Feb 2014 07:34:21 -0800 Subject: [Numpy-discussion] ndarray is not a sequence In-Reply-To: <1393578222.6392.2.camel@sebastian-t440> References: <780064138415269619.294722sturla.molden-gmail.com@news.gmane.org> <1393578222.6392.2.camel@sebastian-t440> Message-ID: <-5567695652207982944@unknownmsgid> On Feb 28, 2014, at 1:04 AM, Sebastian Berg wrote: > > because the sequence check like that seems standard in python 3. Whatever happened to duck typing? Sigh. -Chris From francesc at continuum.io Fri Feb 28 11:17:10 2014 From: francesc at continuum.io (Francesc Alted) Date: Fri, 28 Feb 2014 17:17:10 +0100 Subject: [Numpy-discussion] last call for fixes for numpy 1.8.1rc1 In-Reply-To: References: <530F9A77.1090004@googlemail.com> <531075CC.6060006@continuum.io> <53107C8D.2040803@googlemail.com> <531081D7.2010809@continuum.io> <53108682.1060906@googlemail.com> Message-ID: <5310B686.1050802@continuum.io> On 2/28/14, 3:00 PM, Charles R Harris wrote: > > > > On Fri, Feb 28, 2014 at 5:52 AM, Julian Taylor > > > wrote: > > performance should not be impacted as long as we stay on the stack, it > just increases offset of a stack pointer a bit more. > E.g. nditer and einsum use temporary stack arrays of this type for its > initialization: > op_axes_arrays[NPY_MAXARGS][NPY_MAXDIMS]; // both 32 currently > The resulting nditer structure is then in heap space and dependent on > the real amount of arguments it got. > So I'm more worried about running out of stack space, though the limit > is usually 8mb so taking 128kb for a short while should be ok. > > On 28.02.2014 13:32, Francesc Alted wrote: > > Well, what numexpr is using is basically NpyIter_AdvancedNew: > > > > > https://github.com/pydata/numexpr/blob/master/numexpr/interpreter.cpp#L1178 > > > > and nothing else. If NPY_MAXARGS could be increased just for > that, and > > without ABI breaking, then fine. If not, we should have to wait > until > > 1.9 I am afraid. > > > > On the other hand, increasing the temporary arrays in nditer > from 32kb > > to 128kb is a bit worrying, but probably we should do some > benchmarks > > and see how much performance would be compromised (if any). > > > > Francesc > > > > On 2/28/14, 1:09 PM, Julian Taylor wrote: > >> hm increasing it for PyArrayMapIterObject would break the > public ABI. > >> While nobody should be using this part of the ABI its not > appropriate > >> for a bugfix release. > >> Note that as it currently stands in numpy 1.9.dev we will break > this ABI > >> for the indexing improvements. > >> > >> Though for nditer and some other functions we could change it > if thats > >> enough. > >> It would bump some temporary arrays of nditer from 32kb to 128kb, I > >> think that would still be fine, but getting to the point where > we should > >> move them onto the heap. > > > These sort of changes can have subtle side effects and need lots of > testing in a release cycle. Bugfix release cycles are kept short by > restricting changes to those that look simple and safe. Agreed. I have just opened a ticket for having this in mind for NumPy 1.9: https://github.com/numpy/numpy/issues/4398 -- Francesc Alted From chris.barker at noaa.gov Fri Feb 28 11:16:30 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 28 Feb 2014 08:16:30 -0800 Subject: [Numpy-discussion] 1.8.1 release In-Reply-To: References: <1393432035193-36655.post@n7.nabble.com> Message-ID: On Wed, Feb 26, 2014 at 2:48 PM, Matthew Brett wrote: > > Agreed. The trick is that it's reasonable for users of Apple's python > build > > to want this too -- but I don't know how we can hope to provide that. > > We don't support system python for the mpkg, so I think it's > reasonable to leave this little gift for our fellow python.org > friends. > I agree. We decided long ago on the pythonmac list that supporting Apple's python builds was a dead-end. > In that case, the OSX instructions could (within the next few months) > be as simple as: > > Install python from binary installer at python.org > curl -O https://raw.github.com/pypa/pip/master/contrib/get-pip.py > python get-pip.py > pip install scipy-stack > > or similar. yup -- that'st he goal, and think we really are close. Does pip and PyPi "officially" support meta-packages like that? i.e. I take it that scipy-stack would be a package that had nothing except dependencies. I do like the idea. Getting a bit OT for the numpy list, but... As I understand it, there was a stumbling block with wheels for the SciPy stack around iPython and readline -- some systems require th readline package, some don't. As I write this, I can't remember why it would be the lest bit hard to simply require readline in teh Mac Wheels, but there was some debate on the iPython list, and it seemed not-so-easy to resolve... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From chris.barker at noaa.gov Fri Feb 28 11:20:31 2014 From: chris.barker at noaa.gov (Chris Barker) Date: Fri, 28 Feb 2014 08:20:31 -0800 Subject: [Numpy-discussion] 1.8.1 release In-Reply-To: <1393456207055-36662.post@n7.nabble.com> References: <1393432035193-36655.post@n7.nabble.com> <1393456207055-36662.post@n7.nabble.com> Message-ID: On Wed, Feb 26, 2014 at 3:10 PM, Tom Augspurger wrote: > Chris Barker - NOAA Federal wrote > > What python are you using? apparently not a Universal 32+64 bit build. > The > > one Apple delivers? > > I'm using homebrew python, so the platform difference seems to have come > from there. and it _should_ -- in theory, and often in practice, you should be able to get packages for hamebrew with either: brew install the_package_name or pip install the_package_name and let it compile away. But we wouldn't want a brew python to find a binary wheel built for the python.org python -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- An HTML attachment was scrubbed... URL: From gokhansever at gmail.com Fri Feb 28 18:32:27 2014 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Fri, 28 Feb 2014 18:32:27 -0500 Subject: [Numpy-discussion] 2D array indexing Message-ID: Hello, Given this simple 2D array: In [1]: np.arange(9).reshape((3,3)) Out[1]: array([[0, 1, 2], [3, 4, 5], [6, 7, 8]]) In [2]: a = np.arange(9).reshape((3,3)) In [3]: a[:1:] Out[3]: array([[0, 1, 2]]) In [4]: a[:1,:] Out[4]: array([[0, 1, 2]]) Could you tell me why the last two indexing (note the comma!) results in the same array? Thanks. -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From jtaylor.debian at googlemail.com Fri Feb 28 18:48:28 2014 From: jtaylor.debian at googlemail.com (Julian Taylor) Date: Sat, 01 Mar 2014 00:48:28 +0100 Subject: [Numpy-discussion] 2D array indexing In-Reply-To: References: Message-ID: <5311204C.4070307@googlemail.com> On 01.03.2014 00:32, G?khan Sever wrote: > > Hello, > > Given this simple 2D array: > > In [1]: np.arange(9).reshape((3,3)) > Out[1]: > array([[0, 1, 2], > [3, 4, 5], > [6, 7, 8]]) > > In [2]: a = np.arange(9).reshape((3,3)) > > In [3]: a[:1:] > Out[3]: array([[0, 1, 2]]) > > In [4]: a[:1,:] > Out[4]: array([[0, 1, 2]]) > > Could you tell me why the last two indexing (note the comma!) results in > the same array? Thanks. > if you specify less indices than dimensions the latter dimensions are implicitly all selected. so these are identical for three dimensional arrays: d = np.ones((3,3,3)) d[1] d[1,:] d[1,:,:] d[1,...] (... or Ellipsis selects all remaining dimensions) this only applies to latter dimensions in the shape, if you want to select all earlier dimensions they have to be explicitly selected: d[:,1] == d[:,1,:] d[..., 1] = d[:,:,1] as for :1: vs 1:, its standard python rules: start:stop:step, with all three having defaults of 0:len(sequence):1 From gokhansever at gmail.com Fri Feb 28 19:54:40 2014 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_Sever?=) Date: Fri, 28 Feb 2014 19:54:40 -0500 Subject: [Numpy-discussion] 2D array indexing In-Reply-To: <5311204C.4070307@googlemail.com> References: <5311204C.4070307@googlemail.com> Message-ID: Thanks Julian. Mistakenly, I have (a[:1:] + a[:1,:])/2 type of construct somewhere in my code. It works fine, however I wasn't sure if this is something leading to a wrong calculation. Now your explanation makes it clearer. On Fri, Feb 28, 2014 at 6:48 PM, Julian Taylor < jtaylor.debian at googlemail.com> wrote: > On 01.03.2014 00:32, G?khan Sever wrote: > > > > Hello, > > > > Given this simple 2D array: > > > > In [1]: np.arange(9).reshape((3,3)) > > Out[1]: > > array([[0, 1, 2], > > [3, 4, 5], > > [6, 7, 8]]) > > > > In [2]: a = np.arange(9).reshape((3,3)) > > > > In [3]: a[:1:] > > Out[3]: array([[0, 1, 2]]) > > > > In [4]: a[:1,:] > > Out[4]: array([[0, 1, 2]]) > > > > Could you tell me why the last two indexing (note the comma!) results in > > the same array? Thanks. > > > > > if you specify less indices than dimensions the latter dimensions are > implicitly all selected. > so these are identical for three dimensional arrays: > d = np.ones((3,3,3)) > d[1] > d[1,:] > d[1,:,:] > d[1,...] (... or Ellipsis selects all remaining dimensions) > > this only applies to latter dimensions in the shape, if you want to > select all earlier dimensions they have to be explicitly selected: > d[:,1] == d[:,1,:] > d[..., 1] = d[:,:,1] > > > as for :1: vs 1:, its standard python rules: start:stop:step, with all > three having defaults of 0:len(sequence):1 > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From scopatz at gmail.com Sun Feb 16 10:05:32 2014 From: scopatz at gmail.com (Anthony Scopatz) Date: Sun, 16 Feb 2014 15:05:32 -0000 Subject: [Numpy-discussion] ANN: SciPy 2014 Conference, July 6th - 12th, Austin, TX! Message-ID: Hello All! I am pleased to announce that *SciPy 2014*, the thirteenth annual *Scientific Computing with Python conference*, will be held this July 6th-12th in Austin, Texas. SciPy is a community dedicated to the advancement of scientific computing through open source Python software for mathematics, science, and engineering. The annual SciPy Conference allows participants from all types of organizations to showcase their latest projects, learn from skilled users and developers, and collaborate on code development. For more information please visit our website: https://conference.scipy.org/scipy2014/ This year the conference has been extended to include an additional day of presentations. During the presentation days SciPy is proud to host the following event and talk types: - Keynotes - Expert Panels - Short Talks - Poster Presentations - Birds of a Feather Sessions The full program will consist of two days of tutorials by followed by three days of presentations, and concludes with two days of developer sprints on projects of interest to attendees. This year, we are excited to present a job fair for the first time! Specialized Tracks This year we are happy to announce two specialized tracks that run in parallel to the general conference: *Scientific Computing in Education* Thanks to efforts such as Software Carpentry, the Hacker Within and grassroots Python Bootcamps, teaching scientific computing as a discipline is becoming more widely accepted and recognized as a crucial task in developing scientific literacy. This special track will focus on efforts to promote and develop scientific computing education, as well as related topics such as reproducibility and best practices for scientific computing. *Geospatial Data in Science* Python has become a core component of organiziing, understanding, and visualizing geospatial data. This track will focus on libraries, tools and techniques for processing Geospatial data of all types and for all purposes -- from low-volume to high-volume, local and global. Domain-specific Mini-symposia Introduced in 2012, mini-symposia are held to discuss scientific computing applied to a specific scientific domain/industry during a half afternoon after the general conference. Their goal is to promote industry specific libraries and tools, and gather people with similar interests for discussions. Mini-symposia on the following topics will take place this year: - Astronomy and astrophysics - Bioinformatics - Geophysics - Vision, Visualization, and Imaging - Computational Social Science and Digital Humanities - Engineering Tutorials Multiple interactive half-day tutorials will be taught by community experts. The tutorials provide conceptual and practical coverage of tools that have broad interest at both an introductory or advanced level. This year, a third track will be added, which target specifically programmers with no prior knowledge of scientific python. Developer Sprints A hackathon environment is setup for attendees to work on the core SciPy packages or their own personal projects. The conference is an opportunity for developers that are usually physically separated to come together and engage in highly productive sessions. It is also an occasion for new community members to introduce themselves and receive tips from community experts. This year, some of the sprints will be scheduled and announced ahead of the conference. Birds-of-a-Feather (BOF) Sessions Birds-of-a-Feather sessions are self-organized discussions that run parallel to the main conference. The BOFs sessions cover primary, tangential, or unrelated topics in an interactive, discussion setting. This year, some of the BOF sessions will be scheduled and announced ahead of the conference. Important Dates - March 14th: Presentation abstracts, poster, tutorial submission deadline. Application for sponsorship deadline. - April 17th: Speakers selected - April 22nd: Sponsorship acceptance deadline - May 1st: Speaker schedule announced - May 6th, or 150 registrants: Early-bird registration ends - July 6-12th: 2 days of tutorials, 3 days of conference, 2 days of sprints We look forward to a very exciting conference and hope to see you all in Austin this summer! The SciPy2014 Organizers -------------- next part -------------- An HTML attachment was scrubbed... URL: From hoogendoorn.eelco at gmail.com Fri Feb 28 07:00:19 2014 From: hoogendoorn.eelco at gmail.com (Eelco Hoogendoorn) Date: Fri, 28 Feb 2014 04:00:19 -0800 (PST) Subject: [Numpy-discussion] Pickling of memory aliasing patterns Message-ID: <4c16d2ae-7af5-46bf-ad43-20c21aa6e592@googlegroups.com> I have been working on a general function caching mechanism, and in doing so I stumbled upon the following quirck: @cached def foo(a,b): b[0] = 1 return a[0] a = np.zeros(1) b = a[:] print foo(a, b) #computes and returns 1 print foo(a, b) #gets 1 from cache, as it should a = np.zeros(1) #no aliasing between inputs b = np.zeros(1) print foo(a, b) #should compute and return 0 but instead gets 1 from cache Fundamentaly, this is because it turns out that the memory aliasing patterns that arrays may have are lost during pickling. This leads me to two questions: 1: Is this desirable behavior 2: is this preventable behavior? It seems to me the answer to the first question is no, and the answer to the second question is yes. Here is what I am using at the moment to generate a correct hash under such circumstances; but unpickling along these lines should be possible too, methinks. Or am I missing some subtlety as to why something along these lines couldn't be the default pickling behavior for numpy arrays? class ndarray_own(object): def __init__(self, arr): self.buffer = np.getbuffer(arr) self.dtype = arr.dtype self.shape = arr.shape self.strides = arr.strides class ndarray_view(object): def __init__(self, arr): self.base = arr.base self.offset = self.base.ctypes.data - arr.ctypes.data #so we have a view; but where is it? self.dtype = arr.dtype self.shape = arr.shape self.strides = arr.strides class NumpyDeterministicPickler(DeterministicPickler): """ Special case for numpy. in general, external C objects may include internal state which does not serialize in a way we want it to ndarray memory aliasing is one of those things """ def save(self, obj): """ remap a numpy array to a representation which conserves all semantically relevant information concerning memory aliasing note that this mapping is 'destructive'; we will not get our original numpy arrays back after unpickling; not without custom deserialization code at least but we dont care, since this is only meant to be used to obtain correct keying behavior keys dont need to be deserialized """ if isinstance(obj, np.ndarray): if obj.flags.owndata: obj = ndarray_own(obj) else: obj = ndarray_view(obj) DeterministicPickler.save(self, obj) -------------- next part -------------- An HTML attachment was scrubbed... URL: