From valene at nag.co.uk Thu Jul 1 04:48:03 2010 From: valene at nag.co.uk (Valene) Date: Thu, 1 Jul 2010 08:48:03 +0000 (UTC) Subject: [Numpy-discussion] building numpy against Cray xt-libsci References: Message-ID: Charles R Harris gmail.com> writes: > > > On Mon, May 24, 2010 at 1:57 AM, Amir gmail.com> wrote: > I am trying to build numpy against Cray's xt-libsci library on a Cray XT5. I am getting an error I am hoping for hints on how to resolve: > > In [1]: import numpy > > > ?? ? 20 ? ? ? ? isfinite, size > ?? ? 21 from numpy.lib import triu > ---> 22 from numpy.linalg import lapack_lite > ?? ? 23 from numpy.matrixlib.defmatrix import matrix_power > ?? ? 24? > > > ImportError: /opt/xt-libsci/10.4.0/gnu/lib/libsci.so: undefined symbol: fftw_version > > > These are the symbols in libsci: > > > % ?nm /opt/xt-libsci/10.4.0/gnu/lib/libsci.so | grep fftw_version > > 00000000010f9a30 B __crafft_internal__crafft_fftw_version_num > ?? ? ? ? ? ? ? ? U fftw_version > 00000000005aa8a4 T get_fftw_version > > > > I first built numpy with no custom site.cfg file. It built correctly and all tests ran. But it was too slow. > > > Then I tried building numpy against libsci, which has BLAS, LAPACK, FFTW3 among other things. I had to build a libcblas.a from the netlib src as libsci does not have cblas (using gcc, gfortran 4.3.3). Here is my site.cfg, accumulated from several nice tutorials on how to build numpy on these machines, which for some reason don't work for me. The instructions were based on numpy 1.2. > > > > [blas] > blas_libs = cblas > library_dirs = /global/homes/amir/local/lib > > [lapack] > lapack_libs = sci > library_dirs = /opt/xt-libsci/10.4.0/gnu/lib > > > [blas_opt] > blas_libs = cblas, sci > libraries = cblas, sci > > [lapack_opt] > libraries = sci > > [fftw] > libraries = fftw3 > > > > > Here is what is linked to lapack_lite.so: > > > % ldd ./numpy/linalg/lapack_lite.so > > libsci.so => /opt/xt-libsci/10.4.0/gnu/lib/libsci.so (0x00002b4493325000) > > > libgfortran.so.3 => /opt/gcc/4.3.3/snos/lib64/libgfortran.so.3 (0x00002b44a4579000) > > libm.so.6 => /lib64/libm.so.6 (0x00002b44a4770000) > > > libgcc_s.so.1 => /opt/gcc/4.3.3/snos/lib64/libgcc_s.so.1 (0x00002b44a48c6000) > > libc.so.6 => /lib64/libc.so.6 (0x00002b44a49dd000) > > > /lib64/ld-linux-x86-64.so.2 (0x0000555555554000) > > > > > Curious, fftw shows up in numpy/distutils/system_info.py and f2py, but I think numpy/scipy no longer support fftw. Maybe we should get rid of the references? In any case, you can probably modify numpy/distutils/system_info.py to fix this problem, as it doesn't seem to show up on other systems.Chuck > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Hi, I try to install Numpy-1.4.1 on Cray-xt 4 with xt-libsci/10.4.1 with Python/2.6 with shared libraries. Python installation went smoothly but Numpy is a little harder. I simply follow the instructions : python setup.py fgfortran, build,install to install-dir . No error, but when I test it submitting a job I obtained : import numpy.linalg.linalg # precompiled from /work/z03/z03/valene/install- python/lib/python2.6/site-packages/numpy/linalg/linalg.pyc dlopen("/work/z03/z03/valene/install-python/lib/python2.6/site- packages/numpy/linalg/lapack_lite.so", 2); Traceback (most recent call last): File "python_script.py", line 3, in import numpy File "/work/z03/z03/valene/install-python/lib/python2.6/site- packages/numpy/__init__.py", line 132, in import add_newdocs File "/work/z03/z03/valene/install-python/lib/python2.6/site- packages/numpy/add_newdocs.py", line 9, in from lib import add_newdoc File "/work/z03/z03/valene/install-python/lib/python2.6/site- packages/numpy/lib/__init__.py", line 13, in from polynomial import * File "/work/z03/z03/valene/install-python/lib/python2.6/site- packages/numpy/lib/polynomial.py", line 17, in from numpy.linalg import eigvals, lstsq File "/work/z03/z03/valene/install-python/lib/python2.6/site- packages/numpy/linalg/__init__.py", line 47, in from linalg import * File "/work/z03/z03/valene/install-python/lib/python2.6/site- packages/numpy/linalg/linalg.py", line 22, in from numpy.linalg import lapack_lite ImportError: /opt/xt-libsci/10.4.1/gnu/lib/libsci.so: undefined symbol: fftw_version I thought it was the same error but I don't see how to get rid of fftw in the system_info.py. Thanks From renato.fabbri at gmail.com Thu Jul 1 05:17:50 2010 From: renato.fabbri at gmail.com (Renato Fabbri) Date: Thu, 1 Jul 2010 06:17:50 -0300 Subject: [Numpy-discussion] sum up to a specific value Message-ID: hi, i need to find which elements of an array sums up to an specific value any idea of how to do this? best, rf -- GNU/Linux User #479299 skype: fabbri.renato From pav at iki.fi Thu Jul 1 05:22:27 2010 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 1 Jul 2010 09:22:27 +0000 (UTC) Subject: [Numpy-discussion] sum up to a specific value References: Message-ID: Thu, 01 Jul 2010 06:17:50 -0300, Renato Fabbri wrote: > i need to find which elements of an array sums up to an specific value > > any idea of how to do this? Sounds like the knapsack problem http://en.wikipedia.org/wiki/Knapsack_problem From rsalvador.wk at gmail.com Thu Jul 1 07:05:41 2010 From: rsalvador.wk at gmail.com (Ruben Salvador) Date: Thu, 1 Jul 2010 13:05:41 +0200 Subject: [Numpy-discussion] numpy.load raising IOError but EOFError expected In-Reply-To: References: Message-ID: Great! Thanks for all your answers! I actually have the files created as .npy (appending a new array eact time). I know it's weird, and it's not its intended use. But, for whatsoever reasons, I came to use that. No turn back now. Fortunately, I am able to read the files correctly, so being weird also, at least, it works. Repeating the tests would be very time consuming. I'll just try the different options mentioned for the following tests. Anyway, I think this is a quite common situation. Tests running for a loooooong time, producing results at very different times (not necessarily huge amounts of data of results, it could be just a single float, or array), and repeating these tests a lot of times, makes it absolutely necessary to have numpyish functions/filetype to APPEND these freshly-new produced data each time it is available. Having to load a .npz file, adding the new data and saving again is wasting unnecesary resources. Having a single file for each run of the test, though possible, for me, complicates the post-processing section, while increasing the time to copy these files (many small files tend to take longer to copy than one single bigger file). Why not just a modified .npy filetype/function with a header indicating it's hosting more than one array?? Cheers! On Tue, Jun 29, 2010 at 12:43 AM, Friedrich Romstedt < friedrichromstedt at gmail.com> wrote: > 2010/6/28 Keith Goodman : > > How about using h5py? It's not part of numpy but it gives you a > > dictionary-like interface to your archive: > > Yeaa, or PyTables (is that equivalent)? It's also a hdf (or whatever, > I don't recall precisely) interface. > > There were [ANN]s on the list about PyTables. > > Friedrich > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Rub?n Salvador PhD student @ Centro de Electr?nica Industrial (CEI) http://www.cei.upm.es Blog: http://aesatcei.wordpress.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From sole at esrf.fr Thu Jul 1 08:26:35 2010 From: sole at esrf.fr (=?ISO-8859-1?Q?=22V=2E_Armando_Sol=E9=22?=) Date: Thu, 01 Jul 2010 14:26:35 +0200 Subject: [Numpy-discussion] numpy.load raising IOError but EOFError expected In-Reply-To: References: Message-ID: <4C2C897B.1090703@esrf.fr> Ruben Salvador wrote: > Great! Thanks for all your answers! > > I actually have the files created as .npy (appending a new array eact > time). I know it's weird, and it's not its intended use. But, for > whatsoever reasons, I came to use that. No turn back now. > > Fortunately, I am able to read the files correctly, so being weird > also, at least, it works. Repeating the tests would be very time > consuming. I'll just try the different options mentioned for the > following tests. > > Anyway, I think this is a quite common situation. Tests running for a > loooooong time, producing results at very different times (not > necessarily huge amounts of data of results, it could be just a single > float, or array), and repeating these tests a lot of times, makes it > absolutely necessary to have numpyish functions/filetype to APPEND > these freshly-new produced data each time it is available. Having to > load a .npz file, adding the new data and saving again is wasting > unnecesary resources. Having a single file for each run of the test, > though possible, for me, complicates the post-processing section, > while increasing the time to copy these files (many small files tend > to take longer to copy than one single bigger file). Why not just a > modified .npy filetype/function with a header indicating it's hosting > more than one array?? > Well, at our lab we are collecting images and saving them into HDF5 files. Since the files are self-describing it is quite convenient. You can decide if you want the images as individual arrays or stacked into a bigger one because you know it when you open the file. You can keep adding items at any time because HDF5 does not force you to specify the final size of the array and you can access it like any numpy array without needing to load the whole array into memory nor being limited in memory in 32-bit machines. I am currently working on a 100Gbytes array on a 32bit machine without problems. Really, I would give a try to HDF5. In our case we are using h5py, but latest release candidate of PyTables seems to have the same "numpy like" functionality. Armando From vincent at vincentdavis.net Thu Jul 1 09:52:23 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Thu, 1 Jul 2010 07:52:23 -0600 Subject: [Numpy-discussion] sum up to a specific value In-Reply-To: References: Message-ID: On Thu, Jul 1, 2010 at 3:17 AM, Renato Fabbri wrote: > hi, > i need to find which elements of an array sums up to an specific value > > any idea of how to do this? Not sure if there is a better way but a brut force way would be to >>> a array([[ 7., 5., 9., 3.], [ 7., 2., 7., 8.], [ 6., 8., 3., 2.]]) >>> alist= a.flatten() >>> alist [7.0, 5.0, 9.0, 3.0, 7.0, 2.0, 7.0, 8.0, 6.0, 8.0, 3.0, 2.0] asolution = [] for r in range(len(alist): for comb in itertools.combinations(alist, r): if sum(comb)==TheValue: asolution.append(comb) Now just find the comb values in the array. Like I said kinda brute force. Also depends if you want all solutions or a solution. Vincent > > best, > rf > > > -- > GNU/Linux User #479299 > skype: fabbri.renato > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From bsouthey at gmail.com Thu Jul 1 10:40:25 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 01 Jul 2010 09:40:25 -0500 Subject: [Numpy-discussion] Ticket #1223... In-Reply-To: References: Message-ID: <4C2CA8D9.1060005@gmail.com> On 06/29/2010 11:38 PM, David Goldsmith wrote: > On Tue, Jun 29, 2010 at 8:16 PM, Bruce Southey > wrote: > > On Tue, Jun 29, 2010 at 6:03 PM, David Goldsmith > > wrote: > > On Tue, Jun 29, 2010 at 3:56 PM, > wrote: > >> > >> On Tue, Jun 29, 2010 at 6:37 PM, David Goldsmith > >> > wrote: > >> > ...concerns the behavior of numpy.random.multivariate_normal; > if that's > >> > of > >> > interest to you, I urge you to take a look at the comments > (esp. mine > >> > :-) ); > >> > otherwise, please ignore the noise. Thanks! > >> > >> You should add the link to the ticket, so it's faster for > everyone to > >> check what you are talking about. > >> > >> Josef > > > > Ooops! Yes I should; here it is: > > > > http://projects.scipy.org/numpy/ticket/1223 > > Sorry, and thanks, Josef. > > > > DG > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > As I recall, there is no requirement for the variance/covariance of > the normal distribution to be positive definite. > > > No, not positive definite, positive *semi*-definite: yes, the variance > may be zero (the cov may have zero-valued eigenvalues), but the claim > (and I actually am "neutral" about it, in that I wanted to reference > the claim in the docstring and was told that doing so was unnecessary, > the implication being that this is a "well-known" fact), is that, in > essence (in 1-D) the variance can't be negative, which seems clear > enough. I don't see you disputing that, and so I'm uncertain as to > how you feel about the proposal to "weakly" enforce symmetry and > positive *semi*-definiteness. (Now, if you dispute that even > requiring positive *semi*-definiteness is desirable, you'll have to > debate that w/ some of the others, because I'm taking their word for > it that indefiniteness is "unphysical.") > > DG > > >From http://en.wikipedia.org/wiki/Multivariate_normal_distribution > "The covariance matrix is allowed to be singular (in which case the > corresponding distribution has no density)." > > So you must be able to draw random numbers from such a distribution. > Obviously what those numbers really mean is another matter (I presume > the dependent variables should be a linear function of the independent > variables) but the user *must* know since they entered it. Since the > function works the docstring Notes comment must be wrong. > > Imposing any restriction means that this is no longer a multivariate > normal random number generator. If anything, you can only raise a > warning about possible non-positive definiteness but even that will > vary depending how it is measured and on the precision being used. > > > Bruce > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > -- > Mathematician: noun, someone who disavows certainty when their > uncertainty set is non-empty, even if that set has measure zero. > > Hope: noun, that delusive spirit which escaped Pandora's jar and, with > her lies, prevents mankind from committing a general suicide. (As > interpreted by Robert Graves) > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > As you (and the theory) say, a variance should not be negative - yeah right :-) In practice that is not exactly true because estimation procedures like equating observed with expected sum of squares do lead to negative estimates. However, that is really a failure of the model, data and algorithm. I think the issue is really how numpy should handle input when that input is theoretically invalid. I (and apparent the bug submitter) do not know what to expect if the input is not positive definite. If the svd approach is correct for such cases and numpy 'trusts' the user, as the usual case, then there is no issue. If the svd approach is incorrect for such cases then that is obviously a bug. If numpy can not trust the user then numpy has to check and either raise a warning or error if the input variances are greater than or equal to zero and that the cov argument is symmetric. Replacing the SVD with cholesky would also address these issues as both of these are checked by numpy's cholesky function. However, cholesky() does not support semi-positive covariance/variance input (which is possible http://en.wikipedia.org/wiki/Cholesky_decomposition#Proof_for_positive_semi-definite_matrices). Also as Robert said in the thread that 'Cholesky decomposition gave an error "too soon" in my estimation'. Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From renato.fabbri at gmail.com Thu Jul 1 10:46:11 2010 From: renato.fabbri at gmail.com (Renato Fabbri) Date: Thu, 1 Jul 2010 11:46:11 -0300 Subject: [Numpy-discussion] sum up to a specific value In-Reply-To: References: Message-ID: just a solution (not all of them) and the application happen to come up with something like 10k values in the array. don care waiting, but... 2010/7/1 Vincent Davis : > On Thu, Jul 1, 2010 at 3:17 AM, Renato Fabbri wrote: >> hi, >> i need to find which elements of an array sums up to an specific value >> >> any idea of how to do this? > > Not sure if there is a better way but a brut force way would be to > >>>> a > array([[ 7., ?5., ?9., ?3.], > ? ? ? [ 7., ?2., ?7., ?8.], > ? ? ? [ 6., ?8., ?3., ?2.]]) >>>> alist= a.flatten() >>>> alist > [7.0, 5.0, 9.0, 3.0, 7.0, 2.0, 7.0, 8.0, 6.0, 8.0, 3.0, 2.0] > asolution = [] > for r in range(len(alist): > ? ? for comb in itertools.combinations(alist, r): > ? ? ? ? if sum(comb)==TheValue: > ? ? ? ? ? ?asolution.append(comb) > > Now just find the comb values in the array. > > Like I said kinda brute force. Also depends if you want all solutions > or a solution. > > Vincent > > > > > >> >> best, >> rf >> >> >> -- >> GNU/Linux User #479299 >> skype: fabbri.renato >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- GNU/Linux User #479299 skype: fabbri.renato From vincent at vincentdavis.net Thu Jul 1 10:53:17 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Thu, 1 Jul 2010 08:53:17 -0600 Subject: [Numpy-discussion] sum up to a specific value In-Reply-To: References: Message-ID: On Thu, Jul 1, 2010 at 8:46 AM, Renato Fabbri wrote: > just a solution (not all of them) > > and the application happen to come up with something like 10k values > in the array. don care waiting, but... then something like ( I am not testing this so you might need to "fix" it and it is ugly) solution = None for r in range(len(alist): for comb in itertools.combinations(alist, r): if sum(comb)==TheValue: solution=comb break break > > > > 2010/7/1 Vincent Davis : >> On Thu, Jul 1, 2010 at 3:17 AM, Renato Fabbri wrote: >>> hi, >>> i need to find which elements of an array sums up to an specific value >>> >>> any idea of how to do this? >> >> Not sure if there is a better way but a brut force way would be to >> >>>>> a >> array([[ 7., ?5., ?9., ?3.], >> ? ? ? [ 7., ?2., ?7., ?8.], >> ? ? ? [ 6., ?8., ?3., ?2.]]) >>>>> alist= a.flatten() >>>>> alist >> [7.0, 5.0, 9.0, 3.0, 7.0, 2.0, 7.0, 8.0, 6.0, 8.0, 3.0, 2.0] >> asolution = [] >> for r in range(len(alist): >> ? ? for comb in itertools.combinations(alist, r): >> ? ? ? ? if sum(comb)==TheValue: >> ? ? ? ? ? ?asolution.append(comb) >> >> Now just find the comb values in the array. >> >> Like I said kinda brute force. Also depends if you want all solutions >> or a solution. >> >> Vincent >> >> >> >> >> >>> >>> best, >>> rf >>> >>> >>> -- >>> GNU/Linux User #479299 >>> skype: fabbri.renato >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > > -- > GNU/Linux User #479299 > skype: fabbri.renato > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From meine at informatik.uni-hamburg.de Thu Jul 1 11:02:01 2010 From: meine at informatik.uni-hamburg.de (Hans Meine) Date: Thu, 1 Jul 2010 17:02:01 +0200 Subject: [Numpy-discussion] Ufunc memory access optimization In-Reply-To: <1276601850.2218.30.camel@talisman> References: <201006111052.52965.meine@informatik.uni-hamburg.de> <1276601850.2218.30.camel@talisman> Message-ID: <201007011702.01448.meine@informatik.uni-hamburg.de> Hi Pauli and Anne, On Tuesday 15 June 2010 13:37:30 Pauli Virtanen wrote: > pe, 2010-06-11 kello 10:52 +0200, Hans Meine kirjoitti: > > At the bottom you can see that he basically wraps all numpy.ufuncs he can > > find in the numpy top-level namespace automatically. > > Ok, here's the branch: > > > http://github.com/pv/numpy-work/compare/master...feature;ufunc-memory-acce > ss-speedup Great! (I did not keep up with the NumPy list lately, sorry.) Oh, and that was not even so much code, that's good news, too! :-) > As expected, some improvement can be seen. There's also appears to be > an additional 5 us (~ 700 inner loop operations it seems) overhead > coming from somewhere; perhaps this can still be reduced. How much time would you expect the new optimized reordering to take? 5 ?s is not exactly much, right? Anyhow, thanks a lot for this contribution! On Tuesday 15 June 2010 20:15:39 Pauli Virtanen wrote: > Another optimization could be flipping negative strides. This sounds a > bit dangerous, though, so one would need to think if it could e.g. break > > a += a[::-1] > > etc. I agree that this is more dangerous, yet doing "a += a[::-1]" has always been considered dangerous already, right? > > I'm more worried this may violate some users' assumptions. If a user > > knows they need an array to be in C order, really they should use > > ascontiguousarray. But as it stands it's enough to make sure that it's > > newly-allocated as the result of an arithmetic expression. Incautious > > users could suddenly start seeing copies happening, or even transposed > > arrays being passed to, C and Fortran code. > > Yes, it's a semantic change, and we have to make it consciously (and > conscientiously, to boot :). I am all ready for the change though. > Personally, I believe this change is worth making, with suitable mention > in the release notes. +1 Also, recall that new users will be bitten by the current behaviour, too. Imagine how much work Ulli put in the reordering and how he sweared when he found out that a simple addition broke all fortran-orderedness of our images.. ;-) > > [...] We want as many of those operations as possible to operate > > on contiguous arrays, but it's possible that an out-of-order array > > could propagate indefinitely, forcing all loops to be done with one > > array having large strides, and resulting in output that is stil > > out-of-order. > > I think, at present, non-C-contiguous arrays will propagate > indefinitely. Right, but that's a good thing! Anne wrote: > > Some preference for C contiguous output is worth adding. When you talk about the preference for C contiguous arrays, it sounds as if Fortran-ordered arrays were somehow evil. With the new behaviour, that would be much less so, because at least ufuncs would not be any slower anymore for Fortran arrays. And numpy would not get into the way of users who are actually trying to use Fortran order so much anymore, which is a good thing. I really don't know how much C-order preference is justified. Have a nice day, Hans From pav at iki.fi Thu Jul 1 11:38:02 2010 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 01 Jul 2010 17:38:02 +0200 Subject: [Numpy-discussion] sum up to a specific value In-Reply-To: References: Message-ID: <1277998682.2138.287.camel@talisman> to, 2010-07-01 kello 11:46 -0300, Renato Fabbri kirjoitti: > just a solution (not all of them) > > and the application happen to come up with something like 10k values > in the array. don care waiting, but... As said, the problem is a well-known one, and it's not really Python or Numpy-specific, so slightly off-topic for this list. Numpy and Scipy don't ship pre-made algorithms for solving these. But anyway, you'll probably find that the brute force algorithm (e.g. the one from Vincent) takes exponential time (and exp(10000) is a big number). So you need to do something more clever. First stop, Wikipedia, http://en.wikipedia.org/wiki/Knapsack_problem http://en.wikipedia.org/wiki/Subset_sum_problem and if you are looking for pre-cooked solutions, second stop stackoverflow, http://stackoverflow.com/search?q=subset+sum+problem Some search words you might want to try on Google: http://www.google.com/search?q=subset%20sum%20dynamic%20programming Generic advice only this time, sorry; I don't have pre-made code for solving this at hand, but hopefully the above links give some pointers for what to do. -- Pauli Virtanen From charlesr.harris at gmail.com Thu Jul 1 12:11:06 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 1 Jul 2010 10:11:06 -0600 Subject: [Numpy-discussion] Ticket #1223... In-Reply-To: <4C2CA8D9.1060005@gmail.com> References: <4C2CA8D9.1060005@gmail.com> Message-ID: On Thu, Jul 1, 2010 at 8:40 AM, Bruce Southey wrote: > On 06/29/2010 11:38 PM, David Goldsmith wrote: > > On Tue, Jun 29, 2010 at 8:16 PM, Bruce Southey wrote: > >> On Tue, Jun 29, 2010 at 6:03 PM, David Goldsmith >> wrote: >> > On Tue, Jun 29, 2010 at 3:56 PM, wrote: >> >> >> >> On Tue, Jun 29, 2010 at 6:37 PM, David Goldsmith >> >> wrote: >> >> > ...concerns the behavior of numpy.random.multivariate_normal; if >> that's >> >> > of >> >> > interest to you, I urge you to take a look at the comments (esp. mine >> >> > :-) ); >> >> > otherwise, please ignore the noise. Thanks! >> >> >> >> You should add the link to the ticket, so it's faster for everyone to >> >> check what you are talking about. >> >> >> >> Josef >> > >> > Ooops! Yes I should; here it is: >> > >> > http://projects.scipy.org/numpy/ticket/1223 >> > Sorry, and thanks, Josef. >> > >> > DG >> > >> > >> > _______________________________________________ >> > NumPy-Discussion mailing list >> > NumPy-Discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> As I recall, there is no requirement for the variance/covariance of >> the normal distribution to be positive definite. >> > > No, not positive definite, positive *semi*-definite: yes, the variance may > be zero (the cov may have zero-valued eigenvalues), but the claim (and I > actually am "neutral" about it, in that I wanted to reference the claim in > the docstring and was told that doing so was unnecessary, the implication > being that this is a "well-known" fact), is that, in essence (in 1-D) the > variance can't be negative, which seems clear enough. I don't see you > disputing that, and so I'm uncertain as to how you feel about the proposal > to "weakly" enforce symmetry and positive *semi*-definiteness. (Now, if you > dispute that even requiring positive *semi*-definiteness is desirable, > you'll have to debate that w/ some of the others, because I'm taking their > word for it that indefiniteness is "unphysical.") > > DG > > >From http://en.wikipedia.org/wiki/Multivariate_normal_distribution > "The covariance matrix is allowed to be singular (in which case the > corresponding distribution has no density)." > > So you must be able to draw random numbers from such a distribution. > Obviously what those numbers really mean is another matter (I presume > the dependent variables should be a linear function of the independent > variables) but the user *must* know since they entered it. Since the > function works the docstring Notes comment must be wrong. > > Imposing any restriction means that this is no longer a multivariate > normal random number generator. If anything, you can only raise a > warning about possible non-positive definiteness but even that will > vary depending how it is measured and on the precision being used. > > > Bruce > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > -- > Mathematician: noun, someone who disavows certainty when their uncertainty > set is non-empty, even if that set has measure zero. > > Hope: noun, that delusive spirit which escaped Pandora's jar and, with her > lies, prevents mankind from committing a general suicide. (As interpreted > by Robert Graves) > > > _______________________________________________ > NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion > > As you (and the theory) say, a variance should not be negative - yeah > right :-) In practice that is not exactly true because estimation procedures > like equating observed with expected sum of squares do lead to negative > estimates. However, that is really a failure of the model, data and > algorithm. > > I think the issue is really how numpy should handle input when that input > is theoretically invalid. > > I think the svd version could be used if a check is added for the decomposition. That is, if cov = u*d*v, then dot(u,v) ~= identity. The Cholesky decomposition will be faster than the svd for large arrays, but that might not matter much for the common case. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Thu Jul 1 13:43:50 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Thu, 1 Jul 2010 10:43:50 -0700 Subject: [Numpy-discussion] Ticket #1223... In-Reply-To: References: <4C2CA8D9.1060005@gmail.com> Message-ID: On Thu, Jul 1, 2010 at 9:11 AM, Charles R Harris wrote: > > On Thu, Jul 1, 2010 at 8:40 AM, Bruce Southey wrote: > >> On 06/29/2010 11:38 PM, David Goldsmith wrote: >> >> On Tue, Jun 29, 2010 at 8:16 PM, Bruce Southey wrote: >> >>> On Tue, Jun 29, 2010 at 6:03 PM, David Goldsmith >>> wrote: >>> > On Tue, Jun 29, 2010 at 3:56 PM, wrote: >>> >> >>> >> On Tue, Jun 29, 2010 at 6:37 PM, David Goldsmith >>> >> wrote: >>> >> > ...concerns the behavior of numpy.random.multivariate_normal; if >>> that's >>> >> > of >>> >> > interest to you, I urge you to take a look at the comments (esp. >>> mine >>> >> > :-) ); >>> >> > otherwise, please ignore the noise. Thanks! >>> >> >>> >> You should add the link to the ticket, so it's faster for everyone to >>> >> check what you are talking about. >>> >> >>> >> Josef >>> > >>> > Ooops! Yes I should; here it is: >>> > >>> > http://projects.scipy.org/numpy/ticket/1223 >>> > Sorry, and thanks, Josef. >>> > >>> > DG >>> > >>> > _______________________________________________ >>> > NumPy-Discussion mailing list >>> > NumPy-Discussion at scipy.org >>> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> > >>> > >>> As I recall, there is no requirement for the variance/covariance of >>> the normal distribution to be positive definite. >>> >> >> No, not positive definite, positive *semi*-definite: yes, the variance may >> be zero (the cov may have zero-valued eigenvalues), but the claim (and I >> actually am "neutral" about it, in that I wanted to reference the claim in >> the docstring and was told that doing so was unnecessary, the implication >> being that this is a "well-known" fact), is that, in essence (in 1-D) the >> variance can't be negative, which seems clear enough. I don't see you >> disputing that, and so I'm uncertain as to how you feel about the proposal >> to "weakly" enforce symmetry and positive *semi*-definiteness. (Now, if you >> dispute that even requiring positive *semi*-definiteness is desirable, >> you'll have to debate that w/ some of the others, because I'm taking their >> word for it that indefiniteness is "unphysical.") >> >> DG >> >> >From http://en.wikipedia.org/wiki/Multivariate_normal_distribution >> "The covariance matrix is allowed to be singular (in which case the >> corresponding distribution has no density)." >> >> So you must be able to draw random numbers from such a distribution. >> Obviously what those numbers really mean is another matter (I presume >> the dependent variables should be a linear function of the independent >> variables) but the user *must* know since they entered it. Since the >> function works the docstring Notes comment must be wrong. >> >> Imposing any restriction means that this is no longer a multivariate >> normal random number generator. If anything, you can only raise a >> warning about possible non-positive definiteness but even that will >> vary depending how it is measured and on the precision being used. >> >> >> Bruce >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> >> -- >> Mathematician: noun, someone who disavows certainty when their uncertainty >> set is non-empty, even if that set has measure zero. >> >> Hope: noun, that delusive spirit which escaped Pandora's jar and, with her >> lies, prevents mankind from committing a general suicide. (As interpreted >> by Robert Graves) >> >> >> _______________________________________________ >> NumPy-Discussion mailing listNumPy-Discussion at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> As you (and the theory) say, a variance should not be negative - yeah >> right :-) In practice that is not exactly true because estimation procedures >> like equating observed with expected sum of squares do lead to negative >> estimates. However, that is really a failure of the model, data and >> algorithm. >> >> I think the issue is really how numpy should handle input when that input >> is theoretically invalid. >> >> > I think the svd version could be used if a check is added for the > decomposition. That is, if cov = u*d*v, then dot(u,v) ~= identity. The > Cholesky decomposition will be faster than the svd for large arrays, but > that might not matter much for the common case. > > > > Chuck > Well, I'm not sure if what we have so far implies that consensus will possibly be impossible to reach, so I'll just rest on my laurels (i.e., my proposed compromise solution); just let me know if the docstring needs to be changed (and how). DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at pytables.org Thu Jul 1 15:10:42 2010 From: faltet at pytables.org (Francesc Alted) Date: Thu, 1 Jul 2010 21:10:42 +0200 Subject: [Numpy-discussion] [ANN] PyTables 2.2 released: enter the multi-core age Message-ID: <201007012110.42441.faltet@pytables.org> ================================= Announcing PyTables 2.2 (final) ================================= I'm happy to announce PyTables 2.2 (final). After 18 months of continuous development and testing, this is, by far, the most powerful and well-tested release ever. I hope you like it too. What's new ========== The main new features in 2.2 series are: * A new compressor called Blosc, designed to read/write data to/from memory at speeds that can be faster than a system `memcpy()` call. With it, many internal PyTables operations that are currently bounded by CPU or I/O bandwith are speed-up. Some benchmarks: http://blosc.pytables.org/trac/wiki/SyntheticBenchmarks And a demonstration on how Blosc can improve PyTables performance: http://www.pytables.org/docs/manual/ch05.html#chunksizeFineTune * Support for HDF5 hard links, soft links and external links (kind of mounting external filesystems). A new tutorial about its usage has been added to the 'Tutorials' chapter of User's Manual. See: http://www.pytables.org/docs/manual/ch03.html#LinksTutorial * A new `tables.Expr` module (based on Numexpr) that allows to do persistent, on-disk computations on many algebraic operations. For a brief look on its performance, see: http://pytables.org/moin/ComputingKernel * Suport for 'fancy' indexing (i.e., ? la NumPy) in all the data containers in PyTables. Backported from the implementation in the h5py project. Thanks to Andrew Collette for his fine work on this! * Binaries for both Windows 32-bit and 64-bit are provided now. As always, a large amount of bugs have been addressed and squashed too. In case you want to know more in detail what has changed in this version, have a look at: http://www.pytables.org/moin/ReleaseNotes/Release_2.2 You can download a source package with generated PDF and HTML docs, as well as binaries for Windows, from: http://www.pytables.org/download/preliminary For an on-line version of the manual, visit: http://www.pytables.org/docs/manual-2.2 What it is? =========== PyTables is a library for managing hierarchical datasets and designed to efficiently cope with extremely large amounts of data with support for full 64-bit file addressing. PyTables runs on top of the HDF5 library and NumPy package for achieving maximum throughput and convenient use. Resources ========= About PyTables: http://www.pytables.org About the HDF5 library: http://hdfgroup.org/HDF5/ About NumPy: http://numpy.scipy.org/ Acknowledgments =============== Thanks to many users who provided feature improvements, patches, bug reports, support and suggestions. See the ``THANKS`` file in the distribution package for a (incomplete) list of contributors. Most specially, a lot of kudos go to the HDF5 and NumPy (and numarray!) makers. Without them, PyTables simply would not exist. Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. ---- **Enjoy data!** -- The PyTables Team -- Francesc Alted From faltet at pytables.org Thu Jul 1 15:28:34 2010 From: faltet at pytables.org (Francesc Alted) Date: Thu, 1 Jul 2010 21:28:34 +0200 Subject: [Numpy-discussion] [ANN] PyTables 2.2 released: enter the multi-core age In-Reply-To: <201007012110.42441.faltet@pytables.org> References: <201007012110.42441.faltet@pytables.org> Message-ID: <201007012128.34937.faltet@pytables.org> A Thursday 01 July 2010 21:10:42 Francesc Alted escrigu?: > http://www.pytables.org/download/preliminary Mmh, that should read: http://www.pytables.org/download/stable Sorry for the typo! -- Francesc Alted From gandalf at shopzeus.com Thu Jul 1 16:13:50 2010 From: gandalf at shopzeus.com (Laszlo Nagy) Date: Thu, 01 Jul 2010 22:13:50 +0200 Subject: [Numpy-discussion] Determine slices in a sorted array Message-ID: <4C2CF6FE.2040204@shopzeus.com> Given an array with two axes, sorted by a column 'SLICE_BY', how can I extract slice indexes for rows with the same 'SLICE_BY' value? Here is an example program, demonstrating the problem: from numpy import * a = random.randint(0,100,(20,4)) SLICE_BY = 0 # Make slices of array 'a' by column SLICE_BY a.sort(SLICE_BY) slices = [] prev_val = None sidx = -1 for rowidx,row in enumerate(a): val = row[SLICE_BY] if val!=prev_val: if prev_val is None: prev_val = val sidx = rowidx else: slices.append((prev_val,sidx,rowidx)) sidx = rowidx prev_val = val if sidx Hi. The docstring (in the wiki) for where states: x, y : array_like, optionalValues from which to choose. *x* and *y* need to have the same shape as *condition*.But: >>> x = np.eye(2) >>> np.where(x,2,3) array([[2, 3], [3, 2]]) So apparently where supports broadcasting of scalars at least; does it provide full broadcasting support? Thanks! DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at silveregg.co.jp Thu Jul 1 22:34:20 2010 From: david at silveregg.co.jp (David) Date: Fri, 02 Jul 2010 11:34:20 +0900 Subject: [Numpy-discussion] [ANN] Bento (ex-toydist) 0.0.3 Message-ID: <4C2D502C.2040909@silveregg.co.jp> Hi, I am pleased to announce the release 0.0.3 for Bento, the pythonic packaging solution. Wherease the 0.0.2 release was mostly about getting the simplest-still-useful subset of distutils features, this new release adds quite a few significant features: - Add hooks to customize arbitrary stages in bento (there is a hackish example which shows how to use waf to build a simple C extension). The API for this is still in flux, though - Parallel and reliable build of C extensions through yaku build library. - One file distribution: no need for your users to install any new packages, just include one single file into your package to build with bento - Improved documentation - 2.4 -> 2.7 support, tested on linux/windows/mac os x You can download bento on github: http://github.com/cournape/Bento cheers, David From sturla at molden.no Thu Jul 1 23:52:10 2010 From: sturla at molden.no (Sturla Molden) Date: Fri, 02 Jul 2010 05:52:10 +0200 Subject: [Numpy-discussion] __array__struct__: about using PyCapsule instead of PyCObject for Python 2.7 In-Reply-To: References: Message-ID: <4C2D626A.6070101@molden.no> Lisandro Dalcin skrev: > No, no sarcasm at all! I just realized that PyCObject were > (pending)deprecated in 2.7 ... Anyway. let me say I'm so annoyed and > upset as you. > > PyCapsule should be used instead. It has two main advantages over PyCObject: First, it associates a 'name' with the void pointer, to provide some sort of type safety. (The 'name' could have been named 'password' to make its intention clear.) Second, the PyCapsule API makes it easier to implement destructors. PyCObject is a severe security hole and stability problem. It can crash the interpreter or run exploit code, as no checks are made before destructors are executed. PyCObject will never be missed. And personally I am glad it was deprecated because it should be avoided. It is better to include a backport of PyCapsule than continue to use PyCObject for Python 2.6, 2.5 and 2.4. Sturla From ben.root at ou.edu Fri Jul 2 00:51:21 2010 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 1 Jul 2010 23:51:21 -0500 Subject: [Numpy-discussion] numpy.all docstring reality check In-Reply-To: References: Message-ID: This behavior is quite curious. While it is consistent and it behaves exactly as documented (after clarification), I am curious about the rational. Is it merely an unavoidable consequence of passing in the output array? Certainly a few examples from the above emails would make this extremely clear. I particularly liked the 'a3' dtype example. Ben Root On Tue, Jun 29, 2010 at 9:38 PM, David Goldsmith wrote: > OK, now I understand: dtype(out) is preserved, whatever that happens to be, > not dtype(a) (which is what I thought it meant) - I better clarify. Thanks! > > DG > > > On Tue, Jun 29, 2010 at 7:28 PM, Skipper Seabold wrote: > >> On Tue, Jun 29, 2010 at 8:50 PM, David Goldsmith >> wrote: >> > Hi, folks. Under Parameters, the docstring for >> numpy.core.fromnumeric.all >> > says: >> > >> > "out : ndarray, optionalAlternative output array in which to place the >> > result. It must have the same shape as the expected output and the type >> is >> > preserved." [emphasis added].I assume this is a >> > copy-and-paste-from-another-docstring "typo" (shouldn't it be (possibly >> > ndarray of) bool), but I just wanted to double check. >> > >> >> Looks right to me though there is no >> >> In [255]: a = np.ones(10) >> >> In [256]: b = np.empty(1,dtype=int) >> >> In [257]: np.core.fromnumeric.all(a,out=b) >> Out[257]: array([1]) >> >> In [258]: b.dtype >> Out[258]: dtype('int64') >> >> In [259]: b = np.empty(1,dtype=bool) >> >> In [260]: np.core.fromnumeric.all(a,out=b) >> Out[260]: array([ True], dtype=bool) >> >> In [261]: b.dtype >> Out[261]: dtype('bool') >> >> In [262]: b = np.empty(1) >> >> In [263]: np.core.fromnumeric.all(a,out=b) >> Out[263]: array([ 1.]) >> >> In [264]: b.dtype >> Out[264]: dtype('float64') >> >> In [265]: a2 = >> np.column_stack((np.ones(10),np.ones(10),np.random.randint(0,2,10))) >> >> In [266]: b = np.empty(3,dtype=int) >> >> In [267]: np.core.fromnumeric.all(a2,axis=0,out=b) >> Out[267]: array([1, 1, 0]) >> >> In [268]: b.dtype >> Out[268]: dtype('int64') >> >> This is interesting >> >> In [300]: b = np.ones(3,dtype='a3') >> >> In [301]: np.core.fromnumeric.all(a2,axis=0,out=b) >> Out[301]: >> array(['Tru', 'Tru', 'Fal'], >> dtype='|S3') >> >> Skipper >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > > -- > Mathematician: noun, someone who disavows certainty when their uncertainty > set is non-empty, even if that set has measure zero. > > Hope: noun, that delusive spirit which escaped Pandora's jar and, with her > lies, prevents mankind from committing a general suicide. (As interpreted > by Robert Graves) > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rpyle at post.harvard.edu Fri Jul 2 00:56:36 2010 From: rpyle at post.harvard.edu (Robert Pyle) Date: Fri, 02 Jul 2010 00:56:36 -0400 Subject: [Numpy-discussion] [ANN] Bento (ex-toydist) 0.0.3 In-Reply-To: <4C2D502C.2040909@silveregg.co.jp> References: <4C2D502C.2040909@silveregg.co.jp> Message-ID: <950AD178-EBFF-45D8-A4DB-323BC1DBEC00@post.harvard.edu> Hi, While I agree that toydist needs a new name, Bento might not be a good choice. It's already the name of a database system for Macintosh from Filemaker, an Apple subsidiary. I'd be *very* surprised if the name Bento is not copyrighted. Have a look at http://www.filemaker.com/products/bento/ Too bad, because the lunchbox metaphor seems like a good one. Bob On Jul 1, 2010, at 10:34 PM, David wrote: > Hi, > > I am pleased to announce the release 0.0.3 for Bento, the pythonic > packaging solution. > > Wherease the 0.0.2 release was mostly about getting the > simplest-still-useful subset of distutils features, this new release > adds quite a few significant features: > > - Add hooks to customize arbitrary stages in bento (there is a > hackish example which shows how to use waf to build a simple C > extension). The API for this is still in flux, though > - Parallel and reliable build of C extensions through yaku build > library. > - One file distribution: no need for your users to install any new > packages, just include one single file into your package to > build with bento > - Improved documentation > - 2.4 -> 2.7 support, tested on linux/windows/mac os x > > You can download bento on github: http://github.com/cournape/Bento > > cheers, > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From cournape at gmail.com Fri Jul 2 01:11:47 2010 From: cournape at gmail.com (David Cournapeau) Date: Fri, 2 Jul 2010 14:11:47 +0900 Subject: [Numpy-discussion] [ANN] Bento (ex-toydist) 0.0.3 In-Reply-To: <950AD178-EBFF-45D8-A4DB-323BC1DBEC00@post.harvard.edu> References: <4C2D502C.2040909@silveregg.co.jp> <950AD178-EBFF-45D8-A4DB-323BC1DBEC00@post.harvard.edu> Message-ID: On Fri, Jul 2, 2010 at 1:56 PM, Robert Pyle wrote: > Hi, > > While I agree that toydist needs a new name, Bento might not be a good > choice. ?It's already the name of a database system for Macintosh from > Filemaker, an Apple subsidiary. ?I'd be *very* surprised if the name > Bento is not copyrighted. Can you copyright a word ? I thought this was the trademark part of the law. For example, "linux" is a trademark owned by Linus Torvald. Also, well known packages use words which are at least as common as bento in English (sphinx, twisted, etc...), and as likely to be trademarked. But IANAL... cheers, David From sole at esrf.fr Fri Jul 2 03:40:04 2010 From: sole at esrf.fr (=?ISO-8859-1?Q?=22V=2E_Armando_Sol=E9=22?=) Date: Fri, 02 Jul 2010 09:40:04 +0200 Subject: [Numpy-discussion] Ternary plots anywhere? Message-ID: <4C2D97D4.6090908@esrf.fr> Dear all, Perhaps this is a bit off topic for the mailing list, but this is probably the only mailing list that is common to users of all python plotting packages. I am trying to find a python implementation of ternary/triangular plots: http://en.wikipedia.org/wiki/Ternary_plot but I have been unsuccessful. Is there any on-going project around? Thanks for your time. Best regards, Armando From dagss at student.matnat.uio.no Fri Jul 2 04:05:55 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 02 Jul 2010 10:05:55 +0200 Subject: [Numpy-discussion] [ANN] Bento (ex-toydist) 0.0.3 In-Reply-To: References: <4C2D502C.2040909@silveregg.co.jp> <950AD178-EBFF-45D8-A4DB-323BC1DBEC00@post.harvard.edu> Message-ID: <4C2D9DE3.4010905@student.matnat.uio.no> David Cournapeau wrote: > On Fri, Jul 2, 2010 at 1:56 PM, Robert Pyle wrote: > >> Hi, >> >> While I agree that toydist needs a new name, Bento might not be a good >> choice. It's already the name of a database system for Macintosh from >> Filemaker, an Apple subsidiary. I'd be *very* surprised if the name >> Bento is not copyrighted. >> > > Can you copyright a word ? I thought this was the trademark part of > the law. For example, "linux" is a trademark owned by Linus Torvald. > Also, well known packages use words which are at least as common as > bento in English (sphinx, twisted, etc...), and as likely to be > trademarked. But IANAL... > There's been lots of discussions about this on the Sage list, since there's lots of software called Sage. It seems that the consensus of IANAL advice on that list is that as long as they're not competing in the same market they're OK. For instance, there's been some talk about whether it's OK to include economics utilities in Sage since there's an accounting software (?) called Sage -- that sort of thing. Thanks for your work David, I'll make sure to check it out soon! Dag Sverre From david at silveregg.co.jp Fri Jul 2 04:44:03 2010 From: david at silveregg.co.jp (David) Date: Fri, 02 Jul 2010 17:44:03 +0900 Subject: [Numpy-discussion] [ANN] Bento (ex-toydist) 0.0.3 In-Reply-To: <4C2D9DE3.4010905@student.matnat.uio.no> References: <4C2D502C.2040909@silveregg.co.jp> <950AD178-EBFF-45D8-A4DB-323BC1DBEC00@post.harvard.edu> <4C2D9DE3.4010905@student.matnat.uio.no> Message-ID: <4C2DA6D3.5080309@silveregg.co.jp> On 07/02/2010 05:05 PM, Dag Sverre Seljebotn wrote: > David Cournapeau wrote: >> On Fri, Jul 2, 2010 at 1:56 PM, Robert Pyle wrote: >> >>> Hi, >>> >>> While I agree that toydist needs a new name, Bento might not be a good >>> choice. It's already the name of a database system for Macintosh from >>> Filemaker, an Apple subsidiary. I'd be *very* surprised if the name >>> Bento is not copyrighted. >>> >> >> Can you copyright a word ? I thought this was the trademark part of >> the law. For example, "linux" is a trademark owned by Linus Torvald. >> Also, well known packages use words which are at least as common as >> bento in English (sphinx, twisted, etc...), and as likely to be >> trademarked. But IANAL... >> > There's been lots of discussions about this on the Sage list, since > there's lots of software called Sage. It seems that the consensus of > IANAL advice on that list is that as long as they're not competing in > the same market they're OK. For instance, there's been some talk about > whether it's OK to include economics utilities in Sage since there's an > accounting software (?) called Sage -- that sort of thing. Thanks. that's useful to know. > Thanks for your work David, I'll make sure to check it out soon! Note that cython setup.py can be automatically converted - there is a small issue with the setup docstring which contains rest syntax incompatible with bento.info format (when empty lines has a different amount of space than the current indentation). But once you manually edit those, you can build egg, windows installer and install cython. In particular, the cython script is "exucutablified" like setuptools does, so cython is a bit more practical to use on windows. cheers, David From tillmann.falck at gmail.com Fri Jul 2 08:56:47 2010 From: tillmann.falck at gmail.com (Tillmann Falck) Date: Fri, 2 Jul 2010 14:56:47 +0200 Subject: [Numpy-discussion] memory leak using numpy and cvxopt Message-ID: <201007021456.47519.tillmann.falck@gmail.com> Hi all, I am hitting a memory leak with the combination of numpy and cvxopt.matrix. As I am not where it occurs, I am cross posting. On my machine (Fedora 13, x86_64) this example quickly eats up all my memory. ----------- from cvxopt import matrix import numpy as np N = 2000 X = np.ones((N, N)) Y = matrix(0.0, (N, N)) while True: Y[:N, :N] = X ----------- I don't hit the leak if copy blocks of 1-d arrays. Regards, Tillmann From rpyle at post.harvard.edu Fri Jul 2 09:52:56 2010 From: rpyle at post.harvard.edu (Robert Pyle) Date: Fri, 02 Jul 2010 09:52:56 -0400 Subject: [Numpy-discussion] [ANN] Bento (ex-toydist) 0.0.3 In-Reply-To: References: <4C2D502C.2040909@silveregg.co.jp> <950AD178-EBFF-45D8-A4DB-323BC1DBEC00@post.harvard.edu> Message-ID: <33E184BE-CF4B-4D01-B739-FDBA259EB189@post.harvard.edu> On Jul 2, 2010, at 1:11 AM, David Cournapeau wrote: > On Fri, Jul 2, 2010 at 1:56 PM, Robert Pyle > wrote: >> Hi, >> >> While I agree that toydist needs a new name, Bento might not be a >> good >> choice. It's already the name of a database system for Macintosh >> from >> Filemaker, an Apple subsidiary. I'd be *very* surprised if the name >> Bento is not copyrighted. > > Can you copyright a word ? I thought this was the trademark part of > the law. For example, "linux" is a trademark owned by Linus Torvald. > Also, well known packages use words which are at least as common as > bento in English (sphinx, twisted, etc...), and as likely to be > trademarked. But IANAL... > > cheers, > > David It was very late last night when I wrote. I meant to say 'trademark' rather than 'copyright'. But IANAL, also. Bob From matthew.brett at gmail.com Fri Jul 2 10:08:09 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 2 Jul 2010 10:08:09 -0400 Subject: [Numpy-discussion] [ANN] Bento (ex-toydist) 0.0.3 In-Reply-To: References: <4C2D502C.2040909@silveregg.co.jp> <950AD178-EBFF-45D8-A4DB-323BC1DBEC00@post.harvard.edu> Message-ID: Hi, > Can you copyright a word ? I thought this was the trademark part of > the law. For example, "linux" is a trademark owned by Linus Torvald. > Also, well known packages use words which are at least as common as > bento in English (sphinx, twisted, etc...), and as likely to be > trademarked. I got ripely panned for doing this before, but... If you have a look at - to reduce controversy - : http://cyber.law.harvard.edu/metaschool/fisher/domain/tm.htm#7 you'll see a summary of the criteria used. I read this stuff as meaning that, if you're doing something that has a low 'likelihood of confusion' with the other guy / gal doing 'Bento', and the other 'Bento' trademark is not 'famous', you're probably, but not certainly, safe from successful prosecution. See you, Matthew From pav at iki.fi Fri Jul 2 10:54:05 2010 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 2 Jul 2010 14:54:05 +0000 (UTC) Subject: [Numpy-discussion] memory leak using numpy and cvxopt References: <201007021456.47519.tillmann.falck@gmail.com> Message-ID: Fri, 02 Jul 2010 14:56:47 +0200, Tillmann Falck wrote: > I am hitting a memory leak with the combination of numpy and > cvxopt.matrix. As I am not where it occurs, I am cross posting. Probably a bug in cvxopt, as also the following leaks memory: -------------------------------- from cvxopt import matrix N = 2000 X = [0]*N Y = matrix(0.0, (N, N)) while True: Y[:N, :1] = X -------------------------------- -- Pauli Virtanen From robince at gmail.com Fri Jul 2 12:37:41 2010 From: robince at gmail.com (Robin) Date: Fri, 2 Jul 2010 17:37:41 +0100 Subject: [Numpy-discussion] OT: request help building pymex win64 Message-ID: Hi, Sorry for the offtopic post but I wondered if any Windows experts who are familiar with topics like linking python on windows and visual studio runtimes etc. might be able to help. I'm on a bit of a mission to get pymex built for 64 bit windows. Pymex ( http://github.com/kw/pymex ) is a matlab package that embeds the Python interpreter in a mex file and provides a very elegant interface for manipulating python objects from matlab, as well as converting between data times when necessary. It builds easily on mac, linux and win32 with mingw, but I really need it also for 64 bit windows. (It works very well with numpy as well so not completely OT). I have looked at trying to get a 64bit mingw working to build mex files, but that seemed quite difficult, so instead I am trying to build with VS 2008 Express Edition + Windows 7 SDK (for 64 bit support). As far as I can tell this is installed OK as I can build the example mex64 files OK. I have made some modifications to pymex to get it to build under vs 2008 ( http://github.com/robince/pymex/tree/win64 ). And I can get it to build and link (I believe using the implicit dll method of linking against C:\Python26\libs\python26.lib of the amd64 python.org python) without errors, but when I run it seems to segfaults whenever a pointer is passed between the mex side and python26.dll. I asked this stackoverflow question which has some more details (build log) http://stackoverflow.com/questions/3167134/trying-to-embed-python-into-matlab-mex-win64 Anyway I'm completely in the dark but wondered if some of the experts on here would be able to spot something (perhaps to do with incompatible C runtimes - I am not sure what runtime Python is built with but I thought it was VS 2008). Cheers Robin From cournape at gmail.com Fri Jul 2 12:47:50 2010 From: cournape at gmail.com (David Cournapeau) Date: Sat, 3 Jul 2010 01:47:50 +0900 Subject: [Numpy-discussion] OT: request help building pymex win64 In-Reply-To: References: Message-ID: On Sat, Jul 3, 2010 at 1:37 AM, Robin wrote: > Hi, > > Sorry for the offtopic post but I wondered if any Windows experts who > are familiar with topics like linking python on windows and visual > studio runtimes etc. might be able to help. > > I'm on a bit of a mission to get pymex built for 64 bit windows. Pymex > ( http://github.com/kw/pymex ) is a matlab package that embeds the > Python interpreter in a mex file and provides a very elegant interface > for manipulating python objects from matlab, as well as converting > between data times when necessary. It builds easily on mac, linux and > win32 with mingw, but I really need it also for 64 bit windows. (It > works very well with numpy as well so not completely OT). > > I have looked at trying to get a 64bit mingw ?working to build mex > files, but that seemed quite difficult, so instead I am trying to > build with VS 2008 Express Edition + Windows 7 SDK (for 64 bit > support). As far as I can tell this is installed OK as I can build the > example mex64 files OK. > > I have made some modifications to pymex to get it to build under vs > 2008 ( http://github.com/robince/pymex/tree/win64 ). > > And I can get it to build and link (I believe using the implicit dll > method of linking against C:\Python26\libs\python26.lib of the amd64 > python.org python) without errors, but when I run it seems to > segfaults whenever a pointer is passed between the mex side and > python26.dll. > > I asked this stackoverflow question which has some more details (build log) > http://stackoverflow.com/questions/3167134/trying-to-embed-python-into-matlab-mex-win64 > > Anyway I'm completely in the dark but wondered if some of the experts > on here would be able to spot something (perhaps to do with > incompatible C runtimes - I am not sure what runtime Python is built > with but I thought it was VS 2008). The problem may be that matlab is built with one runtime, and Python with another.... Unless your matlab is very recent, it is actually quite likely to be compiled with VS 2005, which means you should use python 2.5 instead (or built python2.6 with VS 2005, but I am not sure it is even possible without herculean efforts). David From robince at gmail.com Fri Jul 2 12:58:49 2010 From: robince at gmail.com (Robin) Date: Fri, 2 Jul 2010 17:58:49 +0100 Subject: [Numpy-discussion] OT: request help building pymex win64 In-Reply-To: References: Message-ID: On Fri, Jul 2, 2010 at 5:47 PM, David Cournapeau wrote: > > The problem may be that matlab is built with one runtime, and Python > with another.... Unless your matlab is very recent, it is actually > quite likely to be compiled with VS 2005, which means you should use > python 2.5 instead (or built python2.6 with VS 2005, but I am not sure > it is even possible without herculean efforts). Thanks for your help! I thought of that, but then VS 2008 is an officially supported compiler for the version of matlab I am using (2009a). http://www.mathworks.com/support/compilers/release2009a/win64.html So I thought on the matlab/mex side 2008 should be fine, and I thought since Python is built with 2008 that should also be OK. But obviously something isn't! Cheers Robin From kwatford+scipy at gmail.com Fri Jul 2 13:00:53 2010 From: kwatford+scipy at gmail.com (Ken Watford) Date: Fri, 2 Jul 2010 13:00:53 -0400 Subject: [Numpy-discussion] OT: request help building pymex win64 In-Reply-To: References: Message-ID: That's an excellent point. I've noticed on my (Linux) workstation that pymex works fine, but PyCUDA fails to import properly, because PyCUDA is a Boost::Python project and expects a different libstdc++ than the one that MATLAB jams into its LD_LIBRARY_PATH. (I got around this using an evil LD_PRELOAD, but that's another story) So yeah. Robin has been converting my C99isms C++-isms to get the Visual Studio compiler to accept it - and in the process, I suppose, adding a libstdc++ dependency that wasn't there to begin with that MATLAB doesn't like. Anyone know if there's a switch somewhere to get VS 2008 to accept some semblance of C99 source? Otherwise you might need to convert those bits into valid C89. I really need to convert this thing into Cython some day. On Fri, Jul 2, 2010 at 12:47 PM, David Cournapeau wrote: > On Sat, Jul 3, 2010 at 1:37 AM, Robin wrote: >> Hi, >> >> Sorry for the offtopic post but I wondered if any Windows experts who >> are familiar with topics like linking python on windows and visual >> studio runtimes etc. might be able to help. >> >> I'm on a bit of a mission to get pymex built for 64 bit windows. Pymex >> ( http://github.com/kw/pymex ) is a matlab package that embeds the >> Python interpreter in a mex file and provides a very elegant interface >> for manipulating python objects from matlab, as well as converting >> between data times when necessary. It builds easily on mac, linux and >> win32 with mingw, but I really need it also for 64 bit windows. (It >> works very well with numpy as well so not completely OT). >> >> I have looked at trying to get a 64bit mingw ?working to build mex >> files, but that seemed quite difficult, so instead I am trying to >> build with VS 2008 Express Edition + Windows 7 SDK (for 64 bit >> support). As far as I can tell this is installed OK as I can build the >> example mex64 files OK. >> >> I have made some modifications to pymex to get it to build under vs >> 2008 ( http://github.com/robince/pymex/tree/win64 ). >> >> And I can get it to build and link (I believe using the implicit dll >> method of linking against C:\Python26\libs\python26.lib of the amd64 >> python.org python) without errors, but when I run it seems to >> segfaults whenever a pointer is passed between the mex side and >> python26.dll. >> >> I asked this stackoverflow question which has some more details (build log) >> http://stackoverflow.com/questions/3167134/trying-to-embed-python-into-matlab-mex-win64 >> >> Anyway I'm completely in the dark but wondered if some of the experts >> on here would be able to spot something (perhaps to do with >> incompatible C runtimes - I am not sure what runtime Python is built >> with but I thought it was VS 2008). > > The problem may be that matlab is built with one runtime, and Python > with another.... Unless your matlab is very recent, it is actually > quite likely to be compiled with VS 2005, which means you should use > python 2.5 instead (or built python2.6 with VS 2005, but I am not sure > it is even possible without herculean efforts). > > David > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From cournape at gmail.com Fri Jul 2 13:10:31 2010 From: cournape at gmail.com (David Cournapeau) Date: Sat, 3 Jul 2010 02:10:31 +0900 Subject: [Numpy-discussion] OT: request help building pymex win64 In-Reply-To: References: Message-ID: On Sat, Jul 3, 2010 at 1:58 AM, Robin wrote: > On Fri, Jul 2, 2010 at 5:47 PM, David Cournapeau wrote: >> >> The problem may be that matlab is built with one runtime, and Python >> with another.... Unless your matlab is very recent, it is actually >> quite likely to be compiled with VS 2005, which means you should use >> python 2.5 instead (or built python2.6 with VS 2005, but I am not sure >> it is even possible without herculean efforts). > > Thanks for your help! > > I thought of that, but then VS 2008 is an officially supported > compiler for the version of matlab I am using (2009a). > http://www.mathworks.com/support/compilers/release2009a/win64.html What mathworks means by supported may not include what you are doing, though. Generally,on windows, people design API to be independent of runtimes (because you more or less have to), and do not use much of the standard C library anyway. This is not true for python. IOW, supporting VS 2008 does not mean built with 2008, that's a limitation of python (and also caused by the desire of MS to complete screw up the C library, but that's another story). Also, matlab could be built with 2008, and not use the same runtime as python. Even if the same version is used in both python and matlab, but the process uses two copies, the issues remain the same. You should use depends.exe to check for this (and maybe the MS debugger as well). Also, I would double check the issue is not something else altogether, David From ben.root at ou.edu Fri Jul 2 14:33:28 2010 From: ben.root at ou.edu (Benjamin Root) Date: Fri, 2 Jul 2010 13:33:28 -0500 Subject: [Numpy-discussion] [Matplotlib-users] Vectorization In-Reply-To: References: Message-ID: I am moving this over to numpy-discussion maillist... I don't have a firm answer for you, but I did notice one issue in your code. You call arange(len(dx) - 1) for your loops, but you probably really need arange(1, len(dx) - 1) because you are accessing elements both after *and* before the current index. An index of -1 is actually valid because that means the last element of the array, and may not be what you intended. Ben Root On Fri, Jul 2, 2010 at 1:15 PM, Nicolas Bigaouette wrote: > Hi all, > > I don't really know where to ask, so here it is. > > I was able to vectorize the normalization calculation in quantum mechanics: > . Basically it's a volume integral of a scalar field. Using: > >> norm = 0.0 >> for i in numpy.arange(len(dx)-1): >> for j in numpy.arange(len(dy)-1): >> for k in numpy.arange(len(dz)-1): >> norm += psi[k,j,i]**2 * dx[i] * dy[j] * dz[k] >> > if dead slow. I replaced that with: > >> norm = (psi**2 * >> dx*dy[:,numpy.newaxis]*dz[:,numpy.newaxis,numpy.newaxis]).sum() >> > which is almost instantanious. > > I want to do the same for the calculation of the kinetic energy: > /2m. There is a laplacian in the volume integral which > complicates things: > >> K = 0.0 >> for i in numpy.arange(len(dx)-1): >> for j in numpy.arange(len(dy)-1): >> for k in numpy.arange(len(dz)-1): >> K += -0.5 * m * phi[k,j,i] * ( >> (phi[k,j,i-1] - 2.0*phi[k,j,i] + phi[k,j,i+1]) / >> dx[i]**2 >> + (phi[k,j-1,i] - 2.0*phi[k,j,i] + phi[k,j+1,i]) / >> dy[j]**2 >> + (phi[k-1,j,i] - 2.0*phi[k,j,i] + phi[k+1,j,i]) / >> dz[k]**2 >> ) >> > > My question is, how would I vectorize such loops? I don't know how I would > manage the "numpy.newaxis" code-foo with neighbours dependency... Any idea? > > Thanx! > > > ------------------------------------------------------------------------------ > This SF.net email is sponsored by Sprint > What will you do first with EVO, the first 4G phone? > Visit sprint.com/first -- http://p.sf.net/sfu/sprint-com-first > _______________________________________________ > Matplotlib-users mailing list > Matplotlib-users at lists.sourceforge.net > https://lists.sourceforge.net/lists/listinfo/matplotlib-users > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Fri Jul 2 14:45:34 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 2 Jul 2010 11:45:34 -0700 Subject: [Numpy-discussion] [Matplotlib-users] Vectorization In-Reply-To: References: Message-ID: On Fri, Jul 2, 2010 at 11:33 AM, Benjamin Root wrote: > I am moving this over to numpy-discussion maillist... > > I don't have a firm answer for you, but I did notice one issue in your > code.? You call arange(len(dx) - 1) for your loops, but you probably really > need arange(1, len(dx) - 1) because you are accessing elements both after > *and* before the current index.? An index of -1 is actually valid because > that means the last element of the array, and may not be what you intended. > > Ben Root > > On Fri, Jul 2, 2010 at 1:15 PM, Nicolas Bigaouette > wrote: >> >> Hi all, >> >> I don't really know where to ask, so here it is. >> >> I was able to vectorize the normalization calculation in quantum >> mechanics: . Basically it's a volume integral of a scalar field. >> Using: >>> >>> norm = 0.0 >>> for i in numpy.arange(len(dx)-1): >>> ??? for j in numpy.arange(len(dy)-1): >>> ??????? for k in numpy.arange(len(dz)-1): >>> ??????????? norm += psi[k,j,i]**2 * dx[i] * dy[j] * dz[k] >> >> if dead slow. I replaced that with: >>> >>> norm = (psi**2 * >>> dx*dy[:,numpy.newaxis]*dz[:,numpy.newaxis,numpy.newaxis]).sum() >> >> which is almost instantanious. >> >> I want to do the same for the calculation of the kinetic energy: >> /2m. There is a laplacian in the volume integral which >> complicates things: >>> >>> K = 0.0 >>> for i in numpy.arange(len(dx)-1): >>> ??? for j in numpy.arange(len(dy)-1): >>> ??????? for k in numpy.arange(len(dz)-1): >>> ??????????? K += -0.5 * m * phi[k,j,i] * ( >>> ????????????????? (phi[k,j,i-1] - 2.0*phi[k,j,i] + phi[k,j,i+1]) / >>> dx[i]**2 >>> ??????????????? + (phi[k,j-1,i] - 2.0*phi[k,j,i] + phi[k,j+1,i]) / >>> dy[j]**2 >>> ??????????????? + (phi[k-1,j,i] - 2.0*phi[k,j,i] + phi[k+1,j,i]) / >>> dz[k]**2 >>> ??????????? ) >> >> My question is, how would I vectorize such loops? I don't know how I would >> manage the "numpy.newaxis" code-foo with neighbours dependency... Any idea? If no one knows how to vectorize it then one way to go is cython. If you convert your arrays to lists then it is very easy to convert the loop to cython. Fast too. From kwgoodman at gmail.com Fri Jul 2 15:15:51 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 2 Jul 2010 12:15:51 -0700 Subject: [Numpy-discussion] [Matplotlib-users] Vectorization In-Reply-To: References: Message-ID: On Fri, Jul 2, 2010 at 11:45 AM, Keith Goodman wrote: > On Fri, Jul 2, 2010 at 11:33 AM, Benjamin Root wrote: >> I am moving this over to numpy-discussion maillist... >> >> I don't have a firm answer for you, but I did notice one issue in your >> code.? You call arange(len(dx) - 1) for your loops, but you probably really >> need arange(1, len(dx) - 1) because you are accessing elements both after >> *and* before the current index.? An index of -1 is actually valid because >> that means the last element of the array, and may not be what you intended. >> >> Ben Root >> >> On Fri, Jul 2, 2010 at 1:15 PM, Nicolas Bigaouette >> wrote: >>> >>> Hi all, >>> >>> I don't really know where to ask, so here it is. >>> >>> I was able to vectorize the normalization calculation in quantum >>> mechanics: . Basically it's a volume integral of a scalar field. >>> Using: >>>> >>>> norm = 0.0 >>>> for i in numpy.arange(len(dx)-1): >>>> ??? for j in numpy.arange(len(dy)-1): >>>> ??????? for k in numpy.arange(len(dz)-1): >>>> ??????????? norm += psi[k,j,i]**2 * dx[i] * dy[j] * dz[k] >>> >>> if dead slow. I replaced that with: >>>> >>>> norm = (psi**2 * >>>> dx*dy[:,numpy.newaxis]*dz[:,numpy.newaxis,numpy.newaxis]).sum() >>> >>> which is almost instantanious. >>> >>> I want to do the same for the calculation of the kinetic energy: >>> /2m. There is a laplacian in the volume integral which >>> complicates things: >>>> >>>> K = 0.0 >>>> for i in numpy.arange(len(dx)-1): >>>> ??? for j in numpy.arange(len(dy)-1): >>>> ??????? for k in numpy.arange(len(dz)-1): >>>> ??????????? K += -0.5 * m * phi[k,j,i] * ( >>>> ????????????????? (phi[k,j,i-1] - 2.0*phi[k,j,i] + phi[k,j,i+1]) / >>>> dx[i]**2 >>>> ??????????????? + (phi[k,j-1,i] - 2.0*phi[k,j,i] + phi[k,j+1,i]) / >>>> dy[j]**2 >>>> ??????????????? + (phi[k-1,j,i] - 2.0*phi[k,j,i] + phi[k+1,j,i]) / >>>> dz[k]**2 >>>> ??????????? ) >>> >>> My question is, how would I vectorize such loops? I don't know how I would >>> manage the "numpy.newaxis" code-foo with neighbours dependency... Any idea? > > If no one knows how to vectorize it then one way to go is cython. If > you convert your arrays to lists then it is very easy to convert the > loop to cython. Fast too. Some more thoughts: you pull phi[k,j,i] four times per loop. Setting phikji = phi[k,j,i] might give a little more speed. Might also help to do stuff like phi_i = phi[:,:,i], phi_ip1 = phi[:,:,i+1], etc right after "for i in range()". Also everything gets multiplied by "-0.5 * m * phi" so you should do that outside the loop. You can square the d's (dx, dy, and dz) outside the loop. Doing all these and then using cython should makes things very fast. From bsouthey at gmail.com Fri Jul 2 15:45:19 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Fri, 02 Jul 2010 14:45:19 -0500 Subject: [Numpy-discussion] [Matplotlib-users] Vectorization In-Reply-To: References: Message-ID: <4C2E41CF.80504@gmail.com> On 07/02/2010 01:45 PM, Keith Goodman wrote: > On Fri, Jul 2, 2010 at 11:33 AM, Benjamin Root wrote: > >> I am moving this over to numpy-discussion maillist... >> >> I don't have a firm answer for you, but I did notice one issue in your >> code. You call arange(len(dx) - 1) for your loops, but you probably really >> need arange(1, len(dx) - 1) because you are accessing elements both after >> *and* before the current index. An index of -1 is actually valid because >> that means the last element of the array, and may not be what you intended. >> >> Ben Root >> >> On Fri, Jul 2, 2010 at 1:15 PM, Nicolas Bigaouette >> wrote: >> >>> Hi all, >>> >>> I don't really know where to ask, so here it is. >>> >>> I was able to vectorize the normalization calculation in quantum >>> mechanics:. Basically it's a volume integral of a scalar field. >>> Using: >>> >>>> norm = 0.0 >>>> for i in numpy.arange(len(dx)-1): >>>> for j in numpy.arange(len(dy)-1): >>>> for k in numpy.arange(len(dz)-1): >>>> norm += psi[k,j,i]**2 * dx[i] * dy[j] * dz[k] >>>> >>> if dead slow. I replaced that with: >>> >>>> norm = (psi**2 * >>>> dx*dy[:,numpy.newaxis]*dz[:,numpy.newaxis,numpy.newaxis]).sum() >>>> >>> which is almost instantanious. >>> >>> I want to do the same for the calculation of the kinetic energy: >>> /2m. There is a laplacian in the volume integral which >>> complicates things: >>> >>>> K = 0.0 >>>> for i in numpy.arange(len(dx)-1): >>>> for j in numpy.arange(len(dy)-1): >>>> for k in numpy.arange(len(dz)-1): >>>> K += -0.5 * m * phi[k,j,i] * ( >>>> (phi[k,j,i-1] - 2.0*phi[k,j,i] + phi[k,j,i+1]) / >>>> dx[i]**2 >>>> + (phi[k,j-1,i] - 2.0*phi[k,j,i] + phi[k,j+1,i]) / >>>> dy[j]**2 >>>> + (phi[k-1,j,i] - 2.0*phi[k,j,i] + phi[k+1,j,i]) / >>>> dz[k]**2 >>>> ) >>>> >>> My question is, how would I vectorize such loops? I don't know how I would >>> manage the "numpy.newaxis" code-foo with neighbours dependency... Any idea? >>> > If no one knows how to vectorize it then one way to go is cython. If > you convert your arrays to lists then it is very easy to convert the > loop to cython. Fast too. > _______________________________________________ > Since things do not depend on previous results. Without thinking much can you just replace all your phi[] references to being phi[].sum()? Probably wrong but hopefully you can figure out what I mean. My reasoning is that over three loops then K = 0.0 for i in numpy.arange(len(dx)-1): for j in numpy.arange(len(dy)-1): for k in numpy.arange(len(dz)-1): K += -0.5 * m * phi[k,j,i] is just sum of all the entries except the last dimension in each axis of phi ie K= -0.5 * m * phi[0:len(dz)-1, 0:len(dz)-1,0:len(dx)-1].sum() Similarity for the remaining places you have to sum the appropriate sections of the phi array. So phi[k,j,i-1] becomes something like: phi[:,:,0:i-1].sum() except you have to address the division. Therefore you have to sum over the appropriate access (phi[:,:,0:i-1].sum(axis=2)/dx[0:i-1]**2).sum() # or numpy.dot((phi[:,:,0:i-1].sum(axis=2), 1/(dx[0:i-1]**2)) Then I got confused when say i=0 because the reference phi[k,j,i-1] becomes phi[k,j,-1] - which is the last index of that axis. Is that what you meant to happen? Bruce From gely at usc.edu Fri Jul 2 15:47:22 2010 From: gely at usc.edu (Geoffrey Ely) Date: Fri, 2 Jul 2010 12:47:22 -0700 Subject: [Numpy-discussion] [Matplotlib-users] Vectorization In-Reply-To: References: Message-ID: On Jul 2, 2010, at 11:33 AM, Benjamin Root wrote: > I want to do the same for the calculation of the kinetic energy: /2m. There is a laplacian in the volume integral which complicates things: > K = 0.0 > for i in numpy.arange(len(dx)-1): > for j in numpy.arange(len(dy)-1): > for k in numpy.arange(len(dz)-1): > K += -0.5 * m * phi[k,j,i] * ( > (phi[k,j,i-1] - 2.0*phi[k,j,i] + phi[k,j,i+1]) / dx[i]**2 > + (phi[k,j-1,i] - 2.0*phi[k,j,i] + phi[k,j+1,i]) / dy[j]**2 > + (phi[k-1,j,i] - 2.0*phi[k,j,i] + phi[k+1,j,i]) / dz[k]**2 > ) > > My question is, how would I vectorize such loops? I don't know how I would manage the "numpy.newaxis" code-foo with neighbours dependency... Any idea? How about: K = -0.5 * m * ( phi[1:-1,1:-1,1:-1] * ( np.diff(phi[1:-1,1:-1,:], 2, 2) / dx[None,None,1:-1]**2 + np.diff(phi[1:-1,:,1:-1], 2, 1) / dy[None,1:-1,None]**2 + np.diff(phi[:,1:-1,1:-1], 2, 0) / dz[1:-1,None,None]**2 ) ).sum() (Not tested) From gely at usc.edu Fri Jul 2 15:53:35 2010 From: gely at usc.edu (Geoffrey Ely) Date: Fri, 2 Jul 2010 12:53:35 -0700 Subject: [Numpy-discussion] cython and f2py Message-ID: Hi All, Sorry if this has been documented or discussed already, but my searches have come up short. Can someone please recommend a way to setup both Cython and Fortran extensions in a single package with numpy.distutils (or something else)? E.g.: from numpy.distutils.core import setup, Extension ext_modules = [ Extension( 'cext', ['cext.pyx'] ), Extension( 'fext', ['fext.f90'] ), ] setup( name = 'test', ext_modules = ext_modules, ) Can numpy.distutils be directed to process *.pyx with Cython rather than Pyrex? Is is kosher to call setup twice in the same script, once for Fortran, and once for Cython using Cython.Distutils.build_ext, or would that do bad things? I guess I could just pre-process the Cython stuff, and have distutils work with the generated C, but I don't like that as much. I am using numpy version 1.4.0, but can update to the development version if that helps. Thanks, Geoff From kwgoodman at gmail.com Fri Jul 2 16:00:51 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 2 Jul 2010 13:00:51 -0700 Subject: [Numpy-discussion] cython and f2py In-Reply-To: References: Message-ID: On Fri, Jul 2, 2010 at 12:53 PM, Geoffrey Ely wrote: > Hi All, > > Sorry if this has been documented or discussed already, but my searches have come up short. Can someone please recommend a way to setup both Cython and Fortran extensions in a single package with numpy.distutils (or something else)? E.g.: > > from numpy.distutils.core import setup, Extension > ext_modules = [ > ? ?Extension( 'cext', ['cext.pyx'] ), > ? ?Extension( 'fext', ['fext.f90'] ), > ] > setup( > ? ?name = 'test', > ? ?ext_modules = ext_modules, > ) > > Can numpy.distutils be directed to process *.pyx with Cython rather than Pyrex? > > Is is kosher to call setup twice in the same script, once for Fortran, and once for Cython using Cython.Distutils.build_ext, or would that do bad things? > > I guess I could just pre-process the Cython stuff, and have distutils work with the generated C, but I don't like that as much. That's what I do. I think it is easier for users (some might not have cython). If the C extension doesn't compile I fallback to the python versions of the functions. http://github.com/kwgoodman/la/blob/master/setup.py From matthew.brett at gmail.com Fri Jul 2 16:07:44 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Fri, 2 Jul 2010 16:07:44 -0400 Subject: [Numpy-discussion] cython and f2py In-Reply-To: References: Message-ID: Hi, > Can numpy.distutils be directed to process *.pyx with Cython rather than Pyrex? Yes, but at the moment I believe you have to monkey-patch numpy distutils : see the top of http://github.com/matthew-brett/nipy/blob/master/setup.py and "generate_a_pyrex_source" around line 289 of: http://github.com/matthew-brett/nipy/blob/master/build_helpers.py for how we've done it - there may be a better way - please post if you find it! Best, Matthew From dalcinl at gmail.com Sat Jul 3 00:11:20 2010 From: dalcinl at gmail.com (Lisandro Dalcin) Date: Sat, 3 Jul 2010 01:11:20 -0300 Subject: [Numpy-discussion] PATCH: reference leaks for 'numpy.core._internal' module object Message-ID: The simple test below show the issue. import sys import numpy as np from numpy.core import _internal def f(a = np.zeros(4)): a = np.zeros(4) b = memoryview(a) c = np.asarray(b) print sys.getrefcount(_internal) while 1: f() The patch it trivial (I've added a little extra, unrelated NULL check): Index: numpy/core/src/multiarray/buffer.c =================================================================== --- numpy/core/src/multiarray/buffer.c (revision 8468) +++ numpy/core/src/multiarray/buffer.c (working copy) @@ -747,9 +747,13 @@ } str = PyUString_FromStringAndSize(buf, strlen(buf)); free(buf); + if (str == NULL) { + return NULL; + } descr = (PyArray_Descr*)PyObject_CallMethod( _numpy_internal, "_dtype_from_pep3118", "O", str); Py_DECREF(str); + Py_DECREF(_numpy_internal); if (descr == NULL) { PyErr_Format(PyExc_ValueError, "'%s' is not a valid PEP 3118 buffer format string", buf); PS: I think that such implementation should at least handle the very simple one/two character format (eg, 'i', 'f', 'd', 'Zf', 'Zd', etc.) without calling Python code... Of course, complaints without patches should not be taken too seriously ;-) -- Lisandro Dalcin --------------- CIMEC (INTEC/CONICET-UNL) Predio CONICET-Santa Fe Colectora RN 168 Km 472, Paraje El Pozo Tel: +54-342-4511594 (ext 1011) Tel/Fax: +54-342-4511169 From robince at gmail.com Sat Jul 3 05:02:15 2010 From: robince at gmail.com (Robin) Date: Sat, 3 Jul 2010 10:02:15 +0100 Subject: [Numpy-discussion] OT: request help building pymex win64 In-Reply-To: References: Message-ID: On Fri, Jul 2, 2010 at 6:10 PM, David Cournapeau wrote: > Also, I would double check the issue is not something else altogether, Hi, I got it working eventually - it was something else altogether! I had made some mistakes in the changes I had made to get it to compile with visual studio that were causing the segfaults. Makes me think of the old phrase - problem is between keyboard and chair. Cheers Robin From ralf.gommers at googlemail.com Sat Jul 3 08:36:32 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 3 Jul 2010 20:36:32 +0800 Subject: [Numpy-discussion] ? FAIL: test_print.test_complex_types(, ) In-Reply-To: References: Message-ID: On Wed, Jun 30, 2010 at 1:08 PM, Vincent Davis wrote: > I didn't find these documented anywhere, I have numpy(couple day old > snapshot) install on python 2.7 OSX 64bit. > I see those too. Can you open a ticket? Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From vincent at vincentdavis.net Sat Jul 3 09:51:08 2010 From: vincent at vincentdavis.net (Vincent Davis) Date: Sat, 3 Jul 2010 07:51:08 -0600 Subject: [Numpy-discussion] ? FAIL: test_print.test_complex_types(, ) In-Reply-To: References: Message-ID: On Sat, Jul 3, 2010 at 6:36 AM, Ralf Gommers wrote: > > > On Wed, Jun 30, 2010 at 1:08 PM, Vincent Davis > wrote: >> >> I didn't find these documented anywhere, I have numpy(couple day old >> snapshot) install on python 2.7 OSX 64bit. > > I see those too. Can you open a ticket? Open Ticket Ticket #1534 http://projects.scipy.org/numpy/ticket/1534 Vincent > > Cheers, > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From ralf.gommers at googlemail.com Sat Jul 3 12:14:16 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 4 Jul 2010 00:14:16 +0800 Subject: [Numpy-discussion] Numpy 1.5 for Python 2.7 and 3.1 In-Reply-To: <4C250418.20009@uci.edu> References: <4C250418.20009@uci.edu> Message-ID: Hi Christoph, Sorry for the slow reply. This looks quite good! On Sat, Jun 26, 2010 at 3:31 AM, Christoph Gohlke wrote: > Dear NumPy developers, > > at I have posted a patch > against numpy svn trunk r8464 that: > > 1) removes the datetime functionality > 2) restores ABI compatibility with numpy 1.4.1 3) enables numpy to build and run on Python 2.7 and 3.1 (at least on > Windows) > > Tested on OS X, there with python 2.6 and 2.7 all is well. With 3.1 it compiles fine but doesn't import, not sure why (I'm of course not actually in the source tree with the interpreter): Python 3.1.2 (r312:79360M, Mar 24 2010, 01:33:18) [GCC 4.0.1 (Apple Inc. build 5493)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> import numpy Traceback (most recent call last): File "/Users/rgommers/Code/numpy/numpy/__init__.py", line 122, in from numpy.__config__ import show as show_config ImportError: No module named __config__ During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 1, in File "/Users/rgommers/Code/numpy/numpy/__init__.py", line 127, in raise ImportError(msg) ImportError: Error importing numpy: you should not try to import numpy from its source directory; please exit the numpy source tree, and relaunch your python intepreter from there. > I hope this work can be used to get a numpy 1.5.x branch started. > I hope so too. Probably someone familiar with the datetime changes should confirm that no trace of it is left. Cheers, Ralf > Regarding Python 2.7, which is scheduled to be released within 2 weeks: > numpy 1.4.1 does build and work with minor changes > (http://projects.scipy.org/numpy/changeset/7926 and followups). However, > for Python 2.7, the proposed patch will not be binary compatible with > numpy 1.4.1 because it uses PyCapsule instead of PyCObject. > > Best, > > Christoph > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sat Jul 3 13:09:21 2010 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 3 Jul 2010 17:09:21 +0000 (UTC) Subject: [Numpy-discussion] Numpy 1.5 for Python 2.7 and 3.1 References: <4C250418.20009@uci.edu> Message-ID: Fri, 25 Jun 2010 12:31:36 -0700, Christoph Gohlke wrote: > at I have posted a patch > against numpy svn trunk r8464 that: > > 1) removes the datetime functionality 2) restores ABI compatibility with > numpy 1.4.1 3) enables numpy to build and run on Python 2.7 and 3.1 (at > least on Windows) [clip] This should make it easier to keep track of it: http://github.com/pv/numpy-work/tree/1.5.x I note that the patch contains some additional 3K porting. We/I'll need to split the patch into two parts: - Py3K or other miscellaneous fixes that should also go to the trunk - removing datetime and restoring the ABI I can probably take a look on this during EuroScipy. I'd also like to manually check the diff to svn/1.4.x So far, I think the following changes look like they should also go to the trunk: - numpy/numarray/_capi.c - numpy/lib/tests/test_io.py - numpy/core/include/numpy/npy_math.h -- Pauli Virtanen From pav at iki.fi Sat Jul 3 14:48:51 2010 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 3 Jul 2010 18:48:51 +0000 (UTC) Subject: [Numpy-discussion] PATCH: reference leaks for 'numpy.core._internal' module object References: Message-ID: Sat, 03 Jul 2010 01:11:20 -0300, Lisandro Dalcin wrote: > The simple test below show the issue. [clip: patch] Thanks, applied in r8469. [clip] > PS: I think that such implementation should at least handle the very > simple one/two character format (eg, 'i', 'f', 'd', 'Zf', 'Zd', etc.) > without calling Python code... Of course, complaints without patches > should not be taken too seriously ;-) We'll optimize once someone complains this makes their code slow ;) -- Pauli Virtanen From sturla at molden.no Sat Jul 3 22:32:14 2010 From: sturla at molden.no (Sturla Molden) Date: Sun, 04 Jul 2010 04:32:14 +0200 Subject: [Numpy-discussion] debian benchmarks Message-ID: <4C2FF2AE.2090300@molden.no> I was just looking at Debian's benchmark. LuaJIT is now (on median) beating Intel Fortran! Consider that Lua is a dynamic language very similar to Python. I know it's "just a benchmark" but this has to count as insanely impressive. Beating Intel Fortran with a dynamic scripting language... How is that even possible? If this keeps up we'll need a Python to Lua compiler very soon. And LuaJIT 2 is rumoured to be much faster than the current... Looking at median runtimes, here is what I got: gcc 1.10 LuaJIT 1.96 Java 6 -server 2.13 Intel Fortran 2.18 OCaml 3.41 SBCL 3.66 JavaScript V8 7.57 PyPy 31.5 CPython 64.6 Perl 67.2 Ruby 1.9 71.1 This means that LuaJIT can do in less than a day what CPython can do in a month. The only comfort for CPython is that Ruby and Perl did even worse. I wonder how much better CPython would do with NumPy on this benchmark? Sturla From seb.haase at gmail.com Sun Jul 4 16:18:01 2010 From: seb.haase at gmail.com (Sebastian Haase) Date: Sun, 4 Jul 2010 22:18:01 +0200 Subject: [Numpy-discussion] debian benchmarks In-Reply-To: <4C2FF2AE.2090300@molden.no> References: <4C2FF2AE.2090300@molden.no> Message-ID: On Sun, Jul 4, 2010 at 4:32 AM, Sturla Molden wrote: > I was just looking at Debian's benchmark. LuaJIT is now (on median) > beating Intel Fortran! Consider that Lua is a dynamic language very > similar to Python. I know it's "just a benchmark" but this has to count > as insanely impressive. Beating Intel Fortran with a dynamic scripting > language... How is that even possible? > > If this keeps up we'll need a Python to Lua compiler very soon. And > LuaJIT 2 is rumoured to be much faster than the current... > > Looking at median runtimes, here is what I got: > > ? ? ?gcc ? ? ? ? ? ? ? 1.10 > ? LuaJIT ? ? ? ? ? ?1.96 > ? Java 6 -server ? ?2.13 > ? Intel Fortran ? ? 2.18 > ? OCaml ? ? ? ? ? ? 3.41 > ? SBCL ? ? ? ? ? ? ?3.66 > ? JavaScript V8 ? ? 7.57 > > ? PyPy ? ? ? ? ? ? 31.5 > ? CPython ? ? ? ? ?64.6 > ? Perl ? ? ? ? ? ? 67.2 > ? Ruby 1.9 ? ? ? ? 71.1 > > This means that LuaJIT can do in less than a day what CPython can do in > a month. The only comfort for CPython is that Ruby and Perl did even worse. > > I wonder how much better CPython would do with NumPy on this benchmark? > > Sturla Hi Sturla, what is this even about ... ? Do you have some references ? It does indeed sound interesting ... but what kind of code / problem are they actually testing here ? Thanks, Sebastian Haase From pav at iki.fi Sun Jul 4 17:29:24 2010 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 4 Jul 2010 21:29:24 +0000 (UTC) Subject: [Numpy-discussion] [OT] Re: debian benchmarks References: <4C2FF2AE.2090300@molden.no> Message-ID: Sun, 04 Jul 2010 04:32:14 +0200, Sturla Molden wrote: > I was just looking at Debian's benchmark. LuaJIT is now (on median) > beating Intel Fortran! Consider that Lua is a dynamic language very > similar to Python. I know it's "just a benchmark" but this has to count > as insanely impressive. I guess you're talking about shootout.alioth.debian.org tests? > Beating Intel Fortran with a dynamic scripting > language... How is that even possible? It's possible that in the cases where Lua wins, the Lua code is not completely equivalent to the Fortran code, or uses stuff such as strings for which Lua's default implementation may be efficient. At least in the mandelbrot example some things differ. I wonder if Lua there takes advantage of SIMD instructions because the author of the code has manually changed the inmost loop to process two elements at once? -- Pauli Virtanen From sturla at molden.no Sun Jul 4 17:34:49 2010 From: sturla at molden.no (Sturla Molden) Date: Sun, 04 Jul 2010 23:34:49 +0200 Subject: [Numpy-discussion] debian benchmarks In-Reply-To: References: <4C2FF2AE.2090300@molden.no> Message-ID: <4C30FE79.7010809@molden.no> Sebastian Haase skrev: > Hi Sturla, > what is this even about ... ? Do you have some references ? It does > indeed sound interesting ... but what kind of code / problem are they > actually testing here ? > > http://shootout.alioth.debian.org/u32/which-programming-languages-are-fastest.php They are benchmarking with tasks that burns the CPU, like computing and bitmapping Mandelbrot sets and processing DNA data. Sturla From sturla at molden.no Sun Jul 4 17:46:26 2010 From: sturla at molden.no (Sturla Molden) Date: Sun, 04 Jul 2010 23:46:26 +0200 Subject: [Numpy-discussion] debian benchmarks In-Reply-To: <4C30FE79.7010809@molden.no> References: <4C2FF2AE.2090300@molden.no> <4C30FE79.7010809@molden.no> Message-ID: <4C310132.8040106@molden.no> Sturla Molden skrev: > http://shootout.alioth.debian.org/u32/which-programming-languages-are-fastest.php > > They are benchmarking with tasks that burns the CPU, like computing and > bitmapping Mandelbrot sets and processing DNA data. > > It is also the kind of tasks where NumPy would help. It would be nice to get NumPy into the shootout. At least for the sake of advertising :-) From igouy2 at yahoo.com Sun Jul 4 18:51:21 2010 From: igouy2 at yahoo.com (Isaac Gouy) Date: Sun, 4 Jul 2010 22:51:21 +0000 (UTC) Subject: [Numpy-discussion] [OT] Re: debian benchmarks References: <4C2FF2AE.2090300@molden.no> Message-ID: Pauli Virtanen iki.fi> writes: -snip- > It's possible that in the cases where Lua wins, the Lua code is not > completely equivalent to the Fortran code, or uses stuff such as strings > for which Lua's default implementation may be efficient. Note - not "Lua's default implementation" but LuaJIT. > At least in the mandelbrot example some things differ. I wonder if Lua > there takes advantage of SIMD instructions because the author of the code > has manually changed the inmost loop to process two elements at once? Note - the fastest Fortran mandelbrot program is written to use OpenMP, but those u32 measurements are when the programs are forced onto one core. From ralf.gommers at googlemail.com Sun Jul 4 19:34:01 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 5 Jul 2010 07:34:01 +0800 Subject: [Numpy-discussion] ANN: scipy 0.8.0 release candidate 1 Message-ID: I'm pleased to announce the availability of the first release candidate of SciPy 0.8.0. Please try it out and report any problems on the scipy-dev mailing list. SciPy is a package of tools for science and engineering for Python. It includes modules for statistics, optimization, integration, linear algebra, Fourier transforms, signal and image processing, ODE solvers, and more. This release candidate release comes almost one and a half year after the 0.7.0 release and contains many new features, numerous bug-fixes, improved test coverage, and better documentation. Please note that SciPy 0.8.0rc1 requires Python 2.4-2.6 and NumPy 1.4.1 or greater. For more information, please see the release notes: http://sourceforge.net/projects/scipy/files/scipy/0.8.0rc1/NOTES.txt/view You can download the release from here: https://sourceforge.net/projects/scipy/ Python 2.5/2.6 binaries for Windows and OS X are available, as well as source tarballs for other platforms and the documentation in pdf form. Thank you to everybody who contributed to this release. Enjoy, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From robince at gmail.com Mon Jul 5 06:24:13 2010 From: robince at gmail.com (Robin) Date: Mon, 5 Jul 2010 11:24:13 +0100 Subject: [Numpy-discussion] numpy on windows 64 bit Message-ID: Hi, I am having some problems with win64 with all my tests failing. I installed amd64 Python from Python.org and numpy and scipy from http://www.lfd.uci.edu/~gohlke/pythonlibs/ I noticed that on windows sys.maxint is the 32bit value (2147483647 From cournape at gmail.com Mon Jul 5 07:09:13 2010 From: cournape at gmail.com (David Cournapeau) Date: Mon, 5 Jul 2010 18:09:13 +0700 Subject: [Numpy-discussion] numpy on windows 64 bit In-Reply-To: References: Message-ID: Hi Robin, On Mon, Jul 5, 2010 at 5:24 PM, Robin wrote: > Hi, > > I am having some problems with win64 with all my tests failing. Short of saying what those failures are, we can't help you, > > I installed amd64 Python from Python.org and numpy and scipy from > http://www.lfd.uci.edu/~gohlke/pythonlibs/ > > I noticed that on windows sys.maxint is the 32bit value (2147483647 This is not surprising: sys.maxint gives you the max value of a long, which is 32 bits even on 64 bits on windows. David From robince at gmail.com Mon Jul 5 07:19:20 2010 From: robince at gmail.com (Robin) Date: Mon, 5 Jul 2010 12:19:20 +0100 Subject: [Numpy-discussion] numpy on windows 64 bit In-Reply-To: References: Message-ID: On Mon, Jul 5, 2010 at 12:09 PM, David Cournapeau wrote: > > Short of saying what those failures are, we can't help you, Thanks for reply... Somehow my message got truncated - I had written more detail about the errors! >> I noticed that on windows sys.maxint is the 32bit value (2147483647 > > This is not surprising: sys.maxint gives you the max value of a long, > which is 32 bits even on 64 bits on windows. I just got to figuring this out... But it makes some problems. The main one I'm having is that I assume because of this problem array shapes are longs instead of ints (ie x.shape[0] is a long). This breaks np.random.permutation(x.shape[1]) which I use all over the place (I opened a ticket for this, #1535). Something I asked in the previous mail that got lost is what is the best cross platform way of doing this? np.random.permutation(int(x.shape[1]))? Actually that and the problems with scipy.sparse (spsolve doesn't work) cover all of the errors I'm seeing... (I detailed those in a seperate mail to the scipy list). Cheers Robin From igouy2 at yahoo.com Mon Jul 5 09:32:51 2010 From: igouy2 at yahoo.com (Isaac Gouy) Date: Mon, 5 Jul 2010 13:32:51 +0000 (UTC) Subject: [Numpy-discussion] debian benchmarks References: <4C2FF2AE.2090300@molden.no> <4C30FE79.7010809@molden.no> <4C310132.8040106@molden.no> Message-ID: Sturla Molden molden.no> writes: > It is also the kind of tasks where NumPy would help. It would be nice to > get NumPy into the shootout. At least for the sake of advertising http://shootout.alioth.debian.org/u32/program.php?test=spectralnorm&lang=python&id=2 From elcortogm at googlemail.com Mon Jul 5 10:03:56 2010 From: elcortogm at googlemail.com (Steve Schmerler) Date: Mon, 5 Jul 2010 16:03:56 +0200 Subject: [Numpy-discussion] subtle behavior when subtracting sub-arrays Message-ID: <20100705140356.GC12219@cartman.physik.tu-freiberg.de> Hi I stumbled upon some numpy behavior which I was not aware of. Say I have an array of shape (2,2,3) and want to subtract the sub-array a[...,0] of shape (2,2) from each a[...,i], i=0,1,2 . ########## ok ########## In [1]: a=arange(2*2*3).reshape(2,2,3) # Copy the array to be subtracted. In [2]: a0=a[...,0].copy() # Trivial approach. That works. In [3]: for k in range(a.shape[-1]): ...: a[...,k] -= a0 ...: ...: # OK In [4]: a Out[4]: array([[[0, 1, 2], [0, 1, 2]], [[0, 1, 2], [0, 1, 2]]]) In [5]: a=arange(2*2*3).reshape(2,2,3) # The same, with broadcasting. In [6]: a=a-a[...,0][...,None] # OK In [7]: a Out[7]: array([[[0, 1, 2], [0, 1, 2]], [[0, 1, 2], [0, 1, 2]]]) ########## not ok ########## In [8]: a=arange(2*2*3).reshape(2,2,3) In [9]: a-=a[...,0][...,None] # NOT OK In [10]: a Out[10]: array([[[ 0, 1, 2], [ 0, 4, 5]], [[ 0, 7, 8], [ 0, 10, 11]]]) In [11]: a=arange(2*2*3).reshape(2,2,3) # NOT OK, same as above In [12]: for k in range(a.shape[-1]): ...: a[...,k] -= a[...,0] ...: ...: In [14]: a Out[14]: array([[[ 0, 1, 2], [ 0, 4, 5]], [[ 0, 7, 8], [ 0, 10, 11]]]) To sum up, I find it a bit subtle that a = a - a[...,0][...,None] works as expected, while a -= a[...,0][...,None] does not. I guess the reason is that in the latter case (and the corresponding loop), a[...,0] itself is changed during the loop, while in the former case, numpy makes a copy of a[...,0] ? Is this intended? This is with numpy 1.3.0. best, Steve From pav at iki.fi Mon Jul 5 10:23:58 2010 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 5 Jul 2010 14:23:58 +0000 (UTC) Subject: [Numpy-discussion] subtle behavior when subtracting sub-arrays References: <20100705140356.GC12219@cartman.physik.tu-freiberg.de> Message-ID: Mon, 05 Jul 2010 16:03:56 +0200, Steve Schmerler wrote: [clip] > To sum up, I find it a bit subtle that > a = a - a[...,0][...,None] > works as expected, while > a -= a[...,0][...,None] > does not. > I guess the reason is that in the latter case (and the corresponding > loop), a[...,0] itself is changed during the loop, while in the former > case, numpy makes a copy of a[...,0] ? Correct. > Is this intended? Not really. It's a "feature" we're planning to get rid of eventually, once a way to do it without sacrificing performance in "safe" cases is implemented. -- Pauli Virtanen From aisaac at american.edu Mon Jul 5 10:56:59 2010 From: aisaac at american.edu (Alan G Isaac) Date: Mon, 05 Jul 2010 10:56:59 -0400 Subject: [Numpy-discussion] ANN: scipy 0.8.0 release candidate 1 In-Reply-To: References: Message-ID: <4C31F2BB.8070802@american.edu> One odd error (the directory was empty upon inspection) and one failure. Alan Isaac Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> np.__version__ '1.4.1' >>> import scipy as sp >>> sp.__version__ '0.8.0rc1' >>> sp.test() Running unit tests for scipy NumPy version 1.4.1 NumPy is installed in C:\Python26\lib\site-packages\numpy SciPy version 0.8.0rc1 SciPy is installed in C:\Python26\lib\site-packages\scipy Python version 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit (Intel)] nose version 0.11.0 C:\Python26\lib\site-packages\scipy\io\matlab\mio5.py:90: RuntimeWarning: __builtin__.file size changed, may indicate binary incompatibility from mio5_utils import VarReader5 [snip] .............................................................................err or removing c:\users\private\appdata\local\temp\tmp0orogecat_test: c:\users\private\appdata\local\temp\tmp0orogecat_test: The directory is not empty ................................................................................ .................. ====================================================================== FAIL: test_data.test_boost(,) ---------------------------------------------------------------------- Traceback (most recent call last): File "C:\Python26\lib\site-packages\nose\case.py", line 183, in runTest self.test(*self.arg) File "C:\Python26\lib\site-packages\scipy\special\tests\test_data.py", line 20 5, in _test_factory test.check(dtype=dtype) File "C:\Python26\lib\site-packages\scipy\special\tests\testutils.py", line 22 3, in check assert False, "\n".join(msg) AssertionError: Max |adiff|: 1.77636e-15 Max |rdiff|: 2.44233e-14 Bad results for the following points (in output 0): 1.0000014305114746 => 0.0016914556651292853 != 0.0016914556651292944 (rdiff 5.3842961637318929e-15) 1.000007152557373 => 0.0037822080446613874 != 0.0037822080446612951 (rdiff 2.4423306175913249e-14) 1.0000138282775879 => 0.0052589439468011612 != 0.0052589439468011014 (rdiff 1.1380223962570286e-14) 1.0000600814819336 => 0.010961831992188913 != 0.010961831992188852 (rdiff 5.5387933059412495e-15) 1.0001168251037598 => 0.015285472131830449 != 0.015285472131830425 (rdiff 1.5888373256788015e-15) 1.0003981590270996 => 0.028218171738655283 != 0.028218171738655373 (rdiff 3.1967209494023856e-15) ---------------------------------------------------------------------- Ran 4415 tests in 54.916s FAILED (KNOWNFAIL=13, SKIP=39, failures=1) From ralf.gommers at googlemail.com Mon Jul 5 11:13:45 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Mon, 5 Jul 2010 23:13:45 +0800 Subject: [Numpy-discussion] ANN: scipy 0.8.0 release candidate 1 In-Reply-To: <4C31F2BB.8070802@american.edu> References: <4C31F2BB.8070802@american.edu> Message-ID: On Mon, Jul 5, 2010 at 10:56 PM, Alan G Isaac wrote: > One odd error (the directory was empty upon inspection) and one failure. > Alan Isaac > The failure is yet another case of test precision set slightly too high. Thought we had got them all... Not sure about the matlab thing. Which version of Windows are you running? Ralf > Python 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit > (Intel)] on win32 > Type "help", "copyright", "credits" or "license" for more information. > >>> import numpy as np > >>> np.__version__ > '1.4.1' > >>> import scipy as sp > >>> sp.__version__ > '0.8.0rc1' > >>> sp.test() > Running unit tests for scipy > NumPy version 1.4.1 > NumPy is installed in C:\Python26\lib\site-packages\numpy > SciPy version 0.8.0rc1 > SciPy is installed in C:\Python26\lib\site-packages\scipy > Python version 2.6.5 (r265:79096, Mar 19 2010, 21:48:26) [MSC v.1500 32 bit > (Intel)] > nose version 0.11.0 > C:\Python26\lib\site-packages\scipy\io\matlab\mio5.py:90: RuntimeWarning: > __builtin__.file size changed, may indicate binary incompatibility > from mio5_utils import VarReader5 > [snip] > > .............................................................................err > or removing c:\users\private\appdata\local\temp\tmp0orogecat_test: > c:\users\private\appdata\local\temp\tmp0orogecat_test: The directory is not > empty > > ................................................................................ > .................. > ====================================================================== > FAIL: test_data.test_boost(,) > ---------------------------------------------------------------------- > Traceback (most recent call last): > File "C:\Python26\lib\site-packages\nose\case.py", line 183, in runTest > self.test(*self.arg) > File "C:\Python26\lib\site-packages\scipy\special\tests\test_data.py", > line 20 > 5, in _test_factory > test.check(dtype=dtype) > File "C:\Python26\lib\site-packages\scipy\special\tests\testutils.py", > line 22 > 3, in check > assert False, "\n".join(msg) > AssertionError: > Max |adiff|: 1.77636e-15 > Max |rdiff|: 2.44233e-14 > Bad results for the following points (in output 0): > 1.0000014305114746 => 0.0016914556651292853 != > 0.0016914556651292944 (rdiff 5.3842961637318929e-15) > 1.000007152557373 => 0.0037822080446613874 != > 0.0037822080446612951 (rdiff 2.4423306175913249e-14) > 1.0000138282775879 => 0.0052589439468011612 != > 0.0052589439468011014 (rdiff 1.1380223962570286e-14) > 1.0000600814819336 => 0.010961831992188913 != > 0.010961831992188852 (rdiff 5.5387933059412495e-15) > 1.0001168251037598 => 0.015285472131830449 != > 0.015285472131830425 (rdiff 1.5888373256788015e-15) > 1.0003981590270996 => 0.028218171738655283 != > 0.028218171738655373 (rdiff 3.1967209494023856e-15) > > ---------------------------------------------------------------------- > Ran 4415 tests in 54.916s > > FAILED (KNOWNFAIL=13, SKIP=39, failures=1) > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From aisaac at american.edu Mon Jul 5 12:11:54 2010 From: aisaac at american.edu (Alan G Isaac) Date: Mon, 05 Jul 2010 12:11:54 -0400 Subject: [Numpy-discussion] ANN: scipy 0.8.0 release candidate 1 In-Reply-To: References: <4C31F2BB.8070802@american.edu> Message-ID: <4C32044A.5000500@american.edu> On 7/5/2010 11:13 AM, Ralf Gommers wrote: > The failure is yet another case of test precision set slightly too high. > Thought we had got them all... Not sure about the matlab thing. Which > version of Windows are you running? Vista 64bit (with 32 bit Python 2.6). Alan From gely at usc.edu Mon Jul 5 17:31:10 2010 From: gely at usc.edu (Geoffrey Ely) Date: Mon, 5 Jul 2010 14:31:10 -0700 Subject: [Numpy-discussion] cython and f2py In-Reply-To: References: Message-ID: On Jul 2, 2010, at 1:07 PM, Matthew Brett wrote: >> Can numpy.distutils be directed to process *.pyx with Cython rather than Pyrex? > > Yes, but at the moment I believe you have to monkey-patch numpy > distutils : see the top of > > http://github.com/matthew-brett/nipy/blob/master/setup.py > > and "generate_a_pyrex_source" around line 289 of: > > http://github.com/matthew-brett/nipy/blob/master/build_helpers.py > > for how we've done it - there may be a better way - please post if you find it! Thanks Matthew. I don't know enough about distutils to quite follow everything that you have done. So I think I will get by for now by calling setup separately as in the code below. It seems to be working for installing from source, which is my main goal. Anyone know if this is a bad idea? -Geoff # Cython extension from distutils.core import setup from distutils.extension import Extension from Cython.Distutils import build_ext setup( ext_modules = [Extension( 'cext', ['cext.pyx'] )], cmdclass = {'build_ext': build_ext}, script_args = ['build_ext', '--inplace'], ) # Fortran extension from numpy.distutils.core import setup, Extension setup( ext_modules = [Extension( 'fext', ['fext.f90'] )], ) From JDM at MarchRay.net Tue Jul 6 00:31:02 2010 From: JDM at MarchRay.net (Jonathan March) Date: Mon, 5 Jul 2010 23:31:02 -0500 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes Message-ID: Fernando Perez proposed a NumPy enhancement, an ndarray with named axes, prototyped as DataArray by him, Mike Trumpis, Jonathan Taylor, Matthew Brett, Kilian Koepsell and Stefan van der Walt. At SciPy 2010 on July 1, Fernando convened a BOF (Birds of a Feather) discussion of this proposal. The notes from this BOF can be found at: http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes (linked from the Plans section of http://projects.scipy.org/numpy ) HELP NEEDED: Fernando does not have the resources to drive the project beyond this prototype, which already does what he needs. If this is to go anywhere, it needs people to do the work. Please step forward. -------------- next part -------------- An HTML attachment was scrubbed... URL: From arokem at berkeley.edu Tue Jul 6 00:52:23 2010 From: arokem at berkeley.edu (Ariel Rokem) Date: Mon, 5 Jul 2010 21:52:23 -0700 Subject: [Numpy-discussion] Ternary plots anywhere? In-Reply-To: <4C2D97D4.6090908@esrf.fr> References: <4C2D97D4.6090908@esrf.fr> Message-ID: Hi Armando, Here's something in that direction: http://nature.berkeley.edu/~chlewis/Sourcecode.html Hope that helps - Ariel On Fri, Jul 2, 2010 at 12:40 AM, "V. Armando Sol?" wrote: > Dear all, > > Perhaps this is a bit off topic for the mailing list, but this is > probably the only mailing list that is common to users of all python > plotting packages. > > I am trying to find a python implementation of ternary/triangular plots: > > http://en.wikipedia.org/wiki/Ternary_plot > > but I have been unsuccessful. Is there any on-going project around? > > Thanks for your time. > > Best regards, > > Armando > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Ariel Rokem Helen Wills Neuroscience Institute University of California, Berkeley http://argentum.ucbso.berkeley.edu/ariel -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at pytables.org Tue Jul 6 03:01:08 2010 From: faltet at pytables.org (Francesc Alted) Date: Tue, 6 Jul 2010 09:01:08 +0200 Subject: [Numpy-discussion] debian benchmarks In-Reply-To: References: <4C2FF2AE.2090300@molden.no> <4C310132.8040106@molden.no> Message-ID: <201007060901.08125.faltet@pytables.org> A Monday 05 July 2010 15:32:51 Isaac Gouy escrigu?: > Sturla Molden molden.no> writes: > > It is also the kind of tasks where NumPy would help. It would be nice to > > get NumPy into the shootout. At least for the sake of advertising > > http://shootout.alioth.debian.org/u32/program.php?test=spectralnorm&lang=py > thon&id=2 Let's join the game :-) If I run the above version on my desktop computer (Intel E8600 Duo @ 3 GHz, DDR2 @ 800 MHz memory) I get: $ time python -OO spectralnorm-numpy.py 5500 1.274224153 real 0m9.724s user 0m9.295s sys 0m0.269s which should correspond to the 12.86s in shootout (so my machine is around 30% faster). But, if I use ATLAS (3.9.25) so as to accelerate linear algebra: $ python -OO spectralnorm-numpy.py 5500 1.274224153 real 0m5.862s user 0m5.566s sys 0m0.225s Then, my profile said that building M matrix took a lot of time. After using numexpr to improve this (see attached script), I get: $ python -OO spectralnorm-numpy-numexpr.py 5500 1.274224153 real 0m3.333s user 0m3.071s sys 0m0.163s Interestingly, memory consumption also dropped from 480 MB to 255 MB. Finally, if using Intel's MKL for taking advantage of my 2 cores: $ python -OO spectralnorm-numpy-numexpr.py 5500 1.274224153 real 0m2.785s user 0m4.117s sys 0m0.139s which is a 3.5x improvement over the initial version. Also, this seems faster (around ~25%), and consumes similar memory than the fastest version written in pure C in "interesting alternatives" section: http://shootout.alioth.debian.org/u32/performance.php?test=spectralnorm#about I suppose that, provided that Matlab also have a JIT and supports Intel's MKL, it could beat this mark too. Any Matlab user would accept the challenge? -- Francesc Alted -------------- next part -------------- A non-text attachment was scrubbed... Name: spectralnorm-numpy-numexpr.py Type: text/x-python Size: 778 bytes Desc: not available URL: From sole at esrf.fr Tue Jul 6 08:41:44 2010 From: sole at esrf.fr (=?ISO-8859-1?Q?=22V=2E_Armando_Sol=E9=22?=) Date: Tue, 06 Jul 2010 14:41:44 +0200 Subject: [Numpy-discussion] Ternary plots anywhere? In-Reply-To: References: <4C2D97D4.6090908@esrf.fr> Message-ID: <4C332488.2020706@esrf.fr> Hi Ariel, Ariel Rokem wrote: > Hi Armando, > > Here's something in that direction: > > http://nature.berkeley.edu/~chlewis/Sourcecode.html > > > Hope that helps - Ariel > It really helps. It looks more complete than the only thing I had found (http://focacciaman.blogspot.com/2008/05/ternary-plotting-in-python-take-2.html) Thanks a lot, Armando From xscript at gmx.net Tue Jul 6 09:03:28 2010 From: xscript at gmx.net (=?UTF-8?B?TGx1w61z?=) Date: Tue, 06 Jul 2010 15:03:28 +0200 Subject: [Numpy-discussion] Yet another axes naming package [Was: Re: BOF notes: Fernando's proposal: NumPy ndarray with named axes] In-Reply-To: References: Message-ID: <86630svk2n.wl%lluis@ginnungagap.pc.ac.upc.edu> Jonathan March writes: > Fernando Perez proposed a NumPy enhancement, an ndarray with named axes, > prototyped as DataArray by him, Mike Trumpis, Jonathan Taylor, Matthew > Brett, Kilian Koepsell and Stefan van der Walt. I haven't had a thorough look into it, but this work as well as others listed in the 'NdarrayWithNamedAxes' wiki page are similar in spirit to some numpy extensions I've been developing. You can find the code and some initial documentation at: https://people.gso.ac.upc.edu/vilanova/doc/sciexp2 I was not planning to announce it until around 1.0, as the numpy structures are still crude and lack some operations for dynamically extending the structure both in shape and the number of fields on each record (I have some fixes that still need to be committed), but after seeing some related announcements lately, I think we all might benefit from trying to join ideas and efforts. I'll try to shortly explain with an example the part that is related to numpy (that is, the third frontend that appears on the "User Guide": 'plotter', which currently has documentation that is worse than poor). Suppose you have a set of benchmarks that have been simulated with different simulator parameters, such that you have one result file for each executed combination of the "variables": * benchmark * parameter1 * parameter2 Of course, for each execution you'll also have multiple results (what I call "valuenames"; simply fields in a record array, in fact). NOTE: scripts for such executions can be generated with the first frontend ('launchgen'). Then you can find and extract those results (package 'sciexp2.gather') and organize them into an N-dimensional 'Data' object (package 'sciexp2.data'), where the first dimension has (for example) the combinations of "parameter1-parameter2" values, and the 2nd dimension contains one element for each benchmark (method 'sciexp2.data.Data.reshape'). Now, you can index/slice the structure with integers (as always) _as well as_ with: * strings: simple indexing as well as slicing * "filters": slicing with a stepping These are translated into integers through the "metadata" (benchmark name and/or values of the 2 parameters), stored in 'sciexp2.data.Dimension' objects. For example, to get the numbers of tests where parameter1 is between 10 and 100 and just for benchmarks named 'bench1' and 'bench2': data[::"10 < parameter1 && parameter1 < 100",["bench1", "bench2"]] There is a third package extending matplotlib that I have not uploaded (nor fully developed) that is meant to use the dimension and record metadata in the Data object, such that data can be easily plotted. It extracts labels for axis and legends from metadata, and can "exand" operations. For example: * Plot one figure for each benchmark simply declaring the figure as to be "expanded" through the 'benchmark' variable. * Plot multiple lines/bars/whatever with a single plot command, like "plot such and such for each benchmark", or "plot such and such for each configuration and cluster by benchmark name". More extensive examples can be seen on the following URL, which is from a much older version that wasn't using numpy nor matplotlib, and provided a somewhat functional API (SIZE, CPREFETCH, RPREFETCH and SIMULATOR are execution parameters in these examples; fun starts at line 78): https://projects.gso.ac.upc.edu/projects/sciexp2/repository/revisions/200/entry/progs/sciexp2/tags/0.5/plotter/examples/01-spec-figures.cfg Finally, some things that have been bugging me about numppy are: * My 'Data' object is similar to a 'reacarray', such that record elements (what I call "valuenames"), can be accessed as attributes. But to avoid the cost of a recarray, I use an ndarray with records. This has the unfortunate effect that "valuenames" cannot be accessed as attributes on a record, but only when it really is a 'Data' object. Tried to add some methods to numpy.void from my python code to access record fields as attributes, but of course that's not possible. * I'd like to associate extra information to dtype, instead of manually carrying it around on every operation accessing a record field. Namely: * a description; such that it can be automatically used as axis/legend labels in matplotlib. * unit information; such that units of results can be automatically computed when operating with numpy, and later extracted when plotted with matplotlib. For this, existing packages like 'units' in PyPy could be used. * The ability for operating on records instead of separate record fields, such that i can: b = a[0] + a[1] instead of: b_f1 = a[0]["f1"] + a[1]["f1"] b_f2 = a[0]["f2"] + a[1]["f2"] whenever possible. Comments are welcome. apa! -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From silva at lma.cnrs-mrs.fr Tue Jul 6 09:39:27 2010 From: silva at lma.cnrs-mrs.fr (Fabrice Silva) Date: Tue, 06 Jul 2010 10:39:27 -0300 Subject: [Numpy-discussion] OT? Distutils extension with shared libs Message-ID: <1278423567.2570.8.camel@Portable-s2m.cnrs-mrs.fr> I know it is not directly related to numpy (even if it uses numpy.distutils), but I ask you folks how do you deal with code depending on other libs. In libIM7 projet ( https://launchpad.net/libim7 ), I wrap code from a device constructor with ctypes in order to read Particle Image Velocimetry (PIV) files stored by their software (format im7 and vc7). There is a dependency on zlib which is easy to solve in linux (installing zlib-dev package in debian). But as I want to use it also in windows (sharing the commercial dongle amongst various colleagues is a unconfortable solution), I am trying to configure the setup.py both for win and linux. But I am new to dev in windows... My questions are then: - how do you deal with dependencies in distutils? - what do you need to build against zlib (or another lib) in windows using distutils ? Thanks Fabricio From kbasye1 at jhu.edu Tue Jul 6 09:56:35 2010 From: kbasye1 at jhu.edu (Ken Basye) Date: Tue, 06 Jul 2010 09:56:35 -0400 Subject: [Numpy-discussion] reverse cumsum? Message-ID: <4C333613.9070106@jhu.edu> Hi, Is there a simple way to get a cumsum in reverse order? So far, the best I've come up with is to use fancy indexing twice to reverse things: >>> x = np.arange(10) >>> np.cumsum(x[np.arange(9, -1, -1)])[np.arange(9, -1, -1)] array([45, 45, 44, 42, 39, 35, 30, 24, 17, 9]) If it matters, I only care about the 1-d case at this point. Thanks, Ken From aisaac at american.edu Tue Jul 6 10:02:57 2010 From: aisaac at american.edu (Alan G Isaac) Date: Tue, 06 Jul 2010 10:02:57 -0400 Subject: [Numpy-discussion] reverse cumsum? In-Reply-To: <4C333613.9070106@jhu.edu> References: <4C333613.9070106@jhu.edu> Message-ID: <4C333791.5010505@american.edu> On 7/6/2010 9:56 AM, Ken Basye wrote: > Is there a simple way to get a cumsum in reverse order? >>> x = np.arange(10) >>> x[::-1].cumsum()[::-1] array([45, 45, 44, 42, 39, 35, 30, 24, 17, 9]) Is that what you want? Alan Isaac From josh.holbrook at gmail.com Tue Jul 6 10:47:59 2010 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Tue, 6 Jul 2010 06:47:59 -0800 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: References: Message-ID: I really really really want to work on this. I already forked datarray on github and did some research on What Other People Have Done ( http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any luck I'll contribute something actually useful. :) Anyways! --Josh On Mon, Jul 5, 2010 at 8:31 PM, Jonathan March wrote: > Fernando Perez proposed a NumPy enhancement, an ndarray with named axes, > prototyped as DataArray by him, Mike Trumpis, Jonathan Taylor, Matthew > Brett, Kilian Koepsell and Stefan van der Walt. > > At SciPy 2010 on July 1, Fernando convened a BOF (Birds of a Feather) > discussion of this proposal. > > The notes from this BOF can be found at: > http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes > (linked from the Plans section of http://projects.scipy.org/numpy ) > > HELP NEEDED: Fernando does not have the resources to drive the project > beyond this prototype, which already does what he needs. If this is to go > anywhere, it needs people to do the work. Please step forward. > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From jsseabold at gmail.com Tue Jul 6 11:25:11 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 6 Jul 2010 11:25:11 -0400 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: References: Message-ID: On Tue, Jul 6, 2010 at 10:47 AM, Joshua Holbrook wrote: > I really really really want to work on this. I already forked datarray > on github and did some research on What Other People Have Done ( > http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any > luck I'll contribute something actually useful. :) > > Anyways! > > --Josh > Thanks, Josh. Also note this page here that I think I already mentioned to you. The pandas and larry guys have spent a good deal of time discussing this already, especially with respect to speed and timings (ie., DatArray will most likely need to be optimized). I think Keith already has a benchmark script somewhere(?). We have already had discussions on the pystatsmodels mailing list over here , so you might want to search a bit, though I think it's time to move the discussion to the numpy list here. And of course, please make use of the mailing list for design choices, questions, and soliciting feedback, as I think this project is of interest to many people. Cheers, Skipper From sturla at molden.no Tue Jul 6 11:37:14 2010 From: sturla at molden.no (Sturla Molden) Date: Tue, 06 Jul 2010 17:37:14 +0200 Subject: [Numpy-discussion] Download Microsoft C/C++ compiler for use with Python 2.6/2.7 ASAP Message-ID: <4C334DAA.5000909@molden.no> Microsoft has withdrawn VS2008 in favor of VS2010. The express version is also unavailable for download. We can still get a VC++ 2008 compiler required to build extensions for the official Python 2.6 and 2.7 binary installers here (Windows 7 SDK for .NET 3.5 SP1): http://www.microsoft.com/downloads/details.aspx?familyid=71DEB800-C591-4F97-A900-BEA146E4FAE1&displaylang=en Download today, before it goes away! It is possible to build C and Fortran extensions for Python 2.6/2.7 on x86 using mingw. Microsoft's compiler is required for C++ or amd64 though. (Intel's C/C++ compiler requires VS2008, which has now perished.) Microsoft has now published a download for Windows 7 SDK for .NET 4. It has the VC++ 2010 compiler. It can be a matter of days before the VC++ 2008 compiler is totally unavailable. Sturla Molden From kwgoodman at gmail.com Tue Jul 6 11:40:56 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Tue, 6 Jul 2010 08:40:56 -0700 Subject: [Numpy-discussion] [ANN] la 0.4, the labeled array Message-ID: The main class of the la package is a labeled array, larry. A larry consists of data and labels. The data is stored as a NumPy array and the labels as a list of lists (one list per dimension). Alignment by label is automatic when you add (or subtract, multiply, divide) two larrys. The focus of this release was binary operations between unaligned larrys with user control of the join method (five available) and the fill method. A general binary function, la.binaryop(), was added as were the convenience functions add, subtract, multiply, divide. Supporting functions such as la.align(), which aligns two larrys, were also added. download http://pypi.python.org/pypi/la doc http://larry.sourceforge.net code http://github.com/kwgoodman/la list1 http://groups.google.ca/group/pystatsmodels list2 http://groups.google.com/group/labeled-array RELEASE NOTES New larry methods - ismissing: A bool larry with element-wise marking of missing values - take: A copy of the specified elements of a larry along an axis New functions - rand: Random samples from a uniform distribution - randn: Random samples from a Gaussian distribution - missing_marker: Return missing value marker for the given larry - ismissing: A bool Numpy array with element-wise marking of missing values - correlation: Correlation of two Numpy arrays along the specified axis - split: Split into train and test data along given axis - listmap_fill: Index map a list onto another and index of unmappable elements - listmap_fill: Cython version of listmap_fill - align: Align two larrys using one of five join methods - info: la package information such as version number and HDF5 availability - binaryop: Binary operation on two larrys with given function and join method - add: Sum of two larrys using given join and fill methods - subtract: Difference of two larrys using given join and fill methods - multiply: Multiply two larrys element-wise using given join and fill methods - divide: Divide two larrys element-wise using given join and fill methods Enhancements - listmap now has option to ignore unmappable elements instead of KeyError - listmap.pyx now has option to ignore unmappable elements instead of KeyError - larry.morph() is much faster as are methods, such as merge, that use it Breakage from la 0.3 - Development moved from launchpad to github - func.py and afunc.py renamed flarry.py and farray.py to match new flabel.py. Broke: "from la.func import stack"; Did not break: "from la import stack" - Default binary operators (+, -, ...) no longer raise an error when no labels overlap Bug fixes - #590270 Index with 1d array bug: lar[1darray,:] worked; lar[1darray] crashed From kwgoodman at gmail.com Tue Jul 6 11:55:41 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Tue, 6 Jul 2010 08:55:41 -0700 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: References: Message-ID: On Tue, Jul 6, 2010 at 7:47 AM, Joshua Holbrook wrote: > I really really really want to work on this. I already forked datarray > on github and did some research on What Other People Have Done ( > http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any > luck I'll contribute something actually useful. :) I like the figure! To do label indexing on a larry you need to use lix, so lar.lix[...] From jsseabold at gmail.com Tue Jul 6 12:13:31 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 6 Jul 2010 12:13:31 -0400 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: References: Message-ID: On Tue, Jul 6, 2010 at 11:55 AM, Keith Goodman wrote: > On Tue, Jul 6, 2010 at 7:47 AM, Joshua Holbrook wrote: >> I really really really want to work on this. I already forked datarray >> on github and did some research on What Other People Have Done ( >> http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any >> luck I'll contribute something actually useful. :) > > I like the figure! > > To do label indexing on a larry you need to use lix, so lar.lix[...] FYI, if you didn't see it, there are also usage docs in dataarray/doc that you can build with sphinx that show a lot of the thinking and examples (they spent time looking at pandas and larry). One question that was asked of Wes, that I'd propose to you as well Keith, is that if DataArray became part of NumPy, do you think you could use it to work on top of for larry? Skipper From kwgoodman at gmail.com Tue Jul 6 12:23:36 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Tue, 6 Jul 2010 09:23:36 -0700 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: References: Message-ID: On Tue, Jul 6, 2010 at 9:13 AM, Skipper Seabold wrote: > On Tue, Jul 6, 2010 at 11:55 AM, Keith Goodman wrote: >> On Tue, Jul 6, 2010 at 7:47 AM, Joshua Holbrook wrote: >>> I really really really want to work on this. I already forked datarray >>> on github and did some research on What Other People Have Done ( >>> http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any >>> luck I'll contribute something actually useful. :) >> >> I like the figure! >> >> To do label indexing on a larry you need to use lix, so lar.lix[...] > > FYI, if you didn't see it, there are also usage docs in dataarray/doc > that you can build with sphinx that show a lot of the thinking and > examples (they spent time looking at pandas and larry). > > One question that was asked of Wes, that I'd propose to you as well > Keith, is that if DataArray became part of NumPy, do you think you > could use it to work on top of for larry? This is all very exciting. I did not know that DataArray had ticks so I never took a close look at it. After reading the sphinx doc, one question I had was how firm is the decision to not allow integer ticks? I use int ticks a lot. From josh.holbrook at gmail.com Tue Jul 6 12:36:16 2010 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Tue, 6 Jul 2010 08:36:16 -0800 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: References: Message-ID: I'm kinda-sorta still getting around to building/reading the sphinx docs for datarray. <_< Like, I've gone through them before, but it was more cursory than I'd like. Honestly, I kinda let myself get caught up in trying to automate the process of getting them onto github pages. I have to admit that I didn't 100% understand the reasoning behind not allowing integer ticks (I blame jet lag--it's a nice scapegoat). I believe it originally had to do with what you meant if you typed, say, A[3:"london"]; Did you mean the underlying ndarray index 3, or the outer level "tick" 3? I think if you didn't allow integers, then you could simply wrap your "3" in a string: A["3":"London"] so it's probably not a deal-breaker, but I would imagine that using (a) separate method(s) for label-based indexing may make allowing integer-datatyped labels. Thoughts? --Josh On Tue, Jul 6, 2010 at 8:23 AM, Keith Goodman wrote: > On Tue, Jul 6, 2010 at 9:13 AM, Skipper Seabold wrote: >> On Tue, Jul 6, 2010 at 11:55 AM, Keith Goodman wrote: >>> On Tue, Jul 6, 2010 at 7:47 AM, Joshua Holbrook wrote: >>>> I really really really want to work on this. I already forked datarray >>>> on github and did some research on What Other People Have Done ( >>>> http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any >>>> luck I'll contribute something actually useful. :) >>> >>> I like the figure! >>> >>> To do label indexing on a larry you need to use lix, so lar.lix[...] >> >> FYI, if you didn't see it, there are also usage docs in dataarray/doc >> that you can build with sphinx that show a lot of the thinking and >> examples (they spent time looking at pandas and larry). >> >> One question that was asked of Wes, that I'd propose to you as well >> Keith, is that if DataArray became part of NumPy, do you think you >> could use it to work on top of for larry? > > This is all very exciting. I did not know that DataArray had ticks so > I never took a close look at it. > > After reading the sphinx doc, one question I had was how firm is the > decision to not allow integer ticks? I use int ticks a lot. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From jsseabold at gmail.com Tue Jul 6 12:42:19 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 6 Jul 2010 12:42:19 -0400 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: References: Message-ID: On Tue, Jul 6, 2010 at 12:36 PM, Joshua Holbrook wrote: > I'm kinda-sorta still getting around to building/reading the sphinx > docs for datarray. <_< Like, I've gone through them before, but it was > more cursory than I'd like. Honestly, I kinda let myself get caught up > in trying to automate the process of getting them onto github pages. > > I have to admit that I didn't 100% understand the reasoning behind not > allowing integer ticks (I blame jet lag--it's a nice scapegoat). I > believe it originally had to do with what you meant if you typed, say, > A[3:"london"]; Did you mean the underlying ndarray index 3, or the > outer level "tick" 3? I think if you didn't allow integers, then you > could simply wrap your "3" in a string: A["3":"London"] so it's > probably not a deal-breaker, but I would imagine that using (a) > separate method(s) for label-based indexing may make allowing > integer-datatyped labels. > > Thoughts? Would you mind bottom-posting/ posting in-line to make the thread easier to follow? > > --Josh > > On Tue, Jul 6, 2010 at 8:23 AM, Keith Goodman wrote: >> On Tue, Jul 6, 2010 at 9:13 AM, Skipper Seabold wrote: >>> On Tue, Jul 6, 2010 at 11:55 AM, Keith Goodman wrote: >>>> On Tue, Jul 6, 2010 at 7:47 AM, Joshua Holbrook wrote: >>>>> I really really really want to work on this. I already forked datarray >>>>> on github and did some research on What Other People Have Done ( >>>>> http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any >>>>> luck I'll contribute something actually useful. :) >>>> >>>> I like the figure! >>>> >>>> To do label indexing on a larry you need to use lix, so lar.lix[...] >>> >>> FYI, if you didn't see it, there are also usage docs in dataarray/doc >>> that you can build with sphinx that show a lot of the thinking and >>> examples (they spent time looking at pandas and larry). >>> >>> One question that was asked of Wes, that I'd propose to you as well >>> Keith, is that if DataArray became part of NumPy, do you think you >>> could use it to work on top of for larry? >> >> This is all very exciting. I did not know that DataArray had ticks so >> I never took a close look at it. >> >> After reading the sphinx doc, one question I had was how firm is the >> decision to not allow integer ticks? I use int ticks a lot. I think what Josh said is right. However, we proposed having all of the new labeled axis access pushed to a .aix (or whatever) method, so as to avoid any confusion, as the original object can be accessed just as an ndarray. I'm not sure where this leaves us vis-a-vis ints as ticks. Skipper From josh.holbrook at gmail.com Tue Jul 6 12:52:31 2010 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Tue, 6 Jul 2010 08:52:31 -0800 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: References: Message-ID: On Tue, Jul 6, 2010 at 8:42 AM, Skipper Seabold wrote: > On Tue, Jul 6, 2010 at 12:36 PM, Joshua Holbrook > wrote: >> I'm kinda-sorta still getting around to building/reading the sphinx >> docs for datarray. <_< Like, I've gone through them before, but it was >> more cursory than I'd like. Honestly, I kinda let myself get caught up >> in trying to automate the process of getting them onto github pages. >> >> I have to admit that I didn't 100% understand the reasoning behind not >> allowing integer ticks (I blame jet lag--it's a nice scapegoat). I >> believe it originally had to do with what you meant if you typed, say, >> A[3:"london"]; Did you mean the underlying ndarray index 3, or the >> outer level "tick" 3? I think if you didn't allow integers, then you >> could simply wrap your "3" in a string: A["3":"London"] so it's >> probably not a deal-breaker, but I would imagine that using (a) >> separate method(s) for label-based indexing may make allowing >> integer-datatyped labels. >> >> Thoughts? > > Would you mind bottom-posting/ posting in-line to make the thread > easier to follow? > >> >> --Josh >> >> On Tue, Jul 6, 2010 at 8:23 AM, Keith Goodman wrote: >>> On Tue, Jul 6, 2010 at 9:13 AM, Skipper Seabold wrote: >>>> On Tue, Jul 6, 2010 at 11:55 AM, Keith Goodman wrote: >>>>> On Tue, Jul 6, 2010 at 7:47 AM, Joshua Holbrook wrote: >>>>>> I really really really want to work on this. I already forked datarray >>>>>> on github and did some research on What Other People Have Done ( >>>>>> http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any >>>>>> luck I'll contribute something actually useful. :) >>>>> >>>>> I like the figure! >>>>> >>>>> To do label indexing on a larry you need to use lix, so lar.lix[...] >>>> >>>> FYI, if you didn't see it, there are also usage docs in dataarray/doc >>>> that you can build with sphinx that show a lot of the thinking and >>>> examples (they spent time looking at pandas and larry). >>>> >>>> One question that was asked of Wes, that I'd propose to you as well >>>> Keith, is that if DataArray became part of NumPy, do you think you >>>> could use it to work on top of for larry? >>> >>> This is all very exciting. I did not know that DataArray had ticks so >>> I never took a close look at it. >>> >>> After reading the sphinx doc, one question I had was how firm is the >>> decision to not allow integer ticks? I use int ticks a lot. > > I think what Josh said is right. ?However, we proposed having all of > the new labeled axis access pushed to a .aix (or whatever) method, so > as to avoid any confusion, as the original object can be accessed just > as an ndarray. ?I'm not sure where this leaves us vis-a-vis ints as > ticks. > > Skipper > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Sorry re: posting at-top. I guess habit surpassed observation of community norms for a second there. Whups! My opinion on the matter is that, as a matter of "purity," labels should all have the string datatype. That said, I'd imagine that passing an int as an argument would be fine, due to python's loosey-goosey attitude towards datatypes. :) That, or, y'know, str(myint). --Josh From kwgoodman at gmail.com Tue Jul 6 12:56:47 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Tue, 6 Jul 2010 09:56:47 -0700 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: References: Message-ID: On Tue, Jul 6, 2010 at 9:52 AM, Joshua Holbrook wrote: > On Tue, Jul 6, 2010 at 8:42 AM, Skipper Seabold wrote: >> On Tue, Jul 6, 2010 at 12:36 PM, Joshua Holbrook >> wrote: >>> I'm kinda-sorta still getting around to building/reading the sphinx >>> docs for datarray. <_< Like, I've gone through them before, but it was >>> more cursory than I'd like. Honestly, I kinda let myself get caught up >>> in trying to automate the process of getting them onto github pages. >>> >>> I have to admit that I didn't 100% understand the reasoning behind not >>> allowing integer ticks (I blame jet lag--it's a nice scapegoat). I >>> believe it originally had to do with what you meant if you typed, say, >>> A[3:"london"]; Did you mean the underlying ndarray index 3, or the >>> outer level "tick" 3? I think if you didn't allow integers, then you >>> could simply wrap your "3" in a string: A["3":"London"] so it's >>> probably not a deal-breaker, but I would imagine that using (a) >>> separate method(s) for label-based indexing may make allowing >>> integer-datatyped labels. >>> >>> Thoughts? >> >> Would you mind bottom-posting/ posting in-line to make the thread >> easier to follow? >> >>> >>> --Josh >>> >>> On Tue, Jul 6, 2010 at 8:23 AM, Keith Goodman wrote: >>>> On Tue, Jul 6, 2010 at 9:13 AM, Skipper Seabold wrote: >>>>> On Tue, Jul 6, 2010 at 11:55 AM, Keith Goodman wrote: >>>>>> On Tue, Jul 6, 2010 at 7:47 AM, Joshua Holbrook wrote: >>>>>>> I really really really want to work on this. I already forked datarray >>>>>>> on github and did some research on What Other People Have Done ( >>>>>>> http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any >>>>>>> luck I'll contribute something actually useful. :) >>>>>> >>>>>> I like the figure! >>>>>> >>>>>> To do label indexing on a larry you need to use lix, so lar.lix[...] >>>>> >>>>> FYI, if you didn't see it, there are also usage docs in dataarray/doc >>>>> that you can build with sphinx that show a lot of the thinking and >>>>> examples (they spent time looking at pandas and larry). >>>>> >>>>> One question that was asked of Wes, that I'd propose to you as well >>>>> Keith, is that if DataArray became part of NumPy, do you think you >>>>> could use it to work on top of for larry? >>>> >>>> This is all very exciting. I did not know that DataArray had ticks so >>>> I never took a close look at it. >>>> >>>> After reading the sphinx doc, one question I had was how firm is the >>>> decision to not allow integer ticks? I use int ticks a lot. >> >> I think what Josh said is right. ?However, we proposed having all of >> the new labeled axis access pushed to a .aix (or whatever) method, so >> as to avoid any confusion, as the original object can be accessed just >> as an ndarray. ?I'm not sure where this leaves us vis-a-vis ints as >> ticks. >> >> Skipper >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > Sorry re: posting at-top. I guess habit surpassed observation of > community norms for a second there. Whups! > > My opinion on the matter is that, as a matter of "purity," labels > should all have the string datatype. That said, I'd imagine that > passing an int as an argument would be fine, due to python's > loosey-goosey attitude towards datatypes. :) That, or, y'know, > str(myint). Ideally (for me), the only requirement for ticks would be hashable and unique along any one axis. So, for example, datetime.date() could be a tick but a list could not be a tick (not hashable). From wesmckinn at gmail.com Tue Jul 6 13:43:38 2010 From: wesmckinn at gmail.com (Wes McKinney) Date: Tue, 6 Jul 2010 13:43:38 -0400 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: References: Message-ID: On Tue, Jul 6, 2010 at 12:56 PM, Keith Goodman wrote: > On Tue, Jul 6, 2010 at 9:52 AM, Joshua Holbrook wrote: >> On Tue, Jul 6, 2010 at 8:42 AM, Skipper Seabold wrote: >>> On Tue, Jul 6, 2010 at 12:36 PM, Joshua Holbrook >>> wrote: >>>> I'm kinda-sorta still getting around to building/reading the sphinx >>>> docs for datarray. <_< Like, I've gone through them before, but it was >>>> more cursory than I'd like. Honestly, I kinda let myself get caught up >>>> in trying to automate the process of getting them onto github pages. >>>> >>>> I have to admit that I didn't 100% understand the reasoning behind not >>>> allowing integer ticks (I blame jet lag--it's a nice scapegoat). I >>>> believe it originally had to do with what you meant if you typed, say, >>>> A[3:"london"]; Did you mean the underlying ndarray index 3, or the >>>> outer level "tick" 3? I think if you didn't allow integers, then you >>>> could simply wrap your "3" in a string: A["3":"London"] so it's >>>> probably not a deal-breaker, but I would imagine that using (a) >>>> separate method(s) for label-based indexing may make allowing >>>> integer-datatyped labels. >>>> >>>> Thoughts? >>> >>> Would you mind bottom-posting/ posting in-line to make the thread >>> easier to follow? >>> >>>> >>>> --Josh >>>> >>>> On Tue, Jul 6, 2010 at 8:23 AM, Keith Goodman wrote: >>>>> On Tue, Jul 6, 2010 at 9:13 AM, Skipper Seabold wrote: >>>>>> On Tue, Jul 6, 2010 at 11:55 AM, Keith Goodman wrote: >>>>>>> On Tue, Jul 6, 2010 at 7:47 AM, Joshua Holbrook wrote: >>>>>>>> I really really really want to work on this. I already forked datarray >>>>>>>> on github and did some research on What Other People Have Done ( >>>>>>>> http://jesusabdullah.github.com/2010/07/02/datarray.html ). With any >>>>>>>> luck I'll contribute something actually useful. :) >>>>>>> >>>>>>> I like the figure! >>>>>>> >>>>>>> To do label indexing on a larry you need to use lix, so lar.lix[...] >>>>>> >>>>>> FYI, if you didn't see it, there are also usage docs in dataarray/doc >>>>>> that you can build with sphinx that show a lot of the thinking and >>>>>> examples (they spent time looking at pandas and larry). >>>>>> >>>>>> One question that was asked of Wes, that I'd propose to you as well >>>>>> Keith, is that if DataArray became part of NumPy, do you think you >>>>>> could use it to work on top of for larry? >>>>> >>>>> This is all very exciting. I did not know that DataArray had ticks so >>>>> I never took a close look at it. >>>>> >>>>> After reading the sphinx doc, one question I had was how firm is the >>>>> decision to not allow integer ticks? I use int ticks a lot. >>> >>> I think what Josh said is right. ?However, we proposed having all of >>> the new labeled axis access pushed to a .aix (or whatever) method, so >>> as to avoid any confusion, as the original object can be accessed just >>> as an ndarray. ?I'm not sure where this leaves us vis-a-vis ints as >>> ticks. >>> >>> Skipper >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> >> Sorry re: posting at-top. I guess habit surpassed observation of >> community norms for a second there. Whups! >> >> My opinion on the matter is that, as a matter of "purity," labels >> should all have the string datatype. That said, I'd imagine that >> passing an int as an argument would be fine, due to python's >> loosey-goosey attitude towards datatypes. :) That, or, y'know, >> str(myint). > > Ideally (for me), the only requirement for ticks would be hashable and > unique along any one axis. So, for example, datetime.date() could be a > tick but a list could not be a tick (not hashable). > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Gmail needs to really get its act and enable bottom-posting by default. Definitely an annoyance There are many issues at play here so I wanted to give some of my thoughts re: building pandas, larry, etc. on top of DataArray (or whatever it is that makes its way into NumPy), can put this on the wiki, too: 1. Giving semantic information to axes (not ticks, though) I think this is very useful but wouldn't be immediately useful in pandas except perhaps moving axis names elsewhere (which are currently a part of the data-structures and always have the same name). I wouldn't be immediately comfortable say, making a pandas DataFrame a subclass of DataArray and making them implicitly interoperable. Going back and forth e.g. from DataArray and DataFrame *should* be an easy operation-- you could imagine using DataArray to serialize both pandas and larry objects for example! 2. Container for axis metadata (Axis object in datarray, Index in pandas, ...) I would be more than happy to offload the "ordered set" data structure onto NumPy. In pandas, Index is that container-- it's an ndarray subclass with a handful of methods and a reverse index (e.g. if you have ['d', 'b', 'a' 'c'] you have a dict somewhere with {'d' : 0, 'b' : 1, ...} for O(1) lookups). I'm producing the reverse index in Cython at object creation time-- Keith recently added the same thing (Cython) to larry to get a speed boost, but he does it only when needed. It's also nice to have some other convenience methods in this object, like set operations. In pandas, there is also the DateRange class (subclass of Index, so recognized as valid by the data structures) which has a sequence of Python datetime objects and frequency information. IMHO this should all go inside NumPy and leverage the datetime64 dtype. With date ranges you can also special case set operations (e.g. union or intersection) when the ranges overlap (in practice this can yield a huge performance boost)! I like using ndarray for the ticks because slicing produces views, etc. (but in the current implementation in pandas slicing requires constructing a new reverse index from scratch). As for the acceptable type for ticks-- I am with Keith in requiring only hashability. So to support integer ticks for completeness DataArray probably needs a separate "access by tick" interface (already mentioned above I believe). I saw criticism on the datarray docs about pandas having ambiguous behavior for integer ticks-- my view is that you have ticks so you don't have to think about "where" things are in the data structure ;) But again datarray is a different story-- ticks not required! 3. Data alignment routines I think the fundamental data alignment routines in larry and pandas belong in NumPy. We're both creating an integer vector in Cython and passing that to ndarray.take. There is also the issue of missing data handling. We should spend a little time and decide on the API for these functions that will work for both libraries and probably write C implementations. Here's the Cython code I'm referring to (which isn't all that pretty, and makes assumptions guaranteed by other parts of pandas): http://code.google.com/p/pandas/source/browse/trunk/pandas/lib/src/reindex.pyx 4. Group-by routines Not necessarily related to DataArray but highly relevant to statistical data structures (Skipper made a comment about this at the BoF). Having core group by routines (see Travis's NEP: http://projects.scipy.org/numpy/browser/trunk/doc/neps/groupby_additions.rst which is not rendering correctly for me, download the RST) makes a lot of sense rather than have all of us implement our own things. Group-by basically comes down to solving two problems: assigning chunks of data to groups (using some kind of mapping or function), and doing something with those group assignments (like aggregating or transforming-- think like group means or standardizing / zscoring within group). Using Python dicts to store the group assignments computed by arbitrary functions (the way pandas does it now) is often suboptimal if you want to, say, group one ndarray by another-- I think in most cases we can do a lot better, but will be important to have a very "general" group-by where performance might be a little slower. ---- In any case-- if we can trim down the amount of duplicated logic between the various libraries, I think that would be a big win overall. I'm not sure if having "one data object to rule them all" is something we can achieve for the moment. pandas has been developed decidedly for statistics, econometrics, and finance which has led to some slightly domain-specific design choices. I am fairly certain there are a large number of users out there for whom these sort of tools could be hugely useful in making the switch to Python from R, Matlab, Java, C++, etc. - Wes From cgohlke at uci.edu Tue Jul 6 13:57:04 2010 From: cgohlke at uci.edu (Christoph Gohlke) Date: Tue, 06 Jul 2010 10:57:04 -0700 Subject: [Numpy-discussion] numpy on windows 64 bit In-Reply-To: References: Message-ID: <4C336E70.1060105@uci.edu> On 7/5/2010 4:19 AM, Robin wrote: > On Mon, Jul 5, 2010 at 12:09 PM, David Cournapeau wrote: >> >> Short of saying what those failures are, we can't help you, > > Thanks for reply... Somehow my message got truncated - I had written > more detail about the errors! > >>> I noticed that on windows sys.maxint is the 32bit value (2147483647 >> >> This is not surprising: sys.maxint gives you the max value of a long, >> which is 32 bits even on 64 bits on windows. > > I just got to figuring this out... But it makes some problems. The > main one I'm having is that I assume because of this problem array > shapes are longs instead of ints (ie x.shape[0] is a long). > > This breaks np.random.permutation(x.shape[1]) which I use all over the > place (I opened a ticket for this, #1535). Something I asked in the > previous mail that got lost is what is the best cross platform way of > doing this? > np.random.permutation(int(x.shape[1]))? I proposed a fix at http://projects.scipy.org/numpy/ticket/1535. Does it work for you? > > Actually that and the problems with scipy.sparse (spsolve doesn't > work) cover all of the errors I'm seeing... (I detailed those in a > seperate mail to the scipy list). > -- Christoph From gael.varoquaux at normalesup.org Tue Jul 6 14:09:33 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 6 Jul 2010 20:09:33 +0200 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: References: Message-ID: <20100706180933.GB8523@phare.normalesup.org> Just to give a data point, my research group and I would be very excited at the idea of having Fernando's data arrays in Numpy. We can't offer to maintain it, because we are already fairly involved in machine learning and neuroimaging specific code, but we would be able to rely on it more in our packages, and we love it! Ga?l On Mon, Jul 05, 2010 at 11:31:02PM -0500, Jonathan March wrote: > Fernando Perez proposed a NumPy enhancement, an ndarray with named axes, > prototyped as DataArray by him, Mike Trumpis, Jonathan Taylor, Matthew > Brett, Kilian Koepsell and Stefan van der Walt. > At SciPy 2010 on July 1, Fernando convened a BOF (Birds of a Feather) > discussion of this proposal. > The notes from this BOF can be found at: > [1]http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes > (linked from the Plans section of [2]http://projects.scipy.org/numpy ) > HELP NEEDED: Fernando does not have the resources to drive the project > beyond this prototype, which already does what he needs. If this is to go > anywhere, it needs people to do the work. Please step forward. > References > Visible links > 1. http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes > 2. http://projects.scipy.org/numpy > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -- Gael Varoquaux Research Fellow, INRIA Laboratoire de Neuro-Imagerie Assistee par Ordinateur NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France Phone: ++ 33-1-69-08-78-35 Mobile: ++ 33-6-28-25-64-62 http://gael-varoquaux.info From robince at gmail.com Tue Jul 6 14:41:28 2010 From: robince at gmail.com (Robin) Date: Tue, 6 Jul 2010 19:41:28 +0100 Subject: [Numpy-discussion] numpy on windows 64 bit In-Reply-To: <4C336E70.1060105@uci.edu> References: <4C336E70.1060105@uci.edu> Message-ID: On Tue, Jul 6, 2010 at 6:57 PM, Christoph Gohlke wrote: > > I proposed a fix at http://projects.scipy.org/numpy/ticket/1535. Does it > work for you? Thanks very much... that looks great. Since it works with long's it fixes my problems (I think it will also fix a couple of the failing scipy tests) Cheers Robin From xscript at gmx.net Tue Jul 6 14:51:20 2010 From: xscript at gmx.net (=?UTF-8?B?TGx1w61z?=) Date: Tue, 06 Jul 2010 20:51:20 +0200 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: References: Message-ID: <864ogcv3yv.wl%lluis@ginnungagap.pc.ac.upc.edu> > My opinion on the matter is that, as a matter of "purity," labels > should all have the string datatype. That said, I'd imagine that > passing an int as an argument would be fine, due to python's > loosey-goosey attitude towards datatypes. :) That, or, y'know, > str(myint). That's kind of what I went for in sciexp2. Integers are maintained to index the structure, and strings are internally translated into the real integers or lists of them (e.g., a filter, see below). All translation into the real integers happens in the Dimension object [1] (an Axis in datarray), which supports all the indexing methods in numpy (slices, iterables, etc), plus what I call filters (i.e., slicing by tick values) [2] If you download the code, you can see the documentation for the user API in a nicer way with './sciexp2/trunk/plotter -d'. After looking into [3], sciexp2 seems conceptually equivalent to datarray. The main difference I see is that sciexp2 supports "compound" ticks, in the sense that, for me, ticks are formed by a sequence of variables meaningful to the user, which are merged into a single unique string following a user-provided expression: Dimension.expression <- "@PARAM1 at -@PARAM2@" Dimension.contents <- ["1-z1", "1-z2", "2-z1", "2-z5", ...] So that the user is able not only to index through tick strings (e.g., data["v1-z1"]), but also to arbitrarily slice the structure according to each of the separate values of each variable (e.g., data[::"PARAM1 <= 3 && PARAM2 == 'z6'"] or any other boolean expression involving any or both of PARAM1 and PARAM2). The other difference is that the Data object in sciexp2 also uses record arrays (but not recarrays, as the documentation talked about extra costs). The idea is that record fields contain the results of a single experiment, and experiment parameters (one "variable" for each experiment parameter) are arbitrarily mapped into axis/dimensions (thus, the "values" of experiment parameters form the ticks/indexes of that dimension). This allows the user to store heterogeneous results on a single 'Data' object (e.g., mix integers, floats, strings, dates, etc). As a final note, and as there is no formal documentation for the plotter part (only the API documentation), you can quickly test it with './sciexp2/plotter -i' (opens an IPython shell with everything imported). Then, suppose you have various csv files, with a header line describing each column, and path names are 'foo/bar-baz.results': find_files("@FOO@/@BAR at -@BAZ at .results") extract(default_source, "csv", count="LINE") # build a Data with 1 dimension data = from_rawdata(default_rawdata) print data.ndim, data.dim().expression print list(data.dim()) # reshape to multiple dimensions rdata = data.reshape(["FOO"], ["BAR", "BAZ"], ["LINE"]) print rdata.ndim, rdata.dim(0).expression, rdata.dim(1).expression print list(rdata.dim(0)) print list(rdata.dim(1)) # now you can start playing with accesses to ticks (as returned by previous # prints), lists of those, slices or filters (e.g., rdata[::"FOO == # 'foo1'"]) # you can also access record fields by means of 'data.name' # if you put this in a file, simply execute './sciexp2/plotter -f file', # and at the end: shell() apa! Footnotes: [1] https://projects.gso.ac.upc.edu/projects/sciexp2/repository/entry/trunk/sciexp2/data/__init__.py#L762 [2] https://projects.gso.ac.upc.edu/projects/sciexp2/repository/entry/trunk/sciexp2/data/__init__.py#L561 [3] http://jesusabdullah.github.com/2010/07/02/datarray.html -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From jjstickel at vcn.com Tue Jul 6 15:25:12 2010 From: jjstickel at vcn.com (Jonathan Stickel) Date: Tue, 06 Jul 2010 13:25:12 -0600 Subject: [Numpy-discussion] reverse cumsum? In-Reply-To: References: Message-ID: <4C338318.7070106@vcn.com> On 7/6/10 10:42 , numpy-discussion-request at scipy.org wrote: > Date: Tue, 06 Jul 2010 10:02:57 -0400 > From: Alan G Isaac > Subject: Re: [Numpy-discussion] reverse cumsum? > To: Discussion of Numerical Python > Message-ID:<4C333791.5010505 at american.edu> > Content-Type: text/plain; charset=ISO-8859-1; format=flowed > > On 7/6/2010 9:56 AM, Ken Basye wrote: >> > Is there a simple way to get a cumsum in reverse order? >>>> >>> x = np.arange(10) >>>> >>> x[::-1].cumsum()[::-1] > array([45, 45, 44, 42, 39, 35, 30, 24, 17, 9]) > > Is that what you want? > > Alan Isaac Or, you can do: In [1]: a = np.arange(10) In [5]: np.sum(a) - np.cumsum(a) Out[5]: array([45, 44, 42, 39, 35, 30, 24, 17, 9, 0]) Jonathan From josh.holbrook at gmail.com Tue Jul 6 15:37:02 2010 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Tue, 6 Jul 2010 11:37:02 -0800 Subject: [Numpy-discussion] reverse cumsum? In-Reply-To: <4C338318.7070106@vcn.com> References: <4C338318.7070106@vcn.com> Message-ID: On Tue, Jul 6, 2010 at 11:25 AM, Jonathan Stickel wrote: > On 7/6/10 10:42 , numpy-discussion-request at scipy.org wrote: >> Date: Tue, 06 Jul 2010 10:02:57 -0400 >> From: Alan G Isaac >> Subject: Re: [Numpy-discussion] reverse cumsum? >> To: Discussion of Numerical Python >> Message-ID:<4C333791.5010505 at american.edu> >> Content-Type: text/plain; charset=ISO-8859-1; format=flowed >> >> On 7/6/2010 9:56 AM, Ken Basye wrote: >>> > ?Is there a simple way to get a cumsum in reverse order? >>>>> >>> ?x = np.arange(10) >>>>> >>> ?x[::-1].cumsum()[::-1] >> array([45, 45, 44, 42, 39, 35, 30, 24, 17, ?9]) >> >> Is that what you want? >> >> Alan Isaac > > Or, you can do: > > In [1]: a = np.arange(10) > In [5]: np.sum(a) - np.cumsum(a) > Out[5]: array([45, 44, 42, 39, 35, 30, 24, 17, ?9, ?0]) > > Jonathan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Alternately: In [11]: reversed(np.arange(10).cumsum()) Out[11]: In [12]: [i for i in _] Out[12]: [45, 36, 28, 21, 15, 10, 6, 3, 1, 0] reversed(x) is, as you can see, not an array but an iterator which will return the cumulative sums in reverse. If you need an array specifically, you can convert it fairly easily, though it does need two type conversions: In [10]: np.array(list(reversed(np.arange(10).cumsum()))) Out[10]: array([45, 36, 28, 21, 15, 10, 6, 3, 1, 0]) or, if you like list comps: In [14]: np.array([i for i in reversed(np.arange(10).cumsum())]) Out[14]: array([45, 36, 28, 21, 15, 10, 6, 3, 1, 0]) I think Ken's suggestion may be the best so far, but as perl users say, timtowtdi. --Josh From aisaac at american.edu Tue Jul 6 18:23:50 2010 From: aisaac at american.edu (Alan G Isaac) Date: Tue, 06 Jul 2010 18:23:50 -0400 Subject: [Numpy-discussion] reverse cumsum? In-Reply-To: References: <4C338318.7070106@vcn.com> Message-ID: <4C33ACF6.8030709@american.edu> On 7/6/2010 3:37 PM, Joshua Holbrook wrote: > In [10]: np.array(list(reversed(np.arange(10).cumsum()))) > Out[10]: array([45, 36, 28, 21, 15, 10, 6, 3, 1, 0]) > That might appear to match the subject line but does not match the OP's example output, which was [45, 45, 44, 42, 39, 35, 30, 24, 17, 9]. You are giving the equivalent of x.cumsum()[::-1], while the OP asked for the equivalent of x[::-1].cumsum()[::-1]. fwiw, Alan Isaac From silva at lma.cnrs-mrs.fr Tue Jul 6 18:34:04 2010 From: silva at lma.cnrs-mrs.fr (silva at lma.cnrs-mrs.fr) Date: Wed, 07 Jul 2010 00:34:04 +0200 Subject: [Numpy-discussion] OT? Distutils extension with shared libs In-Reply-To: <1278423567.2570.8.camel@Portable-s2m.cnrs-mrs.fr> References: <1278423567.2570.8.camel@Portable-s2m.cnrs-mrs.fr> Message-ID: <20100707003404.13404t0uozf9jgjk@www.lma.cnrs-mrs.fr> More precisely, the constructor provides C source code to access data and metadata with files ReadIM{7,x}.{c.h}. I wrote a tiny ctypes wrappers in order to have a object-oriented class in python that handling reading the data files written by the constructor software. One issue is that ReadIM7.h includes zlib.h. On linux, it is easy to install zlib-dev package. All is simple, as it is installed in standard repertory. On windows of course no standards. How would you then proceed? Do I have to distribute zlib.h, and also zconf.h, zlib.lib, libz.a and libz.dll.a needed to get it work ? Or a simpler solution is to only distribute binaries on this platform ? Other issue: with all these files in ./src, I have the following configuration: ext = Extension('_im7', sources=['src/ReadIM7.cpp', 'src/ReadIMX.cpp'], include_dirs=['src'],\ libraries=['zlib',],\ library_dirs=['src'],\ define_macros=[('_WIN32', None), ('BUILD_DLL', None)], \ extra_compile_args=['-ansi', '-pedantic', '-g', '-v']) it builds a _im7.pyd file that ctypes is not able to load as it expects a _im7.dll file with ctypes.cdll.loadlibrary('_im7')... Is their a way to make distutils build a .dll file that could be loaded by ctypes ? or using distutils in such a way is not the right way to go ? on linux, there is no trouble, distutils makes a _im7.so file that ctypes can easily load... Thanks ---------------------------------------------------------------- This message was sent using IMP, the Internet Messaging Program. From josh.holbrook at gmail.com Tue Jul 6 18:54:19 2010 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Tue, 6 Jul 2010 14:54:19 -0800 Subject: [Numpy-discussion] reverse cumsum? In-Reply-To: <4C33ACF6.8030709@american.edu> References: <4C338318.7070106@vcn.com> <4C33ACF6.8030709@american.edu> Message-ID: On Tue, Jul 6, 2010 at 2:23 PM, Alan G Isaac wrote: > On 7/6/2010 3:37 PM, Joshua Holbrook wrote: >> In [10]: np.array(list(reversed(np.arange(10).cumsum()))) >> Out[10]: array([45, 36, 28, 21, 15, 10, ?6, ?3, ?1, ?0]) >> > > > That might appear to match the subject line > but does not match the OP's example output, > which was [45, 45, 44, 42, 39, 35, 30, 24, 17, ?9]. > > You are giving the equivalent of x.cumsum()[::-1], > while the OP asked for the equivalent of x[::-1].cumsum()[::-1]. > > fwiw, > Alan Isaac > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Oh snap. Good call--idk what I was thinking. Tired, I guess. :) In that case, if you were going to use reversed() things would get a bit nastier: In [13]: np.array(list(reversed(np.array([9-i for i in xrange(10)]).cumsum()))) Out[13]: array([45, 45, 44, 42, 39, 35, 30, 24, 17, 9]) ...which is gross enough that this approach is probably worth abandoning. > I think Ken's suggestion may be the best so far... I meant to say Alan's suggestion, i.e. x[::-1].cumsum()[::-1]. From cournape at gmail.com Tue Jul 6 20:44:12 2010 From: cournape at gmail.com (David Cournapeau) Date: Wed, 7 Jul 2010 02:44:12 +0200 Subject: [Numpy-discussion] OT? Distutils extension with shared libs In-Reply-To: <20100707003404.13404t0uozf9jgjk@www.lma.cnrs-mrs.fr> References: <1278423567.2570.8.camel@Portable-s2m.cnrs-mrs.fr> <20100707003404.13404t0uozf9jgjk@www.lma.cnrs-mrs.fr> Message-ID: On Wed, Jul 7, 2010 at 12:34 AM, wrote: > More precisely, the constructor provides C source code to access data > and metadata with files ReadIM{7,x}.{c.h}. > I wrote a tiny ctypes wrappers in order to have a object-oriented > class in python that handling reading the data files written by the > constructor software. > > One issue is that ReadIM7.h includes zlib.h. On linux, it is easy to > install zlib-dev package. All is simple, as it is installed in > standard repertory. On windows of course no standards. How would you > then proceed? Do I have to distribute zlib.h, and also zconf.h, > zlib.lib, libz.a and libz.dll.a needed to get it work ? Three solutions: - ask your users to build the software and install zlib by themselves. On windows, I am afraid it means you concretely limit your userbase to practically 0. - build zlib as part of the build process, and keep zlib internally. - include a copy of the zlib library (the binary) in the tarball. > > Other issue: with all these files in ./src, I have the following > configuration: > ? ? ext = Extension('_im7', > ? ? ? ? sources=['src/ReadIM7.cpp', 'src/ReadIMX.cpp'], > ? ? ? ? include_dirs=['src'],\ > ? ? ? ? libraries=['zlib',],\ > ? ? ? ? library_dirs=['src'],\ > ? ? ? ? define_macros=[('_WIN32', None), ('BUILD_DLL', None)], \ > ? ? ? ? extra_compile_args=['-ansi', '-pedantic', '-g', '-v']) > > it builds a _im7.pyd file that ctypes is not able to load as it > expects a _im7.dll file with > ctypes.cdll.loadlibrary('_im7')... You cannot build a library loadable with ctypes with distutils nor numpy.distutils. You need to implement it in distutils, or copy the code from one of the project which implemented it cheers, David From d.l.goldsmith at gmail.com Tue Jul 6 22:03:37 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Tue, 6 Jul 2010 19:03:37 -0700 Subject: [Numpy-discussion] effect of shape=None (the default) in format.open_memmap Message-ID: Hi, I'm trying to wrap my brain around the affect of leaving shape=None (the default) in format.open_memmap. First, I get that it's only even seen if the file is opened in write mode. Then, write_array_header_1_0 is called with dict d as second parameter, w/, as near as I can see, d['shape'] still = None. write_array_header_1_0 is a little opaque to me, but as near as I can tell, shape = None is then written as is to the file's header. Here's where things get a little worrisome/confusing. Looking ahead, the next function in the source is read_array_header_1_0, in which we see the following comment: "...The keys are strings 'shape' : tuple of int..." Then later in the code we see: # Sanity-check the values. if (not isinstance(d['shape'], tuple) or not numpy.all([isinstance(x, (int,long)) for x in d['shape']])): msg = "shape is not valid: %r" raise ValueError(msg % (d['shape'],)) Unless I'm missing something, if shape=None, this ValueError will be raised, correct? So it appears as if the default value for shape in the original function, open_memmap, will produce a header that would ultimately result in a "defective" file, at least as far as read_array_header_1_0 is concerned. A) Am I missing something (e.g., a numpy-wide default substitution for shape if it happens to equal None) that results in this conclusion being incorrect? B) If I am correct, "feature" or "bug"? DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From d.l.goldsmith at gmail.com Wed Jul 7 01:33:40 2010 From: d.l.goldsmith at gmail.com (David Goldsmith) Date: Tue, 6 Jul 2010 22:33:40 -0700 Subject: [Numpy-discussion] finfo.eps v. finfo.epsneg Message-ID: >>> np.finfo('float64').eps # returns a scalar 2.2204460492503131e-16 >>> np.finfo('float64').epsneg # returns an array array(1.1102230246251565e-16) Bug or feature? DG -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Jul 7 09:25:28 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 7 Jul 2010 07:25:28 -0600 Subject: [Numpy-discussion] finfo.eps v. finfo.epsneg In-Reply-To: References: Message-ID: On Tue, Jul 6, 2010 at 11:33 PM, David Goldsmith wrote: > >>> np.finfo('float64').eps # returns a scalar > 2.2204460492503131e-16 > >>> np.finfo('float64').epsneg # returns an array > array(1.1102230246251565e-16) > > Bug or feature? > > Looks like a bug. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From silva at lma.cnrs-mrs.fr Wed Jul 7 09:23:24 2010 From: silva at lma.cnrs-mrs.fr (Fabrice Silva) Date: Wed, 07 Jul 2010 10:23:24 -0300 Subject: [Numpy-discussion] OT? Distutils extension with shared libs In-Reply-To: References: <1278423567.2570.8.camel@Portable-s2m.cnrs-mrs.fr> <20100707003404.13404t0uozf9jgjk@www.lma.cnrs-mrs.fr> Message-ID: <1278509004.12170.6.camel@Portable-s2m.cnrs-mrs.fr> Thanks for your answers. > Three solutions: > - ask your users to build the software and install zlib by > themselves. On windows, I am afraid it means you concretely limit your > userbase to practically 0. > - build zlib as part of the build process, and keep zlib internally. > - include a copy of the zlib library (the binary) in the tarball. > You cannot build a library loadable with ctypes with distutils nor > numpy.distutils. You need to implement it in distutils, or copy the > code from one of the project which implemented it Ok, the simplest may then to build _im7.dll with make or scons and include it as install_data in the python package... I was astonished that the process (building the shared object called by ctypes) that works on linux does not work in windows! By the way, do you have any example of project implementing it in distutils ? From bsouthey at gmail.com Wed Jul 7 09:52:37 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 07 Jul 2010 08:52:37 -0500 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: <20100706180933.GB8523@phare.normalesup.org> References: <20100706180933.GB8523@phare.normalesup.org> Message-ID: <4C3486A5.30800@gmail.com> On 07/06/2010 01:09 PM, Gael Varoquaux wrote: > Just to give a data point, my research group and I would be very excited > at the idea of having Fernando's data arrays in Numpy. We can't offer to > maintain it, because we are already fairly involved in machine learning > and neuroimaging specific code, but we would be able to rely on it more > in our packages, and we love it! > > Ga?l > > On Mon, Jul 05, 2010 at 11:31:02PM -0500, Jonathan March wrote: > >> Fernando Perez proposed a NumPy enhancement, an ndarray with named axes, >> prototyped as DataArray by him, Mike Trumpis, Jonathan Taylor, Matthew >> Brett, Kilian Koepsell and Stefan van der Walt. >> > >> At SciPy 2010 on July 1, Fernando convened a BOF (Birds of a Feather) >> discussion of this proposal. >> > >> The notes from this BOF can be found at: >> [1]http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes >> (linked from the Plans section of [2]http://projects.scipy.org/numpy ) >> > >> HELP NEEDED: Fernando does not have the resources to drive the project >> beyond this prototype, which already does what he needs. If this is to go >> anywhere, it needs people to do the work. Please step forward. >> > >> References >> > >> Visible links >> 1. http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes >> 2. http://projects.scipy.org/numpy >> This is very interesting work especially if can be used to extend or replace the current record arrays (and perhaps structured arrays). If it can not then you really need to make a case for yet another data structure. Currently we will have all these unnecessary and incompatible hybrids rather than a single option - competition is not good. I really dislike the current impasse with numpy's Matrix class and do not wish this to happen again. However, I am not saying that you can not create another scikit rather that there has to be some consideration if if is to go back into numpy/scipy. As per Wes's reply in this thread, I really do think that a set of specific behaviors that are expected for this new data structure need to be agreed upon. Currently speed should not an issue until the basic functionality is covered. I think that there are at least the following concerns that people need to agree on: 1) Indexing especially related to slicing and broadcasting. 2) Joining data structures - what to do when all data structures have the same 'metadata' (axes, labels, dtypes) and when each of these differ. Also, do you allow union (so the result is includes all axes, labels etc present all data structures) or intersection (keep only the axes and labels in common) operations? 3) How do you expect basic mathematical operations to work? For example, what does A +1 mean if A has different data types like strings? 4) How should this interact with the rest of numpy? Bruce From dwf at cs.toronto.edu Wed Jul 7 11:40:43 2010 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Wed, 7 Jul 2010 11:40:43 -0400 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: <4C3486A5.30800@gmail.com> References: <20100706180933.GB8523@phare.normalesup.org> <4C3486A5.30800@gmail.com> Message-ID: <98733BEB-106C-44ED-B9E2-FB4B211C62B3@cs.toronto.edu> On 2010-07-07, at 9:52 AM, Bruce Southey wrote: > This is very interesting work especially if can be used to extend or > replace the current record arrays (and perhaps structured arrays). It's unlikely that this would happen. They serve a different purpose, as far as I can tell. It would be perfectly acceptable to have structured arrays with axis labels, for example. > If it can not then you really need to make a case for yet another data > structure. Currently we will have all these unnecessary and incompatible > hybrids rather than a single option - competition is not good. I really > dislike the current impasse with numpy's Matrix class and do not wish > this to happen again. There was discussion a month or two ago about deprecating matrix-array operations that introduced ambiguity, and I still think that's a reasonable compromise for people who like the matrix class for teaching. David From josh.holbrook at gmail.com Wed Jul 7 11:40:52 2010 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Wed, 7 Jul 2010 07:40:52 -0800 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: <4C3486A5.30800@gmail.com> References: <20100706180933.GB8523@phare.normalesup.org> <4C3486A5.30800@gmail.com> Message-ID: On Wed, Jul 7, 2010 at 5:52 AM, Bruce Southey wrote: > On 07/06/2010 01:09 PM, Gael Varoquaux wrote: >> Just to give a data point, my research group and I would be very excited >> at the idea of having Fernando's data arrays in Numpy. We can't offer to >> maintain it, because we are already fairly involved in machine learning >> and neuroimaging specific code, but we would be able to rely on it more >> in our packages, and we love it! >> >> Ga?l >> >> On Mon, Jul 05, 2010 at 11:31:02PM -0500, Jonathan March wrote: >> >>> ? ? Fernando Perez proposed a NumPy enhancement, an ndarray with named axes, >>> ? ? prototyped as DataArray by him, Mike Trumpis, Jonathan Taylor, Matthew >>> ? ? Brett, Kilian Koepsell and Stefan van der Walt. >>> >> >>> ? ? At SciPy 2010 on July 1, Fernando convened a BOF (Birds of a Feather) >>> ? ? discussion of this proposal. >>> >> >>> ? ? The notes from this BOF can be found at: >>> ? ? [1]http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes >>> ? ? (linked from the Plans section of [2]http://projects.scipy.org/numpy ) >>> >> >>> ? ? HELP NEEDED: Fernando does not have the resources to drive the project >>> ? ? beyond this prototype, which already does what he needs. If this is to go >>> ? ? anywhere, it needs people to do the work. Please step forward. >>> >> >>> References >>> >> >>> ? ? Visible links >>> ? ? 1. http://projects.scipy.org/numpy/wiki/NdarrayWithNamedAxes >>> ? ? 2. http://projects.scipy.org/numpy >>> It's 7:30am, so if I say something crazy bear with me. ;) > This is very interesting work especially if can be used to extend or > replace the current record arrays (and perhaps structured arrays). I don't think record arrays are intended to solve quite the same problem. I think of record arrays as arrays of tuples, whereas datarray&friends are giving labels to axes and indices. In fact, there's really no reason why you couldn't label the axes and indices of a record array. To be honest, though, I haven't really used the record array previously, and tbh I'm eyeing it with some suspicion. If anyone wants to defend the poor defenseless record array, I'm all ears! (Speaking of the matrix: If nobody uses it, why not deprecate it?) > If it can not then you really need to make a case for yet another data > structure. Currently we will have all these unnecessary and incompatible > hybrids rather than a single option - competition is not good. ?I really > dislike the current impasse with numpy's Matrix class and do not wish > this to happen again. Sure. I think the case is pretty easy, though: Look at all the ad-hoc implementations of something like this elsewhere. Just off the top of my head: Larry, pandas, datarray, metaarray (from the cookbook), tabular, and pyDataFrame. There is clearly a lot of demand for something like this. On the other hand, many of these solutions (pandas and tabular in particular) have goals quite beyond just the datatype. For examples, pandas is meant for 2-d and 3-d financial data in particular, and tabular was written to emulate a 2-d spreadsheet. So, clearly, a nice, solid, basic labeled array that's been accepted into what's really *the* de facto standard numerical library for python, is something that a lot of people would appreciate, and many developers have said they would use something like datarray in numpy were it available. > However, I am not saying that you can not create > another scikit rather that there has to be some consideration if if is > to go back into numpy/scipy. I don't think datarray itself would best fit in a scikit, though there are definitely some common manipulations that people would want to do to datarrays which may fit in a scikit better than in numpy (in my head I'm already calling it datarraytools). > As per Wes's reply in this thread, I really do think that a set of > specific behaviors that are expected for this new data structure need to > be agreed upon. Currently speed should not an issue until the basic > functionality is covered. I agree that premature optimization is a bad idea. Best to nail down the features and api first. > I think that there are at least the following > concerns that people need to agree on: > > 1) Indexing especially related to slicing and broadcasting. > 2) Joining data structures - what to do when all data structures have > the same 'metadata' (axes, labels, dtypes) and when each of these > differ. Also, do you allow union (so the result is includes all axes, > labels etc present all data structures) ?or intersection (keep only the > axes and labels in common) operations? > 3) How do you expect basic mathematical operations to work? For example, > what does A +1 mean if A has different data types like strings? > 4) How should this interact with the rest of numpy? Why not allow both unions and intersections? Just make separate functions for them. I think the standard behavior of the datarray, assuming that indices themselves don't get into it, should be very similar to that of the stock ndarray. A possible exception would be when two datarrays have the same axes and ticks but are in a different order, since one would either rearrange one set of axes/ticks, or throw an error. From xscript at gmx.net Wed Jul 7 11:51:58 2010 From: xscript at gmx.net (=?UTF-8?B?TGx1w61z?=) Date: Wed, 07 Jul 2010 17:51:58 +0200 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: <4C3486A5.30800@gmail.com> References: <20100706180933.GB8523@phare.normalesup.org> <4C3486A5.30800@gmail.com> Message-ID: <86zky3thlt.wl%lluis@ginnungagap.pc.ac.upc.edu> Bruce Southey writes: > 1) Indexing especially related to slicing and broadcasting. 1.1) Absolute indexing/slicing a[0], a['tickvalue'] 1.2) Partial slicing For the case of "compund" ticks that is merging multiple ticks into a single one: a['subtick1value-subtick2value'] (absolute) a[::"subtick1 == 'subtick1value'"] (partial slicing) That is, I have a dict in an ndarray subclass for the 'tickvalue' -> int translation, but tick values are built themselves with dicts with one key for every subtick, such that the user can flatten/reshape the ndarray subclass and merge the "subticks" into a single tick on any axis/dimension. This reshaping operation has some complexities regarding shape homogeneity of the result of reshaping, which are handled during a reshape operation in sciexp2. Example: # 'a' has three "tick/metadata variables": varA, varB, varC # a tick is built as '@varA at -@varB at -@varC@' a["a1-b1-c1"] = 1.0 a["a1-b1-c2"] = 1.0 a["a2-b1-c1"] = 1.0 # reshape it into 2 dimensions, the first with '@varA@', the second with # '@varB at -@varC@' b = a.reshape(['varA'], ['varB', 'varC']) # then, 'b' is Data([ # a1 [1.0, # b1-c1 1.0] # b1-c2 ], [ # a2 [1.0, # b1-c1 nan] # b1-c2 ]) > 2) Joining data structures - what to do when all data structures have > the same 'metadata' (axes, labels, dtypes) and when each of these > differ. Also, do you allow union (so the result is includes all axes, > labels etc present all data structures) or intersection (keep only the > axes and labels in common) operations? First, I assume two levels of semantics on the structure: * A set of values, reached by indexing multiple axis/dimensions. I use this for identifying experiments, where the parameters of the experiments are spread among an arbitrary number of dimensions (see above). * A specific value of a set of values, (in my case) reached by indexing a field in a structured array. I use each structure/record to encapsulate all the various outputs of a single experiment, where structure fields can have arbitrarily different types. What I allow right now in sciexp2 is the "union" of experiment outputs; that is, the union of structured arrays into a single one. On the "experiment" metadata side, I think operations should fail if metadata differs, unless you want to "append" new experiments (in my case this is appending new tick variable values describing the very same tick variables). Conceptually, I do this like (def append(self, data)): 0) Check both are describing the same type of experiments (i.e., have the same metadata variables, although different values for them) assert self.variables() == data.variables() 1) Flattening the affected arrays (flatteing is the inverse operation of the above example, which I have not currently implemented but would be easy to if speed were not a concern) flat_self = self.flatten() flat_data = data.dlatten() 2) Concatenate the two sequences of metadata. Will fail if any repeated elements exist. res = np.Data(len(flat_self) + len(flat_data), metadata=flat_self.metadata + flat_data.metadata) res[:len(flat_self)] = flat_self res[len(flat_self):] = flat_data 3) Reshape 'res' metadata like 'self'. This will take care of placing NaN to homogeneize the resulting structure. 4) Return res :) > 3) How do you expect basic mathematical operations to work? For example, > what does A +1 mean if A has different data types like strings? I'd run for forcing the user to specify which structure fields have to operated on (of course, assuming these really are structured arrays). But, one thing that has been bugging me is how to operate on all fields when the operation is compatible with all fields. For example, calculating the average of all experiment results on a given axis. Right now I have to calculate each of them separately, and then perform a "union" of the resulting structured arrays (which in fact are not structured arrays but plain ndarrays). > 4) How should this interact with the rest of numpy? Not sure what you mean. I already maintain metadata through all numpy operations, except when indexing with 'numpy.newaxis', for which right now I return a plain ndarray instead of creating a stub dimension metadata. BTW, I stuck with the 'dimension' wording instead of 'axis' because of 'numpy.ndarray.ndim'. Maybe this should be unified with the 'axis' argumnent on numeric operations, in order to use a single wording for the concept. Read you, Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From xscript at gmx.net Wed Jul 7 13:08:51 2010 From: xscript at gmx.net (=?UTF-8?B?TGx1w61z?=) Date: Wed, 07 Jul 2010 19:08:51 +0200 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: <4C3486A5.30800@gmail.com> References: <20100706180933.GB8523@phare.normalesup.org> <4C3486A5.30800@gmail.com> Message-ID: <86y6dnte1o.wl%lluis@ginnungagap.pc.ac.upc.edu> Bruce Southey writes: > 4) How should this interact with the rest of numpy? BTW, now I rememberd something I wanted to implement but required too much monkeypatching right now. For all functions accepting the 'axis' argument, I'd like to provide a string that uniquely identifies a dimension/axis, instead of an integer. In my case, I'd like to provide the name of any of the variables on the dimension, such that if I have a dimension where metadata ticks are built like 'param1-param2', I'd like to: data.mean(axis='param1') Such that I could calculate the mean along that dimension/axis, whatever it is. Read you, Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From xscript at gmx.net Wed Jul 7 15:09:23 2010 From: xscript at gmx.net (=?UTF-8?B?TGx1w61z?=) Date: Wed, 07 Jul 2010 21:09:23 +0200 Subject: [Numpy-discussion] User-defined metadata in dtype Message-ID: <86sk3vt8gs.wl%lluis@ginnungagap.pc.ac.upc.edu> I've been trying to embed extra information into dtype, such that whether I have structured arrays or not, I can alwas have access to: * A name of the field. This is already present on structured arrays, but lost whenever you access a field. * A description of the field. Tried to build a dtype with 'titles', but there can be no repeated titles, which is not always possible. * Units of the field. By using __array_wrap__ I could always carry along the current units of the information on every field of the structured array (or the units of the array if it's a plain ndarray) with a package like 'units'. My current implementation uses __array_wrap__, __getitem__ and __setitem__ to maintain these fields, but I'm still not sure if that information should better reside inside dtype itself (I've tried to subclass and monkeypatch dtype from python with no success). My intent is to maintain all that information for later retrieval when plotting it into figures. Read you, Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From fperez.net at gmail.com Wed Jul 7 17:42:45 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 7 Jul 2010 14:42:45 -0700 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: References: Message-ID: Sorry to be super-brief, I've been offline for days and only have a brief window of access for now. Many thanks to Jonathan for the summary! On Tue, Jul 6, 2010 at 9:42 AM, Skipper Seabold wrote: > > > I think what Josh said is right. ?However, we proposed having all of > the new labeled axis access pushed to a .aix (or whatever) method, so > as to avoid any confusion, as the original object can be accessed just > as an ndarray. ?I'm not sure where this leaves us vis-a-vis ints as > ticks. I think we agreed that once we have a separate .whatever[labeled_indexing] accessor so that there is no ambiguity with integer indexing on the main object, then integer ticks would be OK. Multiple people expressed valid use cases for them. Regards, f From cgohlke at uci.edu Thu Jul 8 00:13:05 2010 From: cgohlke at uci.edu (Christoph Gohlke) Date: Wed, 07 Jul 2010 21:13:05 -0700 Subject: [Numpy-discussion] TypeError when using double , longdouble in numpy.dot Message-ID: <4C355051.2000902@uci.edu> Dear NumPy developers, I am trying to solve some scipy.sparse TypeError failures reported in [1] and reduced them to the following example: >>> import numpy >>> a = numpy.array([[1]]) >>> numpy.dot(a.astype('single'), a.astype('longdouble')) array([[1.0]], dtype=float64) >>> numpy.dot(a.astype('double'), a.astype('longdouble')) Traceback (most recent call last): File "", line 1, in TypeError: array cannot be safely cast to required type Is this exception expected? Also I noticed this: >>> numpy.array([1]).astype('longdouble').dtype.num 13 >>> numpy.array([1.0]).astype('longdouble').dtype.num 12 I am using Python 2.6.5 for Windows and numpy 1.4.1 compiled with msvc9, where sizeof(longdouble) == sizeof(double). [1] http://aspn.activestate.com/ASPN/Mail/Message/scipy-user/3875416 -- Christoph From cournape at gmail.com Thu Jul 8 00:25:08 2010 From: cournape at gmail.com (David Cournapeau) Date: Thu, 8 Jul 2010 06:25:08 +0200 Subject: [Numpy-discussion] TypeError when using double , longdouble in numpy.dot In-Reply-To: <4C355051.2000902@uci.edu> References: <4C355051.2000902@uci.edu> Message-ID: On Thu, Jul 8, 2010 at 6:13 AM, Christoph Gohlke wrote: > Dear NumPy developers, > > I am trying to solve some scipy.sparse TypeError failures reported in > [1] and reduced them to the following example: > > >>>> import numpy >>>> a = numpy.array([[1]]) > >>>> numpy.dot(a.astype('single'), a.astype('longdouble')) > array([[1.0]], dtype=float64) > >>>> numpy.dot(a.astype('double'), a.astype('longdouble')) > Traceback (most recent call last): > ? File "", line 1, in > TypeError: array cannot be safely cast to required type > > > Is this exception expected? No, I don't think so. The error seems to be platform specific - I have the expected result on my macbook. > > Also I noticed this: > >>>> numpy.array([1]).astype('longdouble').dtype.num > 13 >>>> numpy.array([1.0]).astype('longdouble').dtype.num > 12 This is unexpected. There maybe some untested/buggy codepaths for the windows case (where sizeof(double) == sizeof(long double)). I will try to look into it, but please post an issue on trac so that it does not get lost, David From charlesr.harris at gmail.com Thu Jul 8 00:43:34 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 7 Jul 2010 22:43:34 -0600 Subject: [Numpy-discussion] TypeError when using double , longdouble in numpy.dot In-Reply-To: <4C355051.2000902@uci.edu> References: <4C355051.2000902@uci.edu> Message-ID: On Wed, Jul 7, 2010 at 10:13 PM, Christoph Gohlke wrote: > Dear NumPy developers, > > I am trying to solve some scipy.sparse TypeError failures reported in > [1] and reduced them to the following example: > > > >>> import numpy > >>> a = numpy.array([[1]]) > > >>> numpy.dot(a.astype('single'), a.astype('longdouble')) > array([[1.0]], dtype=float64) > > >>> numpy.dot(a.astype('double'), a.astype('longdouble')) > Traceback (most recent call last): > File "", line 1, in > TypeError: array cannot be safely cast to required type > > > Is this exception expected? > > I think not. On some platforms longdouble is the same as double, on others it is extended precision or quad precision. On your platform this looks like a bug, on my platform it would be correct except there is a fallback version of dot that works with extended precision. Is there a mix of compilers here, or is it msvc all the way down. In [5]: a = array([[1]]) In [6]: dot(a.astype('single'), a.astype('longdouble')) Out[6]: array([[1.0]], dtype=float128) Also I noticed this: > > >>> numpy.array([1]).astype('longdouble').dtype.num > 13 > >>> numpy.array([1.0]).astype('longdouble').dtype.num > 12 > > Yeah, that is probably correct in a strange sort of way since the two types are identical under the hood. On ubuntu I get In [1]: array([1]).astype('longdouble').dtype.num Out[1]: 13 In [2]: array([1.]).astype('longdouble').dtype.num Out[2]: 13 Type numbers aren't a good way to determine precision in a platform independent way. > I am using Python 2.6.5 for Windows and numpy 1.4.1 compiled with msvc9, > where sizeof(longdouble) == sizeof(double). > > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Thu Jul 8 00:59:32 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 7 Jul 2010 22:59:32 -0600 Subject: [Numpy-discussion] TypeError when using double , longdouble in numpy.dot In-Reply-To: <4C355051.2000902@uci.edu> References: <4C355051.2000902@uci.edu> Message-ID: On Wed, Jul 7, 2010 at 10:13 PM, Christoph Gohlke wrote: > Dear NumPy developers, > > I am trying to solve some scipy.sparse TypeError failures reported in > [1] and reduced them to the following example: > > > >>> import numpy > >>> a = numpy.array([[1]]) > > >>> numpy.dot(a.astype('single'), a.astype('longdouble')) > array([[1.0]], dtype=float64) > > >>> numpy.dot(a.astype('double'), a.astype('longdouble')) > Traceback (most recent call last): > File "", line 1, in > TypeError: array cannot be safely cast to required type > > Just for laughs, what happens if you reverse the order of the arguments? Type promotion in numpy is not always symmetric. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From rspeer at MIT.EDU Thu Jul 8 02:25:29 2010 From: rspeer at MIT.EDU (Rob Speer) Date: Thu, 8 Jul 2010 02:25:29 -0400 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: References: Message-ID: Glad I finally found this discussion. I implemented some of the ideas from the SciPy BOAF discussion, and Joshua has already merged them into his datarray on GitHub (thanks, Joshua, for being so fast on the merge button). To introduce these changes, here's a couple of examples of how you could index into a matrix whose rows represent countries, and whose columns represent something that is observed every four years (hmm...). >>> arr.country.named('Netherlands').year.named(2010) >>> arr.country.named('Spain').year.named(slice(1994, 2010)) >>> arr.year.named(2006).country[0:2] First of all, a bit of terminology. Axes can have labels. Ticks (which are particular rows, columns, etc.) can have names. Axes and ticks also have indices (the sequential numbers they've always had). Feel free to suggest alternate terminology, I just used what sounded the most natural to me in the method names. Addressing by indices and addressing by tick names are separate, which allows integers to be tick names without a conflict. You use the "named" method of an axis to address it by name, while __getitem__ only addresses it by indices. You can still take slices of names (makes sense for things like years), but you have to spell out "slice" because it's not inside square brackets. Then, at the axis level: My impression from the SciPy discussion was that people wanted to be able to look up multiple labeled axes at once without repeating themselves, and .aix and stuples were not satisfying, but we didn't come up with anything else during the discussion. My choice was to add a bit of attribute magic: if you get an attribute of a datarray that is (a) not a real attribute and (b) matches the label of one of its axes, you'll get that axis. So "arr.axis.country" can be shortened to "arr.country", for example, but if you decided to name your axis "T", you would be stuck with "arr.axis.T". So this is the state of the code at http://github.com/rspeer/datarray (and also at http://github.com/jesusabdullah/datarray now). I'll even try to make the documentation catch up with this code if people think the changes are good. -- Rob From josh.holbrook at gmail.com Thu Jul 8 03:07:24 2010 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Wed, 7 Jul 2010 23:07:24 -0800 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: References: Message-ID: On Wed, Jul 7, 2010 at 10:25 PM, Rob Speer wrote: > Glad I finally found this discussion. > > I implemented some of the ideas from the SciPy BOAF discussion, and > Joshua has already merged them into his datarray on GitHub (thanks, > Joshua, for being so fast on the merge button). > > To introduce these changes, here's a couple of examples of how you > could index into a matrix whose rows represent countries, and whose > columns represent something that is observed every four years > (hmm...). >>>> arr.country.named('Netherlands').year.named(2010) >>>> arr.country.named('Spain').year.named(slice(1994, 2010)) >>>> arr.year.named(2006).country[0:2] > > First of all, a bit of terminology. Axes can have labels. Ticks (which > are particular rows, columns, etc.) can have names. Axes and ticks > also have indices (the sequential numbers they've always had). Feel > free to suggest alternate terminology, I just used what sounded the > most natural to me in the method names. > > Addressing by indices and addressing by tick names are separate, which > allows integers to be tick names without a conflict. You use the > "named" method of an axis to address it by name, while __getitem__ > only addresses it by indices. You can still take slices of names > (makes sense for things like years), but you have to spell out "slice" > because it's not inside square brackets. > > Then, at the axis level: My impression from the SciPy discussion was > that people wanted to be able to look up multiple labeled axes at once > without repeating themselves, and .aix and stuples were not > satisfying, but we didn't come up with anything else during the > discussion. > > My choice was to add a bit of attribute magic: if you get an attribute > of a datarray that is (a) not a real attribute and (b) matches the > label of one of its axes, you'll get that axis. So "arr.axis.country" > can be shortened to "arr.country", for example, but if you decided to > name your axis "T", you would be stuck with "arr.axis.T". > > So this is the state of the code at http://github.com/rspeer/datarray > (and also at http://github.com/jesusabdullah/datarray now). I'll even > try to make the documentation catch up with this code if people think > the changes are good. > -- Rob > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > While I haven't had a chance to really look in-depth at the changes myself (I'm a busy man! So many mailing lists!), I so far like the look and sound of them. That's just my opinion, though. While on the subject of docs: The current sphinx docs look like they got a bit jumbled somewhere along the way. I don't really know my sphinxes (or restructuredtexts) yet, but these docs are definitely something I'd like to get in-order. --Josh --Josh From cgohlke at uci.edu Thu Jul 8 03:24:28 2010 From: cgohlke at uci.edu (Christoph Gohlke) Date: Thu, 08 Jul 2010 00:24:28 -0700 Subject: [Numpy-discussion] TypeError when using double , longdouble in numpy.dot In-Reply-To: References: <4C355051.2000902@uci.edu> Message-ID: <4C357D2C.5000302@uci.edu> On 7/7/2010 9:43 PM, Charles R Harris wrote: > > > On Wed, Jul 7, 2010 at 10:13 PM, Christoph Gohlke > wrote: > > Dear NumPy developers, > > I am trying to solve some scipy.sparse TypeError failures reported in > [1] and reduced them to the following example: > > > > >> import numpy > > >> a = numpy.array([[1]]) > > > >> numpy.dot(a.astype('single'), a.astype('longdouble')) > array([[1.0]], dtype=float64) > > > >> numpy.dot(a.astype('double'), a.astype('longdouble')) > Traceback (most recent call last): > File "", line 1, in > TypeError: array cannot be safely cast to required type > > > Is this exception expected? > > > I think not. On some platforms longdouble is the same as double, on > others it is extended precision or quad precision. On your platform this > looks like a bug, on my platform it would be correct except there is a > fallback version of dot that works with extended precision. Is there a > mix of compilers here, or is it msvc all the way down. Yes, msvc9 all the way down. It fails no matter whether I build with setup.py or setupscons.py. Using mingw build gives the expected results. > > In [5]: a = array([[1]]) > > In [6]: dot(a.astype('single'), a.astype('longdouble')) > Out[6]: array([[1.0]], dtype=float128) > > > Also I noticed this: > > > >> numpy.array([1]).astype('longdouble').dtype.num > 13 > > >> numpy.array([1.0]).astype('longdouble').dtype.num > 12 > > > Yeah, that is probably correct in a strange sort of way since the two > types are identical under the hood. On ubuntu I get > > In [1]: array([1]).astype('longdouble').dtype.num > Out[1]: 13 > > In [2]: array([1.]).astype('longdouble').dtype.num > Out[2]: 13 > > Type numbers aren't a good way to determine precision in a platform > independent way. I should have mentioned that the following example works for me: >>> a = numpy.array([[1.0]]) >>> numpy.dot(a.astype('double'), a.astype('longdouble')) array([[ 1.]]) -- Christoph From cgohlke at uci.edu Thu Jul 8 03:27:55 2010 From: cgohlke at uci.edu (Christoph Gohlke) Date: Thu, 08 Jul 2010 00:27:55 -0700 Subject: [Numpy-discussion] TypeError when using double , longdouble in numpy.dot In-Reply-To: References: <4C355051.2000902@uci.edu> Message-ID: <4C357DFB.5030507@uci.edu> On 7/7/2010 9:59 PM, Charles R Harris wrote: > > > On Wed, Jul 7, 2010 at 10:13 PM, Christoph Gohlke > wrote: > > Dear NumPy developers, > > I am trying to solve some scipy.sparse TypeError failures reported in > [1] and reduced them to the following example: > > > > >> import numpy > > >> a = numpy.array([[1]]) > > > >> numpy.dot(a.astype('single'), a.astype('longdouble')) > array([[1.0]], dtype=float64) > > > >> numpy.dot(a.astype('double'), a.astype('longdouble')) > Traceback (most recent call last): > File "", line 1, in > TypeError: array cannot be safely cast to required type > > > Just for laughs, what happens if you reverse the order of the arguments? > Type promotion in numpy is not always symmetric. > This works as expected: >>> numpy.dot(a.astype('longdouble'), a.astype('double')) array([[1.0]], dtype=float64) -- Christoph From cgohlke at uci.edu Thu Jul 8 04:17:34 2010 From: cgohlke at uci.edu (Christoph Gohlke) Date: Thu, 08 Jul 2010 01:17:34 -0700 Subject: [Numpy-discussion] TypeError when using double , longdouble in numpy.dot In-Reply-To: References: <4C355051.2000902@uci.edu> Message-ID: <4C35899E.7040402@uci.edu> On 7/7/2010 9:25 PM, David Cournapeau wrote: > On Thu, Jul 8, 2010 at 6:13 AM, Christoph Gohlke wrote: >> Dear NumPy developers, >> >> I am trying to solve some scipy.sparse TypeError failures reported in >> [1] and reduced them to the following example: >> >> >>>>> import numpy >>>>> a = numpy.array([[1]]) >> >>>>> numpy.dot(a.astype('single'), a.astype('longdouble')) >> array([[1.0]], dtype=float64) >> >>>>> numpy.dot(a.astype('double'), a.astype('longdouble')) >> Traceback (most recent call last): >> File "", line 1, in >> TypeError: array cannot be safely cast to required type >> >> >> Is this exception expected? > > No, I don't think so. The error seems to be platform specific - I have > the expected result on my macbook. > >> >> Also I noticed this: >> >>>>> numpy.array([1]).astype('longdouble').dtype.num >> 13 >>>>> numpy.array([1.0]).astype('longdouble').dtype.num >> 12 > > This is unexpected. There maybe some untested/buggy codepaths for the > windows case (where sizeof(double) == sizeof(long double)). I will try > to look into it, but please post an issue on trac so that it does not > get lost, > Thank you. I opened a ticket: http://projects.scipy.org/numpy/ticket/1539 -- Christoph From xscript at gmx.net Thu Jul 8 07:13:57 2010 From: xscript at gmx.net (=?UTF-8?B?TGx1w61z?=) Date: Thu, 08 Jul 2010 13:13:57 +0200 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: References: Message-ID: <86r5jetedm.wl%lluis@ginnungagap.pc.ac.upc.edu> Rob Speer writes: >>>> arr.country.named('Netherlands').year.named(2010) >>>> arr.country.named('Spain').year.named(slice(1994, 2010)) >>>> arr.year.named(2006).country[0:2] This looks too verbose to me. As axis always have a total order, I'd go for the most compact representation (assuming 'country' is the first axis, and 'year' the second one): arr['Netherlands','2010'] arr['Spain','1994':'2010'] arr[0:2,'2006'] This is my current implementation, which also allows for slices with mixed integers and names everywhere. I understand this might not be the desired default behaviour, as requires looking into the types of every item in '__getitem__', and this might be a performance issue (although my current implementation tries to optimize for the case of integer indexes). Thus, we can use something in the middle: arr[0,1] arr.names['Netherlands',2010] # I'd rather go for 'names' instead of 'ticks' arr.country['Spain'].year[1994:2010] The default '__getitem__' still has full speed, but accessing the 'named' attribute allows for accessing on the lines of my previous example, while still allowing the access through axis name without requiring an explicit 'slice'. Although this is not my preferred syntax, I think it is a good compromise, and I could always subclass this to redirect the default '__getitem__' into 'names.__getitem__'. Btw, I store the names to index translations on an ordered dict (indexed by name), such that I can also provide an 'arr.iteritems' method that returns tuples with 'name/tick' and the array contents of that index. In the above syntax, this would probably be 'arr..iteritems'. Another feature I like is being able to translate back and forth from names/ticks to integers, which I do through my 'Dimension.__getitem__' method (Dimension is the equivalent of datarray's 'Axis'). PS: I also have a separation between axis and their naming, meaning that I can have a single axis with both 'country' and 'year', such that I would index with 'Netherlands-2010' (other examples do make more sense), but still be able to access them separately (this reduces the size of the full ndarray, as there is no need for so many NaNs to make the ndarray homoheneus on size, and it brings the ndarray closer to the structuring of data on the mind of the user). Read you, Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From hannes.bretschneider at wiwi.hu-berlin.de Thu Jul 8 09:26:03 2010 From: hannes.bretschneider at wiwi.hu-berlin.de (Hannes Bretschneider) Date: Thu, 8 Jul 2010 13:26:03 +0000 (UTC) Subject: [Numpy-discussion] Memory usage of numpy-arrays Message-ID: Dear NumPy developers, I have to process some big data files with high-frequency financial data. I am trying to load a delimited text file having ~700 MB with ~ 10 million lines using numpy.genfromtxt(). The machine is a Debian Lenny server 32bit with 3GB of memory. Since the file is just 700MB I am naively assuming that it should fit into memory in whole. However, when I attempt to load it, python fills the entire available memory and then fails with Traceback (most recent call last): File "", line 1, in File "/usr/local/lib/python2.6/site-packages/numpy/lib/io.py", line 1318, in genfromtxt errmsg = "\n".join(errmsg) MemoryError Is there a way to load this file without crashing? Thanks, Hannes From wesmckinn at gmail.com Thu Jul 8 09:52:59 2010 From: wesmckinn at gmail.com (Wes McKinney) Date: Thu, 8 Jul 2010 09:52:59 -0400 Subject: [Numpy-discussion] Memory usage of numpy-arrays In-Reply-To: References: Message-ID: On Thu, Jul 8, 2010 at 9:26 AM, Hannes Bretschneider wrote: > Dear NumPy developers, > > I have to process some big data files with high-frequency > financial data. I am trying to load a delimited text file having > ~700 MB with ~ 10 million lines using numpy.genfromtxt(). The > machine is a Debian Lenny server 32bit with 3GB of memory. ?Since > the file is just 700MB I am naively assuming that it should fit > into memory in whole. However, when I attempt to load it, python > fills the entire available memory and then fails with > > > Traceback (most recent call last): > ?File "", line 1, in > ?File "/usr/local/lib/python2.6/site-packages/numpy/lib/io.py", line 1318, in genfromtxt > ? ?errmsg = "\n".join(errmsg) > MemoryError > > > Is there a way to load this file without crashing? > > Thanks, Hannes > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > >From my experience I might suggest using PyTables (HDF5) as intermediate storage for the data which can be populated iteratively (you'll have to parse the data yourself, marking missing data could be a problem). This of course requires that you know the column schema ahead of time which is one thing that np.genfromtxt will handle automatically. Particularly if you have a large static data set this can be worthwhile as reading the data out of HDF5 will be many times faster than parsing the text file. I believe you can also append rows to the PyTables Table structure in chunks which would be faster than appending one row at a time. hth, Wes From bsouthey at gmail.com Thu Jul 8 10:46:17 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 08 Jul 2010 09:46:17 -0500 Subject: [Numpy-discussion] Memory usage of numpy-arrays In-Reply-To: References: Message-ID: <4C35E4B9.5000505@gmail.com> On 07/08/2010 08:52 AM, Wes McKinney wrote: > On Thu, Jul 8, 2010 at 9:26 AM, Hannes Bretschneider > wrote: > >> Dear NumPy developers, >> >> I have to process some big data files with high-frequency >> financial data. I am trying to load a delimited text file having >> ~700 MB with ~ 10 million lines using numpy.genfromtxt(). The >> machine is a Debian Lenny server 32bit with 3GB of memory. Since >> the file is just 700MB I am naively assuming that it should fit >> into memory in whole. However, when I attempt to load it, python >> fills the entire available memory and then fails with >> >> >> Traceback (most recent call last): >> File "", line 1, in >> File "/usr/local/lib/python2.6/site-packages/numpy/lib/io.py", line 1318, in genfromtxt >> errmsg = "\n".join(errmsg) >> MemoryError >> >> >> Is there a way to load this file without crashing? >> >> Thanks, Hannes >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > From my experience I might suggest using PyTables (HDF5) as > intermediate storage for the data which can be populated iteratively > (you'll have to parse the data yourself, marking missing data could be > a problem). This of course requires that you know the column schema > ahead of time which is one thing that np.genfromtxt will handle > automatically. Particularly if you have a large static data set this > can be worthwhile as reading the data out of HDF5 will be many times > faster than parsing the text file. > > I believe you can also append rows to the PyTables Table structure in > chunks which would be faster than appending one row at a time. > > hth, > Wes > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > There have been past discussions on this. Numpy needs contiguous memory so you are running out of memory because as loading the original data and the numpy array will exhaust your available contiguous memory. Note that a file of ~700 MB does not translate into ~700 MB of memory since it depends on the dtypes. Also a system with 3GB of memory probably has about 1.5GB of free memory available (you might get closer to 2GB if you have a very lean system). If you know your data then you have do all the hard work yourself to minimize memory usage or use something like hdf5 or PyTables. Bruce From josh.holbrook at gmail.com Thu Jul 8 10:54:58 2010 From: josh.holbrook at gmail.com (Joshua Holbrook) Date: Thu, 8 Jul 2010 06:54:58 -0800 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: <86r5jetedm.wl%lluis@ginnungagap.pc.ac.upc.edu> References: <86r5jetedm.wl%lluis@ginnungagap.pc.ac.upc.edu> Message-ID: On Thu, Jul 8, 2010 at 3:13 AM, Llu?s wrote: > Rob Speer writes: > >>>>> arr.country.named('Netherlands').year.named(2010) >>>>> arr.country.named('Spain').year.named(slice(1994, 2010)) >>>>> arr.year.named(2006).country[0:2] > > This looks too verbose to me. > > As axis always have a total order, I'd go for the most compact representation > (assuming 'country' is the first axis, and 'year' the second one): > > ? arr['Netherlands','2010'] > ? arr['Spain','1994':'2010'] > ? arr[0:2,'2006'] > > This is my current implementation, which also allows for slices with mixed > integers and names everywhere. > > I understand this might not be the desired default behaviour, as requires > looking into the types of every item in '__getitem__', and this might be a > performance issue (although my current implementation tries to optimize for the > case of integer indexes). > > Thus, we can use something in the middle: > > ? arr[0,1] > ? arr.names['Netherlands',2010] # I'd rather go for 'names' instead of 'ticks' > ? arr.country['Spain'].year[1994:2010] > > The default '__getitem__' still has full speed, but accessing the 'named' > attribute allows for accessing on the lines of my previous example, while still > allowing the access through axis name without requiring an explicit 'slice'. > > Although this is not my preferred syntax, I think it is a good compromise, and I > could always subclass this to redirect the default '__getitem__' into > 'names.__getitem__'. > > Btw, I store the names to index translations on an ordered dict (indexed by > name), such that I can also provide an 'arr.iteritems' method that returns > tuples with 'name/tick' and the array contents of that index. In the above > syntax, this would probably be 'arr..iteritems'. > > Another feature I like is being able to translate back and forth from > names/ticks to integers, which I do through my 'Dimension.__getitem__' method > (Dimension is the equivalent of datarray's 'Axis'). > > PS: I also have a separation between axis and their naming, meaning that I can > have a single axis with both 'country' and 'year', such that I would index with > 'Netherlands-2010' (other examples do make more sense), but still be able to > access them separately (this reduces the size of the full ndarray, as there is > no need for so many NaNs to make the ndarray homoheneus on size, and it brings > the ndarray closer to the structuring of data on the mind of the user). > > Read you, > ? ? Lluis > > -- > ?"And it's much the same thing with knowledge, for whenever you learn > ?something new, the whole world becomes that much richer." > ?-- The Princess of Pure Reason, as told by Norton Juster in The Phantom > ?Tollbooth > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > arr['Netherlands','2010'] Isn't this the __getitem___ action we were trying to avoid? --Josh From rspeer at MIT.EDU Thu Jul 8 11:49:14 2010 From: rspeer at MIT.EDU (Rob Speer) Date: Thu, 8 Jul 2010 11:49:14 -0400 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: <86r5jetedm.wl%lluis@ginnungagap.pc.ac.upc.edu> References: <86r5jetedm.wl%lluis@ginnungagap.pc.ac.upc.edu> Message-ID: On Thu, Jul 8, 2010 at 7:13 AM, Llu?s wrote: > Thus, we can use something in the middle: > > ? arr[0,1] > ? arr.names['Netherlands',2010] # I'd rather go for 'names' instead of 'ticks' Ah ha. So this is the case with positional axes but named ticks, which we haven't really brought up yet. I'm definitely thinking of making the top-level datarray support "named" as well, which would make it into: >>> arr.named('Netherlands', 2010) But the other change you've got here is to make "named" into a __getitem__-able object instead of a method, so you use square brackets with it and can use slice syntax. I could do it this way as well. But I don't understand your second example: > ? arr.country['Spain'].year[1994:2010] That seems to run straight into the index/name ambiguity. Shouldn't that take the 1994th through 2010th indices along the "year" axis? Not every axis will have names, so you can't make *all* the indexing go by names. If named were a getitem-able object, that would be: >>> arr.country.named['Spain'].year.named[1994:2010] -- Rob From xscript at gmx.net Thu Jul 8 11:55:00 2010 From: xscript at gmx.net (=?UTF-8?B?TGx1w61z?=) Date: Thu, 08 Jul 2010 17:55:00 +0200 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: References: <86r5jetedm.wl%lluis@ginnungagap.pc.ac.upc.edu> Message-ID: <86pqyyt1d7.wl%lluis@ginnungagap.pc.ac.upc.edu> Joshua Holbrook writes: > On Thu, Jul 8, 2010 at 3:13 AM, Llu?s wrote: >> Rob Speer writes: >> >>>>>> arr.country.named('Netherlands').year.named(2010) >>>>>> arr.country.named('Spain').year.named(slice(1994, 2010)) >>>>>> arr.year.named(2006).country[0:2] >> >> This looks too verbose to me. >> >> As axis always have a total order, I'd go for the most compact representation >> (assuming 'country' is the first axis, and 'year' the second one): >> >> ? arr['Netherlands','2010'] >> ? arr['Spain','1994':'2010'] >> ? arr[0:2,'2006'] >> [...] >> >> Thus, we can use something in the middle: >> >> ? arr[0,1] >> ? arr.names['Netherlands',2010] # I'd rather go for 'names' instead of 'ticks' >> ? arr.country['Spain'].year[1994:2010] >> [...] >> arr['Netherlands','2010'] > Isn't this the __getitem___ action we were trying to avoid? Sorry but I hooked into the whole naming discussion just now, so I'm not aware of much previous discussions except for this thread. What I assumed is that 'arr[...]' is not a desired syntax because of a possible performance loss. That's why I think 'arr.names[...]' might be a good compromise. Use 'arr[]' for the standard integer-based indexing, and 'arr.names[]' for the fancy mixed integer+string indexing. My opinion is that no integer name/tick must be allowed (thus the above example would be arr.names['Netherlands','2010']), such that the user is able to mix "real" indexes with names. Whether this mix makes any sense or not, is something that I'm not sure about, but I'd try to eliminate "unnecessary" typing as much as possible. Read you, Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From seb.haase at gmail.com Thu Jul 8 12:01:57 2010 From: seb.haase at gmail.com (Sebastian Haase) Date: Thu, 8 Jul 2010 18:01:57 +0200 Subject: [Numpy-discussion] Memory usage of numpy-arrays In-Reply-To: <4C35E4B9.5000505@gmail.com> References: <4C35E4B9.5000505@gmail.com> Message-ID: On Thu, Jul 8, 2010 at 4:46 PM, Bruce Southey wrote: > On 07/08/2010 08:52 AM, Wes McKinney wrote: >> On Thu, Jul 8, 2010 at 9:26 AM, Hannes Bretschneider >> ?wrote: >> >>> Dear NumPy developers, >>> >>> I have to process some big data files with high-frequency >>> financial data. I am trying to load a delimited text file having >>> ~700 MB with ~ 10 million lines using numpy.genfromtxt(). The >>> machine is a Debian Lenny server 32bit with 3GB of memory. ?Since >>> the file is just 700MB I am naively assuming that it should fit >>> into memory in whole. However, when I attempt to load it, python >>> fills the entire available memory and then fails with >>> >>> >>> Traceback (most recent call last): >>> ? File "", line 1, in >>> ? File "/usr/local/lib/python2.6/site-packages/numpy/lib/io.py", line 1318, in genfromtxt >>> ? ? errmsg = "\n".join(errmsg) >>> MemoryError >>> >>> >>> Is there a way to load this file without crashing? >>> >>> Thanks, Hannes >>> >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> > From my experience I might suggest using PyTables (HDF5) as >> intermediate storage for the data which can be populated iteratively >> (you'll have to parse the data yourself, marking missing data could be >> a problem). This of course requires that you know the column schema >> ahead of time which is one thing that np.genfromtxt will handle >> automatically. Particularly if you have a large static data set this >> can be worthwhile as reading the data out of HDF5 will be many times >> faster than parsing the text file. >> >> I believe you can also append rows to the PyTables Table structure in >> chunks which would be faster than appending one row at a time. >> >> hth, >> Wes >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > There have been past discussions on this. Numpy needs contiguous memory > so you are running out of memory because as loading the original data > and the numpy array will exhaust your available contiguous memory. Note > that a file of ~700 MB does not translate into ~700 MB of memory since > it depends on the dtypes. Also a system with 3GB of memory probably has > about 1.5GB of free memory available (you might get closer to 2GB if you > have a very lean system). > > If you know your data then you have do all the hard work yourself to > minimize memory usage or use something like hdf5 or PyTables. > > Bruce > I would expect a 700MB text file translate into less than 200MB of data - assuming that you are talking about decimal numbers (maybe total of 10 digits each + spaces) and saving as float32 binary. So the problem would "only" be the loading in - rather, going through - all lines of text from start to end without choking. This might be better done "by hand", i.e. in standard (non numpy) python: nums = [] for line in file("myTextFile.txt"): fields = line.split() nums.extend (map(float, fields)) The last line converts to python-floats which is float64. Using lists adds extra bytes behind the scenes. So, one would have to read in in blocks and blockwise convert to float32 numpy arrays. There is not much more to say unless we know more about the format of the text file. Regards, Sebastian Haase From rspeer at MIT.EDU Thu Jul 8 12:02:52 2010 From: rspeer at MIT.EDU (Rob Speer) Date: Thu, 8 Jul 2010 12:02:52 -0400 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: References: Message-ID: > While I haven't had a chance to really look in-depth at the changes > myself (I'm a busy man! So many mailing lists!), I so far like the > look and sound of them. That's just my opinion, though. If people are okay with the attribute magic, I have a proposal for more of it. In my own project where I use labeled arrays (http://github.com/commonsense/divisi2), I don't have labeled axes. But I assumed everything was 1 or 2-D, and gave the 2-D matrices methods like "row_named", "col_named", etc., to encourage readable code. With the current implementation of datarray, I could get that by labeling the axes "row" and "col", except the moment you transpose a matrix like that you get rows named "col" and columns named "row", so that's not the right answer. My proposal is that datarray.row should be equivalent to datarray.axes[0], and datarray.column should be equivalent to datarray.axes[1], so that you can always ask for something like "arr.column.named(2010)" (replace those with square brackets if you like). Not sure yet what the right way is to generalize this to 1-D and n-D. -- Rob From jsseabold at gmail.com Thu Jul 8 12:39:22 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Thu, 8 Jul 2010 12:39:22 -0400 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: References: Message-ID: On Thu, Jul 8, 2010 at 12:02 PM, Rob Speer wrote: >> While I haven't had a chance to really look in-depth at the changes >> myself (I'm a busy man! So many mailing lists!), I so far like the >> look and sound of them. That's just my opinion, though. > > If people are okay with the attribute magic, I have a proposal for more of it. > > In my own project where I use labeled arrays > (http://github.com/commonsense/divisi2), I don't have labeled axes. > But I assumed everything was 1 or 2-D, and gave the 2-D matrices > methods like "row_named", "col_named", etc., to encourage readable > code. > > With the current implementation of datarray, I could get that by > labeling the axes "row" and "col", except the moment you transpose a > matrix like that you get rows named "col" and columns named "row", so > that's not the right answer. > > My proposal is that datarray.row should be equivalent to > datarray.axes[0], and datarray.column should be equivalent to > datarray.axes[1], so that you can always ask for something like > "arr.column.named(2010)" (replace those with square brackets if you > like). > > Not sure yet what the right way is to generalize this to 1-D and n-D. I think we have to start from the nD case, even if I (and I think most users) will tend to think in 2D. The rest is just going to have to be up to developers how they want users to interact with what we, the developers, see as axes. No end-user wants to think about the 6th axis of the data, but I don't want to be pegged into rows and columns thinking because I don't think it works for the below example. Forgive me if this is has already been addressed, but my question is what happens when we have more than one "label" (not as in a labeled axis but an observation label -- but not a tick because they're not unique!) per say row axis and heterogenous dtypes. This is really the problem that I would like to see addressed and from the BoF comments I'm not sure this use case is going to be covered. I'm also not sure I expressed myself clearly enough or understood what's already available. For me, this is the single most common use case and most of what we are talking about now is just convenient slicing but ignoring some basic and prominent concerns. Please correct me if I'm wrong. I need to play more with DataArray implementation but haven't had time yet. I often have data that looks like this (not really, but it gives the idea in a general way I think). city, month, year, region, precipitation, temperature "Austin", "January", 1980, "South", 12.1, 65.4, "Austin", "February", 1980, "South", 24.3, 55.4 "Austin", "March", 1980, "South", 3, 69.1 .... "Austin", "December", 2009, 1, 62.1 "Boston", "January", 1980, "Northeast", 1.5, 19.2 .... "Boston","December", 2009, "Northeast", 2.1, 23.5 ... "Memphis","January",1980, "South", 2.1, 35.6 ... "Memphis","December",2009, "South", 1.2, 33.5 ... Sometimes, I want, say, to know what the average temperature is in December. Sometimes I want to know what the average temperature is in Memphis. Sometimes I want to know the average temperature in Memphis in December or in Memphis in 1985. If I do this with structured arrays, most group-by type operations are at best O(n). Really this isn't feasible. An even more difficult question is what if I want descriptive statistics on the "region" variable? Ie., I want to know how many observations I have for each region. This one can wait, but is still important for doing statistics. Can these use cases be covered right now by DataArray? Pandas, larry, divisi? Others? I'm having trouble thinking how it could be done with DataArray. Skipper From xscript at gmx.net Thu Jul 8 13:03:47 2010 From: xscript at gmx.net (=?UTF-8?B?TGx1w61z?=) Date: Thu, 08 Jul 2010 19:03:47 +0200 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: References: <86r5jetedm.wl%lluis@ginnungagap.pc.ac.upc.edu> Message-ID: <86ocehucr0.wl%lluis@ginnungagap.pc.ac.upc.edu> Rob Speer writes: > On Thu, Jul 8, 2010 at 7:13 AM, Llu?s wrote: >> Thus, we can use something in the middle: >> >> ? arr[0,1] >> ? arr.names['Netherlands',2010] # I'd rather go for 'names' instead of 'ticks' > Ah ha. So this is the case with positional axes but named ticks, which > we haven't really brought up yet. I'm definitely thinking of making > the top-level datarray support "named" as well, which would make it > into: >>>> arr.named('Netherlands', 2010) > But the other change you've got here is to make "named" into a > __getitem__-able object instead of a method, so you use square > brackets with it and can use slice syntax. I could do it this way as > well. Right, seamless slicing is precisely why I have __getitem__. > But I don't understand your second example: >> ? arr.country['Spain'].year[1994:2010] > That seems to run straight into the index/name ambiguity. Shouldn't > that take the 1994th through 2010th indices along the "year" axis? Not > every axis will have names, so you can't make *all* the indexing go by > names. Sorry, I just c&p without placing the necessary '. > If named were a getitem-able object, that would be: >>>> arr.country.named['Spain'].year.named[1994:2010] Or what I was striving for: arr.year.named[1994:2010] arr.year['1994':'2010'] arr.year['1994':-3] I already have the code for managing all kinds of indexind methods in sciexp2, so I you want I could try to integrate it into datarray. Read you, Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From rspeer at MIT.EDU Thu Jul 8 13:35:29 2010 From: rspeer at MIT.EDU (Rob Speer) Date: Thu, 8 Jul 2010 13:35:29 -0400 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: References: Message-ID: > Forgive me if this is has already been addressed, but my question is > what happens when we have more than one "label" (not as in a labeled > axis but an observation label -- but not a tick because they're not > unique!) per say row axis and heterogenous dtypes. ?This is really the > problem that I would like to see addressed and from the BoF comments > I'm not sure this use case is going to be covered. ?I'm also not sure > I expressed myself clearly enough or understood what's already > available. ?For me, this is the single most common use case and most > of what we are talking about now is just convenient slicing but > ignoring some basic and prominent concerns. ?Please correct me if I'm > wrong. ?I need to play more with DataArray implementation but haven't > had time yet. > > I often have data that looks like this (not really, but it gives the > idea in a general way I think). > > city, month, year, region, precipitation, temperature > "Austin", "January", 1980, "South", 12.1, 65.4, > "Austin", "February", 1980, "South", 24.3, 55.4 > "Austin", "March", 1980, "South", 3, 69.1 > .... > "Austin", "December", 2009, 1, 62.1 > "Boston", "January", 1980, "Northeast", 1.5, 19.2 > .... > "Boston","December", 2009, "Northeast", 2.1, 23.5 > ... > "Memphis","January",1980, "South", 2.1, 35.6 > ... > "Memphis","December",2009, "South", 1.2, 33.5 > ... Your labels are unique if you look at them the right way. Here's how I would represent that in a datarray: * axis0 = 'city', ['Austin', 'Boston', ...] * axis1 = 'month', ['January', 'February', ...] * axis2 = 'year', [1980, 1981, ...] * axis3 = 'region', ['Northeast', 'South', ...] * axis4 = 'measurement', ['precipitation', 'temperature'] and then I'd make a 5-D datarray labeled with [axis0, axis1, axis2, axis3, axis4]. Now I realize not everyone wants to represent their tabular data as a big tensor that they index every which way, and I think this is one thing that pandas is for. Oh, and the other problem with the 5-D datarray is that you'd probably want it to be sparse. This is another discussion worth having. I want to eventually replace the labeling stuff in Divisi with datarray, but sparse matrices are largely the point of using Divisi. So how do we make a sparse datarray? One answer would be to have datarray be a wrapper that encapsulates any sufficiently matrix-like type. This is approximately what I did in the now-obsolete Divisi1. Nobody liked the fact that you had to wrap and unwrap your arrays to accomplish anything that we hadn't thought of in writing Divisi. I would not recommend this route. The other option, which is more like Divisi2. would be to provide the functionality of datarray using a mixin. Then a standard dense datarray could inherit from (np.ndarray, Datarray), while a sparse datarray could inherit from (sparse.csr_matrix, Datarray), for example. -- Rob From xscript at gmx.net Thu Jul 8 13:38:08 2010 From: xscript at gmx.net (=?UTF-8?B?TGx1w61z?=) Date: Thu, 08 Jul 2010 19:38:08 +0200 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: References: Message-ID: <86mxu1ub5r.wl%lluis@ginnungagap.pc.ac.upc.edu> Skipper Seabold writes: > On Thu, Jul 8, 2010 at 12:02 PM, Rob Speer wrote: [...] >> My proposal is that datarray.row should be equivalent to >> datarray.axes[0], and datarray.column should be equivalent to >> datarray.axes[1], so that you can always ask for something like >> "arr.column.named(2010)" (replace those with square brackets if you >> like). >> >> Not sure yet what the right way is to generalize this to 1-D and n-D. > I think we have to start from the nD case, even if I (and I think most > users) will tend to think in 2D. The rest is just going to have to be > up to developers how they want users to interact with what we, the > developers, see as axes. No end-user wants to think about the 6th > axis of the data, but I don't want to be pegged into rows and columns > thinking because I don't think it works for the below example. You could simply provide a subclass of datarray called 'table' that automatically labels the two (mandatory) axis as 'column' and 'row'. [...] > city, month, year, region, precipitation, temperature > "Austin", "January", 1980, "South", 12.1, 65.4, > "Austin", "February", 1980, "South", 24.3, 55.4 > "Austin", "March", 1980, "South", 3, 69.1 > .... > "Austin", "December", 2009, 1, 62.1 > "Boston", "January", 1980, "Northeast", 1.5, 19.2 > .... > "Boston","December", 2009, "Northeast", 2.1, 23.5 > ... > "Memphis","January",1980, "South", 2.1, 35.6 > ... > "Memphis","December",2009, "South", 1.2, 33.5 > ... > Sometimes, I want, say, to know what the average temperature is in > December. Sometimes I want to know what the average temperature is in > Memphis. Sometimes I want to know the average temperature in Memphis > in December or in Memphis in 1985. If I do this with structured > arrays, most group-by type operations are at best O(n). Really this > isn't feasible. If I understood well, you could have 4 axes (assuming that an Axis can only handle a single label/variable). a = DatArray(numpy.array([...], dtype = [("precipitation", float), ("temperature", float)]), (("city", ["Austin", ...]), ("month", ["January"]), ...)) Then, you can: a.city.named("Memphis").month.named("December")["temperature"].mean() a.city.named("Memphis").year.named(1985)["temperature"].mean() Or shorter: a.named["Memphis","December"]["temperature"].mean() a.named["Memphis",:,"1985"]["temperature"].mean() This raises the problem of non-homogeneous measurements. For example, if you had only a few measurements for Austin, the rest would be just NaNs to make the shape homogeneus. I solved this in sciexp2 with (this is not the API, but translated into a DatArray-like interface for clarity): a = Data(numpy.array([...], dtype = [("precipitation", float), ("temperature", float)]), (("measurement", "@city at -@month at -@year at -@region@", [{"city": "Austin", "month": "January", "year": 1980, "region": "South"}, ...]))) a.named[::"city == 'Memphis' && month == 'December'"]["temperature"].mean() a.named[::"city == 'Memphis' && year == 1985"]["temperature"].mean() But of course, this represents a tradeoff between "wasted" space and speed. The internals are on the line of (using ordered dicts): { 'city' : { 'Memphis': set(), ... }, 'month' : { 'December': set(), ... }, ... } Which translates into: a[union( d['city']['Memphis'], d['month']['december'] )] There's a less optimized path that supports arbitrary expressions (less than, more than or equal, etc.), but has a cost of O(n). > An even more difficult question is what if I want descriptive > statistics on the "region" variable? Ie., I want to know how many > observations I have for each region. This one can wait, but is still > important for doing statistics. This _should_ be: a.region.named("South").size Read you, Lluis -- "And it's much the same thing with knowledge, for whenever you learn something new, the whole world becomes that much richer." -- The Princess of Pure Reason, as told by Norton Juster in The Phantom Tollbooth From rspeer at MIT.EDU Thu Jul 8 13:41:34 2010 From: rspeer at MIT.EDU (Rob Speer) Date: Thu, 8 Jul 2010 13:41:34 -0400 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: <86ocehucr0.wl%lluis@ginnungagap.pc.ac.upc.edu> References: <86r5jetedm.wl%lluis@ginnungagap.pc.ac.upc.edu> <86ocehucr0.wl%lluis@ginnungagap.pc.ac.upc.edu> Message-ID: >> But I don't understand your second example: >>> ? arr.country['Spain'].year[1994:2010] > >> That seems to run straight into the index/name ambiguity. Shouldn't >> that take the 1994th through 2010th indices along the "year" axis? Not >> every axis will have names, so you can't make *all* the indexing go by >> names. > > Sorry, I just c&p without placing the necessary '. > >> If named were a getitem-able object, that would be: >>>>> arr.country.named['Spain'].year.named[1994:2010] > > Or what I was striving for: > > ? arr.year.named[1994:2010] > ? arr.year['1994':'2010'] > ? arr.year['1994':-3] So your proposal is, whenever there's an index that is not an integer, look it up by name, and use .named only if you want integer tick names? This feels too inconsistent to me. It adds a fair amount of confusion to save a small amount of typing. If keystrokes are that important, I'd rather replace "named" with something shorter than lose the distinction entirely. -- Rob From xscript at gmx.net Thu Jul 8 13:56:32 2010 From: xscript at gmx.net (=?UTF-8?B?TGx1w61z?=) Date: Thu, 08 Jul 2010 19:56:32 +0200 Subject: [Numpy-discussion] BOF notes: Fernando's proposal: NumPy ndarray with named axes In-Reply-To: References: <86r5jetedm.wl%lluis@ginnungagap.pc.ac.upc.edu> <86ocehucr0.wl%lluis@ginnungagap.pc.ac.upc.edu> Message-ID: <86k4p5uab3.wl%lluis@ginnungagap.pc.ac.upc.edu> Rob Speer writes: >> Or what I was striving for: >> >> ? arr.year.named[1994:2010] >> ? arr.year['1994':'2010'] >> ? arr.year['1994':-3] > So your proposal is, whenever there's an index that is not an integer, > look it up by name, and use .named only if you want integer tick > names? This feels too inconsistent to me. It adds a fair amount of > confusion to save a small amount of typing. If keystrokes are that > important, I'd rather replace "named" with something shorter than lose > the distinction entirely. No. I'd rather go for eliminating the 'arr.year.named', and providing only: * arr.__getitem__ * arr.named.__getitem__ * arr.