From schlesin at cshl.edu Wed Dec 1 00:46:39 2010 From: schlesin at cshl.edu (Felix Schlesinger) Date: Wed, 1 Dec 2010 05:46:39 +0000 (UTC) Subject: [Numpy-discussion] A faster median (Wirth's method) References: <4A9C9DDA.9060503@molden.no> <4A9D7B5E.6040009@student.matnat.uio.no> <4A9D9432.20907@molden.no> Message-ID: > > import numpy as np > > cimport numpy as cnp > > > cdef cnp.float64_t namean(cnp.ndarray[cnp.float64_t, ndim=1] a): > > return np.nanmean(a) # just a placeholder > > > is not allowed? It works for me. Is it a cython version thing? > > (I've got 0.13), > > Oh, that's nice! I'm using 0.11.2. OK, time to upgrade. Oh wow, does that mean that http://trac.cython.org/cython_trac/ticket/177 is fixed? I couldn't find anything in the release notes about that, but it would be great news. Does the cdef function acquire and hold the buffer? Felix From jeanluc.menut at free.fr Wed Dec 1 05:23:22 2010 From: jeanluc.menut at free.fr (Jean-Luc Menut) Date: Wed, 01 Dec 2010 11:23:22 +0100 Subject: [Numpy-discussion] numpy speed question In-Reply-To: References: <4CEE36DD.8000105@free.fr> Message-ID: <4CF6221A.6020804@free.fr> Le 26/11/2010 17:48, Bruce Sherwood a ?crit : > Although this was mentioned earlier, it's worth emphasizing that if > you need to use functions such as cosine with scalar arguments, you > should use math.cos(), not numpy.cos(). The numpy versions of these > functions are optimized for handling array arguments and are much > slower than the math versions for scalar arguments. Yes I understand that. I just want to stress that it was not a benchmark (nor a critic) but a test to know if it was interesting to translate directly an IDL code into python/numpy before trying to optimize it (I know more python than IDL). I expected to have approximatively the same speed for both, was surprised by the result, and wanted to know if there was an obvious reason besides the unoptimization for scalars. From John.Hornstein at nrl.navy.mil Wed Dec 1 09:26:02 2010 From: John.Hornstein at nrl.navy.mil (John Hornstein) Date: Wed, 1 Dec 2010 09:26:02 -0500 Subject: [Numpy-discussion] Python versions for NumPy 1.5 Message-ID: <004d01cb9163$ae317de0$0a9479a0$@Hornstein@nrl.navy.mil> Does NumPy 1.5 work with Python 2.7 or Python 3.x? From charlesr.harris at gmail.com Wed Dec 1 09:32:51 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 1 Dec 2010 07:32:51 -0700 Subject: [Numpy-discussion] Python versions for NumPy 1.5 In-Reply-To: <493532460441506022@unknownmsgid> References: <493532460441506022@unknownmsgid> Message-ID: On Wed, Dec 1, 2010 at 7:26 AM, John Hornstein wrote: > Does NumPy 1.5 work with Python 2.7 or Python 3.x? > > > Yes, both. NumPy 1.5.1 fixes some small bugs and that is what you should use. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsalvati at u.washington.edu Wed Dec 1 11:07:18 2010 From: jsalvati at u.washington.edu (John Salvatier) Date: Wed, 1 Dec 2010 08:07:18 -0800 Subject: [Numpy-discussion] A faster median (Wirth's method) In-Reply-To: References: <4A9C9DDA.9060503@molden.no> <4A9D7B5E.6040009@student.matnat.uio.no> <4A9D9432.20907@molden.no> Message-ID: @Keith Goodman I think I figured it out. I believe something like the following will do what you want, iterating across one axis specially, so it can apply a median function along an axis. This code in particular is for calculating a moving average and seems to work (though I haven't checked my math). Let me know if you find any problems. def ewma(a, d, int axis = -1): out = np.empty(a.shape, dtype) cdef np.flatiter ita, ito ita = np.PyArray_IterAllButAxis(a, &axis) ito = np.PyArray_IterAllButAxis(out, &axis) cdef int i cdef int axis_length = a.shape[axis] cdef int a_axis_stride = a.strides[axis]/a.itemsize cdef int o_axis_stride = out.strides[axis]/out.itemsize cdef double avg = 0.0 cdef double weight = 1.0 - np.exp(-d) while np.PyArray_ITER_NOTDONE(ita): avg = 0.0 for i in range(axis_length): avg += (np.PyArray_ITER_DATA (ita))[i * a_axis_stride ] * weight + avg * (1 - weight) (np.PyArray_ITER_DATA (ito))[i * o_axis_stride ] = avg np.PyArray_ITER_NEXT(ita) np.PyArray_ITER_NEXT(ito) return out On Tue, Nov 30, 2010 at 9:46 PM, Felix Schlesinger wrote: > > > import numpy as np > > > cimport numpy as cnp > > > > > cdef cnp.float64_t namean(cnp.ndarray[cnp.float64_t, ndim=1] a): > > > return np.nanmean(a) # just a placeholder > > > > > is not allowed? It works for me. Is it a cython version thing? > > > (I've got 0.13), > > > > Oh, that's nice! I'm using 0.11.2. OK, time to upgrade. > > Oh wow, does that mean that http://trac.cython.org/cython_trac/ticket/177 > is fixed? I couldn't find anything in the release notes about that, > but it would be great news. Does the cdef function acquire and hold > the buffer? > > Felix > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From gregwh at gmail.com Wed Dec 1 14:16:13 2010 From: gregwh at gmail.com (greg whittier) Date: Wed, 1 Dec 2010 14:16:13 -0500 Subject: [Numpy-discussion] broadcasting with numpy.interp In-Reply-To: References: Message-ID: On Wed, Nov 24, 2010 at 3:16 PM, Friedrich Romstedt < friedrichromstedt at gmail.com> wrote: > 2010/11/16 greg whittier : > > I'd like to be able to speed up the following code. > > > > def replace_dead(cube, dead): > > # cube.shape == (320, 640, 1200) > > # dead.shape == (320, 640) > > # cube[i,j,:] are bad points to be replaced via interpolation if > > dead[i,j] == True > > > > bands = np.arange(0, cube.shape[0]) > > for line in range(cube.shape[1]): > > dead_bands = bands[dead[:, line] == True] > > good_bands = bands[dead[:, line] == False] > > for sample in range(cube.shape[2]): > > # interp returns fp[0] for x < xp[0] and fp[-1] for x > xp[-1] > > cube[dead_bands, line, sample] = \ > > np.interp(dead_bands, > > good_bands, > > cube[good_bands, line, sample]) > > I assume you just need *some* interpolation, not that specific one? > In that case, I'd suggest the following: > > 1) Use a 2d interpolation, taking into account all nearest neighbours. > 2) For this, use a looped interpolation in this nearest-neighbour sense: > a) Generate sums of all unmasked nearest-neighbour values > b) Generate counts for the nearest neighbours present > c) Replace the bad values by the sums divided by the count. > d) Continue at (a) if there are bad values left > > Bad values which are neighbouring each other (>= 3) need multiple > passes through the loop. It should be pretty fast. > > If this is what you have in mind, maybe we (or I) can make up some code. > > Friedrich > > Thanks so much for the response! Sorry I didn't respond earlier. I put it aside until I found time to try and understand part 2 of your response and forgot about it. I'm not really looking for 2d interpolation at the moment, but I can see needing it in the future. Right now, I just want to interpolate along one of the three axes. I think what you're suggesting might work for 1d or 2d depending on how you find the nearest neighbors. What routine would you use? Also, when you say "unmasked" do you mean literally using masked arrays? Thanks, Greg -------------- next part -------------- An HTML attachment was scrubbed... URL: From kbasye1 at jhu.edu Wed Dec 1 14:18:36 2010 From: kbasye1 at jhu.edu (Ken Basye) Date: Wed, 01 Dec 2010 14:18:36 -0500 Subject: [Numpy-discussion] printoption to allow hexified floats? Message-ID: <4CF69F8C.2090603@jhu.edu> Hi Numpy folks, When working with floats, I prefer to have exact string representations in doctests and other reference-based testing; I find it helps a lot to avoid chasing cross-platform differences that are really about the string conversion rather than about numerical differences. Since Python 2.6, the, the hex() method on floats has been available and it gives an exact representation. Is there any way to have Numpy arrays of floats printed using this representation? If not, would there be interest in adding that? On a somewhat related note, is there a table someplace which shows which versions of Python are supported in each release of Numpy? I found an FAQ that mentioned 2.4 and 2.5, but since it didn't mention 2.6 or 2.7 (much less 3.1), I assume it's out of date. This relates to the above since it would be harder to support a new hex printoption for Pythons before 2.6. Thanks, Ken B. From kwgoodman at gmail.com Wed Dec 1 14:47:36 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 1 Dec 2010 11:47:36 -0800 Subject: [Numpy-discussion] A Cython apply_along_axis function Message-ID: It's hard to write Cython code that can handle all dtypes and arbitrary number of dimensions. The former is typically dealt with using templates, but what do people do about the latter? I'm trying to take baby steps towards writing an apply_along_axis function that takes as input a cython function, numpy array, and axis. I'm using the following numpy ticket as a guide but I'm really just copying and pasting without understanding: http://projects.scipy.org/numpy/attachment/ticket/1213/_selectmodule.pyx Can anyone spot why I get a segfault on the call to nanmean_1d in apply_along_axis? import numpy as np cimport numpy as np import cython cdef double NAN = np.nan ctypedef np.float64_t (*func_t)(void *buf, np.npy_intp size, np.npy_intp s) def apply_along_axis(np.ndarray[np.float64_t, ndim=1] a, int axis): cdef func_t nanmean_1d cdef np.npy_intp stride, itemsize cdef int ndim = a.ndim cdef np.float64_t out itemsize = a.itemsize if ndim == 1: stride = a.strides[0] // itemsize # convert stride bytes --> items out = nanmean_1d(a.data, a.shape[0], stride) else: raise ValueError("Not yet coded") return out cdef np.float64_t nanmean_1d(void *buf, np.npy_intp n, np.npy_intp s): "nanmean of buffer." cdef np.float64_t *a = buf # cdef np.npy_intp i, count = 0 cdef np.float64_t asum, ai if s == 1: for i in range(n): ai = a[i] if ai == ai: asum += ai count += 1 else: for i in range(n): ai = a[i*s] if ai == ai: asum += ai count += 1 if count > 0: return asum / count else: return NAN From dagss at student.matnat.uio.no Wed Dec 1 15:00:38 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Wed, 01 Dec 2010 21:00:38 +0100 Subject: [Numpy-discussion] A Cython apply_along_axis function In-Reply-To: References: Message-ID: <4CF6A966.2070002@student.matnat.uio.no> On 12/01/2010 08:47 PM, Keith Goodman wrote: > It's hard to write Cython code that can handle all dtypes and > arbitrary number of dimensions. The former is typically dealt with > using templates, but what do people do about the latter? > What you typically do is to use the C-level iterator API. In fact there's a recent thread on cython-users that does exactly this ("How can I use PyArray_IterAllButAxis..."). Of course, make sure you take the comments of that thread into account (!). I feel that is easier to work with than what you do below. Not saying it couldn't be easier, but it's not too bad once you get used to it. Dag Sverre > I'm trying to take baby steps towards writing an apply_along_axis > function that takes as input a cython function, numpy array, and axis. > I'm using the following numpy ticket as a guide but I'm really just > copying and pasting without understanding: > > http://projects.scipy.org/numpy/attachment/ticket/1213/_selectmodule.pyx > > Can anyone spot why I get a segfault on the call to nanmean_1d in > apply_along_axis? > > import numpy as np > cimport numpy as np > import cython > > cdef double NAN = np.nan > ctypedef np.float64_t (*func_t)(void *buf, np.npy_intp size, np.npy_intp s) > > def apply_along_axis(np.ndarray[np.float64_t, ndim=1] a, int axis): > > cdef func_t nanmean_1d > cdef np.npy_intp stride, itemsize > cdef int ndim = a.ndim > cdef np.float64_t out > > itemsize = a.itemsize > > if ndim == 1: > stride = a.strides[0] // itemsize # convert stride bytes --> items > out = nanmean_1d(a.data, a.shape[0], stride) > else: > raise ValueError("Not yet coded") > > return out > > cdef np.float64_t nanmean_1d(void *buf, np.npy_intp n, np.npy_intp s): > "nanmean of buffer." > cdef np.float64_t *a = buf # > cdef np.npy_intp i, count = 0 > cdef np.float64_t asum, ai > if s == 1: > for i in range(n): > ai = a[i] > if ai == ai: > asum += ai > count += 1 > else: > for i in range(n): > ai = a[i*s] > if ai == ai: > asum += ai > count += 1 > if count> 0: > return asum / count > else: > return NAN > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From david at silveregg.co.jp Wed Dec 1 20:53:43 2010 From: david at silveregg.co.jp (David) Date: Thu, 02 Dec 2010 10:53:43 +0900 Subject: [Numpy-discussion] A Cython apply_along_axis function In-Reply-To: References: Message-ID: <4CF6FC27.5040005@silveregg.co.jp> Hi Keith, On 12/02/2010 04:47 AM, Keith Goodman wrote: > It's hard to write Cython code that can handle all dtypes and > arbitrary number of dimensions. The former is typically dealt with > using templates, but what do people do about the latter? The only way that I know to do that systematically is iterator. There is a relatively simple example in scipy/signal (lfilter.c.src). I wonder if it would be possible to add better support for numpy iterators in cython... cheers, David From kwgoodman at gmail.com Wed Dec 1 21:07:04 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 1 Dec 2010 18:07:04 -0800 Subject: [Numpy-discussion] A Cython apply_along_axis function In-Reply-To: <4CF6FC27.5040005@silveregg.co.jp> References: <4CF6FC27.5040005@silveregg.co.jp> Message-ID: On Wed, Dec 1, 2010 at 5:53 PM, David wrote: > On 12/02/2010 04:47 AM, Keith Goodman wrote: >> It's hard to write Cython code that can handle all dtypes and >> arbitrary number of dimensions. The former is typically dealt with >> using templates, but what do people do about the latter? > > The only way that I know to do that systematically is iterator. There is > a relatively simple example in scipy/signal (lfilter.c.src). > > I wonder if it would be possible to add better support for numpy > iterators in cython... Thanks for the tip. I'm starting to think that for now I should just template both dtype and ndim. From jsalvati at u.washington.edu Wed Dec 1 21:09:27 2010 From: jsalvati at u.washington.edu (John Salvatier) Date: Wed, 1 Dec 2010 18:09:27 -0800 Subject: [Numpy-discussion] A Cython apply_along_axis function In-Reply-To: References: <4CF6FC27.5040005@silveregg.co.jp> Message-ID: On Wed, Dec 1, 2010 at 6:07 PM, Keith Goodman wrote: > On Wed, Dec 1, 2010 at 5:53 PM, David wrote: > > > On 12/02/2010 04:47 AM, Keith Goodman wrote: > >> It's hard to write Cython code that can handle all dtypes and > >> arbitrary number of dimensions. The former is typically dealt with > >> using templates, but what do people do about the latter? > > > > The only way that I know to do that systematically is iterator. There is > > a relatively simple example in scipy/signal (lfilter.c.src). > > > > I wonder if it would be possible to add better support for numpy > > iterators in cython... > > Thanks for the tip. I'm starting to think that for now I should just > template both dtype and ndim. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > I enthusiastically support better iterator support for cython -------------- next part -------------- An HTML attachment was scrubbed... URL: From wardefar at iro.umontreal.ca Wed Dec 1 21:53:21 2010 From: wardefar at iro.umontreal.ca (David Warde-Farley) Date: Wed, 1 Dec 2010 21:53:21 -0500 Subject: [Numpy-discussion] printoption to allow hexified floats? In-Reply-To: <4CF69F8C.2090603@jhu.edu> References: <4CF69F8C.2090603@jhu.edu> Message-ID: On 2010-12-01, at 2:18 PM, Ken Basye wrote: > On a somewhat related note, is there a table someplace which shows > which versions of Python are supported in each release of Numpy? I > found an FAQ that mentioned 2.4 and 2.5, but since it didn't mention 2.6 > or 2.7 (much less 3.1), I assume it's out of date. This relates to the > above since it would be harder to support a new hex printoption for > Pythons before 2.6. NumPy 1.5.x still aims to support Python >= 2.4. I don't know what the plans are for dropping 2.4 support, but I don't think 2.5 would be dropped until some time after official support for 2.4 is phased out. I'm confused how having an exact hex representation of a float would help with doctests, though. It seems like it would exacerbate platform issues. One thought is to include a 'print' and explicit format specifier, which (I think?) is fairly consistent across platforms... or is this what you mean to say you're already doing? David From robert.kern at gmail.com Wed Dec 1 22:02:02 2010 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 1 Dec 2010 21:02:02 -0600 Subject: [Numpy-discussion] printoption to allow hexified floats? In-Reply-To: <4CF69F8C.2090603@jhu.edu> References: <4CF69F8C.2090603@jhu.edu> Message-ID: On Wed, Dec 1, 2010 at 13:18, Ken Basye wrote: > Hi Numpy folks, > ? ? When working with floats, I prefer to have exact string > representations in doctests and other reference-based testing; I find it > helps a lot to avoid chasing cross-platform differences that are really > about the string conversion rather than about numerical differences. Unfortunately, there are still cross-platform numerical differences that are real (but are irrelevant to the validity of the code under test). Hex-printing for floats only helps a little to make doctests useful for numerical code. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From jsalvati at u.washington.edu Wed Dec 1 22:35:42 2010 From: jsalvati at u.washington.edu (John Salvatier) Date: Wed, 1 Dec 2010 19:35:42 -0800 Subject: [Numpy-discussion] MultiIter version of PyArray_IterAllButAxis ? Message-ID: Hello, I am writing a UFunc creation utility, and I would like to know: is there a way to mimic the behavior ofPyArray_IterAllButAxis for multiple arrays at a time? I would like to be able to write UFuncs that take an axis argument and also take multiple array arguments, for example I want to be able to create a moving average with a weighting that changes according to another array. Best Regards, John -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at silveregg.co.jp Wed Dec 1 22:56:36 2010 From: david at silveregg.co.jp (David) Date: Thu, 02 Dec 2010 12:56:36 +0900 Subject: [Numpy-discussion] MultiIter version of PyArray_IterAllButAxis ? In-Reply-To: References: Message-ID: <4CF718F4.3010508@silveregg.co.jp> On 12/02/2010 12:35 PM, John Salvatier wrote: > Hello, > > I am writing a UFunc creation utility, and I would like to know: is > there a way to mimic the behavior ofPyArray_IterAllButAxis for multiple > arrays at a time? Is there a reason why creating a separate iterator for each array is not possible ? cheers, David From jsalvati at u.washington.edu Wed Dec 1 23:00:25 2010 From: jsalvati at u.washington.edu (John Salvatier) Date: Wed, 1 Dec 2010 20:00:25 -0800 Subject: [Numpy-discussion] MultiIter version of PyArray_IterAllButAxis ? In-Reply-To: <4CF718F4.3010508@silveregg.co.jp> References: <4CF718F4.3010508@silveregg.co.jp> Message-ID: On Wed, Dec 1, 2010 at 7:56 PM, David wrote: > On 12/02/2010 12:35 PM, John Salvatier wrote: > > Hello, > > > > I am writing a UFunc creation utility, and I would like to know: is > > there a way to mimic the behavior ofPyArray_IterAllButAxis for multiple > > arrays at a time? > > Is there a reason why creating a separate iterator for each array is not > possible ? > If the arrays are not the same shape, separate iterators won't be aligned, so if the results are going into a broadcasted result array, the computation won't be correct. -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertwb at math.washington.edu Thu Dec 2 02:17:50 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 1 Dec 2010 23:17:50 -0800 Subject: [Numpy-discussion] A Cython apply_along_axis function In-Reply-To: References: <4CF6FC27.5040005@silveregg.co.jp> Message-ID: On Wed, Dec 1, 2010 at 6:09 PM, John Salvatier wrote: > On Wed, Dec 1, 2010 at 6:07 PM, Keith Goodman wrote: >> >> On Wed, Dec 1, 2010 at 5:53 PM, David wrote: >> >> > On 12/02/2010 04:47 AM, Keith Goodman wrote: >> >> It's hard to write Cython code that can handle all dtypes and >> >> arbitrary number of dimensions. The former is typically dealt with >> >> using templates, but what do people do about the latter? >> > >> > The only way that I know to do that systematically is iterator. There is >> > a relatively simple example in scipy/signal (lfilter.c.src). >> > >> > I wonder if it would be possible to add better support for numpy >> > iterators in cython... >> >> Thanks for the tip. I'm starting to think that for now I should just >> template both dtype and ndim. >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > I enthusiastically support better iterator support for cython I enthusiastically welcome contributions along this line. - Robert From dagss at student.matnat.uio.no Thu Dec 2 04:08:12 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 02 Dec 2010 10:08:12 +0100 Subject: [Numpy-discussion] A Cython apply_along_axis function In-Reply-To: References: <4CF6FC27.5040005@silveregg.co.jp> Message-ID: <4CF761FC.60804@student.matnat.uio.no> On 12/02/2010 08:17 AM, Robert Bradshaw wrote: > On Wed, Dec 1, 2010 at 6:09 PM, John Salvatier > wrote: > >> On Wed, Dec 1, 2010 at 6:07 PM, Keith Goodman wrote: >> >>> On Wed, Dec 1, 2010 at 5:53 PM, David wrote: >>> >>> >>>> On 12/02/2010 04:47 AM, Keith Goodman wrote: >>>> >>>>> It's hard to write Cython code that can handle all dtypes and >>>>> arbitrary number of dimensions. The former is typically dealt with >>>>> using templates, but what do people do about the latter? >>>>> >>>> The only way that I know to do that systematically is iterator. There is >>>> a relatively simple example in scipy/signal (lfilter.c.src). >>>> >>>> I wonder if it would be possible to add better support for numpy >>>> iterators in cython... >>>> >>> Thanks for the tip. I'm starting to think that for now I should just >>> template both dtype and ndim. >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >> I enthusiastically support better iterator support for cython >> > I enthusiastically welcome contributions along this line. > Me too :-) I guess we're moving into more Cython-list territory, so let's move any follow-ups there (posting this one both places). Just in case anybody is wondering what something like this could look like, here's a rough scetch complete with bugs. The idea would be to a) add some rudimentary support for using the yield keyword in Cython to make a generator function, b) inline the generator function if the generator is used directly in a for-loop. This should result in very efficient code, and would also be much easier to implement than a general purpose generator. @cython.inline cdef array_iter_double(np.ndarray a, int axis=-1): cdef np.flatiter it ita = np.PyArray_IterAllButAxis(a, &axis) cdef Py_ssize_t stride = a.strides[axis], length = a.shape[axis], i while np.PyArray_ITER_NOTDONE(ita): for i in range(length): yield (np.PyArray_ITER_DATA(it) + )[i * stride])[0] # TODO: Probably yield indices as well np.PyArray_ITER_NEXT(it) # TODO: add faster special-cases for stride == sizeof(double) # Use NumPy iterator API to sum all values of array with # arbitrary number of dimensions: cdef double s = 0, value for value in array_iter_double(myarray): s += value # at this point, the contents of the array_iter_double function is copied, # and "s += value" simply inserted everywhere "yield" occurs in the function Dag Sverre From friedrichromstedt at gmail.com Thu Dec 2 07:04:46 2010 From: friedrichromstedt at gmail.com (Friedrich Romstedt) Date: Thu, 2 Dec 2010 13:04:46 +0100 Subject: [Numpy-discussion] broadcasting with numpy.interp In-Reply-To: References: Message-ID: 2010/12/1 greg whittier : > On Wed, Nov 24, 2010 at 3:16 PM, Friedrich Romstedt > wrote: >> I assume you just need *some* interpolation, not that specific one? >> In that case, I'd suggest the following: >> >> 1) ?Use a 2d interpolation, taking into account all nearest neighbours. >> 2) ?For this, use a looped interpolation in this nearest-neighbour sense: >> ? ?a) ?Generate sums of all unmasked nearest-neighbour values >> ? ?b) ?Generate counts for the nearest neighbours present >> ? ?c) ?Replace the bad values by the sums divided by the count. >> ? ?d) ?Continue at (a) if there are bad values left >> >> Bad values which are neighbouring each other (>= 3) need multiple >> passes through the loop. ?It should be pretty fast. >> >> If this is what you have in mind, maybe we (or I) can make up some code. >> >> Friedrich > > Thanks so much for the response!? Sorry I didn't respond earlier.? I put it > aside until I found time to try and understand part 2 of your response and > forgot about it.? I'm not really looking for 2d interpolation at the moment, > but I can see needing it in the future.? Right now, I just want to > interpolate along one of the three axes.? I think what you're suggesting > might work for 1d or 2d depending on how you find the nearest neighbors. > What routine would you use?? Also, when you say "unmasked" do you mean > literally using masked arrays? Hi Greg, if you can estimate that you'll need a more sophisticated algorithm in future I'd recommend to write it in full glory, in a general way, in the end it'll save you time (this is what I would do). Yes, you're right, by choosing just neighbours along one axis you could do simple one-axis interpolation, but in some corner cases it'll not work properly since it will work the following (some ascii graphics): "x" are present values, "-" are missing values. The chain might look like the following: xxxx-xxxx In this case, interpolation will work. It'll pick the two neighbours, and interpolate them. But consider this: xxxx--xxxx This will just propagate the end points to the neighbours. The missing points will have just one neighbour, hence this behaviour. After the propagation, all values are filled, and you end up with a step in the middle. If such neighbouring missing data points are rare, it might still be considerable over Python loops with numpy.interp(). I don't see a way to vectorize interp(), since the run lengthes are different in each case. You might consider writing a C or Cython function, but I cannot give any advise with this. I'm thinking about a way to propagate the values over more than one step. You might know that interpolation (in images) uses also kernels extending beyond the next neighbours. But I don't know precisely how to design them. First, I'd like to know if you have or have not such neighbouring missing data points. And why do you prefer interpolation in only one axis? I can help with the code, but I'd prefer to do it the following way: You write the code, and when you're stuck, seriously, you write back to the list. I'm sure I could do the code, but 1) it might (might?) save me time, 2) You might profit from doing it yourself :-) Would you mind putting the code online in a github repo? Might well be that I sometimes run across a similar problem. Considering your masking question, I would keep the mask array separate, but this is rather because I'm not familiar with masked arrays. Another thing which comes into my mind would be to rewrite or write a new interp() which takes care of masked entries, but it would be quite an amount of work for me (I'm not familiar with the C interior of numpy either). And it would be restricted to one dimension only. If you can please give more detail on you data, where it comes from etc. Friedrich From totonixsame at gmail.com Thu Dec 2 07:35:38 2010 From: totonixsame at gmail.com (totonixsame at gmail.com) Date: Thu, 2 Dec 2010 10:35:38 -0200 Subject: [Numpy-discussion] Threshold Message-ID: Hi all, I' m developing a medical software named InVesalius [1], it is a free software. It uses numpy arrays to store the medical images (CT and MRI) and the mask, the mask is used to mark the region of interest and to create 3D surfaces. Those array generally have 512x512 elements. The mask is created based in threshold, with lower and upper bound, this way: mask = numpy.zeros(medical_image.shape, dtype="uint16") mask[ numpy.logical_and( medical_image >= lower, medical_image <= upper)] = 255 Where lower and upper are the threshold bounds. Here I' m marking the array positions where medical_image is between the threshold bounds with 255, where isn' t with 0. The question is: Is there a better way to do that? Thank! [1] - svn.softwarepublico.gov.br/trac/invesalius From zachary.pincus at yale.edu Thu Dec 2 08:14:03 2010 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Thu, 2 Dec 2010 08:14:03 -0500 Subject: [Numpy-discussion] Threshold In-Reply-To: References: Message-ID: <5CFD7508-E92B-47B2-96A1-38FBBC235DFE@yale.edu> > mask = numpy.zeros(medical_image.shape, dtype="uint16") > mask[ numpy.logical_and( medical_image >= lower, medical_image <= > upper)] = 255 > > Where lower and upper are the threshold bounds. Here I' m marking the > array positions where medical_image is between the threshold bounds > with 255, where isn' t with 0. The question is: Is there a better > way to do that? This will give you a True/False boolean mask: mask = numpy.logical_and( medical_image >= lower, medical_image <= upper) And this a 0/255 mask: mask = 255*numpy.logical_and( medical_image >= lower, medical_image <= upper) You can make the code a bit more terse/idiomatic by using the bitwise operators, which do logical operations on boolean arrays: mask = 255*((medical_image >= lower) & (medical_image <= upper)) Though this is a bit annoying as the bitwise ops (& | ^ ~) have higher precedence than the comparison ops (< <= > >=), so you need to parenthesize carefully, as above. Zach On Dec 2, 2010, at 7:35 AM, totonixsame at gmail.com wrote: > Hi all, > > I' m developing a medical software named InVesalius [1], it is a free > software. It uses numpy arrays to store the medical images (CT and > MRI) and the mask, the mask is used to mark the region of interest and > to create 3D surfaces. Those array generally have 512x512 elements. > The mask is created based in threshold, with lower and upper bound, > this way: > > mask = numpy.zeros(medical_image.shape, dtype="uint16") > mask[ numpy.logical_and( medical_image >= lower, medical_image <= > upper)] = 255 > > Where lower and upper are the threshold bounds. Here I' m marking the > array positions where medical_image is between the threshold bounds > with 255, where isn' t with 0. The question is: Is there a better way > to do that? > > Thank! > > [1] - svn.softwarepublico.gov.br/trac/invesalius > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From totonixsame at gmail.com Thu Dec 2 09:06:02 2010 From: totonixsame at gmail.com (totonixsame at gmail.com) Date: Thu, 2 Dec 2010 12:06:02 -0200 Subject: [Numpy-discussion] Threshold In-Reply-To: <5CFD7508-E92B-47B2-96A1-38FBBC235DFE@yale.edu> References: <5CFD7508-E92B-47B2-96A1-38FBBC235DFE@yale.edu> Message-ID: On Thu, Dec 2, 2010 at 11:14 AM, Zachary Pincus wrote: >> mask = numpy.zeros(medical_image.shape, dtype="uint16") >> mask[ numpy.logical_and( medical_image >= lower, medical_image <= >> upper)] = 255 >> >> Where lower and upper are the threshold bounds. Here I' m marking the >> array positions where medical_image is between the threshold bounds >> with 255, where isn' t with 0. The question is: Is there a better >> way to do that? > > This will give you a True/False boolean mask: > mask = numpy.logical_and( medical_image >= lower, medical_image <= > upper) > > And this a 0/255 mask: > mask = 255*numpy.logical_and( medical_image >= lower, medical_image <= > upper) > > You can make the code a bit more terse/idiomatic by using the bitwise > operators, which do logical operations on boolean arrays: > mask = 255*((medical_image >= lower) & (medical_image <= upper)) > > Though this is a bit annoying as the bitwise ops (& | ^ ~) have higher > precedence than the comparison ops (< <= > >=), so you need to > parenthesize carefully, as above. > > Zach Thanks, Zach! I stayed with the last one. From kbasye1 at jhu.edu Thu Dec 2 12:17:07 2010 From: kbasye1 at jhu.edu (Ken Basye) Date: Thu, 02 Dec 2010 12:17:07 -0500 Subject: [Numpy-discussion] printoption to allow hexified floats? In-Reply-To: References: Message-ID: <4CF7D493.3060802@jhu.edu> Thanks for the replies. Robert is right; many numerical operations, particularly complex ones, generate different values across platforms, and we deal with these by storing the values from some platform as a reference and using allclose(), which requires extra work. But many basic operations generate the same underlying values on IEEE 754-compliant platforms but don't always format floats consistently (see http://bugs.python.org/issue1580 for a lengthy discussion on this). My impression is that Python 2.7 does a better job here, but at this point a lot of differences also crop up between 2.6 (or less) and 2.7 due to the changed formatting built into 2.7, and these are the result of formatting differences; the numbers themselves are identical (in our experience so far, at any rate). This is a current pain-point which an exact representation would alleviate. In response to David, we haven't implemented a separate print; we rely on the Numpy repr/str for ndarrays and the printoptions that allow some control over float formatting. I'm basically proposing to add a bit more control there. And thanks for the info on supported versions of Python. Ken On 12/2/10 8:14 AM, Robert Kern wrote: > On Wed, Dec 1, 2010 at 13:18, Ken Basye wrote: >> Hi Numpy folks, >> ? ? When working with floats, I prefer to have exact string >> representations in doctests and other reference-based testing; I find it >> helps a lot to avoid chasing cross-platform differences that are really >> about the string conversion rather than about numerical differences. > Unfortunately, there are still cross-platform numerical differences > that are real (but are irrelevant to the validity of the code under > test). Hex-printing for floats only helps a little to make doctests > useful for numerical code. From charlesr.harris at gmail.com Thu Dec 2 12:20:27 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 2 Dec 2010 10:20:27 -0700 Subject: [Numpy-discussion] Float16 and PEP 3118 Message-ID: Hi Folks, Now that the float16 type is in I was wondering if we should do anything to support it in the PEP 3118 buffer interface. This would probably affect the Cython folks as well as the people working on fixing up the structure module for Python 3.x. There is a fairly long thread about the latter and it also looks like what the Python folks are doing with structure alignment isn't going to be compatible with Numpy structured arrays. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Thu Dec 2 12:41:25 2010 From: ben.root at ou.edu (Benjamin Root) Date: Thu, 2 Dec 2010 11:41:25 -0600 Subject: [Numpy-discussion] printoption to allow hexified floats? In-Reply-To: <4CF7D493.3060802@jhu.edu> References: <4CF7D493.3060802@jhu.edu> Message-ID: On Thu, Dec 2, 2010 at 11:17 AM, Ken Basye wrote: > Thanks for the replies. > > Robert is right; many numerical operations, particularly complex ones, > generate different values across platforms, and we deal with these by > storing the values from some platform as a reference and using > allclose(), which requires extra work. But many basic operations > generate the same underlying values on IEEE 754-compliant platforms but > don't always format floats consistently (see > http://bugs.python.org/issue1580 for a lengthy discussion on this). My > impression is that Python 2.7 does a better job here, but at this point > a lot of differences also crop up between 2.6 (or less) and 2.7 due to > the changed formatting built into 2.7, and these are the result of > formatting differences; the numbers themselves are identical (in our > experience so far, at any rate). This is a current pain-point which an > exact representation would alleviate. > > In response to David, we haven't implemented a separate print; we rely > on the Numpy repr/str for ndarrays and the printoptions that allow some > control over float formatting. I'm basically proposing to add a bit > more control there. And thanks for the info on supported versions of > Python. > > Ken > > Another approach to consider is to save the numerical data in a platform-independent standard file format (maybe like netcdf?). While this isn't a fool-proof approach because the calculations themselves may introduce differences that are platform dependent, this at least puts strong controls on one aspect of the overall problem. One caveat that does come across my mind is if the save/load process for the file might have some platform-dependent differences based on the compression/decompression schemes. For example, the GRIB file format does a compression where the mean value and the differences from those means are stored. Calculations like these might introduce some slight differences on various platforms. Just food for thought, Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Thu Dec 2 13:16:47 2010 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 2 Dec 2010 18:16:47 +0000 (UTC) Subject: [Numpy-discussion] Float16 and PEP 3118 References: Message-ID: Thu, 02 Dec 2010 10:20:27 -0700, Charles R Harris wrote: > Now that the float16 type is in I was wondering if we should do anything > to support it in the PEP 3118 buffer interface. This would probably > affect the Cython folks as well as the people working on fixing up the > structure module for Python 3.x. Before introducing a PEP 3118 type code for half floats in the PEP, one would need to argue the Python people to add it to the struct module. Before that, the choices probably are: - refuse to export buffers containing half floats - export half floats as two bytes > There is a fairly long thread about the latter and it also looks like > what the Python folks are doing with structure alignment isn't going to > be compatible with Numpy structured arrays. Thoughts? I think it would be useful for the Python people to have feedback from us here. AFAIK, the native-aligned mode that was discussed there is compatible with what dtype(..., align=True) produces: Numpy aligns structs as given by the maximum alignment of its fields. -- Pauli Virtanen From pearu.peterson at gmail.com Thu Dec 2 15:52:49 2010 From: pearu.peterson at gmail.com (Pearu Peterson) Date: Thu, 2 Dec 2010 22:52:49 +0200 Subject: [Numpy-discussion] Pushing changes to numpy git repo problem Message-ID: Hi, I have followed Development workflow instructions in http://docs.scipy.org/doc/numpy/dev/gitwash/ but I am having a problem with the last step: $ git push upstream ticket1679:master fatal: remote error: You can't push to git://github.com/numpy/numpy.git Use git at github.com:numpy/numpy.git What I am doing wrong? Here's some additional info: $ git remote -v show origin git at github.com:pearu/numpy.git (fetch) origin git at github.com:pearu/numpy.git (push) upstream git://github.com/numpy/numpy.git (fetch) upstream git://github.com/numpy/numpy.git (push) $ git branch -a master * ticket1679 remotes/origin/HEAD -> origin/master remotes/origin/maintenance/1.0.3.x remotes/origin/maintenance/1.1.x remotes/origin/maintenance/1.2.x remotes/origin/maintenance/1.3.x remotes/origin/maintenance/1.4.x remotes/origin/maintenance/1.5.x remotes/origin/master remotes/origin/ticket1679 remotes/upstream/maintenance/1.0.3.x remotes/upstream/maintenance/1.1.x remotes/upstream/maintenance/1.2.x remotes/upstream/maintenance/1.3.x remotes/upstream/maintenance/1.4.x remotes/upstream/maintenance/1.5.x remotes/upstream/master Thanks, Pearu From fperez.net at gmail.com Thu Dec 2 16:07:14 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Thu, 2 Dec 2010 13:07:14 -0800 Subject: [Numpy-discussion] Pushing changes to numpy git repo problem In-Reply-To: References: Message-ID: On Thu, Dec 2, 2010 at 12:52 PM, Pearu Peterson wrote: > > What I am doing wrong? > > Here's some additional info: > $ git remote -v show > origin ?git at github.com:pearu/numpy.git (fetch) > origin ?git at github.com:pearu/numpy.git (push) > upstream ? ? ? ?git://github.com/numpy/numpy.git (fetch) > upstream ? ? ? ?git://github.com/numpy/numpy.git (push) The git:// protocol is read-only, for write access you need ssh access. Just edit your /path-to-repo/.git/config file and change the git://github.com/numpy/numpy.git lines for git at github.com:numpy/numpy.git in the upstream description. That should be sufficient. Regards, f From charlesr.harris at gmail.com Thu Dec 2 16:08:10 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 2 Dec 2010 14:08:10 -0700 Subject: [Numpy-discussion] Pushing changes to numpy git repo problem In-Reply-To: References: Message-ID: On Thu, Dec 2, 2010 at 1:52 PM, Pearu Peterson wrote: > Hi, > > I have followed Development workflow instructions in > > http://docs.scipy.org/doc/numpy/dev/gitwash/ > > but I am having a problem with the last step: > > $ git push upstream ticket1679:master > fatal: remote error: > You can't push to git://github.com/numpy/numpy.git > Use git at github.com:numpy/numpy.git > > Do what the message says, the first address is readonly. You can change the settings in .git/config, mine looks like [core] repositoryformatversion = 0 filemode = true bare = false logallrefupdates = true [remote "origin"] fetch = +refs/heads/*:refs/remotes/origin/* url = git at github.com:charris/numpy [branch "master"] remote = origin merge = refs/heads/master [remote "upstream"] url = git at github.com:numpy/numpy fetch = +refs/heads/*:refs/remotes/upstream/* [alias] mb = merge --no-ff Where upstream is the numpy repository. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pearu.peterson at gmail.com Thu Dec 2 16:14:05 2010 From: pearu.peterson at gmail.com (Pearu Peterson) Date: Thu, 2 Dec 2010 23:14:05 +0200 Subject: [Numpy-discussion] Pushing changes to numpy git repo problem In-Reply-To: References: Message-ID: Thanks! Pearu On Thu, Dec 2, 2010 at 11:08 PM, Charles R Harris wrote: > > > On Thu, Dec 2, 2010 at 1:52 PM, Pearu Peterson > wrote: >> >> Hi, >> >> I have followed Development workflow instructions in >> >> ?http://docs.scipy.org/doc/numpy/dev/gitwash/ >> >> but I am having a problem with the last step: >> >> $ git push upstream ticket1679:master >> fatal: remote error: >> ?You can't push to git://github.com/numpy/numpy.git >> ?Use git at github.com:numpy/numpy.git >> > > Do what the message says, the first address is readonly. You can change the > settings in .git/config, mine looks like > > [core] > ??????? repositoryformatversion = 0 > ??????? filemode = true > ??????? bare = false > ??????? logallrefupdates = true > [remote "origin"] > ??????? fetch = +refs/heads/*:refs/remotes/origin/* > ??????? url = git at github.com:charris/numpy > [branch "master"] > ??????? remote = origin > ??????? merge = refs/heads/master > [remote "upstream"] > ??????? url = git at github.com:numpy/numpy > ??????? fetch = +refs/heads/*:refs/remotes/upstream/* > [alias] > ??????? mb = merge --no-ff > > Where upstream is the numpy repository. > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From nwagner at iam.uni-stuttgart.de Fri Dec 3 02:29:34 2010 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Fri, 03 Dec 2010 08:29:34 +0100 Subject: [Numpy-discussion] numpy.test() Program received signal SIGABRT, Aborted. Message-ID: Hi all, I have installed the latest version of numpy. >>> numpy.__version__ '2.0.0.dev-6aacc2d' numpy.test(verbose=2) received signal SIGABRT. test_cdouble_2 (test_linalg.TestEig) ... ok test_csingle (test_linalg.TestEig) ... FAIL *** glibc detected *** /data/home/nwagner/local/bin/python: free(): invalid next size (fast): 0x000000001c2887b0 *** ======= Backtrace: ========= /lib64/libc.so.6[0x383cc71684] /lib64/libc.so.6(cfree+0x8c)[0x383cc74ccc] /data/home/nwagner/local/lib/python2.5/site-packages/numpy/core/multiarray.so[0x2b33e06f710e] (gdb) bt #0 0x000000383cc30155 in raise () from /lib64/libc.so.6 #1 0x000000383cc31bf0 in abort () from /lib64/libc.so.6 #2 0x000000383cc6a3db in __libc_message () from /lib64/libc.so.6 #3 0x000000383cc71684 in _int_free () from /lib64/libc.so.6 #4 0x000000383cc74ccc in free () from /lib64/libc.so.6 #5 0x00002b33e06f710e in array_dealloc (self=0x1c65fa00) at numpy/core/src/multiarray/arrayobject.c:209 #6 0x00000000004d6dbb in frame_dealloc (f=0x1c65eec0) at Objects/frameobject.c:416 Nils From charlesr.harris at gmail.com Fri Dec 3 02:42:16 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 3 Dec 2010 00:42:16 -0700 Subject: [Numpy-discussion] numpy.test() Program received signal SIGABRT, Aborted. In-Reply-To: References: Message-ID: On Fri, Dec 3, 2010 at 12:29 AM, Nils Wagner wrote: > Hi all, > > I have installed the latest version of numpy. > > >>> numpy.__version__ > '2.0.0.dev-6aacc2d' > > I don't see that here or on the buildbots. There was a problem with segfaults that was fixed in commit c0e1c0000f27b55dfd5aCan you check that your installation is clean, etc. Also, what platform are you running on? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From nwagner at iam.uni-stuttgart.de Fri Dec 3 02:47:32 2010 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Fri, 03 Dec 2010 08:47:32 +0100 Subject: [Numpy-discussion] numpy.test() Program received signal SIGABRT, Aborted. In-Reply-To: References: Message-ID: On Fri, 3 Dec 2010 00:42:16 -0700 Charles R Harris wrote: > On Fri, Dec 3, 2010 at 12:29 AM, Nils Wagner > wrote: > >> Hi all, >> >> I have installed the latest version of numpy. >> >> >>> numpy.__version__ >> '2.0.0.dev-6aacc2d' >> >> > > I don't see that here or on the buildbots. There was a >problem with > segfaults that was fixed in commit > c0e1c0000f27b55dfd5aCan > you check that your installation is clean, etc. Also, >what platform > are > you running on? I have removed the build directory. Is it also neccessary to remove numpy in thr installation directory ? /data/home/nwagner/local/lib/python2.5/site-packages/ Platform 2.6.18-92.el5 #1 SMP Tue Jun 10 18:51:06 EDT 2008 x86_64 x86_64 x86_64 GNU/Linux Nils From nwagner at iam.uni-stuttgart.de Fri Dec 3 02:56:02 2010 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Fri, 03 Dec 2010 08:56:02 +0100 Subject: [Numpy-discussion] numpy.test() Program received signal SIGABRT, Aborted. In-Reply-To: References: Message-ID: On Fri, 03 Dec 2010 08:47:32 +0100 "Nils Wagner" wrote: > On Fri, 3 Dec 2010 00:42:16 -0700 > Charles R Harris wrote: >> On Fri, Dec 3, 2010 at 12:29 AM, Nils Wagner >> wrote: >> >>> Hi all, >>> >>> I have installed the latest version of numpy. >>> >>> >>> numpy.__version__ >>> '2.0.0.dev-6aacc2d' >>> >>> >> >> I don't see that here or on the buildbots. There was a >>problem with >> segfaults that was fixed in commit >> c0e1c0000f27b55dfd5aCan >> you check that your installation is clean, etc. Also, >>what platform >> are >> you running on? > > I have removed the build directory. > Is it also neccessary to remove numpy in thr >installation > directory ? > > /data/home/nwagner/local/lib/python2.5/site-packages/ > > Platform > > 2.6.18-92.el5 #1 SMP Tue Jun 10 18:51:06 EDT 2008 x86_64 > x86_64 x86_64 GNU/Linux > > > Nils > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion I have also removed the numpy directory within /data/home/nwagner/local/lib/python2.5/site-packages/. Now all tests pass. Ran 3080 tests in 12.288s OK (KNOWNFAIL=4, SKIP=1) How is the build process implemented on the build bots ? Nils From oc-spam66 at laposte.net Fri Dec 3 06:29:49 2010 From: oc-spam66 at laposte.net (oc-spam66) Date: Fri, 03 Dec 2010 12:29:49 +0100 Subject: [Numpy-discussion] numpy.r_[True, False] is not a boolean array Message-ID: <4CF8D4AD.801@laposte.net> Hello, I observe the following behavior: numpy.r_[True, False] -> array([1, 0], dtype=int8) numpy.r_[True] -> array([ True], dtype=bool) I would expect the first line to give a boolean array: array([ True, False], dtype=bool) Is it normal? Is it a bug? -- O.C. numpy.__version__ = '1.4.1' From josef.pktd at gmail.com Fri Dec 3 06:48:10 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 3 Dec 2010 06:48:10 -0500 Subject: [Numpy-discussion] numpy.r_[True, False] is not a boolean array In-Reply-To: <4CF8D4AD.801@laposte.net> References: <4CF8D4AD.801@laposte.net> Message-ID: On Fri, Dec 3, 2010 at 6:29 AM, oc-spam66 wrote: > Hello, > > I observe the following behavior: > > numpy.r_[True, False] ? -> array([1, 0], dtype=int8) > numpy.r_[True] ? ? ? ? ?-> array([ True], dtype=bool) and >>> np.r_[[True], [False]] array([ True, False], dtype=bool) >>> np.r_[[True, False]] array([ True, False], dtype=bool) > > I would expect the first line to give a boolean array: > array([ True, False], dtype=bool) > > Is it normal? Is it a bug? Looks like a bug to me. Josef > > -- > O.C. > numpy.__version__ = '1.4.1' > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From moura.mario at gmail.com Fri Dec 3 07:31:20 2010 From: moura.mario at gmail.com (Mario Moura) Date: Fri, 3 Dec 2010 10:31:20 -0200 Subject: [Numpy-discussion] itertools.combinations to numpy Message-ID: Hi Folks I have this situation >>> from timeit import Timer >>> reps = 5 >>> >>> t = Timer('itertools.combinations(range(1,10),3)', 'import itertools') >>> print sum(t.repeat(repeat=reps, number=1)) / reps 1.59740447998e-05 >>> t = Timer('itertools.combinations(range(1,100),3)', 'import itertools') >>> print sum(t.repeat(repeat=reps, number=1)) / reps 1.74999237061e-05 >>> >>> t = Timer('list(itertools.combinations(range(1,10),3))', 'import itertools') >>> print sum(t.repeat(repeat=reps, number=1)) / reps 5.31673431396e-05 >>> t = Timer('list(itertools.combinations(range(1,100),3))', 'import itertools') >>> print sum(t.repeat(repeat=reps, number=1)) / reps 0.0556231498718 >>> You can see list(itertools.combinations(range(1,100),3)) is terrible!! If you change to range(1,100000) your computer will lock. So I would like to know a good way to convert to ndarray? fast! without use list Is it possible? >>> x = itertools.combinations(range(1,10),3) >>> x >>> I tried this from http://docs.python.org/library/itertools.html?highlight=itertools#itertools.combinations >>> numpy.fromiter(itertools.combinations(range(1,10),3), int, count=-1) Traceback (most recent call last): File "", line 1, in ValueError: setting an array element with a sequence. >>> and this from http://docs.python.org/library/itertools.html?highlight=itertools#itertools.combinations import numpy from itertools import * from numpy import * def combinations(iterable, r): pool = tuple(iterable) n = len(pool) for indices in permutations(range(n), r): if sorted(indices) == list(indices): yield tuple(pool[i] for i in indices) numpy.fromiter(combinations(range(1,10),3), int, count=-1) >>> numpy.fromiter(combinations(range(1,10),3), int, count=-1) Traceback (most recent call last): File "", line 1, in ValueError: setting an array element with a sequence. >>> I like itertools.combinations performance but I need convert it to numpy. Best Regards mario From sturla at molden.no Fri Dec 3 07:32:46 2010 From: sturla at molden.no (Sturla Molden) Date: Fri, 3 Dec 2010 13:32:46 +0100 Subject: [Numpy-discussion] A Cython apply_along_axis function In-Reply-To: References: Message-ID: <1e1a8219245e238092b40c2656799f27.squirrel@webmail.uio.no> > if ndim == 1: > stride = a.strides[0] // itemsize # convert stride bytes --> items Oh, did I really do this in selectmodule.pyx? :( That is clearly an error. I don't have time to fix it now. Sturla From sturla at molden.no Fri Dec 3 07:42:50 2010 From: sturla at molden.no (Sturla Molden) Date: Fri, 3 Dec 2010 13:42:50 +0100 Subject: [Numpy-discussion] A Cython apply_along_axis function In-Reply-To: References: Message-ID: <24f90a063c03155ad1952f7ee25191d9.squirrel@webmail.uio.no> > It's hard to write Cython code that can handle all dtypes and > arbitrary number of dimensions. The former is typically dealt with > using templates, but what do people do about the latter? There are number of ways to do it. NumPy's C API has an iterator that returns an axis on demand. Mine just collects an array with pointers to the first element in each axis. The latter is more friendly to parallelization (OpenMP or Python threads with released GIL), which is why I wrote it, otherwise it has no advantage over NumPy's. Sturla From warren.weckesser at enthought.com Fri Dec 3 09:50:57 2010 From: warren.weckesser at enthought.com (Warren Weckesser) Date: Fri, 3 Dec 2010 08:50:57 -0600 Subject: [Numpy-discussion] itertools.combinations to numpy In-Reply-To: References: Message-ID: On Fri, Dec 3, 2010 at 6:31 AM, Mario Moura wrote: > Hi Folks > > I have this situation > > >>> from timeit import Timer > >>> reps = 5 > >>> > >>> t = Timer('itertools.combinations(range(1,10),3)', 'import itertools') > >>> print sum(t.repeat(repeat=reps, number=1)) / reps > 1.59740447998e-05 > >>> t = Timer('itertools.combinations(range(1,100),3)', 'import itertools') > >>> print sum(t.repeat(repeat=reps, number=1)) / reps > 1.74999237061e-05 > >>> > >>> t = Timer('list(itertools.combinations(range(1,10),3))', 'import > itertools') > >>> print sum(t.repeat(repeat=reps, number=1)) / reps > 5.31673431396e-05 > >>> t = Timer('list(itertools.combinations(range(1,100),3))', 'import > itertools') > >>> print sum(t.repeat(repeat=reps, number=1)) / reps > 0.0556231498718 > >>> > > You can see list(itertools.combinations(range(1,100),3)) is terrible!! > > If you change to range(1,100000) your computer will lock. > > So I would like to know a good way to convert object> to ndarray? fast! without use list > Is it possible? > > >>> x = itertools.combinations(range(1,10),3) > >>> x > > >>> > > I tried this from > > http://docs.python.org/library/itertools.html?highlight=itertools#itertools.combinations > > >>> numpy.fromiter(itertools.combinations(range(1,10),3), int, count=-1) > Traceback (most recent call last): > File "", line 1, in > ValueError: setting an array element with a sequence. > >>> > > and this from > > http://docs.python.org/library/itertools.html?highlight=itertools#itertools.combinations > > import numpy > from itertools import * > from numpy import * > > def combinations(iterable, r): > pool = tuple(iterable) > n = len(pool) > for indices in permutations(range(n), r): > if sorted(indices) == list(indices): > yield tuple(pool[i] for i in indices) > > > numpy.fromiter(combinations(range(1,10),3), int, count=-1) > > >>> numpy.fromiter(combinations(range(1,10),3), int, count=-1) > Traceback (most recent call last): > File "", line 1, in > ValueError: setting an array element with a sequence. > >>> > > > I like itertools.combinations performance but I need convert it to numpy. > > The docstring for numpy.fromiter() says it creates a 1D array. You can use it with itertools.combinations if you specify a dtype for a 1D structured array. Here's an example (I'm using ipython with the -pylab option, so the numpy functions have all been imported): In [1]: from itertools import combinations In [2]: dt = dtype('i,i,i') In [3]: a = fromiter(combinations(range(100),3), dtype=dt, count=-1) In [4]: b = array(list(combinations(range(100),3))) In [5]: all(a.view(int).reshape(-1,3) == b) Out[5]: True In [6]: timeit a = fromiter(combinations(range(100),3), dtype=dt, count=-1) 10 loops, best of 3: 92.7 ms per loop In [7]: timeit b = array(list(combinations(range(100),3))) 1 loops, best of 3: 627 ms per loop In [8]: a[:3] Out[8]: array([(0, 1, 2), (0, 1, 3), (0, 1, 4)], dtype=[('f0', ' From fabian.pedregosa at inria.fr Fri Dec 3 10:24:49 2010 From: fabian.pedregosa at inria.fr (Fabian Pedregosa) Date: Fri, 3 Dec 2010 16:24:49 +0100 Subject: [Numpy-discussion] [PATCH] gfortran under macports Message-ID: Hi all. Macports installs gfortran as part of the gcc package, but names it gfortran-mp-$version, without providing a symbolic link to a default gcfortran executable, and thus numpy.distutils is unable to find the right executable. The attached patch very simple, it just extends possible_executables with those names, but makes the build of scipy work without having to restore to obscure fc_config flags. Fabian. -------------- next part -------------- A non-text attachment was scrubbed... Name: 0001-FIX-recognize-macports-gfortran-compiler.patch Type: application/octet-stream Size: 1165 bytes Desc: not available URL: From charlesr.harris at gmail.com Fri Dec 3 11:00:45 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 3 Dec 2010 09:00:45 -0700 Subject: [Numpy-discussion] numpy.test() Program received signal SIGABRT, Aborted. In-Reply-To: References: Message-ID: On Fri, Dec 3, 2010 at 12:56 AM, Nils Wagner wrote: > On Fri, 03 Dec 2010 08:47:32 +0100 > "Nils Wagner" wrote: > > On Fri, 3 Dec 2010 00:42:16 -0700 > > Charles R Harris wrote: > >> On Fri, Dec 3, 2010 at 12:29 AM, Nils Wagner > >> wrote: > >> > >>> Hi all, > >>> > >>> I have installed the latest version of numpy. > >>> > >>> >>> numpy.__version__ > >>> '2.0.0.dev-6aacc2d' > >>> > >>> > >> > >> I don't see that here or on the buildbots. There was a > >>problem with > >> segfaults that was fixed in commit > >> c0e1c0000f27b55dfd5a< > https://github.com/numpy/numpy/commit/c0e1c0000f27b55dfd5aa4b1674a8c1b6ac38c36 > >Can > >> you check that your installation is clean, etc. Also, > >>what platform > >> are > >> you running on? > > > > I have removed the build directory. > > Is it also neccessary to remove numpy in thr > >installation > > directory ? > > > > /data/home/nwagner/local/lib/python2.5/site-packages/ > > > > Platform > > > > 2.6.18-92.el5 #1 SMP Tue Jun 10 18:51:06 EDT 2008 x86_64 > > x86_64 x86_64 GNU/Linux > > > > > > Nils > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > I have also removed the numpy directory within > /data/home/nwagner/local/lib/python2.5/site-packages/. > Now all tests pass. > Ran 3080 tests in 12.288s > > OK (KNOWNFAIL=4, SKIP=1) > > > Great. > How is the build process implemented on the build bots ? > > I don't know the details, but it looks to me like they do a clean checkout and fresh install. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri Dec 3 11:02:21 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 3 Dec 2010 09:02:21 -0700 Subject: [Numpy-discussion] [PATCH] gfortran under macports In-Reply-To: References: Message-ID: Hi Fabian, On Fri, Dec 3, 2010 at 8:24 AM, Fabian Pedregosa wrote: > Hi all. > > Macports installs gfortran as part of the gcc package, but names it > gfortran-mp-$version, without providing a symbolic link to a default > gcfortran executable, and thus numpy.distutils is unable to find the > right executable. > > The attached patch very simple, it just extends possible_executables > with those names, but makes the build of scipy work without having to > restore to obscure fc_config flags. > > Can you open a ticket for this so it doesn't get lost? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From moura.mario at gmail.com Fri Dec 3 11:14:04 2010 From: moura.mario at gmail.com (Mario Moura) Date: Fri, 3 Dec 2010 14:14:04 -0200 Subject: [Numpy-discussion] itertools.combinations to numpy In-Reply-To: References: Message-ID: Hi Mr. Weckesser Thanks a lot! Works fine! Regards Mario 2010/12/3 Warren Weckesser : > > > On Fri, Dec 3, 2010 at 6:31 AM, Mario Moura wrote: >> >> Hi Folks >> >> I have this situation >> >> >>> from timeit import Timer >> >>> reps = 5 >> >>> >> >>> t = Timer('itertools.combinations(range(1,10),3)', 'import itertools') >> >>> print sum(t.repeat(repeat=reps, number=1)) / reps >> 1.59740447998e-05 >> >>> t = Timer('itertools.combinations(range(1,100),3)', 'import >> >>> itertools') >> >>> print sum(t.repeat(repeat=reps, number=1)) / reps >> 1.74999237061e-05 >> >>> >> >>> t = Timer('list(itertools.combinations(range(1,10),3))', 'import >> >>> itertools') >> >>> print sum(t.repeat(repeat=reps, number=1)) / reps >> 5.31673431396e-05 >> >>> t = Timer('list(itertools.combinations(range(1,100),3))', 'import >> >>> itertools') >> >>> print sum(t.repeat(repeat=reps, number=1)) / reps >> 0.0556231498718 >> >>> >> >> You can see list(itertools.combinations(range(1,100),3)) is terrible!! >> >> If you change to range(1,100000) your computer will lock. >> >> So I would like to know a good way to convert > object> to ndarray? fast! without use list >> Is it possible? >> >> >>> x = itertools.combinations(range(1,10),3) >> >>> x >> >> >>> >> >> I tried this from >> >> http://docs.python.org/library/itertools.html?highlight=itertools#itertools.combinations >> >> >>> numpy.fromiter(itertools.combinations(range(1,10),3), int, count=-1) >> Traceback (most recent call last): >> ?File "", line 1, in >> ValueError: setting an array element with a sequence. >> >>> >> >> and this from >> >> http://docs.python.org/library/itertools.html?highlight=itertools#itertools.combinations >> >> import numpy >> from itertools import * >> from numpy import * >> >> def combinations(iterable, r): >> ? ?pool = tuple(iterable) >> ? ?n = len(pool) >> ? ?for indices in permutations(range(n), r): >> ? ? ? ?if sorted(indices) == list(indices): >> ? ? ? ? ? ?yield tuple(pool[i] for i in indices) >> >> >> numpy.fromiter(combinations(range(1,10),3), int, count=-1) >> >> >>> numpy.fromiter(combinations(range(1,10),3), int, count=-1) >> Traceback (most recent call last): >> ?File "", line 1, in >> ValueError: setting an array element with a sequence. >> >>> >> >> >> I like itertools.combinations performance but I need convert it to numpy. >> > > > The docstring for numpy.fromiter() says it creates a 1D array.? You can use > it with itertools.combinations if you specify a dtype for a 1D? structured > array.? Here's an example (I'm using ipython with the -pylab option, so the > numpy functions have all been imported): > > > In [1]: from itertools import combinations > > In [2]: dt = dtype('i,i,i') > > In [3]: a = fromiter(combinations(range(100),3), dtype=dt, count=-1) > > In [4]: b = array(list(combinations(range(100),3))) > > In [5]: all(a.view(int).reshape(-1,3) == b) > Out[5]: True > > In [6]: timeit a = fromiter(combinations(range(100),3), dtype=dt, count=-1) > 10 loops, best of 3: 92.7 ms per loop > > In [7]: timeit b = array(list(combinations(range(100),3))) > 1 loops, best of 3: 627 ms per loop > > In [8]: a[:3] > Out[8]: > array([(0, 1, 2), (0, 1, 3), (0, 1, 4)], > ????? dtype=[('f0', ' > In [9]: b[:3] > Out[9]: > array([[0, 1, 2], > ?????? [0, 1, 3], > ?????? [0, 1, 4]]) > > > In the above example, 'a' is a 1D structured array; each element of 'a' > holds one of the combinations.? If you need it, you can create a 2D view > with a.view(int).reshape(-1,3). > > Warren > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From mwwiebe at gmail.com Fri Dec 3 12:23:15 2010 From: mwwiebe at gmail.com (Mark Wiebe) Date: Fri, 3 Dec 2010 09:23:15 -0800 Subject: [Numpy-discussion] Float16 and PEP 3118 In-Reply-To: References: Message-ID: On Thu, Dec 2, 2010 at 10:16 AM, Pauli Virtanen wrote: > Before introducing a PEP 3118 type code for half floats in the PEP, one > would need to argue the Python people to add it to the struct module. > > Before that, the choices probably are: > > - refuse to export buffers containing half floats > I think this is the better option, code that needs to do this can create an int16 view for the time being. > - export half floats as two bytes > This would throw away the byte-order, a problem much harder to track down for the user than the other option. -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Fri Dec 3 12:50:35 2010 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 3 Dec 2010 17:50:35 +0000 (UTC) Subject: [Numpy-discussion] Float16 and PEP 3118 References: Message-ID: Fri, 03 Dec 2010 09:23:15 -0800, Mark Wiebe wrote: [clip] >> - refuse to export buffers containing half floats > > I think this is the better option, code that needs to do this can create > an int16 view for the time being. That's also easier to implement -- no changes are needed :) Pauli From paul.anton.letnes at gmail.com Sat Dec 4 04:00:42 2010 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Sat, 4 Dec 2010 10:00:42 +0100 Subject: [Numpy-discussion] [PATCH] gfortran under macports In-Reply-To: References: Message-ID: On 3. des. 2010, at 16.24, Fabian Pedregosa wrote: > Hi all. > > Macports installs gfortran as part of the gcc package, but names it > gfortran-mp-$version, without providing a symbolic link to a default > gcfortran executable, and thus numpy.distutils is unable to find the > right executable. > > The attached patch very simple, it just extends possible_executables > with those names, but makes the build of scipy work without having to > restore to obscure fc_config flags. > > Fabian. > <0001-FIX-recognize-macports-gfortran-compiler.patch>_______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Correct me if I am wrong here: If you run "(sudo) gcc_select gfortran-mp-XY", where XY are the version numbers (e.g. 45 for gfortran 4.5), you should get symbolic links for the selected gcc/gfortran version. I believe that macports should probably make this clearer, and perhaps automatically when you do a "port install gccXY", but I am not sure if this needs any patching? Again, I might be wrong on this. Cheers Paul. From fabian.pedregosa at inria.fr Sat Dec 4 04:25:52 2010 From: fabian.pedregosa at inria.fr (Fabian Pedregosa) Date: Sat, 4 Dec 2010 10:25:52 +0100 Subject: [Numpy-discussion] [PATCH] gfortran under macports In-Reply-To: <1135702146.1178800.1291453261307.JavaMail.root@zmbs3.inria.fr> References: <1135702146.1178800.1291453261307.JavaMail.root@zmbs3.inria.fr> Message-ID: > > Correct me if I am wrong here: If you run "(sudo) gcc_select gfortran-mp-XY", where XY are the version numbers (e.g. 45 for gfortran 4.5), you should get symbolic links for the selected gcc/gfortran version. I believe that macports should probably make this clearer, and perhaps automatically when you do a "port install gccXY", but I am not sure if this needs any patching? Again, I might be wrong on this. Thanks! I didn't know about gcc_select. The correct command is "sudo gcc_select mp-gcc45" which effectively does all the symbolic links for you and works like a charm, so please ignore my previous patch. Cheers, fabian From gael.varoquaux at normalesup.org Sat Dec 4 04:29:11 2010 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Sat, 4 Dec 2010 10:29:11 +0100 Subject: [Numpy-discussion] [PATCH] gfortran under macports In-Reply-To: References: <1135702146.1178800.1291453261307.JavaMail.root@zmbs3.inria.fr> Message-ID: <20101204092911.GB30391@phare.normalesup.org> On Sat, Dec 04, 2010 at 10:25:52AM +0100, Fabian Pedregosa wrote: > The correct command is "sudo gcc_select mp-gcc45" which effectively > does all the symbolic links for you and works like a charm, so please > ignore my previous patch. I am not a mac user, so I guess that my opinion is not very educated, but isn't your patch still useful: test if 'gcc' exists, and if not fallback to your patch, so that it still works for the clueless user? My 2 cents, Ga?l From fabian.pedregosa at inria.fr Sat Dec 4 08:47:48 2010 From: fabian.pedregosa at inria.fr (Fabian Pedregosa) Date: Sat, 4 Dec 2010 14:47:48 +0100 Subject: [Numpy-discussion] [PATCH] gfortran under macports In-Reply-To: <374052771.1189935.1291454962784.JavaMail.root@zmbs3.inria.fr> References: <1135702146.1178800.1291453261307.JavaMail.root@zmbs3.inria.fr> <374052771.1189935.1291454962784.JavaMail.root@zmbs3.inria.fr> Message-ID: On Sat, Dec 4, 2010 at 10:29 AM, Gael Varoquaux wrote: > On Sat, Dec 04, 2010 at 10:25:52AM +0100, Fabian Pedregosa wrote: >> The correct command is "sudo gcc_select mp-gcc45" which effectively >> does all the symbolic links for you and works like a charm, so please >> ignore my previous patch. > > I am not a mac user, so I guess that my opinion is not very educated, but > isn't your patch still useful: test if 'gcc' exists, and if not fallback > to your patch, so that it still works for the clueless user? Indeed, having scipy build out of the box would be nice, but it's not for me to decide if numpy.distutils should overcome these limitations in macports ... On the other hand, as installing on macports is not that trivial, I strongly feel that a subsection 'Macports' should be added to scipy's INSTALL.txt file, where it details needed packages, the gcc_select trick and options needed in site.cfg for umfpack. I'll gladly provide a patch for that if people are OK. Fabian. > > My 2 cents, > > Ga?l > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ralf.gommers at googlemail.com Sat Dec 4 09:04:16 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sat, 4 Dec 2010 22:04:16 +0800 Subject: [Numpy-discussion] [PATCH] gfortran under macports In-Reply-To: References: <1135702146.1178800.1291453261307.JavaMail.root@zmbs3.inria.fr> <374052771.1189935.1291454962784.JavaMail.root@zmbs3.inria.fr> Message-ID: On Sat, Dec 4, 2010 at 9:47 PM, Fabian Pedregosa wrote: > On Sat, Dec 4, 2010 at 10:29 AM, Gael Varoquaux > wrote: > > On Sat, Dec 04, 2010 at 10:25:52AM +0100, Fabian Pedregosa wrote: > >> The correct command is "sudo gcc_select mp-gcc45" which effectively > >> does all the symbolic links for you and works like a charm, so please > >> ignore my previous patch. > > > > I am not a mac user, so I guess that my opinion is not very educated, but > > isn't your patch still useful: test if 'gcc' exists, and if not fallback > > to your patch, so that it still works for the clueless user? > > Indeed, having scipy build out of the box would be nice, but it's not > for me to decide if numpy.distutils should overcome these limitations > in macports ... > I would prefer to just document the gcc_select solution, since it solves the problem at hand. > > On the other hand, as installing on macports is not that trivial, I > strongly feel that a subsection 'Macports' should be added to scipy's > INSTALL.txt file, where it details needed packages, the gcc_select > trick and options needed in site.cfg for umfpack. I'll gladly provide > a patch for that if people are OK. > The most up-to-date instructions are at http://www.scipy.org/Installing_SciPy/Mac_OS_X, so those should be updated as well. That said, if a "Macports" section is added there should be a strong disclaimer that it is *not* the recommended way to install numpy/scipy. A good portion of the build problems reported on these lists are related to Fortran on OS X, and a specific gfortran build at http://r.research.att.com/tools/ is recommended for a reason. If a user really wants to use Macports some notes in the docs may help, but let's not give the impression that it's a good/default option for a new user. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Dec 4 11:20:37 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 4 Dec 2010 09:20:37 -0700 Subject: [Numpy-discussion] Refactor fork uses the ./configure, make, make install process. Message-ID: Hi Jason, Just wondering if this is temporary or the intention is to change the build process? I also note that the *.h files in libndarray are not complete and a *lot* of trailing whitespace has crept into the files. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From garyfallidis at gmail.com Sat Dec 4 14:00:43 2010 From: garyfallidis at gmail.com (Eleftherios Garyfallidis) Date: Sat, 4 Dec 2010 19:00:43 +0000 Subject: [Numpy-discussion] Faster than ndindex? Message-ID: Hi guys, I would like to know if there is any way to make the following operation faster. def test(): shape=(200,200,200,3) refinds = np.ndindex(shape[:3]) reftmp=np.zeros(shape) for ijk_t in refinds: i,j,k = ijk_t reftmp[i,j,k,0]=i reftmp[i,j,k,1]=j reftmp[i,j,k,2]=k %timeit test() 1 loops, best of 3: 19.5 s per loop I am using ndindex and then a for loop. Is there a better/faster way? Thank you, Eleftherios -------------- next part -------------- An HTML attachment was scrubbed... URL: From ischnell at enthought.com Sat Dec 4 14:07:54 2010 From: ischnell at enthought.com (Ilan Schnell) Date: Sat, 4 Dec 2010 13:07:54 -0600 Subject: [Numpy-discussion] Refactor fork uses the ./configure, make, make install process. In-Reply-To: References: Message-ID: Hello Charles, it was indeed the intention to change the build process of the core libndarray to use autoconf. I've tested it on Linux, Mac, Solaris, and it works very well. libndarray is really a separate project, which only resides for current development inside the numpy project. The point is that you can build libndarray without having a particular Python installed. The hope is that libndarray becomes used by other projects which are not Python based, for example: * a pure C program * a Perl C extension * a Ruby C extension I thought that autoconf was the obvious choice for doing this, and also that is cleaner than numpy.distutils. - Ilan On Sat, Dec 4, 2010 at 10:20 AM, Charles R Harris wrote: > Hi Jason, > > Just wondering if this is temporary or the intention is to change the build > process? I also note that the *.h files in libndarray are not complete and a > *lot* of trailing whitespace has crept into the files. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From pav at iki.fi Sat Dec 4 14:16:45 2010 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 4 Dec 2010 19:16:45 +0000 (UTC) Subject: [Numpy-discussion] Faster than ndindex? References: Message-ID: On Sat, 04 Dec 2010 19:00:43 +0000, Eleftherios Garyfallidis wrote: [clip] > I am using ndindex and then a for loop. Is there a better/faster way? Yes: import numpy as np from numpy import newaxis x = np.zeros((200, 200, 200, 3)) x[...,0] = np.arange(200)[:,newaxis,newaxis] x[...,1] = np.arange(200)[newaxis,:,newaxis] x[...,2] = np.arange(200)[newaxis,newaxis,:] x[1,3,2] # -> array([ 1., 3., 2.]) Depending on what you use this array for, it's possible that you can avoid constructing it (and use broadcasting etc. instead). -- Pauli Virtanen From charlesr.harris at gmail.com Sat Dec 4 14:21:15 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 4 Dec 2010 12:21:15 -0700 Subject: [Numpy-discussion] Refactor fork uses the ./configure, make, make install process. In-Reply-To: References: Message-ID: On Sat, Dec 4, 2010 at 12:07 PM, Ilan Schnell wrote: > Hello Charles, > > it was indeed the intention to change the build process of the core > libndarray to use autoconf. I've tested it on Linux, Mac, Solaris, and > it works very well. libndarray is really a separate project, which only > resides for current development inside the numpy project. The point > is that you can build libndarray without having a particular Python > installed. The hope is that libndarray becomes used by other projects > which are not Python based, for example: > * a pure C program > * a Perl C extension > * a Ruby C extension > > I thought that autoconf was the obvious choice for doing this, and also > that is cleaner than numpy.distutils. > > So does numpy currently build on top of libndarray or is that something for the future also? It would also be useful for David C. to offer his thoughts on building/packaging, did you consult with him by any chance? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sat Dec 4 14:52:50 2010 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 4 Dec 2010 19:52:50 +0000 (UTC) Subject: [Numpy-discussion] Refactor fork uses the ./configure, make, make install process. References: Message-ID: On Sat, 04 Dec 2010 12:21:15 -0700, Charles R Harris wrote: [clip] > So does numpy currently build on top of libndarray or is that something > for the future also? [clip] It does. If you look how it works, most of the heavy lifting has been moved there, leaving the multiarray module mostly as Python-specific wrappers. -- Pauli Virtanen From ischnell at enthought.com Sat Dec 4 14:59:12 2010 From: ischnell at enthought.com (Ilan Schnell) Date: Sat, 4 Dec 2010 13:59:12 -0600 Subject: [Numpy-discussion] Refactor fork uses the ./configure, make, make install process. In-Reply-To: References: Message-ID: Yes, numpy-refactor builds of top of libndarray. The whole point was that the libndarray is independent of the interface, i.e. the CPython or the IronPython interface, and possibly other (Jython) in the future. Looking at different building/packaging solutions for libndarray, autoconf make things very easy, it's a well established pattern, I'm sure David C. will agree. - Ilan On Sat, Dec 4, 2010 at 1:21 PM, Charles R Harris wrote: > > So does numpy currently build on top of libndarray or is that something for > the future also? It would also be useful for David C. to offer his thoughts > on building/packaging, did you consult with him by any chance? > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From garyfallidis at gmail.com Sat Dec 4 15:05:02 2010 From: garyfallidis at gmail.com (Eleftherios Garyfallidis) Date: Sat, 4 Dec 2010 20:05:02 +0000 Subject: [Numpy-discussion] Faster than ndindex? In-Reply-To: References: Message-ID: This is beautiful! Thank you Pauli. On Sat, Dec 4, 2010 at 7:16 PM, Pauli Virtanen wrote: > On Sat, 04 Dec 2010 19:00:43 +0000, Eleftherios Garyfallidis wrote: > [clip] > > I am using ndindex and then a for loop. Is there a better/faster way? > > Yes: > > import numpy as np > from numpy import newaxis > > x = np.zeros((200, 200, 200, 3)) > x[...,0] = np.arange(200)[:,newaxis,newaxis] > x[...,1] = np.arange(200)[newaxis,:,newaxis] > x[...,2] = np.arange(200)[newaxis,newaxis,:] > x[1,3,2] > # -> array([ 1., 3., 2.]) > > Depending on what you use this array for, it's possible that you can > avoid constructing it (and use broadcasting etc. instead). > > -- > Pauli Virtanen > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Dec 4 15:11:48 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 4 Dec 2010 13:11:48 -0700 Subject: [Numpy-discussion] Refactor fork uses the ./configure, make, make install process. In-Reply-To: References: Message-ID: On Sat, Dec 4, 2010 at 12:59 PM, Ilan Schnell wrote: > Yes, numpy-refactor builds of top of libndarray. The whole point > was that the libndarray is independent of the interface, i.e. the > CPython or the IronPython interface, and possibly other (Jython) > in the future. > Looking at different building/packaging solutions for libndarray, > autoconf make things very easy, it's a well established pattern, > I'm sure David C. will agree. > > I know he has expressed reservations about it on non-posix platforms and some large projects have moved away from it. I'm not saying it isn't the best short term solution so you folks can get on with the job, but it may be that long term we will want to look elsewhere. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Dec 4 15:19:11 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 4 Dec 2010 13:19:11 -0700 Subject: [Numpy-discussion] Refactor fork uses the ./configure, make, make install process. In-Reply-To: References: Message-ID: On Sat, Dec 4, 2010 at 12:52 PM, Pauli Virtanen wrote: > On Sat, 04 Dec 2010 12:21:15 -0700, Charles R Harris wrote: > [clip] > > So does numpy currently build on top of libndarray or is that something > > for the future also? > [clip] > > It does. If you look how it works, most of the heavy lifting has been > moved there, leaving the multiarray module mostly as Python-specific > wrappers. > > Would it unreasonable to move the libndarray stuff to the current master branch of numpy while leaving the rest of things intact? The needed changes to the current core/src could be brought in later. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ischnell at enthought.com Sat Dec 4 15:24:49 2010 From: ischnell at enthought.com (Ilan Schnell) Date: Sat, 4 Dec 2010 14:24:49 -0600 Subject: [Numpy-discussion] Refactor fork uses the ./configure, make, make install process. In-Reply-To: References: Message-ID: I'm not sure how reasonable it would be to move only libndarray into the master, because I've been working on EPD for the last couple of week. But Jason will know how complete libndarray is. - Ilan On Sat, Dec 4, 2010 at 2:19 PM, Charles R Harris wrote: > Would it unreasonable to move the libndarray stuff to the current master > branch of numpy while leaving the rest of things intact? The needed changes > to the current core/src could be brought in later. > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From pav at iki.fi Sat Dec 4 15:45:59 2010 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 4 Dec 2010 20:45:59 +0000 (UTC) Subject: [Numpy-discussion] Refactor fork uses the ./configure, make, make install process. References: Message-ID: On Sat, 04 Dec 2010 14:24:49 -0600, Ilan Schnell wrote: > I'm not sure how reasonable it would be to move only libndarray into the > master, because I've been working on EPD for the last couple of week. > But Jason will know how complete libndarray is. The main question is whether moving it will make things easier or more difficult, I think. It's one tree more to keep track of. In any case, it would be a first part in the merge, and it would split the hunk of changes into two parts. *** Technically, the move could be done like this, so that merge tracking still works: --------refactor--------------- new-refactor / / /--------libndarray----------x / \ start---------------------- master----- new-master -- Pauli Virtanen From dagss at student.matnat.uio.no Sat Dec 4 15:57:31 2010 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sat, 04 Dec 2010 21:57:31 +0100 Subject: [Numpy-discussion] Refactor fork uses the ./configure, make, make install process. In-Reply-To: References: Message-ID: <4CFAAB3B.3060509@student.matnat.uio.no> On 12/04/2010 09:11 PM, Charles R Harris wrote: > > > On Sat, Dec 4, 2010 at 12:59 PM, Ilan Schnell > wrote: > > Yes, numpy-refactor builds of top of libndarray. The whole point > was that the libndarray is independent of the interface, i.e. the > CPython or the IronPython interface, and possibly other (Jython) > in the future. > Looking at different building/packaging solutions for libndarray, > autoconf make things very easy, it's a well established pattern, > I'm sure David C. will agree. > > > > I know he has expressed reservations about it on non-posix platforms > and some large projects have moved away from it. I'm not saying it > isn't the best short term solution so you folks can get on with the > job, but it may be that long term we will want to look elsewhere. Such as perhaps waf for building libndarray, which seems like it will be much easier to make work nicely with Bento etc. than autoconf (again, speaking long-term). Also, it'd be good to avoid a seperate build system for Windows (problem of keeping changes sync-ed with Visual Studio projects etc. etc.). Dag Sverre -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat Dec 4 16:01:03 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 4 Dec 2010 14:01:03 -0700 Subject: [Numpy-discussion] Refactor fork uses the ./configure, make, make install process. In-Reply-To: References: Message-ID: On Sat, Dec 4, 2010 at 1:45 PM, Pauli Virtanen wrote: > On Sat, 04 Dec 2010 14:24:49 -0600, Ilan Schnell wrote: > > I'm not sure how reasonable it would be to move only libndarray into the > > master, because I've been working on EPD for the last couple of week. > > But Jason will know how complete libndarray is. > > The main question is whether moving it will make things easier or more > difficult, I think. It's one tree more to keep track of. > > In any case, it would be a first part in the merge, and it would split > the hunk of changes into two parts. > > That would be a good thing IMHO. It would also bring a bit more numpy reality to the refactor and since we are implicitly relying on it for the next release sometime next spring the closer to reality it gets the better. > *** > > Technically, the move could be done like this, so that merge tracking > still works: > > --------refactor--------------- new-refactor > / / > /--------libndarray----------x > / \ > start---------------------- master----- new-master > > Looks good to me. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Sat Dec 4 19:41:17 2010 From: mwwiebe at gmail.com (Mark Wiebe) Date: Sat, 4 Dec 2010 16:41:17 -0800 Subject: [Numpy-discussion] Refactor fork uses the ./configure, make, make install process. In-Reply-To: References: Message-ID: On Sat, Dec 4, 2010 at 12:45 PM, Pauli Virtanen wrote: > > Technically, the move could be done like this, so that merge tracking > still works: > > --------refactor--------------- new-refactor > / / > /--------libndarray----------x > / \ > start---------------------- master----- new-master > Switching to use libndarray is a big ABI+API change, right? If there's an idea to release an ABI-compatible 1.6, wouldn't this end up being more difficult? Maybe I'm misunderstanding this idea. I looked a little bit at the 1.4.0 ABI issue, and if the only blocking problem was the cast[] array in ArrFuncs, I think that can be worked around without too much difficulty. Would people want an ABI-compatible 1.6 release adding date-time and float16? -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul.anton.letnes at gmail.com Sun Dec 5 02:58:57 2010 From: paul.anton.letnes at gmail.com (Paul Anton Letnes) Date: Sun, 5 Dec 2010 08:58:57 +0100 Subject: [Numpy-discussion] [PATCH] gfortran under macports In-Reply-To: References: <1135702146.1178800.1291453261307.JavaMail.root@zmbs3.inria.fr> <374052771.1189935.1291454962784.JavaMail.root@zmbs3.inria.fr> Message-ID: Mabe I am wrong somehow, but in my experience the easiest install of scipy is 'port install py26-scipy'. For new users, I do not see why one would recommend to build manually from source? Macports can do it for you, automagically... Paul 4. des.. 2010 15.04 "Ralf Gommers" : On Sat, Dec 4, 2010 at 9:47 PM, Fabian Pedregosa wrote: > > On Sat, Dec ... I would prefer to just document the gcc_select solution, since it solves the problem at hand. > > > On the other hand, as installing on macports is not that trivial, I > strongly feel that a sub... The most up-to-date instructions are at http://www.scipy.org/Installing_SciPy/Mac_OS_X, so those should be updated as well. That said, if a "Macports" section is added there should be a strong disclaimer that it is *not* the recommended way to install numpy/scipy. A good portion of the build problems reported on these lists are related to Fortran on OS X, and a specific gfortran build at http://r.research.att.com/tools/ is recommended for a reason. If a user really wants to use Macports some notes in the docs may help, but let's not give the impression that it's a good/default option for a new user. Cheers, Ralf _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Dec 5 06:28:44 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 5 Dec 2010 19:28:44 +0800 Subject: [Numpy-discussion] [PATCH] gfortran under macports In-Reply-To: References: <1135702146.1178800.1291453261307.JavaMail.root@zmbs3.inria.fr> <374052771.1189935.1291454962784.JavaMail.root@zmbs3.inria.fr> Message-ID: On Sun, Dec 5, 2010 at 3:58 PM, Paul Anton Letnes < paul.anton.letnes at gmail.com> wrote: > Mabe I am wrong somehow, but in my experience the easiest install of scipy > is 'port install py26-scipy'. For new users, I do not see why one would > recommend to build manually from source? Macports can do it for you, > automagically... > > Well, by far the easiest method is to just grab a binary installer. The other choices you have are build from source, or try to use Macports/Fink/Homebrew/easy_install/pip/buildout-recipe/. Those all rely on source builds as well, they're just hiding the details. Which makes things way more confusing when something goes wrong. About Macports specifically, I haven't tried in a few years but certainly don't remember things always working out of the box. And AFAIK Homebrew is a replacement for Macports for many people because the latter was issues. Cheers, Ralf Paul > > 4. des.. 2010 15.04 "Ralf Gommers" : > > > > On Sat, Dec 4, 2010 at 9:47 PM, Fabian Pedregosa < > fabian.pedregosa at inria.fr> wrote: > > > > On Sat, Dec ... > > I would prefer to just document the gcc_select solution, since it solves > the problem at hand. > > > > > > > On the other hand, as installing on macports is not that trivial, I > > strongly feel that a sub... > > The most up-to-date instructions are at > http://www.scipy.org/Installing_SciPy/Mac_OS_X, so those should be updated > as well. That said, if a "Macports" section is added there should be a > strong disclaimer that it is *not* the recommended way to install > numpy/scipy. A good portion of the build problems reported on these lists > are related to Fortran on OS X, and a specific gfortran build at > http://r.research.att.com/tools/ is recommended for a reason. If a user > really wants to use Macports some notes in the docs may help, but let's not > give the impression that it's a good/default option for a new user. > > Cheers, > Ralf > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun Dec 5 07:10:56 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 5 Dec 2010 20:10:56 +0800 Subject: [Numpy-discussion] ANN: NumPy 1.5.1 In-Reply-To: <4CEAC5EF.1030205@noaa.gov> References: <4CEAC5EF.1030205@noaa.gov> Message-ID: On Tue, Nov 23, 2010 at 3:35 AM, Christopher Barker wrote: > On 11/20/10 11:04 PM, Ralf Gommers wrote: > >> I am pleased to announce the availability of NumPy 1.5.1. >> > > Binaries, sources and release notes can be found at >> https://sourceforge.net/projects/numpy/files/. >> >> Thank you to everyone who contributed to this release. >> > > Yes, thanks so much -- in particular thanks to the team that build the OS-X > binaries -- looks like a complete set! > > It does look like a complete set. And it was named correctly and in sync with python.org for a single week. From pythonmac list: "With Python 2.7, there are two Mac OS X installer variants available for download: the "traditional" 32-bit-only (Intel and PPC) version that installs and runs on all versions of OS X from 10.3.9 through current 10.6.x; and a new 64-bit/32-bit (Intel only) variant. As discussed in http://bugs.python.org/issue9227, there were problems using Tkinter and IDLE with the original 2.7 64/32 installer. The problem is that the only supported non-X11 64-bit Tcl/Tk at the moment is the one supplied by Apple in 10.6 and the installer tried unsuccessfully to support both 10.5 and 10.6. For 2.7.1, the 64/32 installer now only supports 10.6.x and will only use the Apple-supplied Tcl/Tk 8.5. The 32-bit-only installer is still built to link with either an Active/State Tcl/Tk 8.4, if installed in /Library/Frameworks, or fallback to the Apple-supplied Tcl/Tk 8.4 in OS X 10.4 through 10.6." So for the next release we'll build the 10.6 binary on 10.6 again. Sigh. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From friedrichromstedt at gmail.com Sun Dec 5 07:33:22 2010 From: friedrichromstedt at gmail.com (Friedrich Romstedt) Date: Sun, 5 Dec 2010 13:33:22 +0100 Subject: [Numpy-discussion] ANN: NumPy 1.5.1 In-Reply-To: References: <4CEAC5EF.1030205@noaa.gov> Message-ID: Hi Ralf, 2010/12/5 Ralf Gommers : > It does look like a complete set. And it was named correctly and in sync > with python.org for a single week. From pythonmac list: > > "With Python 2.7, there are two Mac OS X installer variants available for > download: the "traditional" 32-bit-only (Intel and PPC) version that > installs and runs on all versions of OS X from 10.3.9 through current > 10.6.x; and a new 64-bit/32-bit (Intel only) variant. ?As discussed in > http://bugs.python.org/issue9227, there were problems using Tkinter and > IDLE with the original 2.7 64/32 installer. ?The problem is that the > only supported non-X11 64-bit Tcl/Tk at the moment is the one supplied > by Apple in 10.6 and the installer tried unsuccessfully to support both > 10.5 and 10.6. ?For 2.7.1, the 64/32 installer now only supports 10.6.x > and will only use the Apple-supplied Tcl/Tk 8.5. ?The 32-bit-only > installer is still built to link with either an Active/State Tcl/Tk 8.4, > if installed in /Library/Frameworks, or fallback to the Apple-supplied > Tcl/Tk 8.4 in OS X 10.4 through 10.6." > > So for the next release we'll build the 10.6 binary on 10.6 again. Sigh. But the i386/ppc version should still be built on 10.5? Shall I give you commit rights on my repo for the build logs (concerning the i386/x86_64 10.6 build)? Who is going to do the 10.6 builds? Friedrich From ralf.gommers at googlemail.com Sun Dec 5 09:34:27 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 5 Dec 2010 22:34:27 +0800 Subject: [Numpy-discussion] ANN: NumPy 1.5.1 In-Reply-To: References: <4CEAC5EF.1030205@noaa.gov> Message-ID: On Sun, Dec 5, 2010 at 8:33 PM, Friedrich Romstedt < friedrichromstedt at gmail.com> wrote: > Hi Ralf, > > 2010/12/5 Ralf Gommers : > > It does look like a complete set. And it was named correctly and in sync > > with python.org for a single week. From pythonmac list: > > > > "With Python 2.7, there are two Mac OS X installer variants available for > > download: the "traditional" 32-bit-only (Intel and PPC) version that > > installs and runs on all versions of OS X from 10.3.9 through current > > 10.6.x; and a new 64-bit/32-bit (Intel only) variant. As discussed in > > http://bugs.python.org/issue9227, there were problems using Tkinter and > > IDLE with the original 2.7 64/32 installer. The problem is that the > > only supported non-X11 64-bit Tcl/Tk at the moment is the one supplied > > by Apple in 10.6 and the installer tried unsuccessfully to support both > > 10.5 and 10.6. For 2.7.1, the 64/32 installer now only supports 10.6.x > > and will only use the Apple-supplied Tcl/Tk 8.5. The 32-bit-only > > installer is still built to link with either an Active/State Tcl/Tk 8.4, > > if installed in /Library/Frameworks, or fallback to the Apple-supplied > > Tcl/Tk 8.4 in OS X 10.4 through 10.6." > > > > So for the next release we'll build the 10.6 binary on 10.6 again. Sigh. > > But the i386/ppc version should still be built on 10.5? > > Yes. > Shall I give you commit rights on my repo for the build logs > (concerning the i386/x86_64 10.6 build)? > Sure. > > Who is going to do the 10.6 builds? > I'll do that one. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From ben.root at ou.edu Sun Dec 5 10:40:56 2010 From: ben.root at ou.edu (Benjamin Root) Date: Sun, 5 Dec 2010 09:40:56 -0600 Subject: [Numpy-discussion] [PATCH] gfortran under macports In-Reply-To: References: <1135702146.1178800.1291453261307.JavaMail.root@zmbs3.inria.fr> <374052771.1189935.1291454962784.JavaMail.root@zmbs3.inria.fr> Message-ID: On Sun, Dec 5, 2010 at 5:28 AM, Ralf Gommers wrote: > > > On Sun, Dec 5, 2010 at 3:58 PM, Paul Anton Letnes < > paul.anton.letnes at gmail.com> wrote: > >> Mabe I am wrong somehow, but in my experience the easiest install of scipy >> is 'port install py26-scipy'. For new users, I do not see why one would >> recommend to build manually from source? Macports can do it for you, >> automagically... >> >> Well, by far the easiest method is to just grab a binary installer. The > other choices you have are build from source, or try to use > Macports/Fink/Homebrew/easy_install/pip/buildout-recipe/. > Those all rely on source builds as well, they're just hiding the details. > Which makes things way more confusing when something goes wrong. > > About Macports specifically, I haven't tried in a few years but certainly > don't remember things always working out of the box. And AFAIK Homebrew is a > replacement for Macports for many people because the latter was issues. > > Cheers, > Ralf > > I did a Macports install of numpy/scipy/matplotlib on my wife's macbook a few months ago just because I was curious. Besides the fact that it took forever (it had trouble obtaining the various compilers from the servers, and it did a full-blown ATLAS tuning and compiling...) it did eventually install and work. YMMV, Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at silveregg.co.jp Sun Dec 5 20:03:05 2010 From: david at silveregg.co.jp (David) Date: Mon, 06 Dec 2010 10:03:05 +0900 Subject: [Numpy-discussion] Refactor fork uses the ./configure, make, make install process. In-Reply-To: <4CFAAB3B.3060509@student.matnat.uio.no> References: <4CFAAB3B.3060509@student.matnat.uio.no> Message-ID: <4CFC3649.2000203@silveregg.co.jp> On 12/05/2010 05:57 AM, Dag Sverre Seljebotn wrote: > On 12/04/2010 09:11 PM, Charles R Harris wrote: >> >> >> On Sat, Dec 4, 2010 at 12:59 PM, Ilan Schnell > > wrote: >> >> Yes, numpy-refactor builds of top of libndarray. The whole point >> was that the libndarray is independent of the interface, i.e. the >> CPython or the IronPython interface, and possibly other (Jython) >> in the future. >> Looking at different building/packaging solutions for libndarray, >> autoconf make things very easy, it's a well established pattern, >> I'm sure David C. will agree. >> >> >> >> I know he has expressed reservations about it on non-posix platforms >> and some large projects have moved away from it. I'm not saying it >> isn't the best short term solution so you folks can get on with the >> job, but it may be that long term we will want to look elsewhere. > > Such as perhaps waf for building libndarray, which seems like it will be > much easier to make work nicely with Bento etc. than autoconf (again, > speaking long-term). > > Also, it'd be good to avoid a seperate build system for Windows (problem > of keeping changes sync-ed with Visual Studio projects etc. etc.). Is support for visual studio projects a requirement for the refactoring ? If so, the only alternative to keeping changes in sync is to be able to generate the project files from a description, which is not so easy (and quite time consuming). I know of at least two tools doing that: cmake and gpy (the build system used for chrome). cheers, David From tungwaiyip at yahoo.com Sun Dec 5 22:44:25 2010 From: tungwaiyip at yahoo.com (Wai Yip Tung) Date: Sun, 05 Dec 2010 19:44:25 -0800 Subject: [Numpy-discussion] Structured array? recarray? issue access by attribute name Message-ID: I'm trying to use numpy to manipulate CSV file. I'm looking for feature similar to relational database. So I come across a class recarray that seems to meet my need. And then I see other references of structured array. Are these just different name of the same feature? Also I encounter a problem trying to access an field by attribute name. I have In [303]: arr = np.array([ .....: (1, 2.2, 0.0), .....: (3, 4.5, 0.0) .....: ], .....: dtype=[ .....: ('unit',int), .....: ('price',float), .....: ('amount',float), .....: ] .....: ) In [304]: data0 = arr.view(recarray) In [305]: data0.price[0] Out[305]: 2.2000000000000002 It works fine when I get a price vector and pick the first element of it. But if instead I select the first row and try to access its price attribute, it wouldn't work In [306]: data0[0].price --------------------------------------------------------------------------- AttributeError Traceback (most recent call last) c:\Python26\Lib\site-packages\numpy\ in () AttributeError: 'numpy.void' object has no attribute 'price' Then I come across an alternative way to build a recarray. In that case both usage work fine. In [307]: data1 = np.rec.fromarrays( .....: [[1,3],[2.2,4.5],[0.0,0.0]], .....: names='unit,price,amount') In [309]: data1.price[0] Out[309]: 2.2000000000000002 In [310]: data1[0].price Out[310]: 2.2000000000000002 What's going on here? Wai Yip From tungwaiyip at yahoo.com Sun Dec 5 22:56:51 2010 From: tungwaiyip at yahoo.com (Wai Yip Tung) Date: Sun, 05 Dec 2010 19:56:51 -0800 Subject: [Numpy-discussion] Can I add rows and columns to recarray? Message-ID: I'm fairly new to numpy and I'm trying to figure out the right way to do things. Continuing on my question about using recarray as a relation. I have a recarray like this In [339]: arr = np.array([ .....: (1, 2.2, 0.0), .....: (3, 4.5, 0.0) .....: ], .....: dtype=[ .....: ('unit',int), .....: ('price',float), .....: ('amount',float), .....: ] .....: ) In [340]: data = arr.view(recarray) One of the most common thing I want to do is to append rows to data. I think concatenate() might be the method. But I get a problem: In [342]: np.concatenate((data0,[1,9.0,9.0])) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) c:\Python26\Lib\site-packages\numpy\ in () TypeError: expected a readable buffer object The other thing I want to do is to calculate the column value. Right now it can do great thing like In [343]: data.amount = data.unit * data.price But sometimes it may require me to add a new column not already exist, e.g.: In [344]: data.discount_price = data.price * 0.9 How can I add a new column? I tried column_stack. But it give a similar TypeError. I figure I need to first specify the type of the column. But I don't know how. Thanks, Wai Yip From jsseabold at gmail.com Sun Dec 5 23:22:04 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Sun, 5 Dec 2010 23:22:04 -0500 Subject: [Numpy-discussion] Structured array? recarray? issue access by attribute name In-Reply-To: References: Message-ID: On Sun, Dec 5, 2010 at 10:44 PM, Wai Yip Tung wrote: > I'm trying to use numpy to manipulate CSV file. I'm looking for feature > similar to relational database. So I come across a class recarray that > seems to meet my need. And then I see other references of structured > array. Are these just different name of the same feature? > > Also I encounter a problem trying to access an field by attribute name. I > have > > > In [303]: arr = np.array([ > ? ?.....: ? ? (1, 2.2, 0.0), > ? ?.....: ? ? (3, 4.5, 0.0) > ? ?.....: ? ? ], > ? ?.....: ? ? dtype=[ > ? ?.....: ? ? ? ? ('unit',int), > ? ?.....: ? ? ? ? ('price',float), > ? ?.....: ? ? ? ? ('amount',float), > ? ?.....: ? ? ] > ? ?.....: ) > > In [304]: data0 = arr.view(recarray) > > In [305]: data0.price[0] > Out[305]: 2.2000000000000002 > You don't have to take a view as a recarray if you don't want to. You lose attribute lookup but gain some speed. In [14]: arr['price'] Out[14]: array([ 2.2, 4.5]) > > > It works fine when I get a price vector and pick the first element of it. > But if instead I select the first row and try to access its price > attribute, it wouldn't work > I'm not sure why this doesn't work. It looks like taking a view of the structured array as a recarray does not cast the structs to records Is this a bug? Note that you can do In [19]: arr[0]['price'] Out[19]: 2.2000000000000002 In [20]: data0[0]['price'] Out[20]: 2.2000000000000002 also slicing seems to work In [27]: data0[0:1].price Out[27]: array([ 2.2]) Skipper > > > In [306]: data0[0].price > --------------------------------------------------------------------------- > AttributeError ? ? ? ? ? ? ? ? ? ? ? ? ? ?Traceback (most recent call last) > > c:\Python26\Lib\site-packages\numpy\ in () > > AttributeError: 'numpy.void' object has no attribute 'price' > > > > Then I come across an alternative way to build a recarray. In that case > both usage work fine. > > > > In [307]: data1 = np.rec.fromarrays( > ? ?.....: ? ? [[1,3],[2.2,4.5],[0.0,0.0]], > ? ?.....: ? ? names='unit,price,amount') > > In [309]: data1.price[0] > Out[309]: 2.2000000000000002 > > In [310]: data1[0].price > Out[310]: 2.2000000000000002 > > > What's going on here? > > > Wai Yip > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From jsseabold at gmail.com Sun Dec 5 23:23:51 2010 From: jsseabold at gmail.com (Skipper Seabold) Date: Sun, 5 Dec 2010 23:23:51 -0500 Subject: [Numpy-discussion] Can I add rows and columns to recarray? In-Reply-To: References: Message-ID: On Sun, Dec 5, 2010 at 10:56 PM, Wai Yip Tung wrote: > I'm fairly new to numpy and I'm trying to figure out the right way to do > things. Continuing on my question about using recarray as a relation. I > have a recarray like this > > > In [339]: arr = np.array([ > ? ?.....: ? ? (1, 2.2, 0.0), > ? ?.....: ? ? (3, 4.5, 0.0) > ? ?.....: ? ? ], > ? ?.....: ? ? dtype=[ > ? ?.....: ? ? ? ? ('unit',int), > ? ?.....: ? ? ? ? ('price',float), > ? ?.....: ? ? ? ? ('amount',float), > ? ?.....: ? ? ] > ? ?.....: ) > > In [340]: data = arr.view(recarray) > > > One of the most common thing I want to do is to append rows to data. ?I > think concatenate() might be the method. But I get a problem: > > > In [342]: np.concatenate((data0,[1,9.0,9.0])) > --------------------------------------------------------------------------- > TypeError ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? Traceback (most recent call last) > > c:\Python26\Lib\site-packages\numpy\ in () > > TypeError: expected a readable buffer object > > > > The other thing I want to do is to calculate the column value. Right now > it can do great thing like > > > > In [343]: data.amount = data.unit * data.price > > > > But sometimes it may require me to add a new column not already exist, > e.g.: > > > In [344]: data.discount_price = data.price * 0.9 > > > How can I add a new column? I tried column_stack. But it give a similar > TypeError. I figure I need to first specify the type of the column. But I > don't know how. > Check out numpy.lib.recfunctions I often have import numpy.lib.recfunctions as nprf Skipper From washakie at gmail.com Mon Dec 6 09:21:18 2010 From: washakie at gmail.com (John) Date: Mon, 6 Dec 2010 15:21:18 +0100 Subject: [Numpy-discussion] numpy rec array and sorting Message-ID: Hello, I have been trying two methods for creating a rec array from my data (or a structured array -- I'm still not completely clear on the distinction). In terms of data, you can see what types they are, basically simple (n,1) np.ndarrays. I had to reshape them to (n,1) to get them to work with hstack. The 'NON WORKING' method returns no errors, but when I go to 'sort' the data array that is returned, no sorting takes place, whereas with the 'WORKING' method, I can do: data.sort(order='sza') and my data.indices match the sorted 'sza' data. You can see I also tried to include a 2-d array, but I haven't managed to get this to work... Could someone please explain what is going on here? Thanks, john ## Create recarray so we can easily sort dtype=np.dtype([('indices','int32'),('time','f8'),('zen','f4'),\ ('az','f4'),('sza','f4'),('saz','f4'),('muslope','f4'),\ ('roll','f4'),('pitch','f4'),('yaw','f4')\ #,('spectra',np.ma.core.MaskedArray,c.shape) ]) ### WORKING METHOD values = np.hstack((indices,time,zen,az,sza,saz,musl,roll,pitch,yaw)) data = [[] for dummy in xrange(len(dtype))] for i in xrange(len(dtype)): data[i] = cast[dtype[i]](values[:,i]) data = np.rec.array(data,dtype=dtype) ### NON WORKING METHOD ### values = (indices,time,zen,az,sza,saz,musl,roll,pitch,yaw) data = np.rec.fromarrays(values,dtype=dtype) From Chris.Barker at noaa.gov Mon Dec 6 13:26:59 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 06 Dec 2010 10:26:59 -0800 Subject: [Numpy-discussion] Can I add rows and columns to recarray? In-Reply-To: References: Message-ID: <4CFD2AF3.3070205@noaa.gov> On 12/5/10 7:56 PM, Wai Yip Tung wrote: > I'm fairly new to numpy and I'm trying to figure out the right way to do > things. Continuing on my question about using recarray as a relation. note that recarrays (or structured arrays, AFAIK, the difference is atturube access only -- I don't use recarrays) are far more static than a database table. So you may really want to use a database, or maybe pytables. Or maybe even just stick with lists. But if you are keeping things in memory, should be able to do what you want. > In [339]: arr = np.array([ > .....: (1, 2.2, 0.0), > .....: (3, 4.5, 0.0) > .....: ], > .....: dtype=[ > .....: ('unit',int), > .....: ('price',float), > .....: ('amount',float), > .....: ] > .....: ) > > In [340]: data = arr.view(recarray) > > > One of the most common thing I want to do is to append rows to data. numpy arrays do not naturally support appending, as you have discovered. > I > think concatenate() might be the method. yes. > But I get a problem: > In [342]: np.concatenate((data0,[1,9.0,9.0])) > --------------------------------------------------------------------------- > TypeError Traceback (most recent call last) > > c:\Python26\Lib\site-packages\numpy\ in() > > TypeError: expected a readable buffer object concatenate expects two arrays to be joined. If you pass in something that can easily be turned into an array, it will work, but a tuple can be converted to multiple types of arrays, so it doesn't know what to do. So you need to re-construct the second array: a2 = np.array( [(3,5.5, 3)], dtype=dt) arr = np.concatenate( (arr, a2) ) > In [343]: data.amount = data.unit * data.price yup > But sometimes it may require me to add a new column not already exist, > e.g.: > > In [344]: data.discount_price = data.price * 0.9 > > > How can I add a new column? you can't. what you need to do is create a new array with a new dtype that includes the new field. The trick is that numpy only supports homogenous arrays -- evey item is the same data type. So when you could a strut array like above, numpy does not define it as a 2-d table, but rather, a 1-d array, each element of which is a structure. so you need to do something like: # create a new array data2 = np.zeros(len(data), dtype=dt2) # fill the array: for field_name in dt.fields.keys(): data2[field_name] = data[field_name] # now some calculations: data2['discount_price'] = data2['price'] * 0.9 I don't know of a way to avoid that loop when filling the array. Better yet -- anticipate your needs and create the array with all the fields you need in the first place. You can see that ndarrays are pretty static -- struct arrays can be useful data storage, but are not very suitable when things are changing much. You could write a class that wraps an andarray, and supports what you need better -- it could be a pretty usefull general purpose class, too. I've got one that handle the appending part, but nothing with adding new fields. Here's appending with my class: data3 = accumulator.accumulator(dtype = dt2) data3.append((1, 2.2, 0.0, 0.0)) data3.append((3, 4.5, 0.0, 0.0)) data3.append((2, 1.2, 0.0, 0.0)) data3.append((5, 4.2, 0.0, 0.0)) print repr(data3) # convert to regular array for calculations: data3 = np.array(data3) # now some calculations: data3['discount_price'] = data3['price'] * 0.9 You wouldn't have to convert to a regular array, except that I haven't written the code to support field access yet -- I don't think it would be too hard, though. I've enclosed some test code, and my accumulator class, in case you find it useful. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov -------------- next part -------------- A non-text attachment was scrubbed... Name: struct_test.py Type: application/x-python Size: 1589 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: accumulator.py Type: application/x-python Size: 4651 bytes Desc: not available URL: From ben.root at ou.edu Mon Dec 6 14:00:30 2010 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 6 Dec 2010 13:00:30 -0600 Subject: [Numpy-discussion] Can I add rows and columns to recarray? In-Reply-To: <4CFD2AF3.3070205@noaa.gov> References: <4CFD2AF3.3070205@noaa.gov> Message-ID: On Mon, Dec 6, 2010 at 12:26 PM, Christopher Barker wrote: > On 12/5/10 7:56 PM, Wai Yip Tung wrote: > >> I'm fairly new to numpy and I'm trying to figure out the right way to do >> things. Continuing on my question about using recarray as a relation. >> > > note that recarrays (or structured arrays, AFAIK, the difference is > atturube access only -- I don't use recarrays) are far more static than a > database table. So you may really want to use a database, or maybe pytables. > Or maybe even just stick with lists. > > But if you are keeping things in memory, should be able to do what you > want. > > > In [339]: arr = np.array([ >> .....: (1, 2.2, 0.0), >> .....: (3, 4.5, 0.0) >> .....: ], >> .....: dtype=[ >> .....: ('unit',int), >> .....: ('price',float), >> .....: ('amount',float), >> .....: ] >> .....: ) >> >> In [340]: data = arr.view(recarray) >> >> >> One of the most common thing I want to do is to append rows to data. >> > > numpy arrays do not naturally support appending, as you have discovered. > > > I >> think concatenate() might be the method. >> > > yes. > > > But I get a problem: >> > > In [342]: np.concatenate((data0,[1,9.0,9.0])) >> >> --------------------------------------------------------------------------- >> TypeError Traceback (most recent call >> last) >> >> c:\Python26\Lib\site-packages\numpy\ in() >> >> TypeError: expected a readable buffer object >> > > concatenate expects two arrays to be joined. If you pass in something that > can easily be turned into an array, it will work, but a tuple can be > converted to multiple types of arrays, so it doesn't know what to do. So you > need to re-construct the second array: > > a2 = np.array( [(3,5.5, 3)], dtype=dt) > arr = np.concatenate( (arr, a2) ) > > > In [343]: data.amount = data.unit * data.price >> > > yup > > > But sometimes it may require me to add a new column not already exist, >> e.g.: >> >> In [344]: data.discount_price = data.price * 0.9 >> >> >> How can I add a new column? >> > > you can't. what you need to do is create a new array with a new dtype that > includes the new field. > > The trick is that numpy only supports homogenous arrays -- evey item is the > same data type. So when you could a strut array like above, numpy does not > define it as a 2-d table, but rather, a 1-d array, each element of which is > a structure. > > so you need to do something like: > > # create a new array > data2 = np.zeros(len(data), dtype=dt2) > > # fill the array: > for field_name in dt.fields.keys(): > data2[field_name] = data[field_name] > > # now some calculations: > data2['discount_price'] = data2['price'] * 0.9 > > I don't know of a way to avoid that loop when filling the array. > > Better yet -- anticipate your needs and create the array with all the > fields you need in the first place. > > You can see that ndarrays are pretty static -- struct arrays can be useful > data storage, but are not very suitable when things are changing much. > > You could write a class that wraps an andarray, and supports what you need > better -- it could be a pretty usefull general purpose class, too. I've got > one that handle the appending part, but nothing with adding new fields. > > Here's appending with my class: > > data3 = accumulator.accumulator(dtype = dt2) > data3.append((1, 2.2, 0.0, 0.0)) > data3.append((3, 4.5, 0.0, 0.0)) > data3.append((2, 1.2, 0.0, 0.0)) > data3.append((5, 4.2, 0.0, 0.0)) > print repr(data3) > > # convert to regular array for calculations: > data3 = np.array(data3) > > # now some calculations: > data3['discount_price'] = data3['price'] * 0.9 > > You wouldn't have to convert to a regular array, except that I haven't > written the code to support field access yet -- I don't think it would be > too hard, though. > > I've enclosed some test code, and my accumulator class, in case you find it > useful. > > > > -Chris > > numpy.lib.recfunctions has a method for easily adding new columns. Of course, it really returns a new recarray rather than adding it to an existing recarray. Appending records to such an array, however is a different story, and you have to do something like you demonstrated above. Ben Root -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Mon Dec 6 14:28:58 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 06 Dec 2010 11:28:58 -0800 Subject: [Numpy-discussion] Can I add rows and columns to recarray? In-Reply-To: References: <4CFD2AF3.3070205@noaa.gov> Message-ID: <4CFD397A.7010204@noaa.gov> On 12/6/10 11:00 AM, Benjamin Root wrote: > numpy.lib.recfunctions has a method for easily adding new columns. cool! There is a lot of other nifty- looking stuff in there too. The OP should really take a look. And maybe an appending function is in order, too. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From tungwaiyip at yahoo.com Mon Dec 6 16:00:29 2010 From: tungwaiyip at yahoo.com (Wai Yip Tung) Date: Mon, 06 Dec 2010 13:00:29 -0800 Subject: [Numpy-discussion] Can I add rows and columns to recarray? References: Message-ID: Thank you for the quick response and Christopher's explanation on the design background. All my tables fit in-memory. I want to explore the data interactively and relational database is does not provide me a lot of value. I was rolling my own library before I come to numpy. Then I find numpy's universal function awesome and really fit what I want to do. Now I just need to find out what to add row which is easy in Python. It is OK if it rebuild an array when I add a column, which should happen infrequently. But if adding row build a new array, this will lead to O(n^2) complexity. In anycase, I will explore the recfunctions. Thank you Wai Yip > On Sun, Dec 5, 2010 at 10:56 PM, Wai Yip Tung > wrote: >> I'm fairly new to numpy and I'm trying to figure out the right way to do >> things. Continuing on my question about using recarray as a relation. I >> have a recarray like this >> >> >> In [339]: arr = np.array([ >> .....: (1, 2.2, 0.0), >> .....: (3, 4.5, 0.0) >> .....: ], >> .....: dtype=[ >> .....: ('unit',int), >> .....: ('price',float), >> .....: ('amount',float), >> .....: ] >> .....: ) >> >> In [340]: data = arr.view(recarray) >> >> >> One of the most common thing I want to do is to append rows to data. I >> think concatenate() might be the method. But I get a problem: >> >> >> In [342]: np.concatenate((data0,[1,9.0,9.0])) >> --------------------------------------------------------------------------- >> TypeError Traceback (most recent call >> last) >> >> c:\Python26\Lib\site-packages\numpy\ in () >> >> TypeError: expected a readable buffer object >> >> >> >> The other thing I want to do is to calculate the column value. Right now >> it can do great thing like >> >> >> >> In [343]: data.amount = data.unit * data.price >> >> >> >> But sometimes it may require me to add a new column not already exist, >> e.g.: >> >> >> In [344]: data.discount_price = data.price * 0.9 >> >> >> How can I add a new column? I tried column_stack. But it give a similar >> TypeError. I figure I need to first specify the type of the column. But >> I >> don't know how. >> > > Check out numpy.lib.recfunctions > > I often have > > import numpy.lib.recfunctions as nprf > > Skipper From Chris.Barker at noaa.gov Mon Dec 6 17:44:54 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 06 Dec 2010 14:44:54 -0800 Subject: [Numpy-discussion] Can I add rows and columns to recarray? In-Reply-To: References: Message-ID: <4CFD6766.7080104@noaa.gov> On 12/6/10 1:00 PM, Wai Yip Tung wrote: > Thank you for the quick response and Christopher's explanation on the > design background. you're welcome. > But if adding row build a new array, this will lead to O(n^2) complexity. if you are adding a lot of rows one at a time, yes, you can have performance issues -- though re-allocating data is pretty fast, too -- maybe it won't matter. If it does, consider the accumulator code I sent, or use it as inspiration to write your own. If you do improve it, please send your improvements back to me. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From faltet at pytables.org Mon Dec 6 18:06:54 2010 From: faltet at pytables.org (Francesc Alted) Date: Tue, 7 Dec 2010 00:06:54 +0100 Subject: [Numpy-discussion] Can I add rows and columns to recarray? In-Reply-To: References: Message-ID: <201012070006.54624.faltet@pytables.org> A Monday 06 December 2010 22:00:29 Wai Yip Tung escrigu?: > Thank you for the quick response and Christopher's explanation on the > design background. > > All my tables fit in-memory. I want to explore the data interactively > and relational database is does not provide me a lot of value. > > I was rolling my own library before I come to numpy. Then I find > numpy's universal function awesome and really fit what I want to do. > Now I just need to find out what to add row which is easy in Python. > It is OK if it rebuild an array when I add a column, which should > happen infrequently. But if adding row build a new array, this will > lead to O(n^2) complexity. In anycase, I will explore the > recfunctions. If you want a container with a better complexity for adding columns than O(n^2), you may want to have a look at the ctable object in carray package: https://github.com/FrancescAlted/carray carray is about providing compressed, in-memory data containers for both homogeneous (arrays) and heterogeneous data (structured arrays). Here it is an example of use: >>> import numpy as np >>> import carray as ca >>> NR = 1000*1000 >>> r = np.fromiter(((i,i*i) for i in xrange(NR)), dtype="i4,i8") >>> new_field = np.arange(NR, dtype='f8')**3 >>> rc = ca.ctable(r) >>> rc ctable((1000000,), [('f0', '>> time rc.addcol(new_field, "f2") CPU times: user 0.03 s, sys: 0.00 s, total: 0.03 s Wall time: 0.03 s that is, only 30 ms for appending a column. This is basically the time to copy (and compress) the data (i.e. O(n)). If you append an already compressed column, the cost of adding it is O(1): >>> r = np.fromiter(((i,i*i) for i in xrange(NR)), dtype="i4,i8") >>> rc = ca.ctable(r) >>> cnew_field = ca.carray(np.arange(NR, dtype='f8')**3) >>> time rc.addcol(cnew_field, "f2") CPU times: user 0.00 s, sys: 0.00 s, total: 0.00 s Wall time: 0.00 s On his hand, using plain structured arrays is pretty more costly: >>> import numpy.lib.recfunctions as nprf >>> time r2 = nprf.rec_append_fields(r, 'f2', new_field, 'f8') CPU times: user 0.34 s, sys: 0.02 s, total: 0.36 s Wall time: 0.36 s Appending data at the end of ctable objects is also very fast: >>> timeit rc.append(row) 100000 loops, best of 3: 13.1 ?s per loop Compare this with an append with an structured array: >>> timeit np.concatenate((r2, row)) 100 loops, best of 3: 6.84 ms per loop Unfortunately you cannot do the full range of operations supported by structured arrays with ctables, and a ctable object is rather meant to be used as an efficient, compressed container for structures in memory: >>> r2[2] (2, 4, 8.0) >>> rc[2] (2, 4, 8.0) >>> r2['f1'] array([0, 1, 4, ..., 1, 1, 1]) >>> rc['f1'] carray((1452223,), int64) nbytes: 11.08 MB; cbytes: 1.62 MB; ratio: 6.85 cparams := cparams(clevel=5, shuffle=True) [0, 1, 4, ..., 1, 1, 1] But still, you can do funny things like complex queries: >>> [r for r in rc.getif("(f0<10)&(f2>4)", ["__nrow__", "f1"])] [(2, 4), (3, 9), (4, 16), (5, 25), (6, 36), (7, 49), (8, 64), (9, 81), (1041112, 1)] The queries are also very fast (both Numexpr and Blosc are used under the hood): >>> timeit [r for r in rc.getif("(f0<10)&(f2>4)")] 10 loops, best of 3: 58.6 ms per loop >>> timeit r2[(r2['f0']<10)&(r2['f2']>4)] 10 loops, best of 3: 28 ms per loop So, queries on ctables are only 2x slower than using plain structured arrays --of course, the secret goal is to make these sort of queries actually faster than using structured arrays :) I still need to finish the docs, but I plan to release carray 0.3 later this week. Cheers, -- Francesc Alted From moura.mario at gmail.com Mon Dec 6 21:18:41 2010 From: moura.mario at gmail.com (Mario Moura) Date: Tue, 7 Dec 2010 00:18:41 -0200 Subject: [Numpy-discussion] The power of strides - Combinations Message-ID: Hi Folks Is it possible some example how deal with strides with combinations, let see: >>> from numpy import * >>> import itertools >>> dt = dtype('i,i,i') >>> a = fromiter(itertools.combinations(range(10),3), dtype=dt, count=-1) >>> a array([(0, 1, 2), (0, 1, 3), (0, 1, 4), (0, 1, 5), (0, 1, 6), (0, 1, 7), (0, 1, 8), (0, 1, 9), (0, 2, 3), (0, 2, 4), (0, 2, 5), (0, 2, 6), (0, 2, 7), (0, 2, 8), (0, 2, 9), (0, 3, 4), (0, 3, 5), (0, 3, 6), (0, 3, 7), (0, 3, 8), (0, 3, 9), (0, 4, 5), (0, 4, 6), (0, 4, 7), (0, 4, 8), (0, 4, 9), (0, 5, 6), (0, 5, 7), (0, 5, 8), (0, 5, 9), (0, 6, 7), (0, 6, 8), (0, 6, 9), (0, 7, 8), (0, 7, 9), (0, 8, 9), (1, 2, 3), (1, 2, 4), (1, 2, 5), (1, 2, 6), (1, 2, 7), (1, 2, 8), (1, 2, 9), (1, 3, 4), (1, 3, 5), (1, 3, 6), (1, 3, 7), (1, 3, 8), (1, 3, 9), (1, 4, 5), (1, 4, 6), (1, 4, 7), (1, 4, 8), (1, 4, 9), (1, 5, 6), (1, 5, 7), (1, 5, 8), (1, 5, 9), (1, 6, 7), (1, 6, 8), (1, 6, 9), (1, 7, 8), (1, 7, 9), (1, 8, 9), (2, 3, 4), (2, 3, 5), (2, 3, 6), (2, 3, 7), (2, 3, 8), (2, 3, 9), (2, 4, 5), (2, 4, 6), (2, 4, 7), (2, 4, 8), (2, 4, 9), (2, 5, 6), (2, 5, 7), (2, 5, 8), (2, 5, 9), (2, 6, 7), (2, 6, 8), (2, 6, 9), (2, 7, 8), (2, 7, 9), (2, 8, 9), (3, 4, 5), (3, 4, 6), (3, 4, 7), (3, 4, 8), (3, 4, 9), (3, 5, 6), (3, 5, 7), (3, 5, 8), (3, 5, 9), (3, 6, 7), (3, 6, 8), (3, 6, 9), (3, 7, 8), (3, 7, 9), (3, 8, 9), (4, 5, 6), (4, 5, 7), (4, 5, 8), (4, 5, 9), (4, 6, 7), (4, 6, 8), (4, 6, 9), (4, 7, 8), (4, 7, 9), (4, 8, 9), (5, 6, 7), (5, 6, 8), (5, 6, 9), (5, 7, 8), (5, 7, 9), (5, 8, 9), (6, 7, 8), (6, 7, 9), (6, 8, 9), (7, 8, 9)], dtype=[('f0', '>> Many thanks Mr. Warren about this ((itertools.combinations(range(10),3), dtype=dt, count=-1)) But as you can see itertools.combinations are emitted in lexicographic sort order but NOT with "power of" strides. So what I see is every element in this array into one memory spot but I would like to know if is possible, use "the power of strides"! >>> x = a.reshape(120,1) x = stride_tricks.as_strided(a,shape=(120,),strides=(4,4)) Should I use some sub-class like record array, scalar array? So what I want is repetitive elements on same memory spot. I want save memory in big arrays (main reason) and want go fast. How can I deal with this in random arrays but with repetitive elements? Is it possible have custom strides in subclass(that change in dimension) ? How do this? Best Regards Mario From robert.kern at gmail.com Mon Dec 6 21:47:34 2010 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 6 Dec 2010 20:47:34 -0600 Subject: [Numpy-discussion] The power of strides - Combinations In-Reply-To: References: Message-ID: On Mon, Dec 6, 2010 at 20:18, Mario Moura wrote: > Hi Folks > > Is it possible some example how deal with strides with combinations, let see: No, sorry. It is not possible to generate combinations just using strides. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From rbanerj at fas.harvard.edu Mon Dec 6 22:20:35 2010 From: rbanerj at fas.harvard.edu (Rajat Banerjee) Date: Mon, 6 Dec 2010 22:20:35 -0500 Subject: [Numpy-discussion] fromrecords yields "ValueError: invalid itemsize in generic type tuple" Message-ID: Hi All, I have been using Numpy for a while with great success. I left my little project for a little while (http://web.mit.edu/stardev/cluster/) and now some of my code is broken. I have some Numpy code to create graphs of activity on a cluster with matplotlib. It ran just fine in July / August 2010, but has since stopped working. I have updated numpy on my machine, I think. In [2]: np.version.version Out[2]: '1.5.1' My call to np.rec.fromrecords() is throwing this exception: File "/home/rajat/Envs/StarCluster/lib/python2.6/site-packages/numpy/core/records.py", line 607, in fromrecords descr = sb.dtype((record, dtype)) ValueError: invalid itemsize in generic type tuple Here is the code with some irrelevant stuff stripped: for line in file: a = [datetime.strptime(parts[0], '%Y-%m-%d %H:%M:%S.%f'), int(parts[1]), int(parts[2]), int(parts[3]), int(parts[4]), int(parts[5]), int(parts[6]), float(parts[7])] list.append(a) file.close() names = ['dt', 'hosts', 'running_jobs', 'queued_jobs',\ 'slots', 'avg_duration', 'avg_wait', 'avg_load'] descriptor = {'names': ('dt,hosts,running_jobs,queued_jobs,slots,avg_duration,avg_wait,avg_load'),\ 'formats' : ('S20','u','u','u','u','u','u','f')} self.records = np.rec.fromrecords(list,','.join(names)) #used to work #self.records = np.rec.fromrecords(list, dtype=descriptor) #new attempt Here is one "line" from the array "list": >>> parts (8) = ['2010-12-07 03:09:46.855712', '2', '2', '177', '2', '86', '370', '1.05']. Neither of those np.rec.fromrecords() calls works. I've tried both separately. They both throw the exact same exception, ValueError: invalid itemsize in generic type tuple Can anybody help me? Am I doing something dumb? Thank you. Rajat Banerjee Masters Candidate, Computer Science, Harvard University From ben.root at ou.edu Mon Dec 6 22:51:06 2010 From: ben.root at ou.edu (Benjamin Root) Date: Mon, 6 Dec 2010 21:51:06 -0600 Subject: [Numpy-discussion] The power of strides - Combinations In-Reply-To: References: Message-ID: On Monday, December 6, 2010, Robert Kern wrote: > On Mon, Dec 6, 2010 at 20:18, Mario Moura wrote: >> Hi Folks >> >> Is it possible some example how deal with strides with combinations, let see: > > No, sorry. It is not possible to generate combinations just using strides. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > ? -- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Just wondering, would using ogrid[] in numpy help the OP? Ben From robert.kern at gmail.com Mon Dec 6 22:58:12 2010 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 6 Dec 2010 21:58:12 -0600 Subject: [Numpy-discussion] The power of strides - Combinations In-Reply-To: References: Message-ID: On Mon, Dec 6, 2010 at 21:51, Benjamin Root wrote: > On Monday, December 6, 2010, Robert Kern wrote: >> On Mon, Dec 6, 2010 at 20:18, Mario Moura wrote: >>> Hi Folks >>> >>> Is it possible some example how deal with strides with combinations, let see: >> >> No, sorry. It is not possible to generate combinations just using strides. > > Just wondering, would using ogrid[] in numpy help the OP? No. The limitations of the stride mechanisms apply with ogrid just the same. Generating combinations is not regular enough. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From sebastian.walter at gmail.com Tue Dec 7 09:15:24 2010 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Tue, 7 Dec 2010 15:15:24 +0100 Subject: [Numpy-discussion] ctypes and numpy Message-ID: Hello all, I'd like to call a Python function from a C++ code. The Python function has numpy.ndarrays as input. I figured that the easiest way would be to use ctypes. However, I can't get numpy and ctypes to work together. ----------- run.c ------------ #include #include void run(PyArrayObject *y, PyObject *f) { npy_intp Ny = PyArray_SIZE(y); } ------- end run.c ---------- -------- run.py ---------- import os, ctypes, numpy _r = numpy.ctypeslib.load_library('librun.so', os.path.dirname(__file__)) _r.run.argtypes = [ctypes.py_object] x = numpy.array([1,2,3.]) _r.run(x) -------- end run.py ---------- Compiling gives me the warning: gcc -o librun.so -I/usr/include/python2.6 -I/usr/lib/python2.6/dist-packages/numpy/core/include -O0 -fpic -shared -Wall run.c run.c: In function ?run?: run.c:5: warning: unused variable ?Ny? run.c: At top level: /usr/include/python2.6/numpy/__multiarray_api.h:968: warning: ?_import_array? defined but not used and when I run it I get a segmentation fault. I guess I'm not the first one who has this problem, but I couldn't find something useful on the web. Any pointers are suggestions are welcome. cheers, Sebastian From jmccampbell at enthought.com Tue Dec 7 13:34:25 2010 From: jmccampbell at enthought.com (Jason McCampbell) Date: Tue, 7 Dec 2010 12:34:25 -0600 Subject: [Numpy-discussion] Refactor fork uses the ./configure, make, make install process. In-Reply-To: References: Message-ID: Sorry for the late reply... I missed this thread. Thanks to Ilan for pointing it out. A variety of comments below... On Sat, Dec 4, 2010 at 10:20 AM, Charles R Harris wrote: > Just wondering if this is temporary or the intention is to change the > build process? I also note that the *.h files in libndarray are not complete > and a *lot* of trailing whitespace has crept into the files. For the purposes of our immediate project the intent is to use autoconf since it's widely available and makes building this part Python-independent and easier than working it into both distutils and numscons. Going forward it's certainly open to discussion. Currently all of the .h and .c files are generated as a part of the build rather than being checked in just because it saves a build step. Checking in the intermediate files isn't a problem either. Does the trailing whitespace cause problems? We saw it in the coding guidelines and planned to run a filter over it once the code stabilizes, but none of us had seen a guideline like that before and weren't sure why it was there. On Sat, Dec 4, 2010 at 3:01 PM, Charles R Harris wrote: > > > On Sat, Dec 4, 2010 at 1:45 PM, Pauli Virtanen wrote: > >> On Sat, 04 Dec 2010 14:24:49 -0600, Ilan Schnell wrote: >> > I'm not sure how reasonable it would be to move only libndarray into the >> > master, because I've been working on EPD for the last couple of week. >> > But Jason will know how complete libndarray is. >> >> The main question is whether moving it will make things easier or more >> difficult, I think. It's one tree more to keep track of. >> >> In any case, it would be a first part in the merge, and it would split >> the hunk of changes into two parts. >> >> > That would be a good thing IMHO. It would also bring a bit more numpy > reality to the refactor and since we are implicitly relying on it for the > next release sometime next spring the closer to reality it gets the better. > > >> *** >> >> Technically, the move could be done like this, so that merge tracking >> still works: >> >> --------refactor--------------- new-refactor >> / / >> /--------libndarray----------x >> / \ >> start---------------------- master----- new-master >> >> > Looks good to me. > Doing this isn't a problem, though I'm not sure if it buys us much. 90% of the changes are the refactoring, moving substantial amounts of code from numpy/core/src/multiarray and /umath into libndarray and then all of the assorted fix-ups. The rest is the .NET interface layer which is isolated in numpy/NumpyDotNet for now. We can leave this directory out, but everything else is the same between libndarray and refactor. Or am I misunderstanding the reason? The current state of the refactor branch is that it passes the bulk of regressions on Python 2.6 and 3.? (Ilan, what version did you use?) and is up-to-date with the master branch. There are a few failing regression test that we need to look at vs. the master branch but less than dozen. Switching to use libndarray is a big ABI+API change, right? If there's an > idea to release an ABI-compatible 1.6, wouldn't this end up being more > difficult? Maybe I'm misunderstanding this idea. Definitely a big ABI change and effectively a big API change. The API itself should be close to 100% compatible, except that the data structures all change to introduce a new layer of indirection. Code that strictly uses the macro accessors will build fine, but that is turning out to be quite rare. The changes are quite mechanical but still non-trivial for code that directly accesses the structure fields. Changes to Cython as a part of the project take care of some of the work. A new numpy.pdx file is needed and will mask the changes as long as the Python (as opposed to the CPython) interface is used. -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Dec 7 13:57:13 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 7 Dec 2010 11:57:13 -0700 Subject: [Numpy-discussion] Refactor fork uses the ./configure, make, make install process. In-Reply-To: References: Message-ID: On Tue, Dec 7, 2010 at 11:34 AM, Jason McCampbell wrote: > Sorry for the late reply... I missed this thread. Thanks to Ilan for > pointing it out. A variety of comments below... > > On Sat, Dec 4, 2010 at 10:20 AM, Charles R Harris< > charlesr.harris at gmail.com> wrote: > >> Just wondering if this is temporary or the intention is to change the >> build process? I also note that the *.h files in libndarray are not complete >> and a *lot* of trailing whitespace has crept into the files. > > > For the purposes of our immediate project the intent is to use autoconf > since it's widely available and makes building this part Python-independent > and easier than working it into both distutils and numscons. Going forward > it's certainly open to discussion. > > Yes, maintaining multiple build systems is a hassle. I'm wondering if we shouldn't remove the scons stuff and stick with distutils until we definitely decide there is a better way. As to autotools, I think it is a fine short term solution for development purposes, but probably needs to be replaced down the road. > Currently all of the .h and .c files are generated as a part of the build > rather than being checked in just because it saves a build step. Checking > in the intermediate files isn't a problem either. > > The idea of having separate .h files is that you can test compile without a complete build. They might also be helpful in the separate compilation case (I haven't checked). But in any case, the *.h.src files are there just to make maintaining the .h file easier, they shouldn't be used as part of the build. > Does the trailing whitespace cause problems? We saw it in the coding > guidelines and planned to run a filter over it once the code stabilizes, but > none of us had seen a guideline like that before and weren't sure why it was > there. > > It should be cleaned up before anything becomes official. Git can be set up to warn about trailing whitespace. The general guideline is no trailing whitespace. For one thing you end up with repository changes that unintentionally involve whitespace. Most editors can be set up to flag trailing whitespace, which will increase the desire to keep the file clean. On Sat, Dec 4, 2010 at 3:01 PM, Charles R Harris wrote: > >> >> >> On Sat, Dec 4, 2010 at 1:45 PM, Pauli Virtanen wrote: >> >>> On Sat, 04 Dec 2010 14:24:49 -0600, Ilan Schnell wrote: >>> > I'm not sure how reasonable it would be to move only libndarray into >>> the >>> > master, because I've been working on EPD for the last couple of week. >>> > But Jason will know how complete libndarray is. >>> >>> The main question is whether moving it will make things easier or more >>> difficult, I think. It's one tree more to keep track of. >>> >>> In any case, it would be a first part in the merge, and it would split >>> the hunk of changes into two parts. >>> >>> >> That would be a good thing IMHO. It would also bring a bit more numpy >> reality to the refactor and since we are implicitly relying on it for the >> next release sometime next spring the closer to reality it gets the better. >> >> >>> *** >>> >>> Technically, the move could be done like this, so that merge tracking >>> still works: >>> >>> --------refactor--------------- new-refactor >>> / / >>> /--------libndarray----------x >>> / \ >>> start---------------------- master----- new-master >>> >>> >> Looks good to me. >> > > Doing this isn't a problem, though I'm not sure if it buys us much. 90% of > the changes are the refactoring, moving substantial amounts of code from > numpy/core/src/multiarray and /umath into libndarray and then all of the > assorted fix-ups. The rest is the .NET interface layer which is isolated in > numpy/NumpyDotNet for now. We can leave this directory out, but everything > else is the same between libndarray and refactor. Or am I misunderstanding > the reason? > > The idea is to keep things moving along and maybe encourage others to take a bigger role in the merge. We wouldn't touch the current master branch of numpy yet. > The current state of the refactor branch is that it passes the bulk of > regressions on Python 2.6 and 3.? (Ilan, what version did you use?) and is > up-to-date with the master branch. There are a few failing regression test > that we need to look at vs. the master branch but less than dozen. > > Switching to use libndarray is a big ABI+API change, right? If there's an >> idea to release an ABI-compatible 1.6, wouldn't this end up being more >> difficult? Maybe I'm misunderstanding this idea. > > > Definitely a big ABI change and effectively a big API change. The API > itself should be close to 100% compatible, except that the data structures > all change to introduce a new layer of indirection. Code that strictly uses > the macro accessors will build fine, but that is turning out to be quite > rare. The changes are quite mechanical but still non-trivial for code that > directly accesses the structure fields. > > Changes to Cython as a part of the project take care of some of the work. A > new numpy.pdx file is needed and will mask the changes as long as the Python > (as opposed to the CPython) interface is used. > > There probably needs to be some discussion of a release schedule so we can plan ahead. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Tue Dec 7 19:36:56 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Wed, 8 Dec 2010 08:36:56 +0800 Subject: [Numpy-discussion] Refactor fork uses the ./configure, make, make install process. In-Reply-To: References: Message-ID: On Wed, Dec 8, 2010 at 2:57 AM, Charles R Harris wrote: > > > On Tue, Dec 7, 2010 at 11:34 AM, Jason McCampbell < > jmccampbell at enthought.com> wrote: > >> Sorry for the late reply... I missed this thread. Thanks to Ilan for >> pointing it out. A variety of comments below... >> >> On Sat, Dec 4, 2010 at 10:20 AM, Charles R Harris< >> charlesr.harris at gmail.com> wrote: >> >>> Just wondering if this is temporary or the intention is to change the >>> build process? I also note that the *.h files in libndarray are not complete >>> and a *lot* of trailing whitespace has crept into the files. >> >> >> For the purposes of our immediate project the intent is to use autoconf >> since it's widely available and makes building this part Python-independent >> and easier than working it into both distutils and numscons. Going forward >> it's certainly open to discussion. >> >> > Yes, maintaining multiple build systems is a hassle. I'm wondering if we > shouldn't remove the scons stuff and stick with distutils until we > definitely decide there is a better way. > Why would you want to remove scons before we settle on a final new way of doing things? It's not that much effort to maintain as far as I'm aware, and more useful (at least to me) than distutils. I don't see a reason not to keep it until we have something that's actually better (hopefully bento). Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue Dec 7 19:45:38 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 7 Dec 2010 17:45:38 -0700 Subject: [Numpy-discussion] Refactor fork uses the ./configure, make, make install process. In-Reply-To: References: Message-ID: On Tue, Dec 7, 2010 at 5:36 PM, Ralf Gommers wrote: > > > On Wed, Dec 8, 2010 at 2:57 AM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Tue, Dec 7, 2010 at 11:34 AM, Jason McCampbell < >> jmccampbell at enthought.com> wrote: >> >>> Sorry for the late reply... I missed this thread. Thanks to Ilan for >>> pointing it out. A variety of comments below... >>> >>> On Sat, Dec 4, 2010 at 10:20 AM, Charles R Harris< >>> charlesr.harris at gmail.com> wrote: >>> >>>> Just wondering if this is temporary or the intention is to change the >>>> build process? I also note that the *.h files in libndarray are not complete >>>> and a *lot* of trailing whitespace has crept into the files. >>> >>> >>> For the purposes of our immediate project the intent is to use autoconf >>> since it's widely available and makes building this part Python-independent >>> and easier than working it into both distutils and numscons. Going forward >>> it's certainly open to discussion. >>> >>> >> Yes, maintaining multiple build systems is a hassle. I'm wondering if we >> shouldn't remove the scons stuff and stick with distutils until we >> definitely decide there is a better way. >> > > Why would you want to remove scons before we settle on a final new way of > doing things? It's not that much effort to maintain as far as I'm aware, and > more useful (at least to me) than distutils. I don't see a reason not to > keep it until we have something that's actually better (hopefully bento). > > Actually, I was waiting to see if you liked scons ;) I don't use it myself but I was wondering if you used it for the releases. I agree it will be interesting to see what David comes up with, I think at the moment he likes waf as a build system to use with bento. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From matt.gregory at oregonstate.edu Wed Dec 8 12:12:44 2010 From: matt.gregory at oregonstate.edu (Gregory, Matthew) Date: Wed, 8 Dec 2010 09:12:44 -0800 Subject: [Numpy-discussion] creating zonal statistics from two arrays Message-ID: <1D673F86DDA00841A1216F04D1CE70D6426800037D@EXCH2.nws.oregonstate.edu> Hi all, Likely a very newbie type of question. I'm using numpy with GDAL to calculate zonal statistics on images. The basic approach is that I have a zone raster and a value raster which are aligned spatially and I am storing each zone's corresponding values in a dictionary, then calculating the statistics on that population. (I'm well aware that this approach may have memory issues with large rasters ...) GDAL ReadAsArray gives you a chunk of raster data as a numpy array. Currently I'm iterating over rows and columns of that chunk, but I'm guessing there's a better (and more numpy-like) way. zone_stats = {} zone_block = zone_band.ReadAsArray(x_off, y_off, x_size, y_size) value_block = value_band.ReadAsArray(x_off, y_off, x_size, y_size) for row in xrange(y_size): for col in xrange(x_size): zone = zone_block[row][col] value = value_block[row][col] try: zone_stats[zone].append(value) except KeyError: zone_stats[zone] = [value] # Then calculate stats per zone ... Thanks for all suggestions on how to make this better, especially if the initial approach I'm taking is flawed. matt From josef.pktd at gmail.com Wed Dec 8 12:48:51 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 8 Dec 2010 12:48:51 -0500 Subject: [Numpy-discussion] creating zonal statistics from two arrays In-Reply-To: <1D673F86DDA00841A1216F04D1CE70D6426800037D@EXCH2.nws.oregonstate.edu> References: <1D673F86DDA00841A1216F04D1CE70D6426800037D@EXCH2.nws.oregonstate.edu> Message-ID: On Wed, Dec 8, 2010 at 12:12 PM, Gregory, Matthew wrote: > Hi all, > > Likely a very newbie type of question. ?I'm using numpy with GDAL to calculate zonal statistics on images. ?The basic approach is that I have a zone raster and a value raster which are aligned spatially and I am storing each zone's corresponding values in a dictionary, then calculating the statistics on that population. ?(I'm well aware that this approach may have memory issues with large rasters ...) > > GDAL ReadAsArray gives you a chunk of raster data as a numpy array. ?Currently I'm iterating over rows and columns of that chunk, but I'm guessing there's a better (and more numpy-like) way. > > zone_stats = {} > zone_block = zone_band.ReadAsArray(x_off, y_off, x_size, y_size) > value_block = value_band.ReadAsArray(x_off, y_off, x_size, y_size) > for row in xrange(y_size): > ? ?for col in xrange(x_size): > ? ? ? ?zone = zone_block[row][col] > ? ? ? ?value = value_block[row][col] > ? ? ? ?try: > ? ? ? ? ? ?zone_stats[zone].append(value) > ? ? ? ?except KeyError: > ? ? ? ? ? ?zone_stats[zone] = [value] > > # Then calculate stats per zone > ... Just a thought since I'm not doing spatial statistics. If you can create (integer) labels that assigns each point to a zone, then you can treat it essentially as a 1d grouped data, and you could use np.bincount to calculate some statistics, or alternatively scipy.ndimage.measurements for some additional statistics. This would avoid any python loop, but require a full label array. Josef > > Thanks for all suggestions on how to make this better, especially if the initial approach I'm taking is flawed. > > matt > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From n.becker at amolf.nl Fri Dec 10 04:13:11 2010 From: n.becker at amolf.nl (Nils Becker) Date: Fri, 10 Dec 2010 10:13:11 +0100 Subject: [Numpy-discussion] truth value of dtypes Message-ID: <4D01EF27.1080102@amolf.nl> Hi, why is >>> bool(np.dtype(np.float)) False ? I came across this when using this python idiom: def f(dtype=None): ....if not dtype: ........print 'using default dtype' If there is no good reason to have a False truth value, I would vote for making it True since that is what one would expect (no?) N. From markbak at gmail.com Fri Dec 10 04:38:01 2010 From: markbak at gmail.com (Mark Bakker) Date: Fri, 10 Dec 2010 10:38:01 +0100 Subject: [Numpy-discussion] status of date-times Message-ID: Hello List, Can someone update us on the status of the date-times datatype? Is it working yet? If not, what are the plans? I really appreciate all the work and am looking forward to using the new date-times, Best regards, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ingwer.Wurzel at gmx.net Fri Dec 10 05:33:18 2010 From: Ingwer.Wurzel at gmx.net (Katharina) Date: Fri, 10 Dec 2010 11:33:18 +0100 Subject: [Numpy-discussion] Numpy and Python3 Message-ID: <1291977198.4723.137.camel@Speranza> Hello everyone, first, I'm really apologise for my English-skills. But I have only one simple questions. Does NumPy work on Python3 now. I read so many articles on the Internet, but you can only read some speculation and not a clear state about this topic. At the moment I try numpy1.5.1 on Python3.0, but I get only Errors. If Numpy works on Python3, are the support libraries the same as by Python2.6? (By the way, I use Linux (Ubuntu 9.04)) /With kind regards Ingwer From ralf.gommers at googlemail.com Fri Dec 10 07:05:29 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Fri, 10 Dec 2010 20:05:29 +0800 Subject: [Numpy-discussion] Numpy and Python3 In-Reply-To: <1291977198.4723.137.camel@Speranza> References: <1291977198.4723.137.camel@Speranza> Message-ID: On Fri, Dec 10, 2010 at 6:33 PM, Katharina wrote: > Hello everyone, > > first, I'm really apologise for my English-skills. But I have only one > simple questions. Does NumPy work on Python3 now. > I read so many articles on the Internet, but you can only read some > speculation and not a clear state about this topic. > It works fine with Python 3.1. > > At the moment I try numpy1.5.1 on Python3.0, but I get only Errors. > If Numpy works on Python3, are the support libraries the same as by > Python2.6? > Which support libraries? Just Lapack/Blas or Atlas should be all you need, and just "$ python3.1 setup.py install --prefix=/home/XXX/pick-a-folder" should work fine. If you encounter a problem, please send us the exact build command you used, the build log and compiler versions. Ralf > (By the way, I use Linux (Ubuntu 9.04)) > /With kind regards > Ingwer > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Fri Dec 10 09:33:40 2010 From: robert.kern at gmail.com (Robert Kern) Date: Fri, 10 Dec 2010 08:33:40 -0600 Subject: [Numpy-discussion] truth value of dtypes In-Reply-To: <4D01EF27.1080102@amolf.nl> References: <4D01EF27.1080102@amolf.nl> Message-ID: On Fri, Dec 10, 2010 at 03:13, Nils Becker wrote: > Hi, > > why is > >>>> bool(np.dtype(np.float)) > False > > ? > > I came across this when using this python idiom: > > def f(dtype=None): > ....if not dtype: > ........print 'using default dtype' The default truth value probably should be True, but not for this reason. The correct idiom to use is this: def f(dtype=None): if dtype is None: print 'using default dtype' -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From alan.isaac at gmail.com Fri Dec 10 09:47:59 2010 From: alan.isaac at gmail.com (Alan G Isaac) Date: Fri, 10 Dec 2010 09:47:59 -0500 Subject: [Numpy-discussion] truth value of dtypes In-Reply-To: <4D01EF27.1080102@amolf.nl> References: <4D01EF27.1080102@amolf.nl> Message-ID: <4D023D9F.8050707@gmail.com> On 12/10/2010 4:13 AM, Nils Becker wrote: > def f(dtype=None): > ....if not dtype: I think you want: if dtype is None: fwiw, Alan From charlesr.harris at gmail.com Fri Dec 10 10:45:47 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 10 Dec 2010 08:45:47 -0700 Subject: [Numpy-discussion] Numpy and Python3 In-Reply-To: <1291977198.4723.137.camel@Speranza> References: <1291977198.4723.137.camel@Speranza> Message-ID: On Fri, Dec 10, 2010 at 3:33 AM, Katharina wrote: > Hello everyone, > > first, I'm really apologise for my English-skills. But I have only one > simple questions. Does NumPy work on Python3 now. > I read so many articles on the Internet, but you can only read some > speculation and not a clear state about this topic. > > At the moment I try numpy1.5.1 on Python3.0, but I get only Errors. > If Numpy works on Python3, are the support libraries the same as by > Python2.6? > (By the way, I use Linux (Ubuntu 9.04)) > > We don't support 3.0, only 3.1 and above. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Fri Dec 10 12:25:25 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 10 Dec 2010 09:25:25 -0800 Subject: [Numpy-discussion] A Cython apply_along_axis function In-Reply-To: References: <4CF6FC27.5040005@silveregg.co.jp> Message-ID: On Wed, Dec 1, 2010 at 6:07 PM, Keith Goodman wrote: > On Wed, Dec 1, 2010 at 5:53 PM, David wrote: > >> On 12/02/2010 04:47 AM, Keith Goodman wrote: >>> It's hard to write Cython code that can handle all dtypes and >>> arbitrary number of dimensions. The former is typically dealt with >>> using templates, but what do people do about the latter? >> >> The only way that I know to do that systematically is iterator. There is >> a relatively simple example in scipy/signal (lfilter.c.src). >> >> I wonder if it would be possible to add better support for numpy >> iterators in cython... > > Thanks for the tip. I'm starting to think that for now I should just > template both dtype and ndim. I ended up templating both dtype and axis. For the axis templating I used two functions: looper and loop_cdef. LOOPER Make a 3d loop template: >>> loop = ''' .... for iINDEX0 in range(nINDEX0): .... for iINDEX1 in range(nINDEX1): .... amin = MAXDTYPE .... for iINDEX2 in range(nINDEX2): .... ai = a[INDEXALL] .... if ai <= amin: .... amin = ai .... y[INDEXPOP] = amin .... ''' Import the looper function: >>> from bottleneck.src.template.template import looper Make a loop over axis=0: >>> print looper(loop, ndim=3, axis=0) for i1 in range(n1): for i2 in range(n2): amin = MAXDTYPE for i0 in range(n0): ai = a[i0, i1, i2] if ai <= amin: amin = ai y[i1, i2] = amin Make a loop over axis=1: >>> print looper(loop, ndim=3, axis=1) for i0 in range(n0): for i2 in range(n2): amin = MAXDTYPE for i1 in range(n1): ai = a[i0, i1, i2] if ai <= amin: amin = ai y[i0, i2] = amin LOOP_CDEF Define parameters: >>> ndim = 3 >>> dtype = 'float64' >>> axis = 1 >>> is_reducing_function = True Import loop_cdef: >>> from bottleneck.src.template.template import loop_cdef Make loop initialization code: >>> print loop_cdef(ndim, dtype, axis, is_reducing_function) cdef Py_ssize_t i0, i1, i2 cdef int n0 = a.shape[0] cdef int n1 = a.shape[1] cdef int n2 = a.shape[2] cdef np.npy_intp *dims = [n0, n2] cdef np.ndarray[np.float64_t, ndim=2] y = PyArray_EMPTY(2, dims, NPY_float64, 0) From kwgoodman at gmail.com Fri Dec 10 16:42:49 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 10 Dec 2010 13:42:49 -0800 Subject: [Numpy-discussion] np.var() and ddof Message-ID: Why does ddof=2 and ddof=3 give the same result? >> np.var([1, 2, 3], ddof=0) 0.66666666666666663 >> np.var([1, 2, 3], ddof=1) 1.0 >> np.var([1, 2, 3], ddof=2) 2.0 >> np.var([1, 2, 3], ddof=3) 2.0 >> np.var([1, 2, 3], ddof=4) -2.0 I expected NaN for ddof=3. From josef.pktd at gmail.com Fri Dec 10 17:26:54 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 10 Dec 2010 17:26:54 -0500 Subject: [Numpy-discussion] np.var() and ddof In-Reply-To: References: Message-ID: On Fri, Dec 10, 2010 at 4:42 PM, Keith Goodman wrote: > Why does ddof=2 and ddof=3 give the same result? > >>> np.var([1, 2, 3], ddof=0) > ? 0.66666666666666663 >>> np.var([1, 2, 3], ddof=1) > ? 1.0 >>> np.var([1, 2, 3], ddof=2) > ? 2.0 >>> np.var([1, 2, 3], ddof=3) > ? 2.0 >>> np.var([1, 2, 3], ddof=4) > ? -2.0 > > I expected NaN for ddof=3. It's a floating point calculation, so I would expect np.inf Josef > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From kwgoodman at gmail.com Fri Dec 10 17:32:24 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 10 Dec 2010 14:32:24 -0800 Subject: [Numpy-discussion] np.var() and ddof In-Reply-To: References: Message-ID: On Fri, Dec 10, 2010 at 2:26 PM, wrote: > On Fri, Dec 10, 2010 at 4:42 PM, Keith Goodman wrote: >> Why does ddof=2 and ddof=3 give the same result? >> >>>> np.var([1, 2, 3], ddof=0) >> ? 0.66666666666666663 >>>> np.var([1, 2, 3], ddof=1) >> ? 1.0 >>>> np.var([1, 2, 3], ddof=2) >> ? 2.0 >>>> np.var([1, 2, 3], ddof=3) >> ? 2.0 >>>> np.var([1, 2, 3], ddof=4) >> ? -2.0 >> >> I expected NaN for ddof=3. > > It's a floating point calculation, so I would expect np.inf Right, NAFN (F=Finite). Unless, of course, the numerator is zero too. From Ingwer.Wurzel at gmx.net Sat Dec 11 07:41:59 2010 From: Ingwer.Wurzel at gmx.net (Katharina) Date: Sat, 11 Dec 2010 13:41:59 +0100 Subject: [Numpy-discussion] Numpy and Python3 In-Reply-To: References: <1291977198.4723.137.camel@Speranza> Message-ID: <1292071320.4874.26.camel@Speranza> Hi, I install Python3.1, but I get the same Error: -------------------------------------------------------------------------------------------------------- sudo python3 setup.py build --fcompiler=gnu95 Converting to Python3 via 2to3... RefactoringTool: Skipping implicit fixer: buffer RefactoringTool: Skipping implicit fixer: idioms RefactoringTool: Skipping implicit fixer: set_literal RefactoringTool: Skipping implicit fixer: ws_comma RefactoringTool: No files need to be modified. Running from numpy source directory.Traceback (most recent call last): File "setup.py", line 211, in setup_package() File "setup.py", line 188, in setup_package from numpy.distutils.core import setup File "/usr/local/lib/python3.1/site-packages/numpy-1.5.1/build/py3k/numpy/distutils/__init__.py", line 22, in import numpy.distutils.ccompiler File "/usr/local/lib/python3.1/site-packages/numpy-1.5.1/build/py3k/numpy/distutils/ccompiler.py", line 15, in from numpy.distutils.exec_command import exec_command File "/usr/local/lib/python3.1/site-packages/numpy-1.5.1/build/py3k/numpy/distutils/exec_command.py", line 58, in from numpy.compat import open_latin1 File "/usr/local/lib/python3.1/site-packages/numpy-1.5.1/build/py3k/numpy/compat/__init__.py", line 14, in from .py3k import * AttributeError: 'module' object has no attribute 'unicode' -------------------------------------------------------------------------------------------------------- Can somebody see, what's the problem? I'm really pleased for any help. /With kind regards Ingwer Am Freitag, den 10.12.2010, 20:05 +0800 schrieb Ralf Gommers: > > > On Fri, Dec 10, 2010 at 6:33 PM, Katharina > wrote: > Hello everyone, > > first, I'm really apologise for my English-skills. But I have > only one > simple questions. Does NumPy work on Python3 now. > I read so many articles on the Internet, but you can only read > some > speculation and not a clear state about this topic. > > It works fine with Python 3.1. > > > At the moment I try numpy1.5.1 on Python3.0, but I get only > Errors. > If Numpy works on Python3, are the support libraries the same > as by > Python2.6? > > Which support libraries? Just Lapack/Blas or Atlas should be all you > need, and just "$ python3.1 setup.py install > --prefix=/home/XXX/pick-a-folder" should work fine. > > If you encounter a problem, please send us the exact build command you > used, the build log and compiler versions. > > Ralf > > > > (By the way, I use Linux (Ubuntu 9.04)) > > /With kind regards > Ingwer > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Sat Dec 11 10:05:17 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 11 Dec 2010 08:05:17 -0700 Subject: [Numpy-discussion] Numpy and Python3 In-Reply-To: <1292071320.4874.26.camel@Speranza> References: <1291977198.4723.137.camel@Speranza> <1292071320.4874.26.camel@Speranza> Message-ID: On Sat, Dec 11, 2010 at 5:41 AM, Katharina wrote: > Hi, > I install Python3.1, but I get the same Error: > > -------------------------------------------------------------------------------------------------------- > sudo python3 setup.py build --fcompiler=gnu95 > Converting to Python3 via 2to3... > RefactoringTool: Skipping implicit fixer: buffer > RefactoringTool: Skipping implicit fixer: idioms > RefactoringTool: Skipping implicit fixer: set_literal > RefactoringTool: Skipping implicit fixer: ws_comma > RefactoringTool: No files need to be modified. > Running from numpy source directory.Traceback (most recent call last): > File "setup.py", line 211, in > setup_package() > File "setup.py", line 188, in setup_package > from numpy.distutils.core import setup > File > "/usr/local/lib/python3.1/site-packages/numpy-1.5.1/build/py3k/numpy/distutils/__init__.py", > line 22, in > import numpy.distutils.ccompiler > Are you doing the build in /usr/local/lib/python3.1/site-packages/ ? Usually the build is done in a working directory and installed by "python setup.py install". I don't know that that is the problem, but it is unusual. File > "/usr/local/lib/python3.1/site-packages/numpy-1.5.1/build/py3k/numpy/distutils/ccompiler.py", > line 15, in > from numpy.distutils.exec_command import exec_command > File > "/usr/local/lib/python3.1/site-packages/numpy-1.5.1/build/py3k/numpy/distutils/exec_command.py", > line 58, in > from numpy.compat import open_latin1 > File > "/usr/local/lib/python3.1/site-packages/numpy-1.5.1/build/py3k/numpy/compat/__init__.py", > line 14, in > from .py3k import * > AttributeError: 'module' object has no attribute 'unicode' > > > -------------------------------------------------------------------------------------------------------- > > Can somebody see, what's the problem? > I'm really pleased for any help. > > Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ingwer.Wurzel at gmx.net Sat Dec 11 13:53:30 2010 From: Ingwer.Wurzel at gmx.net (Katharina) Date: Sat, 11 Dec 2010 19:53:30 +0100 Subject: [Numpy-discussion] Numpy and Python3 In-Reply-To: References: <1291977198.4723.137.camel@Speranza> <1292071320.4874.26.camel@Speranza> Message-ID: <1292093610.4874.31.camel@Speranza> Hi, yes my build is in /usr/local/lib/python3.1/site-packages/numpy-1.5.1. Is't wrong? / Ingwer Am Samstag, den 11.12.2010, 08:05 -0700 schrieb Charles R Harris: > > > On Sat, Dec 11, 2010 at 5:41 AM, Katharina > wrote: > Hi, > I install Python3.1, but I get the same Error: > -------------------------------------------------------------------------------------------------------- > sudo python3 setup.py build --fcompiler=gnu95 > Converting to Python3 via 2to3... > RefactoringTool: Skipping implicit fixer: buffer > RefactoringTool: Skipping implicit fixer: idioms > RefactoringTool: Skipping implicit fixer: set_literal > RefactoringTool: Skipping implicit fixer: ws_comma > RefactoringTool: No files need to be modified. > Running from numpy source directory.Traceback (most recent > call last): > File "setup.py", line 211, in > setup_package() > File "setup.py", line 188, in setup_package > from numpy.distutils.core import setup > File > "/usr/local/lib/python3.1/site-packages/numpy-1.5.1/build/py3k/numpy/distutils/__init__.py", line 22, in > import numpy.distutils.ccompiler > > Are you doing the build in /usr/local/lib/python3.1/site-packages/ ? > Usually the build is done in a working directory and installed by > "python setup.py install". I don't know that that is the problem, but > it is unusual. > > > File > "/usr/local/lib/python3.1/site-packages/numpy-1.5.1/build/py3k/numpy/distutils/ccompiler.py", line 15, in > from numpy.distutils.exec_command import exec_command > File > "/usr/local/lib/python3.1/site-packages/numpy-1.5.1/build/py3k/numpy/distutils/exec_command.py", line 58, in > from numpy.compat import open_latin1 > File > "/usr/local/lib/python3.1/site-packages/numpy-1.5.1/build/py3k/numpy/compat/__init__.py", line 14, in > from .py3k import * > AttributeError: 'module' object has no attribute 'unicode' > > -------------------------------------------------------------------------------------------------------- > > Can somebody see, what's the problem? > I'm really pleased for any help. > > > > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Sat Dec 11 14:06:23 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 11 Dec 2010 12:06:23 -0700 Subject: [Numpy-discussion] Numpy and Python3 In-Reply-To: <1292093610.4874.31.camel@Speranza> References: <1291977198.4723.137.camel@Speranza> <1292071320.4874.26.camel@Speranza> <1292093610.4874.31.camel@Speranza> Message-ID: On Sat, Dec 11, 2010 at 11:53 AM, Katharina wrote: > Hi, > yes my build is in /usr/local/lib/python3.1/site-packages/numpy-1.5.1. > Is't wrong? > > Well, let's find out ;) Move your numpy download somewhere like ~/numpy-1.5.1, then do cd numpy-1.5.1 python3.1 setup.py build sudo python3.1 setup.py install You should probably also do sudo rm -rf /usr/local/lib/python3.1/site-packages/numpy-1.5.1 before the build as well as remove your local build directory. You might also need to change ownership of the files from root to yourself. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ingwer.Wurzel at gmx.net Sat Dec 11 14:49:04 2010 From: Ingwer.Wurzel at gmx.net (Katharina) Date: Sat, 11 Dec 2010 20:49:04 +0100 Subject: [Numpy-discussion] Numpy and Python3 In-Reply-To: References: <1291977198.4723.137.camel@Speranza> <1292071320.4874.26.camel@Speranza> <1292093610.4874.31.camel@Speranza> Message-ID: <1292096944.4874.43.camel@Speranza> I'm really sorry, but the Error is the same: ---------------------------------------------------------------------------------------- ~/Desktop/numpy-1.5.1$ python3.1 setup.py build Converting to Python3 via 2to3... RefactoringTool: Skipping implicit fixer: buffer RefactoringTool: Skipping implicit fixer: idioms RefactoringTool: Skipping implicit fixer: set_literal RefactoringTool: Skipping implicit fixer: ws_comma RefactoringTool: No files need to be modified. Running from numpy source directory.Traceback (most recent call last): File "setup.py", line 211, in setup_package() File "setup.py", line 188, in setup_package from numpy.distutils.core import setup File "/home/natta/Desktop/numpy-1.5.1/build/py3k/numpy/distutils/__init__.py", line 22, in import numpy.distutils.ccompiler File "/home/natta/Desktop/numpy-1.5.1/build/py3k/numpy/distutils/ccompiler.py", line 15, in from numpy.distutils.exec_command import exec_command File "/home/natta/Desktop/numpy-1.5.1/build/py3k/numpy/distutils/exec_command.py", line 58, in from numpy.compat import open_latin1 File "/home/natta/Desktop/numpy-1.5.1/build/py3k/numpy/compat/__init__.py", line 14, in from .py3k import * AttributeError: 'module' object has no attribute 'unicode' ---------------------------------------------------------------------------------------- I don't do if it helps. But I try to install numpy1.5.1 on python2.6 and get this Erros: ---------------------------------------------------------------------------------------- /usr/local/lib/python2.6/site-packages/numpy-1.5.1$ python setup.py build fcompiler=gnu95 Traceback (most recent call last): File "setup.py", line 25, in import builtins as builtins ImportError: No module named builtins ---------------------------------------------------------------------------------------- / Ingwer Am Samstag, den 11.12.2010, 12:06 -0700 schrieb Charles R Harris: > > > On Sat, Dec 11, 2010 at 11:53 AM, Katharina > wrote: > Hi, > yes my build is > in /usr/local/lib/python3.1/site-packages/numpy-1.5.1. > Is't wrong? > > > Well, let's find out ;) Move your numpy download somewhere like > ~/numpy-1.5.1, then do > > cd numpy-1.5.1 > python3.1 setup.py build > sudo python3.1 setup.py install > > You should probably also do > > sudo rm -rf /usr/local/lib/python3.1/site-packages/numpy-1.5.1 before > the build as well as remove your local build directory. You might also > need to change ownership of the files from root to yourself. > > Chuck > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From Ingwer.Wurzel at gmx.net Sat Dec 11 14:58:35 2010 From: Ingwer.Wurzel at gmx.net (Katharina) Date: Sat, 11 Dec 2010 20:58:35 +0100 Subject: [Numpy-discussion] Numpy and Python3 In-Reply-To: <1292096944.4874.43.camel@Speranza> References: <1291977198.4723.137.camel@Speranza> <1292071320.4874.26.camel@Speranza> <1292093610.4874.31.camel@Speranza> <1292096944.4874.43.camel@Speranza> Message-ID: <1292097515.4874.47.camel@Speranza> Oh... the Problem with python2.6 is solved. I take the numpy Version, which was transformed with 2to3. sorry /Ingwer Am Samstag, den 11.12.2010, 20:49 +0100 schrieb Katharina: > I'm really sorry, but the Error is the same: > > ---------------------------------------------------------------------------------------- > ~/Desktop/numpy-1.5.1$ python3.1 setup.py build > Converting to Python3 via 2to3... > RefactoringTool: Skipping implicit fixer: buffer > RefactoringTool: Skipping implicit fixer: idioms > RefactoringTool: Skipping implicit fixer: set_literal > RefactoringTool: Skipping implicit fixer: ws_comma > RefactoringTool: No files need to be modified. > Running from numpy source directory.Traceback (most recent call last): > File "setup.py", line 211, in > setup_package() > File "setup.py", line 188, in setup_package > from numpy.distutils.core import setup > File > "/home/natta/Desktop/numpy-1.5.1/build/py3k/numpy/distutils/__init__.py", line 22, in > import numpy.distutils.ccompiler > File > "/home/natta/Desktop/numpy-1.5.1/build/py3k/numpy/distutils/ccompiler.py", line 15, in > from numpy.distutils.exec_command import exec_command > File > "/home/natta/Desktop/numpy-1.5.1/build/py3k/numpy/distutils/exec_command.py", line 58, in > from numpy.compat import open_latin1 > File > "/home/natta/Desktop/numpy-1.5.1/build/py3k/numpy/compat/__init__.py", > line 14, in > from .py3k import * > AttributeError: 'module' object has no attribute 'unicode' > ---------------------------------------------------------------------------------------- > > > I don't do if it helps. But I try to install numpy1.5.1 on python2.6 > and get this Erros: > ---------------------------------------------------------------------------------------- > /usr/local/lib/python2.6/site-packages/numpy-1.5.1$ python setup.py > build fcompiler=gnu95 > Traceback (most recent call last): > File "setup.py", line 25, in > import builtins as builtins > ImportError: No module named builtins > ---------------------------------------------------------------------------------------- > > / Ingwer > > > > > > Am Samstag, den 11.12.2010, 12:06 -0700 schrieb Charles R Harris: > > > > > > On Sat, Dec 11, 2010 at 11:53 AM, Katharina > > wrote: > > Hi, > > yes my build is > > in /usr/local/lib/python3.1/site-packages/numpy-1.5.1. > > Is't wrong? > > > > > > Well, let's find out ;) Move your numpy download somewhere like > > ~/numpy-1.5.1, then do > > > > cd numpy-1.5.1 > > python3.1 setup.py build > > sudo python3.1 setup.py install > > > > You should probably also do > > > > sudo rm -rf /usr/local/lib/python3.1/site-packages/numpy-1.5.1 before > > the build as well as remove your local build directory. You might also > > need to change ownership of the files from root to yourself. > > > > Chuck > > > > > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Sat Dec 11 15:19:05 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 11 Dec 2010 13:19:05 -0700 Subject: [Numpy-discussion] Numpy and Python3 In-Reply-To: <1292097515.4874.47.camel@Speranza> References: <1291977198.4723.137.camel@Speranza> <1292071320.4874.26.camel@Speranza> <1292093610.4874.31.camel@Speranza> <1292096944.4874.43.camel@Speranza> <1292097515.4874.47.camel@Speranza> Message-ID: On Sat, Dec 11, 2010 at 12:58 PM, Katharina wrote: > Oh... the Problem with python2.6 is solved. > I take the numpy Version, which was transformed with 2to3. > > Wait, how did you do that? Setup should automatically select the right version. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From seb.haase at gmail.com Sat Dec 11 17:40:35 2010 From: seb.haase at gmail.com (Sebastian Haase) Date: Sat, 11 Dec 2010 23:40:35 +0100 Subject: [Numpy-discussion] np.lookfor -- still supported / working ? Message-ID: Hi all, I recently discovered numpy's lookfor function, which is supposed to look through "all kinds" of doc strings and list relevant functions related to given keywords. However it does not seem to work for me: >>> N.__version__ '1.3.0' >>> N.lookfor('fft', module=None, import_modules=True, regenerate=False) Traceback (most recent call last): File "", line 1, in File "C:\cygwin\home\haase\Priithon_25_win\numpy\lib\utils.py", line 622, in lookfor found.sort(relevance_sort) TypeError: comparison function must return int >>> 1/2 # I use from future import division .... 0.5 >>> I especially like, that it is supposed to work for any other (not just numpy or scipy) module. But that also doesn't work: >>> import wx >>> N.lookfor('background', module='wx', import_modules=True, regenerate=False) Traceback (most recent call last): File "", line 1, in File "C:\cygwin\home\haase\Priithon_25_win\numpy\lib\utils.py", line 574, in lookfor cache = _lookfor_generate_cache(module, import_modules, regenerate) File "C:\cygwin\home\haase\Priithon_25_win\numpy\lib\utils.py", line 729, in _lookfor_generate_cache doc = inspect.getdoc(item) File "C:\cygwin\home\haase\Priithon_25_win\Python25\lib\inspect.py", line 313, in getdoc doc = object.__doc__ NameError: Unknown C global variable >>> I tried it on more recent numpy (1.5.1 I think) and got same problems. Any comments ? Thanks, Sebastian Haase From pav at iki.fi Sat Dec 11 18:53:58 2010 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 11 Dec 2010 23:53:58 +0000 (UTC) Subject: [Numpy-discussion] np.lookfor -- still supported / working ? References: Message-ID: On Sat, 11 Dec 2010 23:40:35 +0100, Sebastian Haase wrote: > Hi all, > > I recently discovered numpy's lookfor function, which is supposed to > look through "all kinds" of doc strings and list relevant functions > related to given keywords. However it does not seem to work for me: >>>> N.__version__ > '1.3.0' >>>> N.lookfor('fft', module=None, import_modules=True, regenerate=False) [clip] Worksforme >>> import numpy as np >>> np.__version__ '1.5.1' >>> np.lookfor('fft', module=None, import_modules=True, regenerate=False) Search results for 'fft' ------------------------ numpy.fft.hfft Compute the FFT of a signal whose spectrum has Hermitian symmetry. ... [clip] > I especially like, that it is supposed to work for any other (not just > numpy or scipy) module. > But that also doesn't work: >>>> import wx >>>> N.lookfor('background', module='wx', import_modules=True, >>>> regenerate=False) > Traceback (most recent call last): > File "", line 1, in > File "C:\cygwin\home\haase\Priithon_25_win\numpy\lib\utils.py", line > 574, in lookfor > cache = _lookfor_generate_cache(module, import_modules, regenerate) > File "C:\cygwin\home\haase\Priithon_25_win\numpy\lib\utils.py", line > 729, in _lookfor_generate_cache > doc = inspect.getdoc(item) > File "C:\cygwin\home\haase\Priithon_25_win\Python25\lib\inspect.py", > line 313, in getdoc > doc = object.__doc__ > NameError: Unknown C global variable [clip] That's more of an issue in the `wx` module in that it behaves in a non- standard way under introspection. But yes, it would be possible to catch that exception and ignore it. -- Pauli Virtanen From olivier.grisel at ensta.org Sun Dec 12 08:41:25 2010 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Sun, 12 Dec 2010 14:41:25 +0100 Subject: [Numpy-discussion] [ANN] FOSDEM datadevroom - Feb. 5 2011 - Brussels - Call for Presentations Message-ID: Hello numpy users, We (Isabel Drost, Nicolas Maillot and I) are organizing a Data Analytics Devroom that will take place during the next edition of the FOSDEM in Brussels on Feb. 5. Here is the CFP: http://datadevroom.couch.it/CFP You might be interested in attending the event and take the opportunity to speak about your projects. Important Dates (all dates in GMT +2): Submission deadline: 2010-12-17 Notification of accepted speakers: 2010-12-20 Publication of final schedule: 2011-01-10 Meetup: 2011-02-05 The event will comprise presentations on scalable data processing. We invite you to submit talks on the topics: Information retrieval / Search Large Scale data processing, Machine Learning, Text Mining, Computer vision, [Linked] Open Data. High quality, technical submissions are called for, ranging from principles to practice. We are looking for presentations on the implementation of the systems themselves, real world applications and case studies. Submissions should be based on free software solutions. Please re-distribute this CFP to people who might be interested. Looking forward to meeting you face to face in Brussels, -- Olivier http://twitter.com/ogrisel - http://github.com/ogrisel From Ingwer.Wurzel at gmx.net Sun Dec 12 13:06:10 2010 From: Ingwer.Wurzel at gmx.net (Katharina) Date: Sun, 12 Dec 2010 19:06:10 +0100 Subject: [Numpy-discussion] Numpy and Python3 In-Reply-To: References: <1291977198.4723.137.camel@Speranza> <1292071320.4874.26.camel@Speranza> <1292093610.4874.31.camel@Speranza> <1292096944.4874.43.camel@Speranza> <1292097515.4874.47.camel@Speranza> Message-ID: <1292177170.8349.25.camel@Speranza> Hi Chuck, You are right, it works. I had so many versions of Numpy, that in the end I lost track. *Sorry* But now it works perfectly. Thank you for the help. /Ingwer ps: I know SciPY has its own Mail list, but it could be, that somebody can answer my question. Does SciPy works on Python3.1? Am Samstag, den 11.12.2010, 13:19 -0700 schrieb Charles R Harris: > > > On Sat, Dec 11, 2010 at 12:58 PM, Katharina > wrote: > Oh... the Problem with python2.6 is solved. > I take the numpy Version, which was transformed with 2to3. > > > Wait, how did you do that? Setup should automatically select the right > version. > > Chuck > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From charlesr.harris at gmail.com Sun Dec 12 14:23:21 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 12 Dec 2010 12:23:21 -0700 Subject: [Numpy-discussion] Numpy and Python3 In-Reply-To: <1292177170.8349.25.camel@Speranza> References: <1291977198.4723.137.camel@Speranza> <1292071320.4874.26.camel@Speranza> <1292093610.4874.31.camel@Speranza> <1292096944.4874.43.camel@Speranza> <1292097515.4874.47.camel@Speranza> <1292177170.8349.25.camel@Speranza> Message-ID: On Sun, Dec 12, 2010 at 11:06 AM, Katharina wrote: > Hi Chuck, > You are right, it works. > I had so many versions of Numpy, that in the end I lost track. > *Sorry* > > But now it works perfectly. > Thank you for the help. > > /Ingwer > > > ps: I know SciPY has its own Mail list, but it could be, that somebody > can answer my question. > Does SciPy works on Python3.1? > > Support for 3.1 will be in the next scipy release. It should be available in 4-6 weeks. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From Ingwer.Wurzel at gmx.net Sun Dec 12 14:48:39 2010 From: Ingwer.Wurzel at gmx.net (Katharina) Date: Sun, 12 Dec 2010 20:48:39 +0100 Subject: [Numpy-discussion] Numpy and Python3 In-Reply-To: References: <1291977198.4723.137.camel@Speranza> <1292071320.4874.26.camel@Speranza> <1292093610.4874.31.camel@Speranza> <1292096944.4874.43.camel@Speranza> <1292097515.4874.47.camel@Speranza> <1292177170.8349.25.camel@Speranza> Message-ID: <1292183319.8349.34.camel@Speranza> ok, thanks /Ingwer Am Sonntag, den 12.12.2010, 12:23 -0700 schrieb Charles R Harris: > > > On Sun, Dec 12, 2010 at 11:06 AM, Katharina > wrote: > Hi Chuck, > You are right, it works. > I had so many versions of Numpy, that in the end I lost track. > *Sorry* > > But now it works perfectly. > Thank you for the help. > > /Ingwer > > > ps: I know SciPY has its own Mail list, but it could be, that > somebody > can answer my question. > Does SciPy works on Python3.1? > > > Support for 3.1 will be in the next scipy release. It should be > available in 4-6 weeks. > > Chuck > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From kwgoodman at gmail.com Mon Dec 13 12:59:48 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Mon, 13 Dec 2010 09:59:48 -0800 Subject: [Numpy-discussion] Output dtype Message-ID: >From the np.median doc string: "If the input contains integers, or floats of smaller precision than 64, then the output data-type is float64." >> arr = np.array([[0,1,2,3,4,5]], dtype='float32') >> np.median(arr, axis=0).dtype dtype('float32') >> np.median(arr, axis=1).dtype dtype('float32') >> np.median(arr, axis=None).dtype dtype('float64') So the output doesn't agree with the doc string. What is the desired dtype of the accumulator and the output for when the input dtype is less than float64? Should it depend on axis? I'm trying to duplicate the behavior of np.median (and other numpy/scipy functions) in the Bottleneck package and am running into a few corner cases while unit testing. Here's another one: >> np.sum([np.nan]).dtype dtype('float64') >> np.nansum([1,np.nan]).dtype dtype('float64') >> np.nansum([np.nan]).dtype AttributeError: 'float' object has no attribute 'dtype' I just duplicated the numpy behavior for that one since it was easy to do. From morph at debian.org Mon Dec 13 14:51:56 2010 From: morph at debian.org (Sandro Tosi) Date: Mon, 13 Dec 2010 20:51:56 +0100 Subject: [Numpy-discussion] where to ship libnpymath.a ? Message-ID: Hi, in Debian we had a bug report[1] requesting to ship libnpymath.a . [1] http://bugs.debian.org/596987 Our python packaging tools doesn't handle .a files, so I'd like to ask you where exactly should I ship that file. In the build directory I have: $ find . -name "*.a" | xargs md5sum 4c2371b98c138756b0471a4fb364e0ae ./debian/tmp/usr/lib/python2.5/site-packages/numpy/core/lib/libnpymath.a fc6040f2bd4354cca8ef130abc5c8b17 ./debian/tmp/usr/lib/python2.6/dist-packages/numpy/core/lib/libnpymath.a 0c9870a2e5cf61669c92677d9b12c116 ./build/temp_d.linux-x86_64-2.5/libnpymath.a 47fdd29b85570ce80b1c616c6c02f41a ./build/temp.linux-x86_64-2.6-pydebug/libnpymath.a 4c2371b98c138756b0471a4fb364e0ae ./build/temp.linux-x86_64-2.5/libnpymath.a fc6040f2bd4354cca8ef130abc5c8b17 ./build/temp.linux-x86_64-2.6/libnpymath.a (the md5sum is to show that they are actually different between python version and/or debug build): the first 2 are in the "temporary" debian package preparation dir, while the other 4 are for 2.5/2.6 + normal/debug build. So, back to the original question: where should I put libnpymath.a to be useful for our users (main request: new scipy)? maybe in ..../numpy/core/lib/ ? Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From bsouthey at gmail.com Mon Dec 13 15:20:01 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 13 Dec 2010 14:20:01 -0600 Subject: [Numpy-discussion] Output dtype In-Reply-To: References: Message-ID: <4D067FF1.9090001@gmail.com> On 12/13/2010 11:59 AM, Keith Goodman wrote: > > From the np.median doc string: "If the input contains integers, or > floats of smaller precision than 64, then the output data-type is > float64." > >>> arr = np.array([[0,1,2,3,4,5]], dtype='float32') >>> np.median(arr, axis=0).dtype > dtype('float32') >>> np.median(arr, axis=1).dtype > dtype('float32') >>> np.median(arr, axis=None).dtype > dtype('float64') > > So the output doesn't agree with the doc string. > > What is the desired dtype of the accumulator and the output for when > the input dtype is less than float64? Should it depend on axis? > > I'm trying to duplicate the behavior of np.median (and other > numpy/scipy functions) in the Bottleneck package and am running into a > few corner cases while unit testing. > > Here's another one: > >>> np.sum([np.nan]).dtype > dtype('float64') >>> np.nansum([1,np.nan]).dtype > dtype('float64') >>> np.nansum([np.nan]).dtype > > AttributeError: 'float' object has no attribute 'dtype' > > I just duplicated the numpy behavior for that one since it was easy to do. > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion Unless something has changed since the docstring was written, this is probably an inherited 'bug' from np.mean() as the author expected that the docstring of mean was correct. For my 'old' 2.0 dev version: >>> np.mean( np.array([[0,1,2,3,4,5]], dtype='float32'), axis=1).dtype dtype('float32') >>> np.mean( np.array([[0,1,2,3,4,5]], dtype='float32')).dtype dtype('float64') Bruce From kwgoodman at gmail.com Mon Dec 13 15:32:25 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Mon, 13 Dec 2010 12:32:25 -0800 Subject: [Numpy-discussion] Output dtype In-Reply-To: <4D067FF1.9090001@gmail.com> References: <4D067FF1.9090001@gmail.com> Message-ID: On Mon, Dec 13, 2010 at 12:20 PM, Bruce Southey wrote: > On 12/13/2010 11:59 AM, Keith Goodman wrote: >> > From the np.median doc string: "If the input contains integers, or >> floats of smaller precision than 64, then the output data-type is >> float64." >> >>>> arr = np.array([[0,1,2,3,4,5]], dtype='float32') >>>> np.median(arr, axis=0).dtype >> ? ? dtype('float32') >>>> np.median(arr, axis=1).dtype >> ? ? dtype('float32') >>>> np.median(arr, axis=None).dtype >> ? ? dtype('float64') >> >> So the output doesn't agree with the doc string. >> >> What is the desired dtype of the accumulator and the output for when >> the input dtype is less than float64? Should it depend on axis? >> >> I'm trying to duplicate the behavior of np.median (and other >> numpy/scipy functions) in the Bottleneck package and am running into a >> few corner cases while unit testing. >> >> Here's another one: >> >>>> np.sum([np.nan]).dtype >> ? ? dtype('float64') >>>> np.nansum([1,np.nan]).dtype >> ? ? dtype('float64') >>>> np.nansum([np.nan]).dtype >> >> AttributeError: 'float' object has no attribute 'dtype' >> >> I just duplicated the numpy behavior for that one since it was easy to do. >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > Unless something has changed since the docstring was written, this is > probably an inherited 'bug' from np.mean() as the author expected that > the docstring of mean was correct. For my 'old' 2.0 dev version: > > ?>>> np.mean( np.array([[0,1,2,3,4,5]], dtype='float32'), axis=1).dtype > dtype('float32') > ?>>> np.mean( np.array([[0,1,2,3,4,5]], dtype='float32')).dtype > dtype('float64') Same issue with np.std and np.var. From pav at iki.fi Mon Dec 13 16:05:10 2010 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 13 Dec 2010 21:05:10 +0000 (UTC) Subject: [Numpy-discussion] where to ship libnpymath.a ? References: Message-ID: On Mon, 13 Dec 2010 20:51:56 +0100, Sandro Tosi wrote: [clip] > So, back to the original question: where should I put libnpymath.a to be > useful for our users (main request: new scipy)? maybe in > ..../numpy/core/lib/ ? In the place pointed to by npymath.ini, which is where "python setup.py install" puts it. The point is that numpy.distutils should be able to locate this library file for building extension modules that depend on it. -- Pauli Virtanen From Kathleen.M.Tacina at nasa.gov Mon Dec 13 16:39:20 2010 From: Kathleen.M.Tacina at nasa.gov (Kathleen M Tacina) Date: Mon, 13 Dec 2010 16:39:20 -0500 Subject: [Numpy-discussion] same name and title in structured arrays Message-ID: <1292276360.7055.31.camel@moses.grc.nasa.gov> Hi, I've been finding numpy/scipy/matplotlib a very useful tool for data analysis. However, a recent change has caused me some problems. Numpy used to allow the name and title of a column of a structured array or recarray to be the same (at least in the svn version as of early last winter). Now, it seems that this is not allowed; see below. Python 2.6.5 (r265:79063, Apr 27 2010, 12:20:23) [GCC 4.2.2] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> np.__version__ '2.0.0.dev-799179d' >>> data = np.ndarray((5,1),dtype=[(('T','T'),float)]) Traceback (most recent call last): File "", line 1, in ValueError: title already used as a name or title. >>> data = np.ndarray((5,1),dtype=[(('T at 0.25-in in F','T'),float)]) >>> Would it be possible to change the tests to allow the name and title to be the same for the same component? I can work around this new limitation for new data files, but I'm having trouble reading data files I created last winter. (But even for new stuff, it would be nice if it name=title was allowed. I like using both names and titles so that I can record interesting information like point location and units in the titles. Sometimes, though, there isn't anything else interesting to say about a column (e.g., 'point','date', 'time'), and I'd like to (1) have both a title and a name to be consistent with more interesting columns and (2) have the title be equal to the name.) Thanks for any help you can give me with this! Kathy Tacina -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Mon Dec 13 17:53:45 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Mon, 13 Dec 2010 14:53:45 -0800 Subject: [Numpy-discussion] Output dtype In-Reply-To: <4D067FF1.9090001@gmail.com> References: <4D067FF1.9090001@gmail.com> Message-ID: On Mon, Dec 13, 2010 at 12:20 PM, Bruce Southey wrote: > Unless something has changed since the docstring was written, this is > probably an inherited 'bug' from np.mean() as the author expected that > the docstring of mean was correct. For my 'old' 2.0 dev version: > > ?>>> np.mean( np.array([[0,1,2,3,4,5]], dtype='float32'), axis=1).dtype > dtype('float32') > ?>>> np.mean( np.array([[0,1,2,3,4,5]], dtype='float32')).dtype > dtype('float64') Are you saying the bug is in the doc string, the output, or both? I think it is both; I expect the second result above to be float32. From bsouthey at gmail.com Mon Dec 13 21:50:01 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 13 Dec 2010 20:50:01 -0600 Subject: [Numpy-discussion] Output dtype In-Reply-To: References: <4D067FF1.9090001@gmail.com> Message-ID: On Mon, Dec 13, 2010 at 4:53 PM, Keith Goodman wrote: > On Mon, Dec 13, 2010 at 12:20 PM, Bruce Southey wrote: > >> Unless something has changed since the docstring was written, this is >> probably an inherited 'bug' from np.mean() as the author expected that >> the docstring of mean was correct. For my 'old' 2.0 dev version: >> >> ?>>> np.mean( np.array([[0,1,2,3,4,5]], dtype='float32'), axis=1).dtype >> dtype('float32') >> ?>>> np.mean( np.array([[0,1,2,3,4,5]], dtype='float32')).dtype >> dtype('float64') > > Are you saying the bug is in the doc string, the output, or both? I > think it is both; I expect the second result above to be float32. This was a surprise to me as this 'misunderstanding' goes back to at least numpy 1.1. Both! The documentation is wrong when using axis argument. There is a bug because the output should be the same dtype for all possible axis values - which should be a ticket regardless. The recent half-float dtype or if users want the lower precision suggests that it might be a good time to ensure the 'correct' option is used (whatever that is). Bruce From morph at debian.org Tue Dec 14 02:31:46 2010 From: morph at debian.org (Sandro Tosi) Date: Tue, 14 Dec 2010 08:31:46 +0100 Subject: [Numpy-discussion] where to ship libnpymath.a ? In-Reply-To: References: Message-ID: Hi, On Mon, Dec 13, 2010 at 22:05, Pauli Virtanen wrote: > On Mon, 13 Dec 2010 20:51:56 +0100, Sandro Tosi wrote: > [clip] >> So, back to the original question: where should I put libnpymath.a to be >> useful for our users (main request: new scipy)? maybe in >> ..../numpy/core/lib/ ? > > In the place pointed to by npymath.ini, which is where > "python setup.py install" puts it. The point is that numpy.distutils > should be able to locate this library file for building extension modules > that depend on it. Yep, now I see: I think I've prepared the package shipping libnpymath.a in the right place, let's see :) Thanks a lot for your help! Cheers, -- Sandro Tosi (aka morph, morpheus, matrixhasu) My website: http://matrixhasu.altervista.org/ Me at Debian: http://wiki.debian.org/SandroTosi From mjanikas at esri.com Tue Dec 14 13:20:00 2010 From: mjanikas at esri.com (Mark Janikas) Date: Tue, 14 Dec 2010 10:20:00 -0800 Subject: [Numpy-discussion] Most efficient trim of arrays Message-ID: Hello All, I was wondering what the best way to trim an array based on some values I do not want.... I could use NUM.where or NUM.take... but let me give you an example: import numpy as NUM n = 100 (Length of my dataset) data = NUM.empty((n,), float) badRecords = [] for ind, record in enumerate(records): if record == someValueIDOntWant: badRecords.append(ind) else: data[ind] = record Now, I want to "trim" my array using badRecords. I guess I want to avoid copying. Any thoughts on the best way to do it? I do not want to use lists and then subsequently array the result as it is nice to pre-allocate the space. Thanks much, MJ -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue Dec 14 13:32:45 2010 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 14 Dec 2010 12:32:45 -0600 Subject: [Numpy-discussion] Most efficient trim of arrays In-Reply-To: References: Message-ID: On Tue, Dec 14, 2010 at 12:20, Mark Janikas wrote: > Hello All, > > I was wondering what the best way to trim an array based on some values I do > not want?.? I could use NUM.where or NUM.take? but let me give you an > example: > > import numpy as NUM > > n = 100 (Length of my dataset) > data = NUM.empty((n,), float) > badRecords = [] > for ind, record in enumerate(records): > ??????????????? if record == someValueIDOntWant: > ??????????????????????????????? badRecords.append(ind) > ??????????????? else: > ??????????????????????????????? data[ind] = record > > Now, I want to ?trim? my array using badRecords. ?I guess I want to avoid > copying.? Any thoughts on the best way to do it?? I do not want to use lists > and then subsequently array the result as it is nice to pre-allocate the > space. Don't fear the copy. Use boolean indexing. http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html#boolean -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From tmp50 at ukr.net Wed Dec 15 10:24:20 2010 From: tmp50 at ukr.net (Dmitrey) Date: Wed, 15 Dec 2010 17:24:20 +0200 Subject: [Numpy-discussion] new quarterly OpenOpt/FuncDesigner release 0.32 In-Reply-To: <4D085CD7.5080104@gmail.com> References: <4D085CD7.5080104@gmail.com> Message-ID: Hi all, I'm glad to inform you about new quarterly OpenOpt/FuncDesigner release (0.32): OpenOpt: * New class: LCP (and related solver) * New QP solver: qlcp * New NLP solver: sqlcp * New large-scale NSP (nonsmooth) solver gsubg. Currently it still requires lots of improvements (especially for constraints - their handling is very premature yet and often fails), but since the solver sometimes already works better than ipopt, algencan and other competitors it was tried with, I decided to include the one into the release. * Now SOCP can handle Ax <= b constraints (and bugfix for handling lb <= x <= ub has been committed) * Some other fixes and improvements > FuncDesigner: * Add new function removeAttachedConstraints * Add new oofuns min and max (their capabilities are quite restricted yet) * Systems of nonlinear equations: possibility to assign personal tolerance for an equation * Some fixes and improvements > > For more details see our forum entry > http://forum.openopt.org/viewtopic.php?id=325 > > Regards, D. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Mon Dec 20 07:15:17 2010 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 20 Dec 2010 13:15:17 +0100 Subject: [Numpy-discussion] same name and title in structured arrays In-Reply-To: <1292276360.7055.31.camel@moses.grc.nasa.gov> References: <1292276360.7055.31.camel@moses.grc.nasa.gov> Message-ID: <1292847317.2876.0.camel@talisman> On Mon, 13 Dec 2010 16:39:20 -0500, Kathleen M Tacina wrote: > I've been finding numpy/scipy/matplotlib a very useful tool for data > analysis. However, a recent change has caused me some problems. > > Numpy used to allow the name and title of a column of a structured array > or recarray to be the same (at least in the svn version as of early last > winter). Now, it seems that this is not allowed; see below. > > Python 2.6.5 (r265:79063, Apr 27 2010, 12:20:23) [GCC 4.2.2] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> import numpy as np >>>> np.__version__ > '2.0.0.dev-799179d' >>>> data = np.ndarray((5,1),dtype=[(('T','T'),float)]) > Traceback (most recent call last): > File "", line 1, in > ValueError: title already used as a name or title. This behavior was changed when fixing #1254: http://projects.scipy.org/numpy/ticket/1254 It seems that it will not be possible to just revert to the old behavior, since apparently allowing that was a design mistake. The data loading routines however could in principle be changed to handle duplicate title/field combinations. How did you save your data, with numpy.save/numpy.savez or via pickling? -- Pauli Virtanen From alan.isaac at gmail.com Mon Dec 20 11:28:47 2010 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 20 Dec 2010 11:28:47 -0500 Subject: [Numpy-discussion] sample without replacement Message-ID: <4D0F843F.3070705@gmail.com> I want to sample *without* replacement from a vector (as with Python's random.sample). I don't see a direct replacement for this, and I don't want to carry two PRNG's around. Is the best way something like this? permutation(myvector)[:samplesize] Thanks, Alan Isaac From qubax at gmx.at Sun Dec 19 10:40:13 2010 From: qubax at gmx.at (qubax at gmx.at) Date: Sun, 19 Dec 2010 16:40:13 +0100 Subject: [Numpy-discussion] Efficient Matrix-matrix product of hermitian matrices, zhemm (blas) and numpy Message-ID: <20101219154013.GA7960@tux.hotze.com> I need to calculate several products of matrices where at least one of them is always hermitian. The function zhemm (in blas, level 3) seems to directly do that in an efficient manner. However ... how can i access that function and dirctly apply it on numpy arrays? If you know alternatives that are equivalent or even faster, please let me know. Any help is highly appreciated. Q -- The king who needs to remind his people of his rank, is no king. A beggar's mistake harms no one but the beggar. A king's mistake, however, harms everyone but the king. Too often, the measure of power lies not in the number who obey your will, but in the number who suffer your stupidity. From fperez.net at gmail.com Fri Dec 17 09:16:13 2010 From: fperez.net at gmail.com (Fernando Perez) Date: Fri, 17 Dec 2010 19:46:13 +0530 Subject: [Numpy-discussion] Links to doc guidelines broken Message-ID: Howdy, In the ipython doc guide (and many other places) we point to the numpy coding guidelines (especially for documentation), but today while conducting a sprint at the Scipy India conference, I noticed this link is now dead: http://projects.scipy.org/numpy/wiki/CodingStyleGuidelines It seems the docs got moved over to github, which is fine: https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt but it would be really nice if whoever went through the Trac wiki deleting stuff could have left a link pointing to the new proper location of this document. For those who knew what they were looking for it's easy enough to find it againg with a bit of googling around, but newcomers who may be trying to read these guidelines and simply get Trac's version of a 404 are likely to be left confused. I realize that in moving to a new infrastructure broken links are hard to avoid, but for documents as widely used as the numpy coding guidelines, perhaps leaving a link to the new location in the old location would be a good idea... I fixed the page with a github link, but there may be other important pages needing a similar treatment, and I think as a policy it's generally a good idea to leave proper redirects when important pages are deleted. Thanks f From jsalvati at u.washington.edu Mon Dec 20 12:13:21 2010 From: jsalvati at u.washington.edu (John Salvatier) Date: Mon, 20 Dec 2010 09:13:21 -0800 Subject: [Numpy-discussion] sample without replacement In-Reply-To: <4D0F843F.3070705@gmail.com> References: <4D0F843F.3070705@gmail.com> Message-ID: I think this is not possible to do efficiently with just numpy. If you want to do this efficiently, I wrote a no-replacement sampler in Cython some time ago (below). I hearby release it to the public domain. ''' Created on Oct 24, 2009 http://stackoverflow.com/questions/311703/algorithm-for-sampling-without-replacement @author: johnsalvatier ''' from __future__ import division import numpy def random_no_replace(sampleSize, populationSize, numSamples): samples = numpy.zeros((numSamples, sampleSize),dtype=int) # Use Knuth's variable names cdef int n = sampleSize cdef int N = populationSize cdef i = 0 cdef int t = 0 # total input records dealt with cdef int m = 0 # number of items selected so far cdef double u while i < numSamples: t = 0 m = 0 while m < n : u = numpy.random.uniform() # call a uniform(0,1) random number generator if (N - t)*u >= n - m : t += 1 else: samples[i,m] = t t += 1 m += 1 i += 1 return samples On Mon, Dec 20, 2010 at 8:28 AM, Alan G Isaac wrote: > I want to sample *without* replacement from a vector > (as with Python's random.sample). I don't see a direct > replacement for this, and I don't want to carry two > PRNG's around. Is the best way something like this? > > permutation(myvector)[:samplesize] > > Thanks, > Alan Isaac > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jpscipy at gmail.com Mon Dec 20 15:25:05 2010 From: jpscipy at gmail.com (Justin Peel) Date: Mon, 20 Dec 2010 13:25:05 -0700 Subject: [Numpy-discussion] Reversing an array in-place Message-ID: I noticed that there is currently no way to reverse a numpy array in-place. The current way to reverse a numpy array is using slicing, ala arr[::-1]. This is okay for small matrices, but for really large ones, this can be prohibitive. Not only that, but an in-place reverse is much faster than slicing. It seems like a reverse method could be added to arrays that would reverse the array along a given axis fairly easily. Is there any opposition to this? Also, there is consideration for simply marking a given axis as being reverse similar to how transposes are taken. However, I see this as a problem for a method like reshape to deal with and therefore think that it is better to just add a reverse method. What are your opinions? I'm quite willing to make such a method if it will be accepted. Justin Peel From jpscipy at gmail.com Mon Dec 20 15:25:29 2010 From: jpscipy at gmail.com (Justin Peel) Date: Mon, 20 Dec 2010 13:25:29 -0700 Subject: [Numpy-discussion] Short circuiting the all() and any() methods/functions Message-ID: It has come to my attention that the all() and any() methods/functions do not short circuit. It takes nearly as much time to call any() on an array which has 1 as the first entry as it does to call it on an array of the same size full of zeros. The cause of the problem is that all() and any() just call reduce() with the appropriate operator. Is anyone opposed to changing the implementations of these functions so that they short-circuit? By the way, Python already short circuits all() and any() correctly so it certainly makes sense to enact this change. I'm willing to head this up if there isn't any opposition to it. Justin Peel From matt.gregory at oregonstate.edu Mon Dec 20 15:44:17 2010 From: matt.gregory at oregonstate.edu (Matt Gregory) Date: Mon, 20 Dec 2010 12:44:17 -0800 Subject: [Numpy-discussion] creating zonal statistics from two arrays In-Reply-To: References: <1D673F86DDA00841A1216F04D1CE70D6426800037D@EXCH2.nws.oregonstate.edu> Message-ID: On 12/8/2010 9:48 AM, josef.pktd at gmail.com wrote: > Just a thought since I'm not doing spatial statistics. > > If you can create (integer) labels that assigns each point to a zone, > then you can treat it essentially as a 1d grouped data, and you could > use np.bincount to calculate some statistics, or alternatively > scipy.ndimage.measurements for some additional statistics. > > This would avoid any python loop, but require a full label array. Josef, The measurements module did the trick; thanks for the pointer. I just stumbled across a very similar thread on the scipy listserv that you answered basically the same question with some nice code (sorry for the redundancy): http://mail.scipy.org/pipermail/scipy-user/2009-February/019850.html BTW, the OP on that thread (Jose Gomez-Dans) has a script out there for doing just this type of operation that I was after: http://sites.google.com/site/spatialpython/zonal-statistics thanks, matt From charlesr.harris at gmail.com Mon Dec 20 16:12:22 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 20 Dec 2010 14:12:22 -0700 Subject: [Numpy-discussion] Short circuiting the all() and any() methods/functions In-Reply-To: References: Message-ID: On Mon, Dec 20, 2010 at 1:25 PM, Justin Peel wrote: > It has come to my attention that the all() and any() methods/functions > do not short circuit. It takes nearly as much time to call any() on an > array which has 1 as the first entry as it does to call it on an array > of the same size full of zeros. > > The cause of the problem is that all() and any() just call reduce() > with the appropriate operator. Is anyone opposed to changing the > implementations of these functions so that they short-circuit? > > Recent version of reduce do short circuit. What version of numpy are you using? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Mon Dec 20 16:15:25 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 20 Dec 2010 14:15:25 -0700 Subject: [Numpy-discussion] Reversing an array in-place In-Reply-To: References: Message-ID: On Mon, Dec 20, 2010 at 1:25 PM, Justin Peel wrote: > I noticed that there is currently no way to reverse a numpy array > in-place. The current way to reverse a numpy array is using slicing, > ala arr[::-1]. This is okay for small matrices, but for really large > ones, this can be prohibitive. Not only that, but an in-place reverse > is much faster than slicing. It seems like a reverse method could be > added to arrays that would reverse the array along a given axis fairly > easily. Is there any opposition to this? > > The reversed matrix is a view, no copyihg is done. It is even faster than an inplace reversal. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsalvati at u.washington.edu Mon Dec 20 16:42:26 2010 From: jsalvati at u.washington.edu (John Salvatier) Date: Mon, 20 Dec 2010 13:42:26 -0800 Subject: [Numpy-discussion] Giving numpy the ability to multi-iterate excluding an axis Message-ID: A while ago, I asked a whether it was possible to multi-iterate over several ndarrays but exclude a certain axis( http://www.mail-archive.com/numpy-discussion at scipy.org/msg29204.html), sort of a combination of PyArray_IterAllButAxis and PyArray_MultiIterNew. My goal was to allow creation of relatively complex ufuncs that can allow reduction or directionally dependent computation and still use broadcasting (for example a moving averaging ufunc that can have changing averaging parameters). I didn't get any solutions, which I take to mean that no one knew how to do this. I am thinking about trying to make a numpy patch with this functionality, and I have some questions: 1) How difficult would this kind of task be for someone with non-expert C knowledge and good numpy knowledge? 2) Does anyone have advice on how to do this kind of thing? Best Regards, John -------------- next part -------------- An HTML attachment was scrubbed... URL: From jpscipy at gmail.com Mon Dec 20 17:32:28 2010 From: jpscipy at gmail.com (Justin Peel) Date: Mon, 20 Dec 2010 15:32:28 -0700 Subject: [Numpy-discussion] Short circuiting the all() and any() methods/functions In-Reply-To: References: Message-ID: I'm using version 2.0.0.dev8716, which should be new enough I would think. Let me show you what makes me think that there isn't short-circuiting going on. I'll do two timeit's from the command line: $ python -m timeit -s 'import numpy as np; x = np.ones(200000)' 'x.all()' 100 loops, best of 3: 3.87 msec per loop $ python -m timeit -s 'import numpy as np; x = np.ones(200000); x[0] = 0' 'x.all()' 100 loops, best of 3: 2.76 msec per loop You can try different sizes for the arrays if you like, but the ratio of the times seems to hold pretty well. I would think that the second statement would be much, much faster than the first. Instead, it is only about 29% faster. I'm guessing that this speed isn't so much from short-circuiting as that the logical AND operator is faster when the first argument is 0 (the second argument doesn't need to be checked). What do you think? On Mon, Dec 20, 2010 at 2:12 PM, Charles R Harris wrote: > > > On Mon, Dec 20, 2010 at 1:25 PM, Justin Peel wrote: >> >> It has come to my attention that the all() and any() methods/functions >> do not short circuit. It takes nearly as much time to call any() on an >> array which has 1 as the first entry as it does to call it on an array >> of the same size full of zeros. >> >> The cause of the problem is that all() and any() just call reduce() >> with the appropriate operator. Is anyone opposed to changing the >> implementations of these functions so that they short-circuit? >> > > Recent version of reduce do short circuit. What version of numpy are you > using? > > Chuck > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From jpscipy at gmail.com Mon Dec 20 17:36:17 2010 From: jpscipy at gmail.com (Justin Peel) Date: Mon, 20 Dec 2010 15:36:17 -0700 Subject: [Numpy-discussion] Reversing an array in-place In-Reply-To: References: Message-ID: Oh, you're quite right. I should have looked more closely into this. Thanks for the reply. On Mon, Dec 20, 2010 at 2:15 PM, Charles R Harris wrote: > > > On Mon, Dec 20, 2010 at 1:25 PM, Justin Peel wrote: >> >> I noticed that there is currently no way to reverse a numpy array >> in-place. The current way to reverse a numpy array is using slicing, >> ala arr[::-1]. This is okay for small matrices, but for really large >> ones, this can be prohibitive. Not only that, but an in-place reverse >> is much faster than slicing. It seems like a reverse method could be >> added to arrays that would reverse the array along a given axis fairly >> easily. Is there any opposition to this? >> > > The reversed matrix is a view,? no copyihg is done. It is even faster than > an inplace reversal. > > Chuck > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From pav at iki.fi Mon Dec 20 19:15:08 2010 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 21 Dec 2010 01:15:08 +0100 Subject: [Numpy-discussion] Short circuiting the all() and any() methods/functions In-Reply-To: References: Message-ID: <1292890508.3169.3.camel@Obelisk> ma, 2010-12-20 kello 15:32 -0700, Justin Peel kirjoitti: > I'm using version 2.0.0.dev8716, which should be new enough I would > think. Let me show you what makes me think that there isn't > short-circuiting going on. > > I'll do two timeit's from the command line: > > $ python -m timeit -s 'import numpy as np; x = np.ones(200000)' 'x.all()' > 100 loops, best of 3: 3.87 msec per loop > $ python -m timeit -s 'import numpy as np; x = np.ones(200000); x[0] = > 0' 'x.all()' > 100 loops, best of 3: 2.76 msec per loop The short-circuit is made only for bool arrays. $ python -m timeit -s 'import numpy as np; x = np.ones(200000, dtype=bool)' 'x.all()' 1000 loops, best of 3: 779 usec per loop $ python -m timeit -s 'import numpy as np; x = np.ones(200000, dtype=bool); x[0] = 0' 'x.all()' 100000 loops, best of 3: 3.12 usec per loop Could be easily generalized to all types, though, apart from maybe handling the thruth value of NaN correctly. -- Pauli Virtanen From ralf.gommers at googlemail.com Mon Dec 20 19:39:44 2010 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Tue, 21 Dec 2010 08:39:44 +0800 Subject: [Numpy-discussion] Links to doc guidelines broken In-Reply-To: References: Message-ID: On Fri, Dec 17, 2010 at 10:16 PM, Fernando Perez wrote: > Howdy, > > In the ipython doc guide (and many other places) we point to the numpy > coding guidelines (especially for documentation), but today while > conducting a sprint at the Scipy India conference, I noticed this > link is now dead: > > http://projects.scipy.org/numpy/wiki/CodingStyleGuidelines > > It seems the docs got moved over to github, which is fine: > > https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt > > but it would be really nice if whoever went through the Trac wiki > deleting stuff could have left a link pointing to the new proper > location of this document. For those who knew what they were looking > for it's easy enough to find it againg with a bit of googling around, > but newcomers who may be trying to read these guidelines and simply > get Trac's version of a 404 are likely to be left confused. > I realize that in moving to a new infrastructure broken links are hard > to avoid, but for documents as widely used as the numpy coding > guidelines, perhaps leaving a link to the new location in the old > location would be a good idea... > > That's my mistake, sorry. I changed the front page links and cleaned up the rest. I'll check for other pages and put them back if necessary. Just noticed that for links to the svn repo we have the opposite problem BTW, the pages still exist with no warning that the content is outdated. Ralf > I fixed the page with a github link, but there may be other important > pages needing a similar treatment, and I think as a policy it's > generally a good idea to leave proper redirects when important pages > are deleted. > > > Thanks > > f > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon Dec 20 21:41:16 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 20 Dec 2010 21:41:16 -0500 Subject: [Numpy-discussion] sample without replacement In-Reply-To: <4D0F843F.3070705@gmail.com> References: <4D0F843F.3070705@gmail.com> Message-ID: On Mon, Dec 20, 2010 at 11:28 AM, Alan G Isaac wrote: > I want to sample *without* replacement from a vector > (as with Python's random.sample). ?I don't see a direct > replacement for this, and I don't want to carry two > PRNG's around. ?Is the best way something like ?this? > > ? ? ? ?permutation(myvector)[:samplesize] python has it in random sample( population, k) Return a k length list of unique elements chosen from the population sequence. Used for random sampling without replacement. New in version 2.3 Josef > > Thanks, > Alan Isaac > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From alan.isaac at gmail.com Mon Dec 20 22:19:21 2010 From: alan.isaac at gmail.com (Alan G Isaac) Date: Mon, 20 Dec 2010 22:19:21 -0500 Subject: [Numpy-discussion] sample without replacement In-Reply-To: References: <4D0F843F.3070705@gmail.com> Message-ID: <4D101CB9.2090907@gmail.com> On 12/20/2010 9:41 PM, josef.pktd at gmail.com wrote: > python has it in random > > sample( population, k) Yes, I mentioned this in my original post: http://www.mail-archive.com/numpy-discussion at scipy.org/msg29324.html But good simulation practice is perhaps to seed a simulation specific random number generator (not just rely on a global), and I don't want to pass around two different instances. So I want to get this functionality from numpy.random. Which reminds me of another question. numpy.random.RandomState accepts an int array as a seed: what is the *intended* use? Thanks, Alan From josef.pktd at gmail.com Mon Dec 20 22:49:25 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 20 Dec 2010 22:49:25 -0500 Subject: [Numpy-discussion] sample without replacement In-Reply-To: <4D101CB9.2090907@gmail.com> References: <4D0F843F.3070705@gmail.com> <4D101CB9.2090907@gmail.com> Message-ID: On Mon, Dec 20, 2010 at 10:19 PM, Alan G Isaac wrote: > On 12/20/2010 9:41 PM, josef.pktd at gmail.com wrote: >> python has it in random >> >> sample( population, k) > > > Yes, I mentioned this in my original post: > http://www.mail-archive.com/numpy-discussion at scipy.org/msg29324.html > > But good simulation practice is perhaps to seed > a simulation specific random number generator > (not just rely on a global), and I don't want > to pass around two different instances. > So I want to get this functionality from numpy.random. Sorry, I was reading to fast, and I might be tired. What's the difference between a numpy Random and a python random.Random instance of separate states of the random number generators? Josef > > Which reminds me of another question. > numpy.random.RandomState accepts an int array as a seed: > what is the *intended* use? > > Thanks, > Alan > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From alan.isaac at gmail.com Tue Dec 21 08:25:33 2010 From: alan.isaac at gmail.com (Alan G Isaac) Date: Tue, 21 Dec 2010 08:25:33 -0500 Subject: [Numpy-discussion] sample without replacement In-Reply-To: References: <4D0F843F.3070705@gmail.com> <4D101CB9.2090907@gmail.com> Message-ID: <4D10AACD.5020907@gmail.com> On 12/20/2010 10:49 PM, josef.pktd at gmail.com wrote: > What's the difference between a numpy Random and a python > random.Random instance of separate states of the random number > generators? Sorry, I don't understand the question. The difference for my use is that a np.RandomState instance provides access to a different set of methods, which unfortunately does not include an equivalent to random.Random's sample method but which does include others I need. Would it be appropriate to request that an analog to random.sample be added to numpy.random? (It might sample only a range, since producing indexes would provide the base functionality.) Or is this functionality absent intentionally? Alan From aarchiba at physics.mcgill.ca Tue Dec 21 10:39:01 2010 From: aarchiba at physics.mcgill.ca (Anne Archibald) Date: Tue, 21 Dec 2010 10:39:01 -0500 Subject: [Numpy-discussion] sample without replacement In-Reply-To: References: <4D0F843F.3070705@gmail.com> Message-ID: I know this question came up on the mailing list some time ago (19/09/2008), and the conclusion was that yes, you can do it more or less efficiently in pure python; the trick is to use two different methods. If your sample is more than, say, a quarter the size of the set you're drawing from, you permute the set and take the first few. If your sample is smaller, you draw with replacement, then redraw the duplicates, and repeat until there aren't any more duplicates. Since you only do this when your sample is much smaller than the population you don't need to repeat many times. Here's the code I posted to the previous discussion (not tested this time around) with comments: ''' def choose_without_replacement(m,n,repeats=None): """Choose n nonnegative integers less than m without replacement Returns an array of shape n, or (n,repeats). """ if repeats is None: r = 1 else: r = repeats if n>m: raise ValueError, "Cannot find %d nonnegative integers less than %d" % (n,m) if n>m/2: res = np.sort(np.random.rand(m,r).argsort(axis=0)[:n,:],axis=0) else: res = np.random.random_integers(m,size=(n,r)) while True: res = np.sort(res,axis=0) w = np.nonzero(np.diff(res,axis=0)==0) nr = len(w[0]) if nr==0: break res[w] = np.random.random_integers(m,size=nr) if repeats is None: return res[:,0] else: return res For really large values of repeats it does too much sorting; I didn't have the energy to make it pull all the ones with repeats to the beginning so that only they need to be re-sorted the next time through. Still, the expected number of trips through the while loop grows only logarithmically with repeats, so it shouldn't be too bad. ''' Anne On 20 December 2010 12:13, John Salvatier wrote: > I think this is not possible to do efficiently with just numpy. If you want > to do this efficiently, I wrote a no-replacement sampler in Cython some time > ago (below). I hearby release it to the public domain. > > ''' > > Created on Oct 24, 2009 > http://stackoverflow.com/questions/311703/algorithm-for-sampling-without-replacement > @author: johnsalvatier > > ''' > > from __future__ import division > > import numpy > > def random_no_replace(sampleSize, populationSize, numSamples): > > > > ?? ?samples? = numpy.zeros((numSamples, sampleSize),dtype=int) > > > > ?? ?# Use Knuth's variable names > > ?? ?cdef int n = sampleSize > > ?? ?cdef int N = populationSize > > ?? ?cdef i = 0 > > ?? ?cdef int t = 0 # total input records dealt with > > ?? ?cdef int m = 0 # number of items selected so far > > ?? ?cdef double u > > ?? ?while i < numSamples: > > ?? ? ? ?t = 0 > > ?? ? ? ?m = 0 > > ?? ? ? ?while m < n : > > > > ?? ? ? ? ? ?u = numpy.random.uniform() # call a uniform(0,1) random number > generator > > ?? ? ? ? ? ?if? (N - t)*u >= n - m : > > > > ?? ? ? ? ? ? ? ?t += 1 > > > > ?? ? ? ? ? ?else: > > > > ?? ? ? ? ? ? ? ?samples[i,m] = t > > ?? ? ? ? ? ? ? ?t += 1 > > ?? ? ? ? ? ? ? ?m += 1 > > > > ?? ? ? ?i += 1 > > > > ?? ?return samples > > > > On Mon, Dec 20, 2010 at 8:28 AM, Alan G Isaac wrote: >> >> I want to sample *without* replacement from a vector >> (as with Python's random.sample). ?I don't see a direct >> replacement for this, and I don't want to carry two >> PRNG's around. ?Is the best way something like ?this? >> >> ? ? ? ?permutation(myvector)[:samplesize] >> >> Thanks, >> Alan Isaac >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From alan.isaac at gmail.com Tue Dec 21 13:53:47 2010 From: alan.isaac at gmail.com (Alan G Isaac) Date: Tue, 21 Dec 2010 13:53:47 -0500 Subject: [Numpy-discussion] bincount question Message-ID: <4D10F7BB.3000905@gmail.com> :: >>> np.bincount([]) Traceback (most recent call last): File "", line 1, in ValueError: The first argument cannot be empty. Why not? (I.e., why isn't an empty array the right answer?) Thanks, Alan Isaac From alan.isaac at gmail.com Tue Dec 21 14:00:26 2010 From: alan.isaac at gmail.com (Alan G Isaac) Date: Tue, 21 Dec 2010 14:00:26 -0500 Subject: [Numpy-discussion] bincount and generators In-Reply-To: References: <4D0F843F.3070705@gmail.com> Message-ID: <4D10F94A.60209@gmail.com> bincount does not currently allow a generator as an argument. I'm wondering if it is considered too costly to extend it to allow this. (Motivation: I'm counting based on an attribute of a large number of objects, and I don't need a list of the data.) Thanks, Alan Isaac From sturla at molden.no Tue Dec 21 14:33:49 2010 From: sturla at molden.no (Sturla Molden) Date: Tue, 21 Dec 2010 20:33:49 +0100 Subject: [Numpy-discussion] sample without replacement In-Reply-To: References: <4D0F843F.3070705@gmail.com> Message-ID: <7732552513b8620ba14bfbf82404a5e3.squirrel@webmail.uio.no> We often need to generate more than one such sample from an array, e.g. for permutation tests. If we shuffle an array x of size N and use x[:M] as a random sample "without replacement", we just need to put them back randomly to get the next sample (cf. Fisher-Yates shuffle). That way we get O(M) amortized complexity for each sample of size M. Only the first sample will have complexity O(N). Sturla > I know this question came up on the mailing list some time ago > (19/09/2008), and the conclusion was that yes, you can do it more or > less efficiently in pure python; the trick is to use two different > methods. If your sample is more than, say, a quarter the size of the > set you're drawing from, you permute the set and take the first few. > If your sample is smaller, you draw with replacement, then redraw the > duplicates, and repeat until there aren't any more duplicates. Since > you only do this when your sample is much smaller than the population > you don't need to repeat many times. > > Here's the code I posted to the previous discussion (not tested this > time around) with comments: > > ''' > def choose_without_replacement(m,n,repeats=None): > """Choose n nonnegative integers less than m without replacement > > Returns an array of shape n, or (n,repeats). > """ > if repeats is None: > r = 1 > else: > r = repeats > if n>m: > raise ValueError, "Cannot find %d nonnegative integers less > than %d" % (n,m) > if n>m/2: > res = np.sort(np.random.rand(m,r).argsort(axis=0)[:n,:],axis=0) > else: > res = np.random.random_integers(m,size=(n,r)) > while True: > res = np.sort(res,axis=0) > w = np.nonzero(np.diff(res,axis=0)==0) > nr = len(w[0]) > if nr==0: > break > res[w] = np.random.random_integers(m,size=nr) > > if repeats is None: > return res[:,0] > else: > return res > > For really large values of repeats it does too much sorting; I didn't > have the energy to make it pull all the ones with repeats to the > beginning so that only they need to be re-sorted the next time > through. Still, the expected number of trips through the while loop > grows only logarithmically with repeats, so it shouldn't be too bad. > ''' > > Anne > > On 20 December 2010 12:13, John Salvatier > wrote: >> I think this is not possible to do efficiently with just numpy. If you >> want >> to do this efficiently, I wrote a no-replacement sampler in Cython some >> time >> ago (below). I hearby release it to the public domain. >> >> ''' >> >> Created on Oct 24, 2009 >> http://stackoverflow.com/questions/311703/algorithm-for-sampling-without-replacement >> @author: johnsalvatier >> >> ''' >> >> from __future__ import division >> >> import numpy >> >> def random_no_replace(sampleSize, populationSize, numSamples): >> >> >> >> ?? ?samples? = numpy.zeros((numSamples, sampleSize),dtype=int) >> >> >> >> ?? ?# Use Knuth's variable names >> >> ?? ?cdef int n = sampleSize >> >> ?? ?cdef int N = populationSize >> >> ?? ?cdef i = 0 >> >> ?? ?cdef int t = 0 # total input records dealt with >> >> ?? ?cdef int m = 0 # number of items selected so far >> >> ?? ?cdef double u >> >> ?? ?while i < numSamples: >> >> ?? ? ? ?t = 0 >> >> ?? ? ? ?m = 0 >> >> ?? ? ? ?while m < n : >> >> >> >> ?? ? ? ? ? ?u = numpy.random.uniform() # call a uniform(0,1) random >> number >> generator >> >> ?? ? ? ? ? ?if? (N - t)*u >= n - m : >> >> >> >> ?? ? ? ? ? ? ? ?t += 1 >> >> >> >> ?? ? ? ? ? ?else: >> >> >> >> ?? ? ? ? ? ? ? ?samples[i,m] = t >> >> ?? ? ? ? ? ? ? ?t += 1 >> >> ?? ? ? ? ? ? ? ?m += 1 >> >> >> >> ?? ? ? ?i += 1 >> >> >> >> ?? ?return samples >> >> >> >> On Mon, Dec 20, 2010 at 8:28 AM, Alan G Isaac >> wrote: >>> >>> I want to sample *without* replacement from a vector >>> (as with Python's random.sample). ?I don't see a direct >>> replacement for this, and I don't want to carry two >>> PRNG's around. ?Is the best way something like ?this? >>> >>> ? ? ? ?permutation(myvector)[:samplesize] >>> >>> Thanks, >>> Alan Isaac >>> _______________________________________________ >>> NumPy-Discussion mailing list >>> NumPy-Discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From sturla at molden.no Tue Dec 21 14:51:18 2010 From: sturla at molden.no (Sturla Molden) Date: Tue, 21 Dec 2010 20:51:18 +0100 Subject: [Numpy-discussion] Reversing an array in-place In-Reply-To: References: Message-ID: <79490512fd8661f01cf9675aa235d9bb.squirrel@webmail.uio.no> Chuck wrote: > The reversed matrix is a view, no copyihg is done. It is even faster than > an inplace reversal. This is why I love NumPy. In C, Fortran or Matlab most programmers would probably form the reversed array. In NumPy we just change some metainformation (data pointer and strides) behind the scenes. It cannot be done more efficiently than that. Sturla From robert.kern at gmail.com Tue Dec 21 14:53:11 2010 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 21 Dec 2010 13:53:11 -0600 Subject: [Numpy-discussion] sample without replacement In-Reply-To: <4D0F843F.3070705@gmail.com> References: <4D0F843F.3070705@gmail.com> Message-ID: On Mon, Dec 20, 2010 at 10:28, Alan G Isaac wrote: > I want to sample *without* replacement from a vector > (as with Python's random.sample). ?I don't see a direct > replacement for this, and I don't want to carry two > PRNG's around. ?Is the best way something like ?this? > > ? ? ? ?permutation(myvector)[:samplesize] For one of my personal projects, I copied over the mtrand package and added a method to RandomState for doing this kind of thing using reservoir sampling. http://en.wikipedia.org/wiki/Reservoir_sampling def subset_reservoir(self, long nselected, long ntotal, object size=None): """ Sample a given number integers from the set [0, ntotal) without replacement using a reservoir algorithm. Parameters ---------- nselected : int The number of integers to sample. ntotal : int The size of the set to sample from. size : int, sequence of ints, or None The number of subsets to sample or a shape tuple. An axis of the length nselected will be appended to a shape. Returns ------- out : ndarray The sampled subsets. The order of the items is not necessarily random. Use a slice from the result of permutation() if you need the order of the items to be randomized. """ cdef long total_size, length, i, j, u cdef cnp.ndarray[cnp.int_t, ndim=2] out if size is None: shape = (nselected,) total_size = nselected length = 1 elif isinstance(size, int): shape = (size, nselected) total_size = size * nselected length = size else: shape = size + (nselected,) length = 1 for i from 0 <= i < len(size): length *= size[i] total_size = length * nselected out = np.empty((length, nselected), dtype=int) for i from 0 <= i < length: for j from 0 <= j < nselected: out[i,j] = j for j from nselected <= j < ntotal: u = rk_interval(j+1, self.internal_state) if u < nselected: out[i,u] = j return out.reshape(shape) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From mwwiebe at gmail.com Tue Dec 21 19:53:55 2010 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 21 Dec 2010 16:53:55 -0800 Subject: [Numpy-discussion] NEP for faster ufuncs Message-ID: Hello NumPy-ers, After some performance analysis, I've designed and implemented a new iterator designed to speed up ufuncs and allow for easier multi-dimensional iteration. The new code is fairly large, but works quite well already. If some people could read the NEP and give some feedback, that would be great! Here's a link: https://github.com/m-paradox/numpy/blob/mw_neps/doc/neps/new-iterator-ufunc.rst I would also love it if someone could try building the code and play around with it a bit. The github branch is here: https://github.com/m-paradox/numpy/tree/new_iterator To give a taste of the iterator's functionality, below is an example from the NEP for how to implement a "Lambda UFunc." With just a few lines of code, it's possible to replicate something similar to the numexpr library (numexpr still gets a bigger speedup, though). In the example expression I chose, execution time went from 138ms to 61ms. Hopefully this is a good Christmas present for NumPy. :) Cheers, Mark Here is the definition of the ``luf`` function.:: def luf(lamdaexpr, *args, **kwargs): """Lambda UFunc e.g. c = luf(lambda i,j:i+j, a, b, order='K', casting='safe', buffersize=8192) c = np.empty(...) luf(lambda i,j:i+j, a, b, out=c, order='K', casting='safe', buffersize=8192) """ nargs = len(args) op = args + (kwargs.get('out',None),) it = np.newiter(op, ['buffered','no_inner_iteration'], [['readonly','nbo_aligned']]*nargs + [['writeonly','allocate','no_broadcast']], order=kwargs.get('order','K'), casting=kwargs.get('casting','safe'), buffersize=kwargs.get('buffersize',0)) while not it.finished: it[-1] = lamdaexpr(*it[:-1]) it.iternext() return it.operands[-1] Then, by using ``luf`` instead of straight Python expressions, we can gain some performance from better cache behavior.:: In [2]: a = np.random.random((50,50,50,10)) In [3]: b = np.random.random((50,50,1,10)) In [4]: c = np.random.random((50,50,50,1)) In [5]: timeit 3*a+b-(a/c) 1 loops, best of 3: 138 ms per loop In [6]: timeit luf(lambda a,b,c:3*a+b-(a/c), a, b, c) 10 loops, best of 3: 60.9 ms per loop In [7]: np.all(3*a+b-(a/c) == luf(lambda a,b,c:3*a+b-(a/c), a, b, c)) Out[7]: True -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsalvati at u.washington.edu Tue Dec 21 19:59:15 2010 From: jsalvati at u.washington.edu (John Salvatier) Date: Tue, 21 Dec 2010 16:59:15 -0800 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: References: Message-ID: That is an amazing christmas present. On Tue, Dec 21, 2010 at 4:53 PM, Mark Wiebe wrote: > Hello NumPy-ers, > > After some performance analysis, I've designed and implemented a new > iterator designed to speed up ufuncs and allow for easier multi-dimensional > iteration. The new code is fairly large, but works quite well already. If > some people could read the NEP and give some feedback, that would be great! > Here's a link: > > > https://github.com/m-paradox/numpy/blob/mw_neps/doc/neps/new-iterator-ufunc.rst > > I would also love it if someone could try building the code and play around > with it a bit. The github branch is here: > > https://github.com/m-paradox/numpy/tree/new_iterator > > To give a taste of the iterator's functionality, below is an example from > the NEP for how to implement a "Lambda UFunc." With just a few lines of > code, it's possible to replicate something similar to the numexpr library > (numexpr still gets a bigger speedup, though). In the example expression I > chose, execution time went from 138ms to 61ms. > > Hopefully this is a good Christmas present for NumPy. :) > > Cheers, > Mark > > Here is the definition of the ``luf`` function.:: > > def luf(lamdaexpr, *args, **kwargs): > """Lambda UFunc > > e.g. > c = luf(lambda i,j:i+j, a, b, order='K', > casting='safe', buffersize=8192) > > c = np.empty(...) > luf(lambda i,j:i+j, a, b, out=c, order='K', > casting='safe', buffersize=8192) > """ > > nargs = len(args) > op = args + (kwargs.get('out',None),) > it = np.newiter(op, ['buffered','no_inner_iteration'], > [['readonly','nbo_aligned']]*nargs + > [['writeonly','allocate','no_broadcast']], > order=kwargs.get('order','K'), > casting=kwargs.get('casting','safe'), > buffersize=kwargs.get('buffersize',0)) > while not it.finished: > it[-1] = lamdaexpr(*it[:-1]) > it.iternext() > > return it.operands[-1] > > Then, by using ``luf`` instead of straight Python expressions, we > can gain some performance from better cache behavior.:: > > In [2]: a = np.random.random((50,50,50,10)) > In [3]: b = np.random.random((50,50,1,10)) > In [4]: c = np.random.random((50,50,50,1)) > > In [5]: timeit 3*a+b-(a/c) > 1 loops, best of 3: 138 ms per loop > > In [6]: timeit luf(lambda a,b,c:3*a+b-(a/c), a, b, c) > 10 loops, best of 3: 60.9 ms per loop > > In [7]: np.all(3*a+b-(a/c) == luf(lambda a,b,c:3*a+b-(a/c), a, b, c)) > Out[7]: True > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Tue Dec 21 20:12:15 2010 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 21 Dec 2010 17:12:15 -0800 Subject: [Numpy-discussion] Giving numpy the ability to multi-iterate excluding an axis In-Reply-To: References: Message-ID: On Mon, Dec 20, 2010 at 1:42 PM, John Salvatier wrote: > A while ago, I asked a whether it was possible to multi-iterate over > several ndarrays but exclude a certain axis( > http://www.mail-archive.com/numpy-discussion at scipy.org/msg29204.html), > sort of a combination of PyArray_IterAllButAxis and PyArray_MultiIterNew. My > goal was to allow creation of relatively complex ufuncs that can allow > reduction or directionally dependent computation and still use broadcasting > (for example a moving averaging ufunc that can have changing averaging > parameters). I didn't get any solutions, which I take to mean that no one > knew how to do this. > > I am thinking about trying to make a numpy patch with this functionality, > and I have some questions: 1) How difficult would this kind of task be for > someone with non-expert C knowledge and good numpy knowledge? 2) Does anyone > have advice on how to do this kind of thing? > You may be able to do what you would like with the new iterator I've written. In particular, it supports nesting multiple iterators by providing either pointers or offsets, and allowing you to specify any subset of the axes to iterate. Here's how the code to do this in a simple 3D case might look, for making axis 1 the inner loop: PyArrayObject *op[2] = {a,b}; npy_intp axes_outer[2] = {0,2}}; npy_intp *op_axes[2]; npy_intp axis_inner = 1; npy_int32 flags[2] = {NPY_ITER_READONLY, NPY_ITER_READONLY}; NpyIter *outer, *inner; NpyIter_IterNext_Fn oiternext, iiternext; npy_intp *ooffsets; char **idataptrs; op_axes[0] = op_axes[1] = axes_outer; outer = NpyIter_MultiNew(2, op, NPY_ITER_OFFSETS, NPY_KEEPORDER, NPY_NO_CASTING, flags, NULL, 2, op_axes, 0); op_axes[0] = op_axes[1] = &axis_inner; inner = NpyIter_MultiNew(2, op, 0, NPY_KEEPORDER, NPY_NO_CASTING, flags, NULL, 1, op_axes, 0); oiternext = NpyIter_GetIterNext(outer); iiternext = NpyIter_GetIterNext(inner); ooffsets = (npy_intp *)NpyIter_GetDataPtrArray(outer); idataptrs = NpyIter_GetDataPtrArray(inner); do { do { char *a_data = idataptrs[0] + ooffsets[0], *b_data = idataptrs[0] + ooffsets[0]; /* Do stuff with the data */ } while(iiternext()); NpyIter_Reset(inner); } while(oiternext()); NpyIter_Deallocate(outer); NpyIter_Deallocate(inner); Extending to more dimensions, or making both the inner and outer loops have multiple dimensions, isn't too crazy. Is this along the lines of what you need? If you check out my code, note that it currently isn't exposed as NumPy API yet, but you can try a lot of things with the Python exposure. Cheers, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From jsalvati at u.washington.edu Tue Dec 21 21:00:49 2010 From: jsalvati at u.washington.edu (John Salvatier) Date: Tue, 21 Dec 2010 18:00:49 -0800 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: References: Message-ID: I applaud you on your vision. I only have one small suggestion: I suggest you put a table of contents at the beginning of your NEP so people may skip to the part that most interests them. On Tue, Dec 21, 2010 at 4:59 PM, John Salvatier wrote: > That is an amazing christmas present. > > On Tue, Dec 21, 2010 at 4:53 PM, Mark Wiebe wrote: > >> Hello NumPy-ers, >> >> After some performance analysis, I've designed and implemented a new >> iterator designed to speed up ufuncs and allow for easier multi-dimensional >> iteration. The new code is fairly large, but works quite well already. If >> some people could read the NEP and give some feedback, that would be great! >> Here's a link: >> >> >> https://github.com/m-paradox/numpy/blob/mw_neps/doc/neps/new-iterator-ufunc.rst >> >> I would also love it if someone could try building the code and play >> around with it a bit. The github branch is here: >> >> https://github.com/m-paradox/numpy/tree/new_iterator >> >> To give a taste of the iterator's functionality, below is an example from >> the NEP for how to implement a "Lambda UFunc." With just a few lines of >> code, it's possible to replicate something similar to the numexpr library >> (numexpr still gets a bigger speedup, though). In the example expression I >> chose, execution time went from 138ms to 61ms. >> >> Hopefully this is a good Christmas present for NumPy. :) >> >> Cheers, >> Mark >> >> Here is the definition of the ``luf`` function.:: >> >> def luf(lamdaexpr, *args, **kwargs): >> """Lambda UFunc >> >> e.g. >> c = luf(lambda i,j:i+j, a, b, order='K', >> casting='safe', buffersize=8192) >> >> c = np.empty(...) >> luf(lambda i,j:i+j, a, b, out=c, order='K', >> casting='safe', buffersize=8192) >> """ >> >> nargs = len(args) >> op = args + (kwargs.get('out',None),) >> it = np.newiter(op, ['buffered','no_inner_iteration'], >> [['readonly','nbo_aligned']]*nargs + >> [['writeonly','allocate','no_broadcast']], >> order=kwargs.get('order','K'), >> casting=kwargs.get('casting','safe'), >> buffersize=kwargs.get('buffersize',0)) >> while not it.finished: >> it[-1] = lamdaexpr(*it[:-1]) >> it.iternext() >> >> return it.operands[-1] >> >> Then, by using ``luf`` instead of straight Python expressions, we >> can gain some performance from better cache behavior.:: >> >> In [2]: a = np.random.random((50,50,50,10)) >> In [3]: b = np.random.random((50,50,1,10)) >> In [4]: c = np.random.random((50,50,50,1)) >> >> In [5]: timeit 3*a+b-(a/c) >> 1 loops, best of 3: 138 ms per loop >> >> In [6]: timeit luf(lambda a,b,c:3*a+b-(a/c), a, b, c) >> 10 loops, best of 3: 60.9 ms per loop >> >> In [7]: np.all(3*a+b-(a/c) == luf(lambda a,b,c:3*a+b-(a/c), a, b, c)) >> Out[7]: True >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at silveregg.co.jp Tue Dec 21 21:06:35 2010 From: david at silveregg.co.jp (David) Date: Wed, 22 Dec 2010 11:06:35 +0900 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: References: Message-ID: <4D115D2B.7070904@silveregg.co.jp> Hi Mark, On 12/22/2010 09:53 AM, Mark Wiebe wrote: > Hello NumPy-ers, > > After some performance analysis, I've designed and implemented a new > iterator designed to speed up ufuncs and allow for easier > multi-dimensional iteration. The new code is fairly large, but works > quite well already. If some people could read the NEP and give some > feedback, that would be great! Here's a link: > > https://github.com/m-paradox/numpy/blob/mw_neps/doc/neps/new-iterator-ufunc.rst This looks pretty cool. I hope to be able to take a look at it during the christmas holidays. I cannot comment in details yet, but it seems to address several issues I encountered myself while implementing the neighborhood iterator (which I will try to update to use the new one). One question: which CPU/platform did you test it on ? cheers, David From mwwiebe at gmail.com Tue Dec 21 21:15:06 2010 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 21 Dec 2010 18:15:06 -0800 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: References: Message-ID: That's a good suggestion - added. Unfortunately, it looks like the github rst converter doesn't make a table of contents with working links. Cheers, Mark On Tue, Dec 21, 2010 at 6:00 PM, John Salvatier wrote: > I applaud you on your vision. I only have one small suggestion: I suggest > you put a table of contents at the beginning of your NEP so people may skip > to the part that most interests them. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Tue Dec 21 21:20:32 2010 From: mwwiebe at gmail.com (Mark Wiebe) Date: Tue, 21 Dec 2010 18:20:32 -0800 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: <4D115D2B.7070904@silveregg.co.jp> References: <4D115D2B.7070904@silveregg.co.jp> Message-ID: On Tue, Dec 21, 2010 at 6:06 PM, David wrote: > > This looks pretty cool. I hope to be able to take a look at it during > the christmas holidays. > Thanks! > > I cannot comment in details yet, but it seems to address several issues > I encountered myself while implementing the neighborhood iterator (which > I will try to update to use the new one). > > One question: which CPU/platform did you test it on ? > The system I'm testing on is a bit old: a dual core Athlon 64 X2 4200+ on 64-bit Fedora 13, gcc 4.4.5. -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Wed Dec 22 01:05:36 2010 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 21 Dec 2010 23:05:36 -0700 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: References: Message-ID: On Tue, Dec 21, 2010 at 5:53 PM, Mark Wiebe wrote: > Hello NumPy-ers, > > After some performance analysis, I've designed and implemented a new > iterator designed to speed up ufuncs and allow for easier multi-dimensional > iteration. The new code is fairly large, but works quite well already. If > some people could read the NEP and give some feedback, that would be great! > Here's a link: > > > https://github.com/m-paradox/numpy/blob/mw_neps/doc/neps/new-iterator-ufunc.rst > > I would also love it if someone could try building the code and play around > with it a bit. The github branch is here: > > https://github.com/m-paradox/numpy/tree/new_iterator > > To give a taste of the iterator's functionality, below is an example from > the NEP for how to implement a "Lambda UFunc." With just a few lines of > code, it's possible to replicate something similar to the numexpr library > (numexpr still gets a bigger speedup, though). In the example expression I > chose, execution time went from 138ms to 61ms. > > Hopefully this is a good Christmas present for NumPy. :) > > Cheers, > Mark > > Here is the definition of the ``luf`` function.:: > > def luf(lamdaexpr, *args, **kwargs): > """Lambda UFunc > > e.g. > c = luf(lambda i,j:i+j, a, b, order='K', > casting='safe', buffersize=8192) > > c = np.empty(...) > luf(lambda i,j:i+j, a, b, out=c, order='K', > casting='safe', buffersize=8192) > """ > > nargs = len(args) > op = args + (kwargs.get('out',None),) > it = np.newiter(op, ['buffered','no_inner_iteration'], > [['readonly','nbo_aligned']]*nargs + > [['writeonly','allocate','no_broadcast']], > order=kwargs.get('order','K'), > casting=kwargs.get('casting','safe'), > buffersize=kwargs.get('buffersize',0)) > while not it.finished: > it[-1] = lamdaexpr(*it[:-1]) > it.iternext() > > return it.operands[-1] > > Then, by using ``luf`` instead of straight Python expressions, we > can gain some performance from better cache behavior.:: > > In [2]: a = np.random.random((50,50,50,10)) > In [3]: b = np.random.random((50,50,1,10)) > In [4]: c = np.random.random((50,50,50,1)) > > In [5]: timeit 3*a+b-(a/c) > 1 loops, best of 3: 138 ms per loop > > In [6]: timeit luf(lambda a,b,c:3*a+b-(a/c), a, b, c) > 10 loops, best of 3: 60.9 ms per loop > > In [7]: np.all(3*a+b-(a/c) == luf(lambda a,b,c:3*a+b-(a/c), a, b, c)) > Out[7]: True > > > Wow, that's a really nice design and write up. Small typo: /* Only allow exactly equivalent types */ NPY_NO_CASTING=0, /* Allow casts between equivalent types of different byte orders */ NPY_EQUIV_CASTING=0, Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From cmutel at gmail.com Wed Dec 22 02:51:46 2010 From: cmutel at gmail.com (Christopher Mutel) Date: Wed, 22 Dec 2010 08:51:46 +0100 Subject: [Numpy-discussion] take from structured array is faster than boolean indexing, but reshapes columns to 2D Message-ID: Dear all- Structured arrays are great, but I am having problems filtering them efficiently. Reading through the mailing list, it seems like boolean arrays are the recommended approach to filtering arrays for arbitrary conditions, but my testing shows that a combination of take and where can be much faster when dealing with structured arrays: import timeit setup = "from numpy import random, where, zeros; r = random.random_integers(1e3, size=1e6); q = zeros((1e6), dtype=[('foo', 'u4'), ('bar', 'u4'), ('baz', 'u4')]); q['foo'] = r" statement1 = "s = q.take(where(q['foo'] < 500))" statement2 = "s = q[q['foo'] < 500]" t = timeit.Timer(statement1, setup) t.timeit(10) t = timeit.Timer(statement2, setup) t.timeit(10) Using the boolean array is about 4 times slower when dealing with large arrays. In my case, these operations are supposed to happen on a web server with a large number of requests, so the efficiency gain is important. However, the combination of take and where reshapes the columns of structured arrays to be 2-dimensional: q['foo'].shape >> (1000000,) s = q[q['foo'] < 500] s['foo'].shape >> (499102,) s = q.take(where(q['foo'] < 500)) s['foo'].shape >> (1, 499102) Is there a way to use this seemingly more efficient approach (take & where) and not have to manually reshape the columns? This seems ungainly for larger structured arrays. Or should I file this as a bug? Perhaps there are even more efficient approaches that I haven't thought of, but are obvious to others? Thanks in advance, Yours, -Chris -- ############################ Chris Mutel ?kologisches Systemdesign - Ecological Systems Design Institut f.Umweltingenieurwissenschaften - Institute for Environmental Engineering ETH Z?rich - HIF C 42 - Schafmattstr. 6 8093 Z?rich Telefon: +41 44 633 71 45 - Fax: +41 44 633 10 61 ############################ From faltet at pytables.org Wed Dec 22 03:21:39 2010 From: faltet at pytables.org (Francesc Alted) Date: Wed, 22 Dec 2010 09:21:39 +0100 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: References: Message-ID: <201012220921.39669.faltet@pytables.org> A Wednesday 22 December 2010 01:53:55 Mark Wiebe escrigu?: > Hello NumPy-ers, > > After some performance analysis, I've designed and implemented a new > iterator designed to speed up ufuncs and allow for easier > multi-dimensional iteration. The new code is fairly large, but > works quite well already. If some people could read the NEP and > give some feedback, that would be great! Here's a link: > > https://github.com/m-paradox/numpy/blob/mw_neps/doc/neps/new-iterator > -ufunc.rst > > I would also love it if someone could try building the code and play > around with it a bit. The github branch is here: > > https://github.com/m-paradox/numpy/tree/new_iterator > > To give a taste of the iterator's functionality, below is an example > from the NEP for how to implement a "Lambda UFunc." With just a few > lines of code, it's possible to replicate something similar to the > numexpr library (numexpr still gets a bigger speedup, though). In > the example expression I chose, execution time went from 138ms to > 61ms. > > Hopefully this is a good Christmas present for NumPy. :) > > Cheers, > Mark > > Here is the definition of the ``luf`` function.:: > > def luf(lamdaexpr, *args, **kwargs): > """Lambda UFunc > > e.g. > c = luf(lambda i,j:i+j, a, b, order='K', > casting='safe', buffersize=8192) > > c = np.empty(...) > luf(lambda i,j:i+j, a, b, out=c, order='K', > casting='safe', buffersize=8192) > """ > > nargs = len(args) > op = args + (kwargs.get('out',None),) > it = np.newiter(op, ['buffered','no_inner_iteration'], > [['readonly','nbo_aligned']]*nargs + > > [['writeonly','allocate','no_broadcast']], > order=kwargs.get('order','K'), > casting=kwargs.get('casting','safe'), > buffersize=kwargs.get('buffersize',0)) > while not it.finished: > it[-1] = lamdaexpr(*it[:-1]) > it.iternext() > > return it.operands[-1] > > Then, by using ``luf`` instead of straight Python expressions, we > can gain some performance from better cache behavior.:: > > In [2]: a = np.random.random((50,50,50,10)) > In [3]: b = np.random.random((50,50,1,10)) > In [4]: c = np.random.random((50,50,50,1)) > > In [5]: timeit 3*a+b-(a/c) > 1 loops, best of 3: 138 ms per loop > > In [6]: timeit luf(lambda a,b,c:3*a+b-(a/c), a, b, c) > 10 loops, best of 3: 60.9 ms per loop > > In [7]: np.all(3*a+b-(a/c) == luf(lambda a,b,c:3*a+b-(a/c), a, b, > c)) Out[7]: True Wow, really nice work! It would be great if that could make into NumPy :-) Regarding your comment on numexpr being faster, I'm not sure (your new_iterator branch does not work for me; it gives me an error like: AttributeError: 'module' object has no attribute 'newiter'), but my guess is that your approach seems actually faster: >>> a = np.random.random((50,50,50,10)) >>> b = np.random.random((50,50,1,10)) >>> c = np.random.random((50,50,50,1)) >>> timeit 3*a+b-(a/c) 10 loops, best of 3: 67.5 ms per loop >>> import numexpr as ne >>> ne.evaluate("3*a+b-(a/c) >>> timeit ne.evaluate("3*a+b-(a/c)") 10 loops, best of 3: 42.8 ms per loop i.e. numexpr is not able to achieve the 2x speedup mark that you are getting with ``luf`` (using a Core2 @ 3 GHz here). -- Francesc Alted From jsalvati at u.washington.edu Wed Dec 22 03:44:28 2010 From: jsalvati at u.washington.edu (John Salvatier) Date: Wed, 22 Dec 2010 00:44:28 -0800 Subject: [Numpy-discussion] Giving numpy the ability to multi-iterate excluding an axis In-Reply-To: References: Message-ID: This now makes sense to me, and I think it should work :D. This is all very cool. This is going to do big things for cython and numpy. Some hopefully constructive criticism: When first reading through the API description, the way oa_ndim and oa_axes work is not clear. I think your description would be clearer if you explain what oa_ndim means (I gather something like "the number of axes over which you wish to iterate"), currently it just says "These parameters let you control in detail how the axes of the operand arrays get matched together and iterated." It's also not totally clear to me how offsetting works. What are the offsets measured from? It seems like they are measured from another iterator, but I'm not sure and I don't see how it gets that information. John On Tue, Dec 21, 2010 at 5:12 PM, Mark Wiebe wrote: > On Mon, Dec 20, 2010 at 1:42 PM, John Salvatier > wrote: > >> A while ago, I asked a whether it was possible to multi-iterate over >> several ndarrays but exclude a certain axis( >> http://www.mail-archive.com/numpy-discussion at scipy.org/msg29204.html), >> sort of a combination of PyArray_IterAllButAxis and PyArray_MultiIterNew. My >> goal was to allow creation of relatively complex ufuncs that can allow >> reduction or directionally dependent computation and still use broadcasting >> (for example a moving averaging ufunc that can have changing averaging >> parameters). I didn't get any solutions, which I take to mean that no one >> knew how to do this. >> >> I am thinking about trying to make a numpy patch with this functionality, >> and I have some questions: 1) How difficult would this kind of task be for >> someone with non-expert C knowledge and good numpy knowledge? 2) Does anyone >> have advice on how to do this kind of thing? >> > > You may be able to do what you would like with the new iterator I've > written. In particular, it supports nesting multiple iterators by providing > either pointers or offsets, and allowing you to specify any subset of the > axes to iterate. Here's how the code to do this in a simple 3D case might > look, for making axis 1 the inner loop: > > PyArrayObject *op[2] = {a,b}; > npy_intp axes_outer[2] = {0,2}}; > npy_intp *op_axes[2]; > npy_intp axis_inner = 1; > npy_int32 flags[2] = {NPY_ITER_READONLY, NPY_ITER_READONLY}; > NpyIter *outer, *inner; > NpyIter_IterNext_Fn oiternext, iiternext; > npy_intp *ooffsets; > char **idataptrs; > > op_axes[0] = op_axes[1] = axes_outer; > outer = NpyIter_MultiNew(2, op, NPY_ITER_OFFSETS, > NPY_KEEPORDER, NPY_NO_CASTING, flags, NULL, 2, > op_axes, 0); > op_axes[0] = op_axes[1] = &axis_inner; > inner = NpyIter_MultiNew(2, op, 0, NPY_KEEPORDER, NPY_NO_CASTING, flags, > NULL, 1, op_axes, 0); > > oiternext = NpyIter_GetIterNext(outer); > iiternext = NpyIter_GetIterNext(inner); > > ooffsets = (npy_intp *)NpyIter_GetDataPtrArray(outer); > idataptrs = NpyIter_GetDataPtrArray(inner); > > do { > do { > char *a_data = idataptrs[0] + ooffsets[0], *b_data = idataptrs[0] + > ooffsets[0]; > /* Do stuff with the data */ > } while(iiternext()); > NpyIter_Reset(inner); > } while(oiternext()); > > NpyIter_Deallocate(outer); > NpyIter_Deallocate(inner); > > Extending to more dimensions, or making both the inner and outer loops have > multiple dimensions, isn't too crazy. Is this along the lines of what you > need? > > If you check out my code, note that it currently isn't exposed as NumPy API > yet, but you can try a lot of things with the Python exposure. > > Cheers, > Mark > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Dec 22 03:48:03 2010 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 22 Dec 2010 00:48:03 -0800 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: <201012220921.39669.faltet@pytables.org> References: <201012220921.39669.faltet@pytables.org> Message-ID: On Wed, Dec 22, 2010 at 12:21 AM, Francesc Alted wrote: > > Wow, really nice work! It would be great if that could make into NumPy > :-) Regarding your comment on numexpr being faster, I'm not sure (your > new_iterator branch does not work for me; it gives me an error like: > AttributeError: 'module' object has no attribute 'newiter'), What are you using to build it? So far I've just modified the setup.py scripts, I still need to add it to numscons. > but my > guess is that your approach seems actually faster: > > >>> a = np.random.random((50,50,50,10)) > >>> b = np.random.random((50,50,1,10)) > >>> c = np.random.random((50,50,50,1)) > >>> timeit 3*a+b-(a/c) > 10 loops, best of 3: 67.5 ms per loop > >>> import numexpr as ne > >>> ne.evaluate("3*a+b-(a/c) > >>> timeit ne.evaluate("3*a+b-(a/c)") > 10 loops, best of 3: 42.8 ms per loop > > i.e. numexpr is not able to achieve the 2x speedup mark that you are > getting with ``luf`` (using a Core2 @ 3 GHz here). > That's promising! I based my assertion on getting a slower speedup than numexpr does on their front page example. -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at pytables.org Wed Dec 22 03:54:16 2010 From: faltet at pytables.org (Francesc Alted) Date: Wed, 22 Dec 2010 09:54:16 +0100 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: References: <201012220921.39669.faltet@pytables.org> Message-ID: <201012220954.16336.faltet@pytables.org> A Wednesday 22 December 2010 09:48:03 Mark Wiebe escrigu?: > On Wed, Dec 22, 2010 at 12:21 AM, Francesc Alted wrote: > > > > Wow, really nice work! It would be great if that could make into > > NumPy > > > > :-) Regarding your comment on numexpr being faster, I'm not sure > > :(your > > > > new_iterator branch does not work for me; it gives me an error > > like: AttributeError: 'module' object has no attribute 'newiter'), > > What are you using to build it? So far I've just modified the > setup.py scripts, I still need to add it to numscons. Well, just the typical "git clone ...; python setup.py install" dance. > > i.e. numexpr is not able to achieve the 2x speedup mark that you > > are getting with ``luf`` (using a Core2 @ 3 GHz here). > > That's promising! I based my assertion on getting a slower speedup > than numexpr does on their front page example. I see :-) Well, I'd think that numexpr is not specially efficient when handling broadcasting, so this might be the reason your approach is faster. I suppose that with operands with the same shape, things might look different. -- Francesc Alted From michel.dupront at hotmail.fr Wed Dec 22 04:55:30 2010 From: michel.dupront at hotmail.fr (Michel Dupront) Date: Wed, 22 Dec 2010 10:55:30 +0100 Subject: [Numpy-discussion] use of the PyArray_SETITEM numpy method Message-ID: Hello, Please somebody help me ! I am really confused with all these void that I see everywhere. The documentation says: PyObject * PyArray_SETITEM(PyObject * arr, void* itemptr, PyObject* obj) I created a 1D array of doubles with PyArray_SimpleNew. Now I am in a loop of index i to feed my array with values. What should I give for itemptr ? Thank you -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at pytables.org Wed Dec 22 05:11:17 2010 From: faltet at pytables.org (Francesc Alted) Date: Wed, 22 Dec 2010 11:11:17 +0100 Subject: [Numpy-discussion] use of the PyArray_SETITEM numpy method In-Reply-To: References: Message-ID: <201012221111.17356.faltet@pytables.org> A Wednesday 22 December 2010 10:55:30 Michel Dupront escrigu?: > Hello, > > Please somebody help me ! > > I am really confused with all these void that I see everywhere. > > The documentation says: > PyObject * PyArray_SETITEM(PyObject * arr, void* itemptr, PyObject* > obj) > > I created a 1D array of doubles with PyArray_SimpleNew. > Now I am in a loop of index i to feed my array with values. > What should I give for itemptr ? The pointer to your data indeed. For example, if you declare your item as: double myitem then, pass it as &myitem. -- Francesc Alted From ndbecker2 at gmail.com Wed Dec 22 08:08:33 2010 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 22 Dec 2010 08:08:33 -0500 Subject: [Numpy-discussion] savetxt to a string? Message-ID: Is the formatting of savetxt available with a string as the destination? If not, shouldn't this functionality be factored out of savetxt? From numpy-discussion at maubp.freeserve.co.uk Wed Dec 22 08:26:03 2010 From: numpy-discussion at maubp.freeserve.co.uk (Peter) Date: Wed, 22 Dec 2010 13:26:03 +0000 Subject: [Numpy-discussion] savetxt to a string? In-Reply-To: References: Message-ID: On Wed, Dec 22, 2010 at 1:08 PM, Neal Becker wrote: > > Is the formatting of savetxt available with a string as the destination? > > If not, shouldn't this functionality be factored out of savetxt? > Have you tried using a StringIO handle? Peter From ndbecker2 at gmail.com Wed Dec 22 08:53:22 2010 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 22 Dec 2010 08:53:22 -0500 Subject: [Numpy-discussion] savetxt to a string? References: Message-ID: Peter wrote: > On Wed, Dec 22, 2010 at 1:08 PM, Neal Becker wrote: >> >> Is the formatting of savetxt available with a string as the destination? >> >> If not, shouldn't this functionality be factored out of savetxt? >> > > Have you tried using a StringIO handle? > > Peter Yup. But wouldn't it be cleaner to factor out this functionality and make saving to a file use this? From ijstokes at hkl.hms.harvard.edu Wed Dec 22 09:16:17 2010 From: ijstokes at hkl.hms.harvard.edu (Ian Stokes-Rees) Date: Wed, 22 Dec 2010 09:16:17 -0500 Subject: [Numpy-discussion] counting non-zero entries in an ndarray Message-ID: <4D120831.80805@hkl.hms.harvard.edu> What is the most efficient way to do the Matlab equivalent of nnz(M) (nnz = number-of-non-zeros function)? I've tried Google, but no luck. My assumption is that something like a != 0 will be used, but I'm not sure then how to "count" the number of "True" entries. TIA. Ian -------------- next part -------------- A non-text attachment was scrubbed... Name: ijstokes.vcf Type: text/x-vcard Size: 380 bytes Desc: not available URL: From alan.isaac at gmail.com Wed Dec 22 09:20:02 2010 From: alan.isaac at gmail.com (Alan G Isaac) Date: Wed, 22 Dec 2010 09:20:02 -0500 Subject: [Numpy-discussion] counting non-zero entries in an ndarray In-Reply-To: <4D120831.80805@hkl.hms.harvard.edu> References: <4D120831.80805@hkl.hms.harvard.edu> Message-ID: <4D120912.20701@gmail.com> On 12/22/2010 9:16 AM, Ian Stokes-Rees wrote: > a != 0 > > will be used, but I'm not sure then how to "count" the number of "True" > entries. (a != 0).sum() hth, Alan Isaac From numpy-discussion at maubp.freeserve.co.uk Wed Dec 22 09:33:05 2010 From: numpy-discussion at maubp.freeserve.co.uk (Peter) Date: Wed, 22 Dec 2010 14:33:05 +0000 Subject: [Numpy-discussion] savetxt to a string? In-Reply-To: References: Message-ID: On Wed, Dec 22, 2010 at 1:53 PM, Neal Becker wrote: > > Peter wrote: > >> On Wed, Dec 22, 2010 at 1:08 PM, Neal Becker wrote: >>> >>> Is the formatting of savetxt available with a string as the destination? >>> >>> If not, shouldn't this functionality be factored out of savetxt? >>> >> >> Have you tried using a StringIO handle? >> >> Peter > > Yup. ?But wouldn't it be cleaner to factor out this functionality and make > saving to a file use this? I doubt it. I would think from a code point of view no - taking filenames or handles means you can write everything using handle.write(...) statements, and send the data to disk (or a StringIO handle) gradually. This scales well with large data. If you wanted a single code base to support filenames, handles, or output as a string I think you'd be forced to build a large string in memory, and then (if applicable) write it to disk (in one go). This won't scale well with large data. Peter From ndbecker2 at gmail.com Wed Dec 22 10:24:02 2010 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 22 Dec 2010 10:24:02 -0500 Subject: [Numpy-discussion] savetxt to a string? References: Message-ID: Peter wrote: > On Wed, Dec 22, 2010 at 1:53 PM, Neal Becker wrote: >> >> Peter wrote: >> >>> On Wed, Dec 22, 2010 at 1:08 PM, Neal Becker >>> wrote: >>>> >>>> Is the formatting of savetxt available with a string as the >>>> destination? >>>> >>>> If not, shouldn't this functionality be factored out of savetxt? >>>> >>> >>> Have you tried using a StringIO handle? >>> >>> Peter >> >> Yup. But wouldn't it be cleaner to factor out this functionality and >> make saving to a file use this? > > I doubt it. > > I would think from a code point of view no - taking filenames or handles > means you can write everything using handle.write(...) statements, and > send the data to disk (or a StringIO handle) gradually. This scales well > with large data. > > If you wanted a single code base to support filenames, handles, or > output as a string I think you'd be forced to build a large string in > memory, and then (if applicable) write it to disk (in one go). This > won't scale well with large data. > > Peter Good point. From mwwiebe at gmail.com Wed Dec 22 11:21:24 2010 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 22 Dec 2010 08:21:24 -0800 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: References: Message-ID: On Tue, Dec 21, 2010 at 10:05 PM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > Wow, that's a really nice design and write up. Small typo: > > /* Only allow exactly equivalent types */ > NPY_NO_CASTING=0, > /* Allow casts between equivalent types of different byte orders */ > > NPY_EQUIV_CASTING=0, > > Good catch, turns out the test that should have caught it was broken too. -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Dec 22 11:25:13 2010 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 22 Dec 2010 08:25:13 -0800 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: <201012220954.16336.faltet@pytables.org> References: <201012220921.39669.faltet@pytables.org> <201012220954.16336.faltet@pytables.org> Message-ID: On Wed, Dec 22, 2010 at 12:54 AM, Francesc Alted wrote: > A Wednesday 22 December 2010 09:48:03 Mark Wiebe escrigu?: > > On Wed, Dec 22, 2010 at 12:21 AM, Francesc Alted > wrote: > > > > > > new_iterator branch does not work for me; it gives me an error > > > like: AttributeError: 'module' object has no attribute 'newiter'), > > > > What are you using to build it? So far I've just modified the > > setup.py scripts, I still need to add it to numscons. > > Well, just the typical "git clone ...; python setup.py install" dance. > Can you print out your np.__version__, and try running the tests? If newiter didn't build for some reason, its tests should be throwing a bunch of exceptions. > > > i.e. numexpr is not able to achieve the 2x speedup mark that you > > > are getting with ``luf`` (using a Core2 @ 3 GHz here). > > > > That's promising! I based my assertion on getting a slower speedup > > than numexpr does on their front page example. > > I see :-) Well, I'd think that numexpr is not specially efficient when > handling broadcasting, so this might be the reason your approach is > faster. I suppose that with operands with the same shape, things might > look different. > I haven't looked at the numexpr code, but I think the ufuncs will need SSE versions to make up part of the remaining difference. -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Dec 22 11:54:11 2010 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 22 Dec 2010 08:54:11 -0800 Subject: [Numpy-discussion] Giving numpy the ability to multi-iterate excluding an axis In-Reply-To: References: Message-ID: On Wed, Dec 22, 2010 at 12:44 AM, John Salvatier wrote: > This now makes sense to me, and I think it should work :D. This is all very > cool. This is going to do big things for cython and numpy. > > Some hopefully constructive criticism: > > When first reading through the API description, the way oa_ndim and oa_axes > work is not clear. I think your description would be clearer if you explain > what oa_ndim means (I gather something like "the number of axes over which > you wish to iterate"), currently it just says "These parameters let you > control in detail how the axes of the operand arrays get matched together > and iterated." > Thanks, I've tried to clean up the description a bit. > It's also not totally clear to me how offsetting works. What are the > offsets measured from? It seems like they are measured from another > iterator, but I'm not sure and I don't see how it gets that information. > I added an example to the NEP to try to make it more clear, here's what I wrote: To help understand how the offsets work, here is a simple nested iteration example. Let's say our array a has shape (2, 3, 4), and strides (48, 16, 4). The data pointer for element (i, j, k) is at address PyArray_BYTES(a) + 48*i + 16*j + 4*k. Now consider two iterators with custom op_axes (0,1) and (2,). The first one will produce addresses like PyArray_BYTES(a) + 48*i + 16*j, and the second one will produce addresses likePyArray_BYTES(a) + 4*k. Simply adding together these values would produce invalid pointers. Instead, we can make the outer iterator produce offsets, in which case it will produce the values 48*i + 16*j, and its sum with the other iterator's pointer gives the correct data address. It's important to note that this will not work if any of the iterators share an axis. The iterator cannot check this, so your code must handle it. Additionally, taking a look at the ndarray strides documentation might help: http://docs.scipy.org/doc/numpy/reference/generated/numpy.ndarray.strides.html Cheers, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at pytables.org Wed Dec 22 12:07:13 2010 From: faltet at pytables.org (Francesc Alted) Date: Wed, 22 Dec 2010 18:07:13 +0100 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: References: <201012220954.16336.faltet@pytables.org> Message-ID: <201012221807.13516.faltet@pytables.org> A Wednesday 22 December 2010 17:25:13 Mark Wiebe escrigu?: > Can you print out your np.__version__, and try running the tests? If > newiter didn't build for some reason, its tests should be throwing a > bunch of exceptions. I'm a bit swamped now. Let's see if I can do that later on. > > I see :-) Well, I'd think that numexpr is not specially efficient > > when handling broadcasting, so this might be the reason your > > approach is faster. I suppose that with operands with the same > > shape, things might look different. > > I haven't looked at the numexpr code, but I think the ufuncs will > need SSE versions to make up part of the remaining difference. Uh, I doubt that SSE can do a lot for accelerating operations like 3*a+b-(a/c), as this computation is mainly bounded by memory (although threading does certainly help). Numexpr can use SSE only via Intel's VML, which is very good for accelerating the computation of transcendental functions (sin, cos, sqrt, exp, log...). -- Francesc Alted From mwwiebe at gmail.com Wed Dec 22 12:21:28 2010 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 22 Dec 2010 09:21:28 -0800 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: <201012221807.13516.faltet@pytables.org> References: <201012220954.16336.faltet@pytables.org> <201012221807.13516.faltet@pytables.org> Message-ID: On Wed, Dec 22, 2010 at 9:07 AM, Francesc Alted wrote: > A Wednesday 22 December 2010 17:25:13 Mark Wiebe escrigu?: > > Can you print out your np.__version__, and try running the tests? If > > newiter didn't build for some reason, its tests should be throwing a > > bunch of exceptions. > > I'm a bit swamped now. Let's see if I can do that later on. > Ok. > > I see :-) Well, I'd think that numexpr is not specially efficient > > > when handling broadcasting, so this might be the reason your > > > approach is faster. I suppose that with operands with the same > > > shape, things might look different. > > > > I haven't looked at the numexpr code, but I think the ufuncs will > > need SSE versions to make up part of the remaining difference. > > Uh, I doubt that SSE can do a lot for accelerating operations like > 3*a+b-(a/c), as this computation is mainly bounded by memory (although > threading does certainly help). Numexpr can use SSE only via Intel's > VML, which is very good for accelerating the computation of > transcendental functions (sin, cos, sqrt, exp, log...). > The reason I think it might help is that with 'luf' is that it's calculating the expression on smaller sized arrays, which possibly just got buffered. If the memory allocator for the temporaries keeps giving back the same addresses, all this will be in one of the caches very close to the CPU. Unless this cache is still too slow to feed the SSE instructions, there should be a speed benefit. The ufunc inner loops could also use the SSE prefetch instructions based on the stride to give some strong hints about where the next memory bytes to use will be. -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From tkg at lanl.gov Wed Dec 22 12:43:55 2010 From: tkg at lanl.gov (Thomas K Gamble) Date: Wed, 22 Dec 2010 10:43:55 -0700 Subject: [Numpy-discussion] counting non-zero entries in an ndarray In-Reply-To: <4D120831.80805@hkl.hms.harvard.edu> References: <4D120831.80805@hkl.hms.harvard.edu> Message-ID: <201012221043.55668.tkg@lanl.gov> On Wednesday, December 22, 2010 07:16:17 am Ian Stokes-Rees wrote: > What is the most efficient way to do the Matlab equivalent of nnz(M) > (nnz = number-of-non-zeros function)? > > I've tried Google, but no luck. > > My assumption is that something like > > a != 0 > > will be used, but I'm not sure then how to "count" the number of "True" > entries. > > TIA. > > Ian one possibility: len(where(a != 0)[0]) -- Thomas K. Gamble Research Technologist, System/Network Administrator Chemical Diagnostics and Engineering (C-CDE) Los Alamos National Laboratory MS-E543,p:505-665-4323 f:505-665-4267 There cannot be a crisis next week. My schedule is already full. Henry Kissinger From faltet at pytables.org Wed Dec 22 13:41:10 2010 From: faltet at pytables.org (Francesc Alted) Date: Wed, 22 Dec 2010 19:41:10 +0100 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: References: <201012221807.13516.faltet@pytables.org> Message-ID: <201012221941.11032.faltet@pytables.org> A Wednesday 22 December 2010 18:21:28 Mark Wiebe escrigu?: > On Wed, Dec 22, 2010 at 9:07 AM, Francesc Alted wrote: > > A Wednesday 22 December 2010 17:25:13 Mark Wiebe escrigu?: > > > Can you print out your np.__version__, and try running the tests? > > > If newiter didn't build for some reason, its tests should be > > > throwing a bunch of exceptions. $ PYTHONPATH=numpy python -c "import numpy; numpy.test()" Running unit tests for numpy NumPy version 2.0.0.dev-147f817 NumPy is installed in /tmp/numpy/numpy Python version 2.6.1 (r261:67515, Feb 3 2009, 17:34:37) [GCC 4.3.2 [gcc-4_3-branch revision 141291]] nose version 0.11.0 [clip] Warning: divide by zero encountered in log Warning: divide by zero encountered in log [clip] Ran 3094 tests in 16.771s OK (KNOWNFAIL=4, SKIP=1) IPython seems to work well too: >>> np.__version__ '2.0.0.dev-147f817' >>> timeit 3*a+b-(a/c) 10 loops, best of 3: 67.5 ms per loop However, when trying you luf function: >>> cpaste [the luf code here] -- >>> timeit luf(lambda a,b,c:3*a+b-(a/c), a, b, c) [clip] AttributeError: 'module' object has no attribute 'newiter' > The reason I think it might help is that with 'luf' is that it's > calculating the expression on smaller sized arrays, which possibly > just got buffered. If the memory allocator for the temporaries keeps > giving back the same addresses, all this will be in one of the > caches very close to the CPU. Unless this cache is still too slow to > feed the SSE instructions, there should be a speed benefit. The > ufunc inner loops could also use the SSE prefetch instructions based > on the stride to give some strong hints about where the next memory > bytes to use will be. Ah, okay. However, Numexpr is not meant to accelerate calculations with small operands. I suppose that this is where your new iterator makes more sense: accelerating operations where some of the operands are small (i.e. fit in cache) and have to be broadcasted to match the dimensionality of the others. -- Francesc Alted From mwwiebe at gmail.com Wed Dec 22 13:52:45 2010 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 22 Dec 2010 10:52:45 -0800 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: <201012221941.11032.faltet@pytables.org> References: <201012221807.13516.faltet@pytables.org> <201012221941.11032.faltet@pytables.org> Message-ID: On Wed, Dec 22, 2010 at 10:41 AM, Francesc Alted wrote: > NumPy version 2.0.0.dev-147f817 > There's your problem, it looks like the PYTHONPATH isn't seeing your new build for some reason. That build is off of this commit in the NumPy master branch: https://github.com/numpy/numpy/commit/147f817eefd5efa56fa26b03953a51d533cc27ec > The reason I think it might help is that with 'luf' is that it's > > calculating the expression on smaller sized arrays, which possibly > > just got buffered. If the memory allocator for the temporaries keeps > > giving back the same addresses, all this will be in one of the > > caches very close to the CPU. Unless this cache is still too slow to > > feed the SSE instructions, there should be a speed benefit. The > > ufunc inner loops could also use the SSE prefetch instructions based > > on the stride to give some strong hints about where the next memory > > bytes to use will be. > > Ah, okay. However, Numexpr is not meant to accelerate calculations with > small operands. I suppose that this is where your new iterator makes > more sense: accelerating operations where some of the operands are small > (i.e. fit in cache) and have to be broadcasted to match the > dimensionality of the others. > It's not about small operands, but small chunks of the operands at a time, with temporary arrays for intermediate calculations. It's the small chunks + temporaries which must fit in cache to get the benefit, not the whole array. The numexpr front page explains this fairly well in the section "Why It Works": http://code.google.com/p/numexpr/#Why_It_Works -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at pytables.org Wed Dec 22 13:58:41 2010 From: faltet at pytables.org (Francesc Alted) Date: Wed, 22 Dec 2010 19:58:41 +0100 Subject: [Numpy-discussion] ANN: carray 0.3 released Message-ID: <201012221958.41105.faltet@pytables.org> ===================== Announcing carray 0.3 ===================== What's new ========== A lot of stuff. The most outstanding feature in this version is the introduction of a `ctable` object. A `ctable` is similar to a structured array in NumPy, but instead of storing the data row-wise, it uses a column-wise arrangement. This allows for much better performance for very wide tables, which is one of the scenarios where a `ctable` makes more sense. Of course, as `ctable` is based on `carray` objects, it inherits all its niceties (like on-the-flight compression and fast iterators). Also, the `carray` object itself has received many improvements, like new constructors (arange(), fromiter(), zeros(), ones(), fill()), iterators (where(), wheretrue()) or resize mehtods (resize(), trim()). Most of these also work with the new `ctable`. Besides, Numexpr is supported now (but it is optional) in order to carry out stunningly fast queries on `ctable` objects. For example, doing a query on a table with one million rows and one thousand columns can be up to 2x faster than using a plain structured array, and up to 20x faster than using SQLite (using the ":memory:" backend and indexing). See 'bench/ctable-query.py' for details. Finally, binaries for Windows (both 32-bit and 64-bit) are provided. For more detailed info, see the release notes in: https://github.com/FrancescAlted/carray/wiki/Release-0.3 What it is ========== carray is a container for numerical data that can be compressed in-memory. The compression process is carried out internally by Blosc, a high-performance compressor that is optimized for binary data. Having data compressed in-memory can reduce the stress of the memory subsystem. The net result is that carray operations may be faster than using a traditional ndarray object from NumPy. carray also supports fully 64-bit addressing (both in UNIX and Windows). Below, a carray with 1 trillion of rows has been created (7.3 TB total), filled with zeros, modified some positions, and finally, summed-up:: >>> %time b = ca.zeros(1e12) CPU times: user 54.76 s, sys: 0.03 s, total: 54.79 s Wall time: 55.23 s >>> %time b[[1, 1e9, 1e10, 1e11, 1e12-1]] = (1,2,3,4,5) CPU times: user 2.08 s, sys: 0.00 s, total: 2.08 s Wall time: 2.09 s >>> b carray((1000000000000,), float64) nbytes: 7450.58 GB; cbytes: 2.27 GB; ratio: 3275.35 cparams := cparams(clevel=5, shuffle=True) [0.0, 1.0, 0.0, ..., 0.0, 0.0, 5.0] >>> %time b.sum() CPU times: user 10.08 s, sys: 0.00 s, total: 10.08 s Wall time: 10.15 s 15.0 ['%time' is a magic function provided by the IPyhton shell] Please note that the example above is provided for demonstration purposes only. Do not try to run this at home unless you have more than 3 GB of RAM available, or you will get into trouble. Resources ========= Visit the main carray site repository at: http://github.com/FrancescAlted/carray You can download a source package from: http://carray.pytables.org/download Manual: http://carray.pytables.org/manual Home of Blosc compressor: http://blosc.pytables.org User's mail list: carray at googlegroups.com http://groups.google.com/group/carray Share your experience ===================== Let us know of any bugs, suggestions, gripes, kudos, etc. you may have. ---- Enjoy! -- Francesc Alted From faltet at pytables.org Wed Dec 22 14:16:52 2010 From: faltet at pytables.org (Francesc Alted) Date: Wed, 22 Dec 2010 20:16:52 +0100 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: References: <201012221941.11032.faltet@pytables.org> Message-ID: <201012222016.52648.faltet@pytables.org> A Wednesday 22 December 2010 19:52:45 Mark Wiebe escrigu?: > On Wed, Dec 22, 2010 at 10:41 AM, Francesc Alted wrote: > > NumPy version 2.0.0.dev-147f817 > > There's your problem, it looks like the PYTHONPATH isn't seeing your > new build for some reason. That build is off of this commit in the > NumPy master branch: > > https://github.com/numpy/numpy/commit/147f817eefd5efa56fa26b03953a51d > 533cc27ec Uh, I think I'm a bit lost here. I've cloned this repo: $ git clone git://github.com/m-paradox/numpy.git Is that wrong? > > Ah, okay. However, Numexpr is not meant to accelerate calculations > > with small operands. I suppose that this is where your new > > iterator makes more sense: accelerating operations where some of > > the operands are small (i.e. fit in cache) and have to be > > broadcasted to match the dimensionality of the others. > > It's not about small operands, but small chunks of the operands at a > time, with temporary arrays for intermediate calculations. It's the > small chunks + temporaries which must fit in cache to get the > benefit, not the whole array. But you need to transport those small chunks from main memory to cache before you can start doing the computation for this piece, right? This is what I'm saying that the bottleneck for evaluating arbitrary expressions (like "3*a+b-(a/c)", i.e. not including transcendental functions, nor broadcasting) is memory bandwidth (and more in particular RAM bandwidth). > The numexpr front page explains this > fairly well in the section "Why It Works": > > http://code.google.com/p/numexpr/#Why_It_Works I know. I wrote that part (based on the notes by David Cooke, the original author ;-) -- Francesc Alted From mwwiebe at gmail.com Wed Dec 22 14:42:54 2010 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 22 Dec 2010 11:42:54 -0800 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: <201012222016.52648.faltet@pytables.org> References: <201012221941.11032.faltet@pytables.org> <201012222016.52648.faltet@pytables.org> Message-ID: On Wed, Dec 22, 2010 at 11:16 AM, Francesc Alted wrote: > A Wednesday 22 December 2010 19:52:45 Mark Wiebe escrigu?: > > On Wed, Dec 22, 2010 at 10:41 AM, Francesc Alted > wrote: > > > NumPy version 2.0.0.dev-147f817 > > > > There's your problem, it looks like the PYTHONPATH isn't seeing your > > new build for some reason. That build is off of this commit in the > > NumPy master branch: > > > > https://github.com/numpy/numpy/commit/147f817eefd5efa56fa26b03953a51d > > 533cc27ec > > Uh, I think I'm a bit lost here. I've cloned this repo: > > $ git clone git://github.com/m-paradox/numpy.git > > Is that wrong? > That's right, it was my mistake to assume that the page for a branch on github would give you that branch. You need the 'new_iterator' branch, so after that clone, you should do this: $ git checkout origin/new_iterator > > Ah, okay. However, Numexpr is not meant to accelerate calculations > > > with small operands. I suppose that this is where your new > > > iterator makes more sense: accelerating operations where some of > > > the operands are small (i.e. fit in cache) and have to be > > > broadcasted to match the dimensionality of the others. > > > > It's not about small operands, but small chunks of the operands at a > > time, with temporary arrays for intermediate calculations. It's the > > small chunks + temporaries which must fit in cache to get the > > benefit, not the whole array. > > But you need to transport those small chunks from main memory to cache > before you can start doing the computation for this piece, right? This > is what I'm saying that the bottleneck for evaluating arbitrary > expressions (like "3*a+b-(a/c)", i.e. not including transcendental > functions, nor broadcasting) is memory bandwidth (and more in particular > RAM bandwidth). > In the example expression, I believe the evaluation would go something like this. Assuming the memory allocator keeps giving back the same locations to 'luf', all temporary variables will already be in cache after the first chunk. temp1 = 3 * a # a is read from main memory temp2 = temp1 + b # b is read from main memory temp3 = a / c # a is already in cache, c is read from main memory result = temp2 + temp3 # result is written to data from main memory So there are 4 reads and writes to chunks from outside of the cache, but 12 total reads and writes to chunks, so speeding up the parts already in cache would appear to be beneficial. The benefit will get better with more complicated expressions. I think as long as the operation is slower than a memcpy, the RAM bandwidth isn't the main bottleneck to be concerned with, but instead produces an upper bound on performance. I'm not sure how to precisely measure that overhead, though. > > > The numexpr front page explains this > > fairly well in the section "Why It Works": > > > > http://code.google.com/p/numexpr/#Why_It_Works > > I know. I wrote that part (based on the notes by David Cooke, the > original author ;-) > Cool :) -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at pytables.org Wed Dec 22 15:05:09 2010 From: faltet at pytables.org (Francesc Alted) Date: Wed, 22 Dec 2010 21:05:09 +0100 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: References: <201012222016.52648.faltet@pytables.org> Message-ID: <201012222105.09278.faltet@pytables.org> A Wednesday 22 December 2010 20:42:54 Mark Wiebe escrigu?: > On Wed, Dec 22, 2010 at 11:16 AM, Francesc Alted wrote: > > A Wednesday 22 December 2010 19:52:45 Mark Wiebe escrigu?: > > > On Wed, Dec 22, 2010 at 10:41 AM, Francesc Alted > > > > wrote: > > > > NumPy version 2.0.0.dev-147f817 > > > > > > There's your problem, it looks like the PYTHONPATH isn't seeing > > > your new build for some reason. That build is off of this > > > commit in the NumPy master branch: > > > > > > https://github.com/numpy/numpy/commit/147f817eefd5efa56fa26b03953 > > > a51d 533cc27ec > > > > Uh, I think I'm a bit lost here. I've cloned this repo: > > > > $ git clone git://github.com/m-paradox/numpy.git > > > > Is that wrong? > > That's right, it was my mistake to assume that the page for a branch > on github would give you that branch. You need the 'new_iterator' > branch, so after that clone, you should do this: > > $ git checkout origin/new_iterator Ah, things go well now: >>> timeit 3*a+b-(a/c) 10 loops, best of 3: 67.7 ms per loop >>> timeit luf(lambda a,b,c:3*a+b-(a/c), a, b, c) 10 loops, best of 3: 27.8 ms per loop >>> timeit ne.evaluate("3*a+b-(a/c)") 10 loops, best of 3: 42.8 ms per loop So, yup, I'm seeing the good speedup here too :-) > > But you need to transport those small chunks from main memory to > > cache before you can start doing the computation for this piece, > > right? This is what I'm saying that the bottleneck for evaluating > > arbitrary expressions (like "3*a+b-(a/c)", i.e. not including > > transcendental functions, nor broadcasting) is memory bandwidth > > (and more in particular RAM bandwidth). > > In the example expression, I believe the evaluation would go > something like this. Assuming the memory allocator keeps giving > back the same locations to 'luf', all temporary variables will > already be in cache after the first chunk. > > temp1 = 3 * a # a is read from main memory > temp2 = temp1 + b # b is read from main memory > temp3 = a / c # a is already in cache, c is read from > main memory > result = temp2 + temp3 # result is written to data from main memory > > So there are 4 reads and writes to chunks from outside of the cache, > but 12 total reads and writes to chunks, so speeding up the parts > already in cache would appear to be beneficial. The benefit will > get better with more complicated expressions. I think as long as > the operation is slower than a memcpy, the RAM bandwidth isn't the > main bottleneck to be concerned with, but instead produces an upper > bound on performance. I'm not sure how to precisely measure that > overhead, though. Well, see the timings for the non-broadcasting case: >>> a = np.random.random((50,50,50,10)) >>> b = np.random.random((50,50,50,10)) >>> c = np.random.random((50,50,50,10)) >>> timeit 3*a+b-(a/c) 10 loops, best of 3: 31.1 ms per loop >>> timeit luf(lambda a,b,c:3*a+b-(a/c), a, b, c) 10 loops, best of 3: 24.5 ms per loop >>> timeit ne.evaluate("3*a+b-(a/c)") 100 loops, best of 3: 10.4 ms per loop However, the above comparison is not fair, as numexpr uses all your cores by default (2 for the case above). If we force using only one core: >>> ne.set_num_threads(1) >>> timeit ne.evaluate("3*a+b-(a/c)") 100 loops, best of 3: 16 ms per loop which is still faster than luf. In this case numexpr was not using SSE, but in case luf does so, this does not imply better speed. -- Francesc Alted From jrocher at enthought.com Wed Dec 22 15:29:54 2010 From: jrocher at enthought.com (Jonathan Rocher) Date: Wed, 22 Dec 2010 14:29:54 -0600 Subject: [Numpy-discussion] counting non-zero entries in an ndarray In-Reply-To: <201012221043.55668.tkg@lanl.gov> References: <4D120831.80805@hkl.hms.harvard.edu> <201012221043.55668.tkg@lanl.gov> Message-ID: To answer the part about the most efficient way to do that, In [1]: a = array([0,1,4,76,3,0,4,67,9,5,3,9,0,5,23,3,0,5,3,3,0,5,0]) In [8]: %timeit len(where(a!=0)[0]) 100000 loops, best of 3: 6.54 us per loop In [9]: %timeit (a!=0).sum() 100000 loops, best of 3: 9.81 us per loop Seems like the where option is faster. Now I create a large array In [13]: a = hstack([a,a,a,a,a,a,a,a,a,a,a,a]) In [14]: %timeit len(where(a!=0)[0]) 100000 loops, best of 3: 12.3 us per loop In [15]: %timeit (a!=0).sum() 100000 loops, best of 3: 11 us per loop Now the fastest way is using the sum. The where function is not vectorized because it doesn't know in advance the size of the final array. In the case of a big array, there will be a lot of copy in the memory, as it grows. And the difference increases fast... In [20]: a = hstack([a,a,a,a,a,a,a,a,a,a,a,a]) In [21]: %timeit len(where(a!=0)[0]) 10000 loops, best of 3: 79.1 us per loop In [22]: %timeit (a!=0).sum() 10000 loops, best of 3: 24.5 us per loop Regards, Jonathan On Wed, Dec 22, 2010 at 11:43 AM, Thomas K Gamble wrote: > On Wednesday, December 22, 2010 07:16:17 am Ian Stokes-Rees wrote: > > What is the most efficient way to do the Matlab equivalent of nnz(M) > > (nnz = number-of-non-zeros function)? > > > > I've tried Google, but no luck. > > > > My assumption is that something like > > > > a != 0 > > > > will be used, but I'm not sure then how to "count" the number of "True" > > entries. > > > > TIA. > > > > Ian > > one possibility: > > len(where(a != 0)[0]) > > -- > Thomas K. Gamble > Research Technologist, System/Network Administrator > Chemical Diagnostics and Engineering (C-CDE) > Los Alamos National Laboratory > MS-E543,p:505-665-4323 f:505-665-4267 > > There cannot be a crisis next week. My schedule is already full. > Henry Kissinger > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Jonathan Rocher, Enthought, Inc. jrocher at enthought.com 1-512-536-1057 http://www.enthought.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Dec 22 15:42:43 2010 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 22 Dec 2010 12:42:43 -0800 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: <201012222105.09278.faltet@pytables.org> References: <201012222016.52648.faltet@pytables.org> <201012222105.09278.faltet@pytables.org> Message-ID: On Wed, Dec 22, 2010 at 12:05 PM, Francesc Alted wrote: > > > Ah, things go well now: > > >>> timeit 3*a+b-(a/c) > 10 loops, best of 3: 67.7 ms per loop > >>> timeit luf(lambda a,b,c:3*a+b-(a/c), a, b, c) > 10 loops, best of 3: 27.8 ms per loop > >>> timeit ne.evaluate("3*a+b-(a/c)") > 10 loops, best of 3: 42.8 ms per loop > > So, yup, I'm seeing the good speedup here too :-) > Great! > > Well, see the timings for the non-broadcasting case: > > >>> a = np.random.random((50,50,50,10)) > >>> b = np.random.random((50,50,50,10)) > >>> c = np.random.random((50,50,50,10)) > > >>> timeit 3*a+b-(a/c) > 10 loops, best of 3: 31.1 ms per loop > >>> timeit luf(lambda a,b,c:3*a+b-(a/c), a, b, c) > 10 loops, best of 3: 24.5 ms per loop > >>> timeit ne.evaluate("3*a+b-(a/c)") > 100 loops, best of 3: 10.4 ms per loop > > However, the above comparison is not fair, as numexpr uses all your > cores by default (2 for the case above). If we force using only one > core: > > >>> ne.set_num_threads(1) > >>> timeit ne.evaluate("3*a+b-(a/c)") > 100 loops, best of 3: 16 ms per loop > > which is still faster than luf. In this case numexpr was not using SSE, > but in case luf does so, this does not imply better speed. Ok, I get pretty close to the same ratios (and my machine feels a bit slow...): In [6]: timeit 3*a+b-(a/c) 10 loops, best of 3: 101 ms per loop In [7]: timeit luf(lambda a,b,c:3*a+b-(a/c), a, b, c) 10 loops, best of 3: 53.4 ms per loop In [8]: timeit ne.evaluate("3*a+b-(a/c)") 10 loops, best of 3: 27.8 ms per loop In [9]: ne.set_num_threads(1) In [10]: timeit ne.evaluate("3*a+b-(a/c)") 10 loops, best of 3: 33.6 ms per loop I think the closest to a "memcpy" we can do here would be just adding, which shows the expression evaluation can be estimated to have 20% overhead. While that's small compared the speedup over straight NumPy, I think it's still worth considering. In [11]: timeit ne.evaluate("a+b+c") 10 loops, best of 3: 27.9 ms per loop Even just switching from add to divide gives more than 10% overhead. With SSE2 these divides could be done two at a time for doubles or four at a time for floats to cut that down. In [12]: timeit ne.evaluate("a/b/c") 10 loops, best of 3: 31.7 ms per loop This all shows that the 'luf' Python interpreter overhead is still pretty big, the new iterator can't defeat numexpr by itself. I think numexpr could get a nice boost from using the new iterator internally though - if I go back to the original motivation, different memory orderings, 'luf' is 10x faster than single-threaded numexpr. In [15]: a = np.random.random((50,50,50,10)).T In [16]: b = np.random.random((50,50,50,10)).T In [17]: c = np.random.random((50,50,50,10)).T In [18]: timeit ne.evaluate("3*a+b-(a/c)") 1 loops, best of 3: 556 ms per loop In [19]: timeit luf(lambda a,b,c:3*a+b-(a/c), a, b, c) 10 loops, best of 3: 52.5 ms per loop Cheers, Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From mwwiebe at gmail.com Wed Dec 22 16:02:03 2010 From: mwwiebe at gmail.com (Mark Wiebe) Date: Wed, 22 Dec 2010 13:02:03 -0800 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: References: <201012222016.52648.faltet@pytables.org> <201012222105.09278.faltet@pytables.org> Message-ID: On Wed, Dec 22, 2010 at 12:42 PM, Mark Wiebe wrote: > I think numexpr could get a nice boost from using the new iterator > internally though > There's actually a trivial way to do this with very minimal changes to numexpr - the 'itview' mechanism. Create the new iterator, call NpyIter_GetIterView(it,i) (or it.itviews in Python) to get compatibly reordered views of the inputs, then continue with the existing code. -Mark -------------- next part -------------- An HTML attachment was scrubbed... URL: From ijstokes at hkl.hms.harvard.edu Wed Dec 22 16:10:05 2010 From: ijstokes at hkl.hms.harvard.edu (Ian Stokes-Rees) Date: Wed, 22 Dec 2010 16:10:05 -0500 Subject: [Numpy-discussion] counting non-zero entries in an ndarray In-Reply-To: <4D120831.80805@hkl.hms.harvard.edu> References: <4D120831.80805@hkl.hms.harvard.edu> Message-ID: <4D12692D.8060508@hkl.hms.harvard.edu> On 12/22/10 9:16 AM, Ian Stokes-Rees wrote: > What is the most efficient way to do the Matlab equivalent of nnz(M) > (nnz = number-of-non-zeros function)? Thanks to all the various responses. I should have mentioned that I'm using scipy.sparse, and lil_matrix objects have a method "getnnz()" which gives me the number I want. Ian From faltet at pytables.org Wed Dec 22 16:20:40 2010 From: faltet at pytables.org (Francesc Alted) Date: Wed, 22 Dec 2010 22:20:40 +0100 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: References: Message-ID: <201012222220.40355.faltet@pytables.org> A Wednesday 22 December 2010 22:02:03 Mark Wiebe escrigu?: > On Wed, Dec 22, 2010 at 12:42 PM, Mark Wiebe > wrote: > > > I think numexpr could get a nice boost from using the new iterator > > internally though > > There's actually a trivial way to do this with very minimal changes > to numexpr - the 'itview' mechanism. Create the new iterator, call > NpyIter_GetIterView(it,i) (or it.itviews in Python) to get > compatibly reordered views of the inputs, then continue with the > existing code. That's interesting. I'll think about this (patches are very welcome too!). Thanks! -- Francesc Alted From ijstokes at hkl.hms.harvard.edu Wed Dec 22 16:32:23 2010 From: ijstokes at hkl.hms.harvard.edu (Ian Stokes-Rees) Date: Wed, 22 Dec 2010 16:32:23 -0500 Subject: [Numpy-discussion] How to control column for pretty print line wrap of ndarrays Message-ID: <4D126E67.3020104@hkl.hms.harvard.edu> Like most people these days, I have multiple 24" monitors. I don't need "print" of ndarrays to wrap after 72 columns. Is there some way to change this? TIA Ian CURRENT: [ NaN NaN NaN NaN NaN 5.1882094 1.19646584]] DESIRED: [ NaN NaN NaN NaN NaN 5.1882094 1.19646584]] (Although your mail client mail mangle it...) From robert.kern at gmail.com Wed Dec 22 16:52:08 2010 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 22 Dec 2010 16:52:08 -0500 Subject: [Numpy-discussion] How to control column for pretty print line wrap of ndarrays In-Reply-To: <4D126E67.3020104@hkl.hms.harvard.edu> References: <4D126E67.3020104@hkl.hms.harvard.edu> Message-ID: On Wed, Dec 22, 2010 at 16:32, Ian Stokes-Rees wrote: > Like most people these days, I have multiple 24" monitors. ?I don't need > "print" of ndarrays to wrap after 72 columns. ?Is there some way to > change this? np.set_printoptions(linewidth=100) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From roger at quantumbioinc.com Thu Dec 23 08:13:08 2010 From: roger at quantumbioinc.com (Roger Martin) Date: Thu, 23 Dec 2010 08:13:08 -0500 Subject: [Numpy-discussion] Installing on CentOS 5 claims invalid Python installation Message-ID: <4D134AE4.6090001@quantumbioinc.com> Hi, NumPy looks like the way to get computation done in Python. Now I'm going through the learning curve of installing the module into different linux OS's and Python versions. An extra need is to install google code's h5py http://code.google.com/p/h5py/ which depends on numpy. In trying a number of Python versions the 2.x's are yielding the message " invalid Python installation" --------------- raise DistutilsPlatformError(my_msg) distutils.errors.DistutilsPlatformError: invalid Python installation: unable to open /home/roger/Python-2.6.6/dist/lib/python2.6/config/Makefile (No such file or directory) --------------- From reading on the web it appears a Python-2.x.x-devel version is needed. Yet no search combination comes back with where to get such a thing(note: I need user installs/builds for security reasons). Where are Python versions compatible with numpy? Building Python-2.6.6 Python-2.7.1(fails to build) Python3.2beta2 numpy1.5.1 invalid Python installation NA success h5py1.3.1 needs numpy NA fails To start I need just one successful combination but will need more cases depending on users of a new integration project. Interestingly your numpy 1.5.1's setup is in good shape to build with Python3.2 yet I need to allow older versions for people's systems not ready to upgrade that far. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Thu Dec 23 14:58:25 2010 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 23 Dec 2010 13:58:25 -0600 Subject: [Numpy-discussion] Installing on CentOS 5 claims invalid Python installation In-Reply-To: <4D134AE4.6090001@quantumbioinc.com> References: <4D134AE4.6090001@quantumbioinc.com> Message-ID: <4D13A9E1.6030602@gmail.com> On 12/23/2010 07:13 AM, Roger Martin wrote: > Hi, > > NumPy looks like the way to get computation done in Python. Now I'm > going through the learning curve of installing the module into > different linux OS's and Python versions. An extra need is to install > google code's h5py http://code.google.com/p/h5py/ which depends on numpy. > > In trying a number of Python versions the 2.x's are yielding the > message " invalid Python installation" > --------------- > raise DistutilsPlatformError(my_msg) > distutils.errors.DistutilsPlatformError: invalid Python installation: > unable to open > /home/roger/Python-2.6.6/dist/lib/python2.6/config/Makefile (No such > file or directory) > --------------- > > From reading on the web it appears a Python-2.x.x-devel version is > needed. Yet no search combination comes back with where to get such a > thing(note: I need user installs/builds for security reasons). Where > are Python versions compatible with numpy? > > Building > Python-2.6.6 > Python-2.7.1(fails to build) > Python3.2beta2 > numpy1.5.1 > invalid Python installation NA > success > h5py1.3.1 > needs numpy > NA > fails > > > To start I need just one successful combination but will need more > cases depending on users of a new integration project. > > Interestingly your numpy 1.5.1's setup is in good shape to build with > Python3.2 yet I need to allow older versions for people's systems not > ready to upgrade that far. > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion I thought that Centos 5 ships Python 2.4 so how did you get Python 2.6, 2.7 and 3.2? If these are from some repository then the developmental libraries should also be there - if these are not there then either find another repository or build Python yourself. Bruce -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Thu Dec 23 15:03:40 2010 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Thu, 23 Dec 2010 12:03:40 -0800 Subject: [Numpy-discussion] Installing on CentOS 5 claims invalid Python installation In-Reply-To: <4D134AE4.6090001@quantumbioinc.com> References: <4D134AE4.6090001@quantumbioinc.com> Message-ID: <4D13AB1C.8070006@noaa.gov> On 12/23/10 5:13 AM, Roger Martin wrote: > NumPy looks like the way to get computation done in Python. yup -- welcome! > Now I'm > going through the learning curve of installing the module into different > linux OS's and Python versions. hmm -- usually it's pretty straightforward on Linux (except maybe getting an optimized LAPACK, which you may or may not need). > An extra need is to install google > code's h5py http://code.google.com/p/h5py/ which depends on numpy. I'll leave that for the next step. > In trying a number of Python versions the 2.x's are yielding the message > " invalid Python installation" > --------------- > raise DistutilsPlatformError(my_msg) > distutils.errors.DistutilsPlatformError: invalid Python installation: > unable to open > /home/roger/Python-2.6.6/dist/lib/python2.6/config/Makefile (No such > file or directory) > --------------- > > From reading on the web it appears a Python-2.x.x-devel version is > needed. yup -- many of the Linux package systems split the stuff you need to run Python code from what you need to compile stuff against it -- common with other libs, packages as well. > Yet no search combination comes back with where to get such a > thing each distro has it's own naming convention -- look for anything like "python-devel", "python-dev", etc. > (note: I need user installs/builds for security reasons). Ahh -- a different story -- AFAIK (and I'm NOT an expert) the distro's packages will install python into system directories -- if you really need each user to have their own install, you may need to install from source. That should be pretty straight forward, too. Get the source tarball from python.org, and follow the build instructions. You'll need to specify a user install in that process somehow. The latest numpy should work with any recent python -- If you are free to choose, use 2.7.1 -- it's the latest production version of the 2.* series. 3.* is still a bit bleeding edge. YOU can grab the tarball here: http://www.python.org/download/releases/2.7.1/ Once you've got a python working, a simple "python setup.py install" should do for numpy. HTH, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From paul.z.thunemann at boeing.com Thu Dec 23 17:24:02 2010 From: paul.z.thunemann at boeing.com (Thunemann, Paul Z) Date: Thu, 23 Dec 2010 14:24:02 -0800 Subject: [Numpy-discussion] numpy for jython Message-ID: <11965FB92DB9D44490740AF7806ED4EB37993A872D@XCH-NW-12V.nw.nos.boeing.com> I'd be very interested in hearing more about a numpy port to Java and Jython. If anyone has more info about how to get involved please let me know. -Zack From jsalvati at u.washington.edu Thu Dec 23 17:27:59 2010 From: jsalvati at u.washington.edu (John Salvatier) Date: Thu, 23 Dec 2010 14:27:59 -0800 Subject: [Numpy-discussion] numpy for jython In-Reply-To: <11965FB92DB9D44490740AF7806ED4EB37993A872D@XCH-NW-12V.nw.nos.boeing.com> References: <11965FB92DB9D44490740AF7806ED4EB37993A872D@XCH-NW-12V.nw.nos.boeing.com> Message-ID: I'm curious whether this kind of thing is expected to be relatively easy after the numpy refactor. On Thu, Dec 23, 2010 at 2:24 PM, Thunemann, Paul Z < paul.z.thunemann at boeing.com> wrote: > I'd be very interested in hearing more about a numpy port to Java and > Jython. If anyone has more info about how to get involved please let me > know. > > -Zack > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From numpy-discussion at maubp.freeserve.co.uk Thu Dec 23 17:50:58 2010 From: numpy-discussion at maubp.freeserve.co.uk (Peter) Date: Thu, 23 Dec 2010 22:50:58 +0000 Subject: [Numpy-discussion] numpy for jython In-Reply-To: <11965FB92DB9D44490740AF7806ED4EB37993A872D@XCH-NW-12V.nw.nos.boeing.com> References: <11965FB92DB9D44490740AF7806ED4EB37993A872D@XCH-NW-12V.nw.nos.boeing.com> Message-ID: On Thu, Dec 23, 2010 at 10:24 PM, Thunemann, Paul Z wrote: > > I'd be very interested in hearing more about a numpy port to Java and > Jython. ?If anyone has more info about how to get involved please let > me know. > > -Zack I'd find even a minimal version useful for Jython (just using pure Python) as long as it provided the basic data structures and some core functionality. I'm thinking here of other Python libraries that use NumPy, but don't necessarily need it for speed reasons alone. Peter From paul.z.thunemann at boeing.com Thu Dec 23 18:38:23 2010 From: paul.z.thunemann at boeing.com (Thunemann, Paul Z) Date: Thu, 23 Dec 2010 15:38:23 -0800 Subject: [Numpy-discussion] numpy for jython Message-ID: <11965FB92DB9D44490740AF7806ED4EB379941591F@XCH-NW-12V.nw.nos.boeing.com> If the refactor separates numpy from the Cpython objects and results in a clean C or C++ api, then porting to Java is still a chore but it's doable. I've used JNI and SWIG extensively to port math libraries and could get involved but I don't know who else might be working on this (if anyone). -Zack From david at silveregg.co.jp Thu Dec 23 21:39:42 2010 From: david at silveregg.co.jp (David) Date: Fri, 24 Dec 2010 11:39:42 +0900 Subject: [Numpy-discussion] numpy for jython In-Reply-To: References: <11965FB92DB9D44490740AF7806ED4EB37993A872D@XCH-NW-12V.nw.nos.boeing.com> Message-ID: <4D1407EE.20502@silveregg.co.jp> On 12/24/2010 07:27 AM, John Salvatier wrote: > I'm curious whether this kind of thing is expected to be relatively easy > after the numpy refactor. It would help, but it won't make it easy. I asked this exact question some time ago to Enthought developers, and java would be more complicated because there is no equivalent to C++/CLI in java world. Don't take my word for it, though, because I know very little about ways to wrap native code on the jvm (or CLR for that matter). I think more than one person is interested, though (I for one am more interested in the JVM than the CLR), cheers, David From bioinformed at gmail.com Thu Dec 23 23:34:07 2010 From: bioinformed at gmail.com (Kevin Jacobs ) Date: Thu, 23 Dec 2010 23:34:07 -0500 Subject: [Numpy-discussion] ANN: carray 0.3 released In-Reply-To: <201012221958.41105.faltet@pytables.org> References: <201012221958.41105.faltet@pytables.org> Message-ID: On Wed, Dec 22, 2010 at 1:58 PM, Francesc Alted wrote: > >>> %time b = ca.zeros(1e12) > CPU times: user 54.76 s, sys: 0.03 s, total: 54.79 s > Wall time: 55.23 s > I know this is somewhat missing the point of your demonstration, but 55 seconds to create an empty 3 GB data structure to represent a multi-TB dense array doesn't seem all that fast to me. Compression can do a lot of things, but isn't this a case where a true sparse data structure would be the right tool for the job? I'm more interested in seeing what a carray can do with census data, web logs, or somethat vaguely real world where direct binary representations are used by default and assumed to be reasonable optimal (i.e., anything sensibly stored in sqlite tables). -Kevin -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant at enthought.com Fri Dec 24 00:24:21 2010 From: oliphant at enthought.com (Travis Oliphant) Date: Thu, 23 Dec 2010 23:24:21 -0600 Subject: [Numpy-discussion] NEP for faster ufuncs In-Reply-To: References: Message-ID: <4E081B0A-3E30-48D6-9463-6BFA434FF9ED@enthought.com> This is very cool! I would like to see this get into NumPy 2.0. Thanks for all the great work! -Travis On Dec 21, 2010, at 6:53 PM, Mark Wiebe wrote: > Hello NumPy-ers, > > After some performance analysis, I've designed and implemented a new iterator designed to speed up ufuncs and allow for easier multi-dimensional iteration. The new code is fairly large, but works quite well already. If some people could read the NEP and give some feedback, that would be great! Here's a link: > > https://github.com/m-paradox/numpy/blob/mw_neps/doc/neps/new-iterator-ufunc.rst > > I would also love it if someone could try building the code and play around with it a bit. The github branch is here: > > https://github.com/m-paradox/numpy/tree/new_iterator > > To give a taste of the iterator's functionality, below is an example from the NEP for how to implement a "Lambda UFunc." With just a few lines of code, it's possible to replicate something similar to the numexpr library (numexpr still gets a bigger speedup, though). In the example expression I chose, execution time went from 138ms to 61ms. > > Hopefully this is a good Christmas present for NumPy. :) > > Cheers, > Mark > > Here is the definition of the ``luf`` function.:: > > def luf(lamdaexpr, *args, **kwargs): > """Lambda UFunc > > e.g. > c = luf(lambda i,j:i+j, a, b, order='K', > casting='safe', buffersize=8192) > > c = np.empty(...) > luf(lambda i,j:i+j, a, b, out=c, order='K', > casting='safe', buffersize=8192) > """ > > nargs = len(args) > op = args + (kwargs.get('out',None),) > it = np.newiter(op, ['buffered','no_inner_iteration'], > [['readonly','nbo_aligned']]*nargs + > [['writeonly','allocate','no_broadcast']], > order=kwargs.get('order','K'), > casting=kwargs.get('casting','safe'), > buffersize=kwargs.get('buffersize',0)) > while not it.finished: > it[-1] = lamdaexpr(*it[:-1]) > it.iternext() > > return it.operands[-1] > > Then, by using ``luf`` instead of straight Python expressions, we > can gain some performance from better cache behavior.:: > > In [2]: a = np.random.random((50,50,50,10)) > In [3]: b = np.random.random((50,50,1,10)) > In [4]: c = np.random.random((50,50,50,1)) > > In [5]: timeit 3*a+b-(a/c) > 1 loops, best of 3: 138 ms per loop > > In [6]: timeit luf(lambda a,b,c:3*a+b-(a/c), a, b, c) > 10 loops, best of 3: 60.9 ms per loop > > In [7]: np.all(3*a+b-(a/c) == luf(lambda a,b,c:3*a+b-(a/c), a, b, c)) > Out[7]: True > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion --- Travis Oliphant Enthought, Inc. oliphant at enthought.com 1-512-536-1057 http://www.enthought.com -------------- next part -------------- An HTML attachment was scrubbed... URL: From oliphant at enthought.com Fri Dec 24 00:29:48 2010 From: oliphant at enthought.com (Travis Oliphant) Date: Thu, 23 Dec 2010 23:29:48 -0600 Subject: [Numpy-discussion] numpy for jython In-Reply-To: <11965FB92DB9D44490740AF7806ED4EB37993A872D@XCH-NW-12V.nw.nos.boeing.com> References: <11965FB92DB9D44490740AF7806ED4EB37993A872D@XCH-NW-12V.nw.nos.boeing.com> Message-ID: <902DE6C3-621B-46EC-9482-33FCF9EF0E2F@enthought.com> On Dec 23, 2010, at 4:24 PM, Thunemann, Paul Z wrote: > I'd be very interested in hearing more about a numpy port to Java and Jython. If anyone has more info about how to get involved please let me know. The numpy-refactor should help with this. You basically need to write a Java extension which uses the new libndarray (presumably using JNI). I am not an expert on Java, but the design of NumPy / SciPy for .NET was to allow a NumPy for Java (and Jython) as well. There is probably about 1-3 man months of work however to build the interface. Currently, there are two interfaces: a CPython and a .NET interface (written in C#), a Java interface written in Java (using JNI for the native code interaction) would make NumPy available to Jython. The SciPy port to .NET is being accomplished by porting SciPy to use Cython and Fwrap. This will allow SciPy for Jython as well, once a Java / JNI backend to Cython is completed. -Travis From faltet at pytables.org Fri Dec 24 10:09:32 2010 From: faltet at pytables.org (Francesc Alted) Date: Fri, 24 Dec 2010 16:09:32 +0100 Subject: [Numpy-discussion] ANN: carray 0.3 released In-Reply-To: References: <201012221958.41105.faltet@pytables.org> Message-ID: 2010/12/24, Kevin Jacobs : > On Wed, Dec 22, 2010 at 1:58 PM, Francesc Alted wrote: > >> >>> %time b = ca.zeros(1e12) >> CPU times: user 54.76 s, sys: 0.03 s, total: 54.79 s >> Wall time: 55.23 s >> > > I know this is somewhat missing the point of your demonstration, but 55 > seconds to create an empty 3 GB data structure to represent a multi-TB dense > array doesn't seem all that fast to me. Yes, this was not the point of the demo, but just showing 64-bit addressing (a feature that I implemented recently and was eager to show). But, agreed, I'm guilty to show times, so your observation is pertinent. But mind that I'm not creating an *empty* structure, but a *zeroed* structure; that's a bit different (that does not mean that the process cannot be speed-up, but we all surely agree that there is little sense in optimizing this scenario ;-). > Compression can do a lot of things, > but isn't this a case where a true sparse data structure would be the right > tool for the job? I'm more interested in seeing what a carray can do with > census data, web logs, or somethat vaguely real world where direct binary > representations are used by default and assumed to be reasonable optimal > (i.e., anything sensibly stored in sqlite tables). Well, I'm just creating the tool; it is up to the users to find real-world applications. I'm pretty sure that some of you will find some good ones. Cheers! -- Francesc Alted From enzomich at gmail.com Sun Dec 26 03:51:57 2010 From: enzomich at gmail.com (Enzo Michelangeli) Date: Sun, 26 Dec 2010 16:51:57 +0800 Subject: [Numpy-discussion] Optimization suggestion sought Message-ID: For a pivoted algorithm, I have to perform an operation that in fully vectorized form can be expressed as: pivot = tableau[locat,:]/tableau[locat,cand] tableau -= tableau[:,cand:cand+1]*pivot tableau[locat,:] = pivot tableau is a rather large bidimensional array, and I'd like to avoid the allocation of a temporary array of the same size holding the result of the right-hand side expression in the second line of code (the outer product of tableau[:,cand] and pivot). On the other hand, if I replace that line with: for i in xrange(tableau.shape[0]): tableau[i] -= tableau[i,cand]*pivot ...I incur some CPU overhead for the "for" loop -- and this part of code is the botteneck of the whole algorithm. Is there any smarter (i.e., more time-efficient) way of achieving my goal? TIA -- Enzo From josef.pktd at gmail.com Sun Dec 26 09:34:10 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 26 Dec 2010 09:34:10 -0500 Subject: [Numpy-discussion] Optimization suggestion sought In-Reply-To: References: Message-ID: On Sun, Dec 26, 2010 at 3:51 AM, Enzo Michelangeli wrote: > For a pivoted algorithm, I have to perform an operation that in fully > vectorized form can be expressed as: > > ? ?pivot = tableau[locat,:]/tableau[locat,cand] > ? ?tableau -= tableau[:,cand:cand+1]*pivot > ? ?tableau[locat,:] = pivot > > tableau is a rather large bidimensional array, and I'd like to avoid the > allocation of a temporary array of the same size holding the result of the > right-hand side expression in the second line of code (the outer product of > tableau[:,cand] and pivot). On the other hand, if I replace that line with: > > ? ?for i in xrange(tableau.shape[0]): > ? ? ? ?tableau[i] -= tableau[i,cand]*pivot > > ...I incur some CPU overhead for the "for" loop -- and this part of code is > the botteneck of the whole algorithm. Is there any smarter (i.e., more > time-efficient) way of achieving my goal? just a generic answer: Working in batches can be a good compromise in some cases. I instead of working in a loop with one row at a time, loop and handle, for example, 1000 rows at a time. Josef > > TIA -- > > Enzo > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From Bruce_Sherwood at ncsu.edu Sun Dec 26 17:26:36 2010 From: Bruce_Sherwood at ncsu.edu (Bruce Sherwood) Date: Sun, 26 Dec 2010 15:26:36 -0700 Subject: [Numpy-discussion] How to call import_array() properly? Message-ID: In my Python code I have import cvisual cvisual.init_numpy() and in my C++ code I have void init_numpy() { import_array(); } import_array() in numpy/core/include/numpy/_multiarray_api.h is a macro: #if PY_VERSION_HEX >= 0x03000000 #define NUMPY_IMPORT_ARRAY_RETVAL NULL #else #define NUMPY_IMPORT_ARRAY_RETVAL #endif #define import_array() {if (_import_array() < 0) {PyErr_Print(); PyErr_SetString(PyExc_ImportError, "numpy.core.multiarray failed to import"); return NUMPY_IMPORT_ARRAY_RETVAL; } } Note that for Python 3 there is a change so that the macro returns NULL, whereas for Python 2 it returned nothing. On Windows and Mac, this works fine for Python 2, and it works fine for a with Python 3, but with either Microsoft Visual Studio 2008 or 2010 this fails for Python 3 with the message "'void' function returning a value", presumably due to the return NULL for Python 3, something that doesn't bother the Mac. So my dumb question is, how should I call import_array() from my routine init_numpy() to get around this problem? I have found a workaround, consisting of defining init_numpy to be of type int on Windows with Python 3, but this seems like an odd kludge, since it isn't needed on the Mac, and I think it's also not needed on Linux. Bruce Sherwood From Bruce_Sherwood at ncsu.edu Sun Dec 26 17:42:33 2010 From: Bruce_Sherwood at ncsu.edu (Bruce Sherwood) Date: Sun, 26 Dec 2010 15:42:33 -0700 Subject: [Numpy-discussion] How to call import_array() properly? In-Reply-To: References: Message-ID: I made a mistake: the Mac behaves the same way when I repeat the experiment. I guess I simply have to define init_numpy() to be of type int for Python 3 on both machines. Nevertheless, if you see a more elegant coding, I'd be interested. Thanks. Bruce Sherwood On Sun, Dec 26, 2010 at 3:26 PM, Bruce Sherwood wrote: > In my Python code I have > > import cvisual > cvisual.init_numpy() > > and in my C++ code I have > > void > init_numpy() > { > ? ?import_array(); > } > > import_array() in numpy/core/include/numpy/_multiarray_api.h is a macro: > > #if PY_VERSION_HEX >= 0x03000000 > #define NUMPY_IMPORT_ARRAY_RETVAL NULL > #else > #define NUMPY_IMPORT_ARRAY_RETVAL > #endif > > #define import_array() {if (_import_array() < 0) {PyErr_Print(); > PyErr_SetString(PyExc_ImportError, "numpy.core.multiarray failed to > import"); return NUMPY_IMPORT_ARRAY_RETVAL; } } > > Note that for Python 3 there is a change so that the macro returns > NULL, whereas for Python 2 it returned nothing. > > On Windows and Mac, this works fine for Python 2, and it works fine > for a with Python 3, but with either Microsoft Visual Studio 2008 or > 2010 this fails for Python 3 with the message "'void' function > returning a value", presumably due to the return NULL for Python 3, > something that doesn't bother the Mac. > > So my dumb question is, how should I call import_array() from my > routine init_numpy() to get around this problem? I have found a > workaround, consisting of defining init_numpy to be of type int on > Windows with Python 3, but this seems like an odd kludge, since it > isn't needed on the Mac, and I think it's also not needed on Linux. > > Bruce Sherwood > From jpscipy at gmail.com Mon Dec 27 01:51:23 2010 From: jpscipy at gmail.com (Justin Peel) Date: Sun, 26 Dec 2010 23:51:23 -0700 Subject: [Numpy-discussion] Optimization suggestion sought In-Reply-To: References: Message-ID: On Sun, Dec 26, 2010 at 7:34 AM, wrote: > On Sun, Dec 26, 2010 at 3:51 AM, Enzo Michelangeli wrote: >> For a pivoted algorithm, I have to perform an operation that in fully >> vectorized form can be expressed as: >> >> ? ?pivot = tableau[locat,:]/tableau[locat,cand] >> ? ?tableau -= tableau[:,cand:cand+1]*pivot >> ? ?tableau[locat,:] = pivot >> >> tableau is a rather large bidimensional array, and I'd like to avoid the >> allocation of a temporary array of the same size holding the result of the >> right-hand side expression in the second line of code (the outer product of >> tableau[:,cand] and pivot). On the other hand, if I replace that line with: >> >> ? ?for i in xrange(tableau.shape[0]): >> ? ? ? ?tableau[i] -= tableau[i,cand]*pivot >> >> ...I incur some CPU overhead for the "for" loop -- and this part of code is >> the botteneck of the whole algorithm. Is there any smarter (i.e., more >> time-efficient) way of achieving my goal? > > just a generic answer: > > Working in batches can be a good compromise in some cases. I instead > of working in a loop with one row at a time, loop and handle, for > example, 1000 rows at a time. > > Josef > >> >> TIA -- >> >> Enzo >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > If this is really such a big bottleneck, then I would look into using Cython for this part. With just a few cdef's, I bet that that you could speed up the for loop tremendously. Depending on the details of your algorithm, you might want to make a Cython function that takes tableau, cand and pivot as inputs and just does the for loop part. From enzomich at gmail.com Mon Dec 27 09:20:40 2010 From: enzomich at gmail.com (Enzo Michelangeli) Date: Mon, 27 Dec 2010 22:20:40 +0800 Subject: [Numpy-discussion] Optimization suggestion sought References: Message-ID: Many thanks to Josef and Justin for their replies. Josef's hint sounds like a good way of reducing peak memory allocation especially when the row size is large, which makes the "for" overhead for each iteration comparatively lower. However, time is still spent in back-and-forth conversions between numpy arrays and the native BLAS data structures, and copying data from the temporary array holding the intermediate results and tableau. Regarding Justin's suggestion, before trying Cython (which, according to http://wiki.cython.org/tutorials/numpy , seems to require a bit of work to handle numpy arrays properly) I was looking at weave.blitz . Unfortunately, this doesn't seems to like my code. Running code containing: expr = "tableau = tableau - tableau[:,cand:cand+1]*pivot" weave.blitz(expr) ...elicits: /---------------------------------------------------------- distutils.errors.CompileError: error: Command "g++ -mno-cygwin -O2 -Wall -IC:\Python26\lib\site-packages\scipy\weave -IC:\Python26\lib\site-packages\scipy\weave\scxx -IC:\Python26\lib\site-packages\scipy\weave\blitz -IC:\Python26\lib\site-packages\numpy\core\include -IC:\Python26\include -IC:\Python26\PC -c c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.cpp -o c:\docume~1\admin\locals~1\temp\ADMIN\python26_intermediate\compiler_4b433fbf94fa137eaa5ee69a06987eda\Release\docume~1\admin\locals~1\temp\admin\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.o" failed with exit status 1 \---------------------------------------------------------- >From the error message issued by g++, it would appear that blitz can't figure out the type of cand: /---------------------------------------------------------- C:\Documents and Settings\ADMIN\My Documents\Projects\Valerio\py>g++ -mno-cygwin -O2 -Wall -IC:\Python26\lib\site-packages\scipy\weave -IC:\Python26\lib\site-packages\scipy\weave\scxx -IC:\Python26\lib\site-packages\scipy\weave\blitz -IC:\Python26\lib\site-packages\numpy\core\include -IC:\Python26\include -IC:\Python26\PC -c c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.cpp -o c:\docume~1\admin\locals~1\temp\ADMIN\python26_intermediate\compiler_4b433fbf94fa137eaa5ee69a06987eda\Release\docume~1\admin\locals~1\temp\admin\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.o In file included from C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array-impl.h:37, from C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array.h:26, from c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.cpp:11: C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/range.h: In member function 'bool blitz::Range::isAscendingContiguous() const': C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/range.h:120: warning: suggest parentheses around '&&' within '||' c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.cpp: In function 'PyObject* compiled_func(PyObject*, PyObject*)': c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.cpp:728: error: ambiguous overload for 'operator+' in 'cand + 1' c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.cpp:728: note: candidates are: operator+(PyObject*, int) c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.cpp:728: note: operator+(int, int) c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.cpp:728: note: operator+(float, int) c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.cpp:728: note: operator+(double, int) c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_6cc64ceb623b02ae7511c559ef81fb661.cpp:728: note: operator+(char*, int) \---------------------------------------------------------- Using a temporary variable to keep cand out of blitz: tmp = tableau[:,cand:cand+1] expr = "tableau = tableau - tmp*pivot" weave.blitz(expr) ...produces an even uglier error message, which makes me think that blitz doesn't understand that the product between a (n,1)-shaped array and an (n,)-shaped one is meant to be an outer product: /---------------------------------------------------------- C:\Documents and Settings\ADMIN\My Documents\Projects\Valerio\py>LCPSolve.py Found executable C:\Program Files\pythonxy\mingw\bin\g++.exe In file included from C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array-impl.h:37, from C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array.h:26, from c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_c9159c98b571d3181d8848337bf1e50a1.cpp:11: C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/range.h: In member function 'bool blitz::Range::isAscendingContiguous() const': C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/range.h:120: warning: suggest parentheses around '&&' within '||' In file included from C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array-impl.h:2504, from C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array.h:26, from c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_c9159c98b571d3181d8848337bf1e50a1.cpp:11: C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/expr.h: In member function 'typename P_op::T_numtype blitz::_bz_ArrayExprBinaryOp::operator()(const blitz::TinyVector&) [with int N_rank = 2, P_expr1 = blitz::FastArrayIterator, P_expr2 = blitz::FastArrayIterator, P_op = blitz::Multiply]': C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/expr.h:144: instantiated from 'typename P_expr::T_numtype blitz::_bz_ArrayExpr::operator()(const blitz::TinyVector&) [with int N_rank = 2, P_expr = blitz::_bz_ArrayExprBinaryOp, blitz::FastArrayIterator, blitz::Multiply >]' C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/expr.h:486: instantiated from 'typename P_op::T_numtype blitz::_bz_ArrayExprBinaryOp::operator()(const blitz::TinyVector&) [with int N_rank = 2, P_expr1 = blitz::FastArrayIterator, P_expr2 = blitz::_bz_ArrayExpr, blitz::FastArrayIterator, blitz::Multiply > >, P_op = blitz::Subtract]' C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/expr.h:144: instantiated from 'typename P_expr::T_numtype blitz::_bz_ArrayExpr::operator()(const blitz::TinyVector&) [with int N_rank = 2, P_expr = blitz::_bz_ArrayExprBinaryOp, blitz::_bz_ArrayExpr, blitz::FastArrayIterator, blitz::Multiply > >, blitz::Subtract >]' C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/eval.cc:670: instantiated from 'blitz::Array& blitz::Array::evaluateWithIndexTraversal1(T_expr, T_update) [with T_expr = blitz::_bz_ArrayExpr, blitz::_bz_ArrayExpr, blitz::FastArrayIterator, blitz::Multiply > >, blitz::Subtract > >, T_update = blitz::_bz_update, P_numtype = double, int N_rank = 2]' C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/eval.cc:171: instantiated from 'blitz::Array& blitz::Array::evaluate(T_expr, T_update) [with T_expr = blitz::_bz_ArrayExpr, blitz::_bz_ArrayExpr, blitz::FastArrayIterator, blitz::Multiply > >, blitz::Subtract > >, T_update = blitz::_bz_update, P_numtype = double, int N_rank = 2]' C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/ops.cc:45: instantiated from 'blitz::Array& blitz::Array::operator=(const blitz::ETBase&) [with T_expr = blitz::_bz_ArrayExpr, blitz::_bz_ArrayExpr, blitz::FastArrayIterator, blitz::Multiply > >, blitz::Subtract > >, P_numtype = double, int N_rank = 2]' c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_c9159c98b571d3181d8848337bf1e50a1.cpp:732: instantiated from here C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/expr.h:486: error: no match for call to '(blitz::FastArrayIterator) (const blitz::TinyVector&)' C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/fastiter.h:74: note: candidates are: P_numtype blitz::FastArrayIterator::operator()(const blitz::TinyVector&) [with P_numtype = double, int N_rank = 1] C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/fastiter.h:202: note: P_numtype& blitz::FastArrayIterator::operator()(int) [with P_numtype = double, int N_rank = 1] C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/fastiter.h:208: note: P_numtype& blitz::FastArrayIterator::operator()(int, int) [with P_numtype = double, int N_rank = 1] C:\Python26\lib\site-packages\scipy\weave\blitz/blitz/array/fastiter.h:214: note: P_numtype& blitz::FastArrayIterator::operator()(int, int, int) [with P_numtype = double, int N_rank = 1] Traceback (most recent call last): File "C:\Documents and Settings\ADMIN\My Documents\Projects\Valerio\py\LCPSolve.py", line 132, in w, z, retcode = LCPSolve(M,q) File "C:\Documents and Settings\ADMIN\My Documents\Projects\Valerio\py\LCPSolve.py", line 94, in LCPSolve weave.blitz(expr) File "C:\Python26\lib\site-packages\scipy\weave\blitz_tools.py", line 65, in blitz **kw) File "C:\Python26\lib\site-packages\scipy\weave\inline_tools.py", line 482, in compile_function verbose=verbose, **kw) File "C:\Python26\lib\site-packages\scipy\weave\ext_tools.py", line 367, in compile verbose = verbose, **kw) File "C:\Python26\lib\site-packages\scipy\weave\build_tools.py", line 273, in build_extension setup(name = module_name, ext_modules = [ext],verbose=verb) File "C:\Python26\lib\site-packages\numpy\distutils\core.py", line 186, in setup return old_setup(**new_attr) File "C:\Python26\lib\distutils\core.py", line 169, in setup raise SystemExit, "error: " + str(msg) distutils.errors.CompileError: error: Command "g++ -mno-cygwin -O2 -Wall -IC:\Python26\lib\site-packages\scipy\weave -IC:\Python26\lib\site-packages\scipy\weave\scxx -IC:\Python26\lib\site-packages\scipy\weave\blitz -IC:\Python26\lib\site-packages\numpy\core\include -IC:\Python26\include -IC:\Python26\PC -c c:\docume~1\admin\locals~1\temp\ADMIN\python26_compiled\sc_c9159c98b571d3181d8848337bf1e50a1.cpp -o c:\docume~1\admin\locals~1\temp\ADMIN\python26_intermediate\compiler_4b433fbf94fa137eaa5ee69a06987eda\Release\docume~1\admin\locals~1\temp\admin\python26_compiled\sc_c9159c98b571d3181d8848337bf1e50a1.o" failed with exit status 1 \---------------------------------------------------------- So, for the time being, no speed breakthrough... Enzo ----- Original Message ----- From: "Justin Peel" To: "Discussion of Numerical Python" Sent: Monday, December 27, 2010 2:51 PM Subject: Re: [Numpy-discussion] Optimization suggestion sought On Sun, Dec 26, 2010 at 7:34 AM, wrote: > On Sun, Dec 26, 2010 at 3:51 AM, Enzo Michelangeli > wrote: >> For a pivoted algorithm, I have to perform an operation that in fully >> vectorized form can be expressed as: >> >> pivot = tableau[locat,:]/tableau[locat,cand] >> tableau -= tableau[:,cand:cand+1]*pivot >> tableau[locat,:] = pivot >> >> tableau is a rather large bidimensional array, and I'd like to avoid the >> allocation of a temporary array of the same size holding the result of >> the >> right-hand side expression in the second line of code (the outer product >> of >> tableau[:,cand] and pivot). On the other hand, if I replace that line >> with: >> >> for i in xrange(tableau.shape[0]): >> tableau[i] -= tableau[i,cand]*pivot >> >> ...I incur some CPU overhead for the "for" loop -- and this part of code >> is >> the botteneck of the whole algorithm. Is there any smarter (i.e., more >> time-efficient) way of achieving my goal? > > just a generic answer: > > Working in batches can be a good compromise in some cases. I instead > of working in a loop with one row at a time, loop and handle, for > example, 1000 rows at a time. > > Josef > >> >> TIA -- >> >> Enzo >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > If this is really such a big bottleneck, then I would look into using Cython for this part. With just a few cdef's, I bet that that you could speed up the for loop tremendously. Depending on the details of your algorithm, you might want to make a Cython function that takes tableau, cand and pivot as inputs and just does the for loop part. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From robert.kern at gmail.com Mon Dec 27 10:20:14 2010 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 27 Dec 2010 10:20:14 -0500 Subject: [Numpy-discussion] How to call import_array() properly? In-Reply-To: References: Message-ID: On Sun, Dec 26, 2010 at 17:26, Bruce Sherwood wrote: > In my Python code I have > > import cvisual > cvisual.init_numpy() > > and in my C++ code I have > > void > init_numpy() > { > ? ?import_array(); > } The import_array() call goes into the initialization function for your module, e.g. initcvisual(). Do not put it into a separate function for the user of your module to call. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From Bruce_Sherwood at ncsu.edu Mon Dec 27 13:09:57 2010 From: Bruce_Sherwood at ncsu.edu (Bruce Sherwood) Date: Mon, 27 Dec 2010 11:09:57 -0700 Subject: [Numpy-discussion] How to call import_array() properly? In-Reply-To: References: Message-ID: Thanks for the good suggestion. I now see that it was purely historical that import_array was driven (indirectly through init_numpy) from the pure Python component of the module rather than in the import of the C++ component, and I've changed that. However, I'm still curious as to whether there's a more intelligent or elegant way to drive import_array than the following code: #if PY_MAJOR_VERSION >= 3 int init_numpy() { import_array(); } #else void init_numpy() { import_array(); } #endif Bruce Sherwood On Mon, Dec 27, 2010 at 8:20 AM, Robert Kern wrote: > On Sun, Dec 26, 2010 at 17:26, Bruce Sherwood wrote: >> In my Python code I have >> >> import cvisual >> cvisual.init_numpy() >> >> and in my C++ code I have >> >> void >> init_numpy() >> { >> ? ?import_array(); >> } > > The import_array() call goes into the initialization function for your > module, e.g. initcvisual(). Do not put it into a separate function for > the user of your module to call. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > ? -- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From moura.mario at gmail.com Mon Dec 27 13:36:38 2010 From: moura.mario at gmail.com (Mario Moura) Date: Mon, 27 Dec 2010 16:36:38 -0200 Subject: [Numpy-discussion] How construct custom slice Message-ID: Hi Folks a = np.zeros((4,3,5,55,5),dtype='|S8') myLen = 4 # here I use myLen = len(something) li = [3,2,4] # li from a list.append(something) sl = slice(0,myLen) tmpIndex = tuple(li) + sl + 4 # <== Here my problem a[tmpIndex] # So What I want is: fillMe = np.array(['foo','bar','hello','world']) # But I cant contruct by hand like this a[3,2,4,:4,4] = fillMe a Again. I need construct custom slice from here tmpIndex = tuple(li) + sl + 4 a[tmpIndex] Who can help me? Best Regards Mario Moura From kwgoodman at gmail.com Mon Dec 27 13:48:48 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Mon, 27 Dec 2010 10:48:48 -0800 Subject: [Numpy-discussion] How construct custom slice In-Reply-To: References: Message-ID: On Mon, Dec 27, 2010 at 10:36 AM, Mario Moura wrote: > Hi Folks > > a = np.zeros((4,3,5,55,5),dtype='|S8') > myLen = 4 # here I use myLen = len(something) > li = [3,2,4] # li from a list.append(something) > sl = slice(0,myLen) > tmpIndex = tuple(li) + sl + 4 ?# <== Here my problem > a[tmpIndex] > > # So What I want is: > fillMe = np.array(['foo','bar','hello','world']) > # But I cant contruct by hand like this > a[3,2,4,:4,4] = fillMe > a > > Again. I need construct custom slice from here > tmpIndex = tuple(li) + sl + 4 > a[tmpIndex] First let's do it by hand: >> a = np.zeros((4,3,5,55,5),dtype='|S8') >> fillMe = np.array(['foo','bar','hello','world']) >> a[3,2,4,:4,4] = fillMe Now let's try using an index: >> b = np.zeros((4,3,5,55,5),dtype='|S8') >> myLen = 4 >> li = [3,2,4] >> sl = slice(0,myLen) Make index: >> idx = range(a.ndim) >> idx[:3] = li >> idx[3] = sl >> idx[4] = 4 >> idx = tuple(idx) Compare results: >> b[idx] = fillMe >> (a == b).all() True From moura.mario at gmail.com Mon Dec 27 13:58:38 2010 From: moura.mario at gmail.com (Mario Moura) Date: Mon, 27 Dec 2010 16:58:38 -0200 Subject: [Numpy-discussion] How construct custom slice In-Reply-To: References: Message-ID: Hi Mr. Goodman Thanks a lot. Works Fine Reagards Mario Moura 2010/12/27 Keith Goodman : > On Mon, Dec 27, 2010 at 10:36 AM, Mario Moura wrote: >> Hi Folks >> >> a = np.zeros((4,3,5,55,5),dtype='|S8') >> myLen = 4 # here I use myLen = len(something) >> li = [3,2,4] # li from a list.append(something) >> sl = slice(0,myLen) >> tmpIndex = tuple(li) + sl + 4 ?# <== Here my problem >> a[tmpIndex] >> >> # So What I want is: >> fillMe = np.array(['foo','bar','hello','world']) >> # But I cant contruct by hand like this >> a[3,2,4,:4,4] = fillMe >> a >> >> Again. I need construct custom slice from here >> tmpIndex = tuple(li) + sl + 4 >> a[tmpIndex] > > First let's do it by hand: > >>> a = np.zeros((4,3,5,55,5),dtype='|S8') >>> fillMe = np.array(['foo','bar','hello','world']) >>> a[3,2,4,:4,4] = fillMe > > Now let's try using an index: > >>> b = np.zeros((4,3,5,55,5),dtype='|S8') >>> myLen = 4 >>> li = [3,2,4] >>> sl = slice(0,myLen) > > Make index: > >>> idx = range(a.ndim) >>> idx[:3] = li >>> idx[3] = sl >>> idx[4] = 4 >>> idx = tuple(idx) > > Compare results: > >>> b[idx] = fillMe >>> (a == b).all() > ? True > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From korn at freisingnet.de Mon Dec 27 12:58:24 2010 From: korn at freisingnet.de (Johannes Korn) Date: Mon, 27 Dec 2010 18:58:24 +0100 Subject: [Numpy-discussion] Strange problem with h5py and numpy Message-ID: Hi, I have a strange problem with h5py or with numpy. I try to read a bunch of hdf files in a loop. The problem is that I get an error at the second file because the file handle is of type It seems the file is opened and instantaneous closed again. Meanwhile I found the root of the evil: First I read the contents of a dataset to the numpy array tmp. In the next step I paste this array into a bigger one "alb_c1_tmp". The shape of tmp is (651, 1701). Everything works well if I omit the pasting. alb_c1_tmp = zeros([3712,3712]) c1_eu = File(filename_ch1_eu,mode='r') print c1_eu tmp = c1_eu['AL-SP-BH'][:] #Source of the evil alb_c1_tmp[ 3012:3663, 462:2163 ] = tmp c1_eu.close() The code as above crashes always when the second file is read. However originally I had the boundaries of alb_c1_tmp set dynamically from attributes of the hdf files (they are identical for all files), this code crashed after a random number of files. Here?s the line that caused the randomly delayed crashes: alb_c1_tmp[ 1855 + c1_eu.attrs['LOFF'] - c1_eu.attrs['NL'] : 1855 + c1_eu.attrs['LOFF'] , 1855 + c1_eu.attrs['COFF'] - c1_eu.attrs['NC'] : 1855 + c1_eu.attrs['COFF'] ] = tmp Is it a bug or did I something wrong? Python 2.6.5 (r265:79063, Oct 28 2010, 20:56:56) Numpy-Version is probably 1.5.0 h5py 1.3.0 Happens under 32bit as well as under 64bit SuSE-release 11.3 Kind regards! Johannes From robert.kern at gmail.com Mon Dec 27 14:15:05 2010 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 27 Dec 2010 14:15:05 -0500 Subject: [Numpy-discussion] How to call import_array() properly? In-Reply-To: References: Message-ID: On Mon, Dec 27, 2010 at 13:09, Bruce Sherwood wrote: > Thanks for the good suggestion. I now see that it was purely > historical that import_array was driven (indirectly through > init_numpy) from the pure Python component of the module rather than > in the import of the C++ component, and I've changed that. However, > I'm still curious as to whether there's a more intelligent or elegant > way to drive import_array than the following code: > > #if PY_MAJOR_VERSION >= 3 > int > init_numpy() > { > ? ? ? ?import_array(); > } > #else > void > init_numpy() > { > ? ? ? ?import_array(); > } > #endif Just put "import_array();" into initcvisual(). You should not put it in any other function. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." ? -- Umberto Eco From roger at quantumbioinc.com Mon Dec 27 14:40:11 2010 From: roger at quantumbioinc.com (Roger Martin) Date: Mon, 27 Dec 2010 14:40:11 -0500 Subject: [Numpy-discussion] Installing on CentOS 5 claims invalid Python installation In-Reply-To: <4D13A9E1.6030602@gmail.com> References: <4D134AE4.6090001@quantumbioinc.com> <4D13A9E1.6030602@gmail.com> Message-ID: <4D18EB9B.10308@quantumbioinc.com> Hi Bruce and Chris, This was a user build and install of Python (particularly 2.6.6 since 2.7.1 has build troubles on CentOS 5). The original python 2.4 in the system is ignored for this effort because I can't get to it. Since I was unfamiliar with building Python from source I didn't know it should produce python development where the --prefix points the install. It is supposed to under the altinstall target. By looking at the make I found(with the autoconf 2.63 version) the make altinstall target simply wasn't running all its subtargets even though no install error occurred. Ran the inclinstall, libainstall, sharedinstall targets and the distribution was populated with the devel components needed! In fact the top of the configure.in of python 2.6.6 source build it says dnl NOTE: autoconf 2.64 doesn't seem to work (use 2.61). and I was at 2.63; not sure if same problem they were noting. ......... ./configure --prefix=/home/roger/Python-2.6.6/dist make #make test make altinstall make inclinstall make libainstall make sharedinstall #make oldsharedinstall ......... This is all python install issues and has nothing to do with numpy install. The numpy install followed: ......... export PYTHONPATH=/home/roger/Python-2.6.6/dist/lib/python2.6 export PYTHONHOME=/home/roger/Python-2.6.6/dist export MKLROOT=/share/apps/intel/mkl/10.2.5.035 export PATH=/home/roger/Python-2.6.6/dist/bin:$PATH export LD_LIBRARY_PATH=$MKLROOT/lib/em64t:/home/roger/Python-2.6.6/build/lib.linux-x86_64-2.6:/lib64:/usr/lib64:/lib:$LD_LIBRARY_PATH export PYTHONPATH=/home/roger/Python-2.6.6/dist/lib/python2.6:/home/roger/Python-2.6.6/build/lib.linux-x86_64-2.6:/home/roger/Python-2.6.6/Lib:/home/roger/Python-2.6.6/Modules export PYTHONHOME=/home/roger/Python-2.6.6/dist export PATH=/home/roger/Python-2.6.6/dist/bin:$PATH #export PYTHONVERBOSE=1 #python2.6 setup.py clean python2.6 setup.py build --fcompiler=gnu95 python2.6 setup.py install ......... Success! Then a quick test: ........ Python 2.6.6 (r266:84292, Dec 22 2010, 13:28:53) [GCC 4.1.2 20080704 (Red Hat 4.1.2-48)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> a = numpy.arange(10).reshape(2,5) >>> a array([[0, 1, 2, 3, 4], [5, 6, 7, 8, 9]]) >>> ........ Tested successfully! Python 3.2 didn't have any of the python install issues and numpy installs and functions on it too. Now on to h5py utilizing numpy. Thanks for the discussion; you lead me to understand when building python from source the devel portion should be there, Roger On 12/23/2010 02:58 PM, Bruce Southey wrote: > On 12/23/2010 07:13 AM, Roger Martin wrote: >> Hi, >> >> NumPy looks like the way to get computation done in Python. Now I'm >> going through the learning curve of installing the module into >> different linux OS's and Python versions. An extra need is to >> install google code's h5py http://code.google.com/p/h5py/ which >> depends on numpy. >> >> In trying a number of Python versions the 2.x's are yielding the >> message " invalid Python installation" >> --------------- >> raise DistutilsPlatformError(my_msg) >> distutils.errors.DistutilsPlatformError: invalid Python installation: >> unable to open >> /home/roger/Python-2.6.6/dist/lib/python2.6/config/Makefile (No such >> file or directory) >> --------------- >> >> From reading on the web it appears a Python-2.x.x-devel version is >> needed. Yet no search combination comes back with where to get such >> a thing(note: I need user installs/builds for security reasons). >> Where are Python versions compatible with numpy? >> >> Building >> Python-2.6.6 >> Python-2.7.1(fails to build) >> Python3.2beta2 >> numpy1.5.1 >> invalid Python installation NA >> success >> h5py1.3.1 >> needs numpy >> NA >> fails >> >> >> To start I need just one successful combination but will need more >> cases depending on users of a new integration project. >> >> Interestingly your numpy 1.5.1's setup is in good shape to build with >> Python3.2 yet I need to allow older versions for people's systems not >> ready to upgrade that far. >> >> >> >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > I thought that Centos 5 ships Python 2.4 so how did you get Python > 2.6, 2.7 and 3.2? > If these are from some repository then the developmental libraries > should also be there - if these are not there then either find another > repository or build Python yourself. > > Bruce > > > > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From Bruce_Sherwood at ncsu.edu Mon Dec 27 14:51:35 2010 From: Bruce_Sherwood at ncsu.edu (Bruce Sherwood) Date: Mon, 27 Dec 2010 12:51:35 -0700 Subject: [Numpy-discussion] How to call import_array() properly? In-Reply-To: References: Message-ID: The module I'm working with, which uses Boost, doesn't have a function "initcvisual". Rather there's a section headed with BOOST_PYTHON_MODULE( cvisual). Placing the import_array macro directly in this section causes an unwanted return. I guess it doesn't matter, since what I've done works okay. And I realized that I could collapse init_numpy a bit: #if PY_MAJOR_VERSION >= 3 int #else void #endif init_numpy() { import_array(); } Bruce Sherwood On Mon, Dec 27, 2010 at 12:15 PM, Robert Kern wrote: > On Mon, Dec 27, 2010 at 13:09, Bruce Sherwood wrote: >> Thanks for the good suggestion. I now see that it was purely >> historical that import_array was driven (indirectly through >> init_numpy) from the pure Python component of the module rather than >> in the import of the C++ component, and I've changed that. However, >> I'm still curious as to whether there's a more intelligent or elegant >> way to drive import_array than the following code: >> >> #if PY_MAJOR_VERSION >= 3 >> int >> init_numpy() >> { >> ? ? ? ?import_array(); >> } >> #else >> void >> init_numpy() >> { >> ? ? ? ?import_array(); >> } >> #endif > > Just put "import_array();" into initcvisual(). You should not put it > in any other function. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > ? -- Umberto Eco > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From dsdale24 at gmail.com Tue Dec 28 08:46:05 2010 From: dsdale24 at gmail.com (Darren Dale) Date: Tue, 28 Dec 2010 08:46:05 -0500 Subject: [Numpy-discussion] Strange problem with h5py and numpy In-Reply-To: References: Message-ID: On Mon, Dec 27, 2010 at 12:58 PM, Johannes Korn wrote: > Hi, > > I have a strange problem with h5py or with numpy. I think this question belongs on the h5py mailing list. > I try to read a bunch of hdf files in a loop. The problem is that I get > an error at the second file because the file handle is of type HDF5 file> The code you posted only involves one file. From korn at freisingnet.de Tue Dec 28 09:13:42 2010 From: korn at freisingnet.de (Johannes Korn) Date: Tue, 28 Dec 2010 15:13:42 +0100 Subject: [Numpy-discussion] Strange problem with h5py and numpy In-Reply-To: References: Message-ID: On 28.12.2010 14:46, Darren Dale wrote:: > On Mon, Dec 27, 2010 at 12:58 PM, Johannes Korn wrote: >> I try to read a bunch of hdf files in a loop. The problem is that I get >> an error at the second file because the file handle is of type> HDF5 file> > > The code you posted only involves one file. The code I posted is part of the inside of a loop over the files. The filename changes of course and the files are there. If I try to open a non existing file the error message is different. From korn at freisingnet.de Tue Dec 28 10:14:20 2010 From: korn at freisingnet.de (Johannes Korn) Date: Tue, 28 Dec 2010 16:14:20 +0100 Subject: [Numpy-discussion] Strange problem with h5py and numpy In-Reply-To: References: Message-ID: On 28.12.2010 15:13, Johannes Korn wrote:: > On 28.12.2010 14:46, Darren Dale wrote:: >> On Mon, Dec 27, 2010 at 12:58 PM, Johannes Korn wrote: > >>> I try to read a bunch of hdf files in a loop. The problem is that I get >>> an error at the second file because the file handle is of type>> HDF5 file> >> >> The code you posted only involves one file. > > The code I posted is part of the inside of a loop over the files. The > filename changes of course and the files are there. If I try to open a > non existing file the error message is different. Found the solution: incompatibility between HDF 1.8.5 and h5py 1.3.0. Seems that upgrade to h5py 1.3.1 beta fixed the problem From kwgoodman at gmail.com Tue Dec 28 18:32:55 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Tue, 28 Dec 2010 15:32:55 -0800 Subject: [Numpy-discussion] NumPy C-API equivalent of np.float64() Message-ID: I'm looking for the C-API equivalent of the np.float64 function, something that I could use inline in a Cython function. I don't know how to write the function. Anyone have one sitting around? I'd like to use it, if it is faster than np.float64 (np.int32, np.float32, ...) in the Bottleneck package when the output is a scalar, for example bn.median(arr, axis=None). From jsalvati at u.washington.edu Tue Dec 28 23:10:53 2010 From: jsalvati at u.washington.edu (John Salvatier) Date: Tue, 28 Dec 2010 20:10:53 -0800 Subject: [Numpy-discussion] NumPy C-API equivalent of np.float64() In-Reply-To: References: Message-ID: Wouldn't that be a cast? You do casts in Cython with (expression) and that should be the equivalent of float64 I think. On Tue, Dec 28, 2010 at 3:32 PM, Keith Goodman wrote: > I'm looking for the C-API equivalent of the np.float64 function, > something that I could use inline in a Cython function. > > I don't know how to write the function. Anyone have one sitting > around? I'd like to use it, if it is faster than np.float64 (np.int32, > np.float32, ...) in the Bottleneck package when the output is a > scalar, for example bn.median(arr, axis=None). > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robertwb at math.washington.edu Wed Dec 29 02:22:55 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Tue, 28 Dec 2010 23:22:55 -0800 Subject: [Numpy-discussion] NumPy C-API equivalent of np.float64() In-Reply-To: References: Message-ID: On Tue, Dec 28, 2010 at 8:10 PM, John Salvatier wrote: > Wouldn't that be a cast? You do casts in Cython with (expression) > and that should be the equivalent of float64 I think. Or even (expression) if you've cimported numpy (though as mentioned this is the same as double on every platform I know of). Even easier is just to use the expression in a the right context and it will convert it for you. - Robert > On Tue, Dec 28, 2010 at 3:32 PM, Keith Goodman wrote: >> >> I'm looking for the C-API equivalent of the np.float64 function, >> something that I could use inline in a Cython function. >> >> I don't know how to write the function. Anyone have one sitting >> around? I'd like to use it, if it is faster than np.float64 (np.int32, >> np.float32, ...) in the Bottleneck package when the output is a >> scalar, for example bn.median(arr, axis=None). >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From robertwb at math.washington.edu Wed Dec 29 03:47:21 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 29 Dec 2010 00:47:21 -0800 Subject: [Numpy-discussion] Optimization suggestion sought In-Reply-To: References: Message-ID: On Mon, Dec 27, 2010 at 6:20 AM, Enzo Michelangeli wrote: > Many thanks to Josef and Justin for their replies. > > Josef's hint sounds like a good way of reducing peak memory allocation > especially when the row size is large, which makes the "for" overhead for > each iteration comparatively lower. However, time is still spent in > back-and-forth conversions between numpy arrays and the native BLAS data > structures, and copying data from the temporary array holding the > intermediate results and tableau. > > Regarding Justin's suggestion, before trying Cython (which, according to > http://wiki.cython.org/tutorials/numpy , seems to require a bit of work to > handle numpy arrays properly) Cython doesn't have to be that complicated. For your example, you just have to unroll the vectorization (and account for the fact that the result is mutated in place, which was your original goal). cimport numpy def do_it(numpy.ndarray[double, ndim=2] tableau, int locat, int cand, bint vectorize=True): cdef numpy.ndarray[double, ndim=1] pivot pivot = tableau[locat,:]/tableau[locat,cand] if vectorize: tableau -= tableau[:,cand:cand+1]*pivot else: for i in range(tableau.shape[0]): for j in range(tableau.shape[1]): if j != cand: tableau[i,j] -= tableau[i,cand] * pivot[j] tableau[:,cand] = 0 tableau[locat,:] = pivot return tableau - Robert From friedrichromstedt at gmail.com Wed Dec 29 06:28:56 2010 From: friedrichromstedt at gmail.com (Friedrich Romstedt) Date: Wed, 29 Dec 2010 12:28:56 +0100 Subject: [Numpy-discussion] fromrecords yields "ValueError: invalid itemsize in generic type tuple" In-Reply-To: References: Message-ID: 2010/12/7 Rajat Banerjee : > Hi All, > I have been using Numpy for a while with great success. I left my > little project for a little while > (http://web.mit.edu/stardev/cluster/) and now some of my code is > broken. > > I have some Numpy code to create graphs of activity on a cluster with > matplotlib. It ran just fine in July / August 2010, but has since > stopped working. I have updated numpy on my machine, I think. > > In [2]: np.version.version > Out[2]: '1.5.1' > > My call to np.rec.fromrecords() is throwing this exception: > > File "/home/rajat/Envs/StarCluster/lib/python2.6/site-packages/numpy/core/records.py", > line 607, in fromrecords > descr = sb.dtype((record, dtype)) > ValueError: invalid itemsize in generic type tuple > > Here is the code with some irrelevant stuff stripped: > > for line in file: > a = [datetime.strptime(parts[0], '%Y-%m-%d %H:%M:%S.%f'), > int(parts[1]), int(parts[2]), int(parts[3]), int(parts[4]), > int(parts[5]), int(parts[6]), float(parts[7])] > list.append(a) > file.close() > names = ['dt', 'hosts', 'running_jobs', 'queued_jobs',\ > 'slots', 'avg_duration', 'avg_wait', 'avg_load'] > descriptor = {'names': > ('dt,hosts,running_jobs,queued_jobs,slots,avg_duration,avg_wait,avg_load'),\ > 'formats' : ('S20','u','u','u','u','u','u','f')} > self.records = np.rec.fromrecords(list,','.join(names)) #used to work > #self.records = np.rec.fromrecords(list, dtype=descriptor) #new attempt > > Here is one "line" from the array "list": >>>> parts (8) = ['2010-12-07 03:09:46.855712', '2', '2', '177', '2', '86', '370', '1.05']. > > Neither of those np.rec.fromrecords() calls works. I've tried both > separately. They both throw the exact same exception, ValueError: > invalid itemsize in generic type tuple Hi Rajat, seems to be good that I read all email on the list, seems to be bad that it's such a long queue. Consider the script attached. Remarks: * Use tuples as rows in the numpy.rec array "raw" argument. It works for the first conversion with [] too, but I think more by incident than by design. For the second case, which you will need, it does not work with lists. * Always use keyword args to fromrecords(). I believe this is a) more error-prone b) there is no specification for positional arguments, so their order might change (as it seems to have happened). With positional "names", it ceases working. I don't know what it thinks you are requesting, but for sure not "names". :-) * Don't use the *dtype* in the way you did. I'm not authoritative with the *dtype* arg, but at least it doesn't work this way. Use the names= and formats= kwargs instead. I just tinkered a bit around with your code without deep knowledge of the numpy.rec package. I just used fromrecords() some time ago in the way I did use it here. Friedrich P.S.: Please reply, if you don't I'll resend the email to you OL in the assumtion that you desperately disappointedly unsubscribed. -------------- next part -------------- A non-text attachment was scrubbed... Name: rec.py Type: application/octet-stream Size: 405 bytes Desc: not available URL: From kwgoodman at gmail.com Wed Dec 29 12:05:52 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 29 Dec 2010 09:05:52 -0800 Subject: [Numpy-discussion] NumPy C-API equivalent of np.float64() In-Reply-To: References: Message-ID: On Tue, Dec 28, 2010 at 11:22 PM, Robert Bradshaw wrote: > On Tue, Dec 28, 2010 at 8:10 PM, John Salvatier > wrote: >> Wouldn't that be a cast? You do casts in Cython with (expression) >> and that should be the equivalent of float64 I think. > > Or even (expression) if you've cimported numpy > (though as mentioned this is the same as double on every platform I > know of). Even easier is just to use the expression in a the right > context and it will convert it for you. That will give me a float object but it will not have dtype, shape, ndim, etc methods. >> m = np.mean([1,2,3]) >> m 2.0 >> m.dtype dtype('float64') >> m.ndim 0 using gives: AttributeError: 'float' object has no attribute 'dtype' From robertwb at math.washington.edu Wed Dec 29 12:37:15 2010 From: robertwb at math.washington.edu (Robert Bradshaw) Date: Wed, 29 Dec 2010 09:37:15 -0800 Subject: [Numpy-discussion] NumPy C-API equivalent of np.float64() In-Reply-To: References: Message-ID: On Wed, Dec 29, 2010 at 9:05 AM, Keith Goodman wrote: > On Tue, Dec 28, 2010 at 11:22 PM, Robert Bradshaw > wrote: >> On Tue, Dec 28, 2010 at 8:10 PM, John Salvatier >> wrote: >>> Wouldn't that be a cast? You do casts in Cython with (expression) >>> and that should be the equivalent of float64 I think. >> >> Or even (expression) if you've cimported numpy >> (though as mentioned this is the same as double on every platform I >> know of). Even easier is just to use the expression in a the right >> context and it will convert it for you. > > That will give me a float object but it will not have dtype, shape, > ndim, etc methods. > >>> m = np.mean([1,2,3]) >>> m > ? 2.0 >>> m.dtype > ? dtype('float64') >>> m.ndim > ? 0 > > using gives: > > AttributeError: 'float' object has no attribute 'dtype' Well, in this case I doubt your'e going to be able to do much better than np.float64(expr), as the bulk or the time is probably spent in object allocation (and you're really asking for an object here). If you knew the right C calls, you might be able to get a 2x speedup. - Robert From kwgoodman at gmail.com Wed Dec 29 12:44:58 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 29 Dec 2010 09:44:58 -0800 Subject: [Numpy-discussion] NumPy C-API equivalent of np.float64() In-Reply-To: References: Message-ID: On Wed, Dec 29, 2010 at 9:37 AM, Robert Bradshaw wrote: > On Wed, Dec 29, 2010 at 9:05 AM, Keith Goodman wrote: >> On Tue, Dec 28, 2010 at 11:22 PM, Robert Bradshaw >> wrote: >>> On Tue, Dec 28, 2010 at 8:10 PM, John Salvatier >>> wrote: >>>> Wouldn't that be a cast? You do casts in Cython with (expression) >>>> and that should be the equivalent of float64 I think. >>> >>> Or even (expression) if you've cimported numpy >>> (though as mentioned this is the same as double on every platform I >>> know of). Even easier is just to use the expression in a the right >>> context and it will convert it for you. >> >> That will give me a float object but it will not have dtype, shape, >> ndim, etc methods. >> >>>> m = np.mean([1,2,3]) >>>> m >> ? 2.0 >>>> m.dtype >> ? dtype('float64') >>>> m.ndim >> ? 0 >> >> using gives: >> >> AttributeError: 'float' object has no attribute 'dtype' > > Well, in this case I doubt your'e going to be able to do much better > than np.float64(expr), as the bulk or the time is probably spent in > object allocation (and you're really asking for an object here). If > you knew the right C calls, you might be able to get a 2x speedup. A factor of 2 would be great! A tenth of a micro second is a lot of overhead for small input arrays. I'm guessing it is one of these functions but I don't understand the signatures (nor ref counting): PyObject* PyArray_Scalar(void* data, PyArray_Descr* dtype, PyObject* itemsize) Return an array scalar object of the given enumerated typenum and itemsize by copying from memory pointed to by data . If swap is nonzero then this function will byteswap the data if appropriate to the data-type because array scalars are always in correct machine-byte order. PyObject* PyArray_ToScalar(void* data, PyArrayObject* arr) Return an array scalar object of the type and itemsize indicated by the array object arr copied from the memory pointed to by data and swapping if the data in arr is not in machine byte-order. From matthew.brett at gmail.com Wed Dec 29 12:48:05 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 29 Dec 2010 17:48:05 +0000 Subject: [Numpy-discussion] NumPy C-API equivalent of np.float64() In-Reply-To: References: Message-ID: Hi, On Wed, Dec 29, 2010 at 5:37 PM, Robert Bradshaw wrote: > On Wed, Dec 29, 2010 at 9:05 AM, Keith Goodman wrote: >> On Tue, Dec 28, 2010 at 11:22 PM, Robert Bradshaw >> wrote: >>> On Tue, Dec 28, 2010 at 8:10 PM, John Salvatier >>> wrote: >>>> Wouldn't that be a cast? You do casts in Cython with (expression) >>>> and that should be the equivalent of float64 I think. >>> >>> Or even (expression) if you've cimported numpy >>> (though as mentioned this is the same as double on every platform I >>> know of). Even easier is just to use the expression in a the right >>> context and it will convert it for you. >> >> That will give me a float object but it will not have dtype, shape, >> ndim, etc methods. >> >>>> m = np.mean([1,2,3]) >>>> m >> ? 2.0 >>>> m.dtype >> ? dtype('float64') >>>> m.ndim >> ? 0 >> >> using gives: >> >> AttributeError: 'float' object has no attribute 'dtype' Forgive me if I haven't understood your question, but can you use PyArray_DescrFromType with e.g NPY_FLOAT64 ? Best, Matthew From kwgoodman at gmail.com Wed Dec 29 12:55:49 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 29 Dec 2010 09:55:49 -0800 Subject: [Numpy-discussion] NumPy C-API equivalent of np.float64() In-Reply-To: References: Message-ID: On Wed, Dec 29, 2010 at 9:48 AM, Matthew Brett wrote: > Hi, > > On Wed, Dec 29, 2010 at 5:37 PM, Robert Bradshaw > wrote: >> On Wed, Dec 29, 2010 at 9:05 AM, Keith Goodman wrote: >>> On Tue, Dec 28, 2010 at 11:22 PM, Robert Bradshaw >>> wrote: >>>> On Tue, Dec 28, 2010 at 8:10 PM, John Salvatier >>>> wrote: >>>>> Wouldn't that be a cast? You do casts in Cython with (expression) >>>>> and that should be the equivalent of float64 I think. >>>> >>>> Or even (expression) if you've cimported numpy >>>> (though as mentioned this is the same as double on every platform I >>>> know of). Even easier is just to use the expression in a the right >>>> context and it will convert it for you. >>> >>> That will give me a float object but it will not have dtype, shape, >>> ndim, etc methods. >>> >>>>> m = np.mean([1,2,3]) >>>>> m >>> ? 2.0 >>>>> m.dtype >>> ? dtype('float64') >>>>> m.ndim >>> ? 0 >>> >>> using gives: >>> >>> AttributeError: 'float' object has no attribute 'dtype' > > Forgive me if I haven't understood your question, but can you use > PyArray_DescrFromType with e.g ?NPY_FLOAT64 ? I'm pretty hopeless here. I don't know how to put all that together in a function. From matthew.brett at gmail.com Wed Dec 29 13:13:09 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 29 Dec 2010 18:13:09 +0000 Subject: [Numpy-discussion] NumPy C-API equivalent of np.float64() In-Reply-To: References: Message-ID: >> Forgive me if I haven't understood your question, but can you use >> PyArray_DescrFromType with e.g ?NPY_FLOAT64 ? > > I'm pretty hopeless here. I don't know how to put all that together in > a function. That might be because I'm not understanding you very well, but I was thinking that: cdef dtype descr = PyArray_DescrFromType(NPY_FLOAT64) would give you the float64 dtype that I thought you wanted? I'm shooting from the hip here, in between nieces competing for the computer and my attention. See you, Matthew From kwgoodman at gmail.com Wed Dec 29 13:27:34 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 29 Dec 2010 10:27:34 -0800 Subject: [Numpy-discussion] NumPy C-API equivalent of np.float64() In-Reply-To: References: Message-ID: On Wed, Dec 29, 2010 at 10:13 AM, Matthew Brett wrote: >>> Forgive me if I haven't understood your question, but can you use >>> PyArray_DescrFromType with e.g ?NPY_FLOAT64 ? >> >> I'm pretty hopeless here. I don't know how to put all that together in >> a function. > > That might be because I'm not understanding you very well, but I was > thinking that: > > cdef dtype descr = PyArray_DescrFromType(NPY_FLOAT64) > > would give you the float64 dtype that I thought you wanted? ?I'm > shooting from the hip here, in between nieces competing for the > computer and my attention. I think I need a function. One that does this: >> n = 10.0 >> hasattr(n, 'ndim') False >> m = np.float64(n) >> hasattr(m, 'ndim') True np.float64 is fast, just hoping someone had a C-API inline version of np.float64() that is faster. From matthew.brett at gmail.com Wed Dec 29 14:43:22 2010 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 29 Dec 2010 19:43:22 +0000 Subject: [Numpy-discussion] NumPy C-API equivalent of np.float64() In-Reply-To: References: Message-ID: Hi, >> That might be because I'm not understanding you very well, but I was >> thinking that: >> >> cdef dtype descr = PyArray_DescrFromType(NPY_FLOAT64) >> >> would give you the float64 dtype that I thought you wanted? ?I'm >> shooting from the hip here, in between nieces competing for the >> computer and my attention. > > I think I need a function. One that does this: > >>> n = 10.0 >>> hasattr(n, 'ndim') > ? False >>> m = np.float64(n) >>> hasattr(m, 'ndim') > ? True Now the nieces have gone, I see that I did completely misunderstand. I think you want the C-API calls to be able to create a 0-dim ndarray object from a python float. There was a thread on C-API array creation on the cython list a little while ago: http://www.mail-archive.com/cython-dev at codespeak.net/msg07703.html Code in scipy here: https://github.com/scipy/scipy-svn/blob/master/scipy/io/matlab/mio5_utils.pyx See around line 36 there, and 432, and the header file I copied from Dag Sverre: https://github.com/scipy/scipy-svn/blob/master/scipy/io/matlab/numpy_rephrasing.h As you can see, it's a little horrible, in that you have to take care to get the references right to the dtype and to the data. I actually did not investigate in detail whether this lower-level array creation was speeding my code up much. I hope that's more useful... Matthew From kwgoodman at gmail.com Wed Dec 29 14:53:35 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 29 Dec 2010 11:53:35 -0800 Subject: [Numpy-discussion] NumPy C-API equivalent of np.float64() In-Reply-To: References: Message-ID: On Wed, Dec 29, 2010 at 11:43 AM, Matthew Brett wrote: > Hi, > >>> That might be because I'm not understanding you very well, but I was >>> thinking that: >>> >>> cdef dtype descr = PyArray_DescrFromType(NPY_FLOAT64) >>> >>> would give you the float64 dtype that I thought you wanted? ?I'm >>> shooting from the hip here, in between nieces competing for the >>> computer and my attention. >> >> I think I need a function. One that does this: >> >>>> n = 10.0 >>>> hasattr(n, 'ndim') >> ? False >>>> m = np.float64(n) >>>> hasattr(m, 'ndim') >> ? True > > Now the nieces have gone, I see that I did completely misunderstand. > I think you want the C-API calls to be able to create a 0-dim ndarray > object from a python float. > > There was a thread on C-API array creation on the cython list a little > while ago: > > http://www.mail-archive.com/cython-dev at codespeak.net/msg07703.html > > Code in scipy here: > > https://github.com/scipy/scipy-svn/blob/master/scipy/io/matlab/mio5_utils.pyx > > See around line 36 there, and 432, and the header file I copied from > Dag Sverre: > > https://github.com/scipy/scipy-svn/blob/master/scipy/io/matlab/numpy_rephrasing.h > > As you can see, it's a little horrible, in that you have to take care > to get the references right to the dtype and to the data. ?I actually > did not investigate in detail whether this lower-level array creation > was speeding my code up much. > > I hope that's more useful... Wow! That's a mouthful of code. Yes, very handy to have an example to work from. Thank you. From pav at iki.fi Wed Dec 29 14:54:03 2010 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 29 Dec 2010 21:54:03 +0200 Subject: [Numpy-discussion] NumPy C-API equivalent of np.float64() In-Reply-To: References: Message-ID: <1293652443.27212.2.camel@Nokia-N900-42-11> Keith Goodman wrote: > np.float64 is fast, just hoping someone had a C-API inline version of > np.float64() that is faster. You're looking for PyArrayScalar_New and _ASSIGN. See https://github.com/numpy/numpy/blob/master/numpy/core/include/numpy/arrayscalars.h Undocumented (bad), but AFAIK public. From kwgoodman at gmail.com Wed Dec 29 15:13:03 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 29 Dec 2010 12:13:03 -0800 Subject: [Numpy-discussion] NumPy C-API equivalent of np.float64() In-Reply-To: <1293652443.27212.2.camel@Nokia-N900-42-11> References: <1293652443.27212.2.camel@Nokia-N900-42-11> Message-ID: On Wed, Dec 29, 2010 at 11:54 AM, Pauli Virtanen wrote: > Keith Goodman wrote: >> np.float64 is fast, just hoping someone had a C-API inline version of >> np.float64() that is faster. > > You're looking for PyArrayScalar_New and _ASSIGN. > See https://github.com/numpy/numpy/blob/master/numpy/core/include/numpy/arrayscalars.h > > Undocumented (bad), but AFAIK public. Those look nice. I'm stuck since I can't cimport them. I'll have to read up on how to tell cython about those functions. From kmichael.aye at gmail.com Thu Dec 30 08:27:46 2010 From: kmichael.aye at gmail.com (K.-Michael Aye) Date: Thu, 30 Dec 2010 15:27:46 +0200 Subject: [Numpy-discussion] Why arange has no stop-point opt-in? Message-ID: Dear all, I'm a bit puzzled that there seems just no way to cleanly code an interval with evenly spaced numbers that includes the stop point given? linspace offers to include the stop point, but arange does not? Am I missing something? (I am aware, that I could do arange(9,15.0001,0.1) but that's what I want to avoid!) Best regards and Happy New Year! Michael From friedrichromstedt at gmail.com Thu Dec 30 09:02:46 2010 From: friedrichromstedt at gmail.com (Friedrich Romstedt) Date: Thu, 30 Dec 2010 15:02:46 +0100 Subject: [Numpy-discussion] Why arange has no stop-point opt-in? In-Reply-To: References: Message-ID: 2010/12/30 K.-Michael Aye : > I'm a bit puzzled that there seems just no way to cleanly code an > interval with evenly spaced numbers that includes the stop point given? > linspace offers to include the stop point, but arange does not? > Am I missing something? (I am aware, that I could do > arange(9,15.0001,0.1) but that's what I want to avoid!) Use numpy.linspace(9, 15, 7 * 10 + 1). FYI, there is also numpy.logspace(). Friedrich From friedrichromstedt at gmail.com Thu Dec 30 09:08:09 2010 From: friedrichromstedt at gmail.com (Friedrich Romstedt) Date: Thu, 30 Dec 2010 15:08:09 +0100 Subject: [Numpy-discussion] Why arange has no stop-point opt-in? In-Reply-To: References: Message-ID: 2010/12/30 Friedrich Romstedt : > 2010/12/30 K.-Michael Aye : >> I'm a bit puzzled that there seems just no way to cleanly code an >> interval with evenly spaced numbers that includes the stop point given? >> linspace offers to include the stop point, but arange does not? >> Am I missing something? (I am aware, that I could do >> arange(9,15.0001,0.1) but that's what I want to avoid!) > > Use numpy.linspace(9, 15, 7 * 10 + 1). ?FYI, there is also numpy.logspace(). Oh sorry, I overlooked that you're aware of the linspace functionality. Sorry. I think opting in or opting out the end point in arange() is at even rate, because it's in both cases the same unreliable (about including or not including the end point). Because it might pick a) if opting in a point just 1e-14 above so not opting in as desired and b) vice verse if opting out, it might pick a point just 1e-14 below. But I believe someone more educated about fp issues will give a more authoritative reply. Friedrich From josef.pktd at gmail.com Thu Dec 30 09:43:12 2010 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 30 Dec 2010 09:43:12 -0500 Subject: [Numpy-discussion] Why arange has no stop-point opt-in? In-Reply-To: References: Message-ID: On Thu, Dec 30, 2010 at 9:08 AM, Friedrich Romstedt wrote: > 2010/12/30 Friedrich Romstedt : >> 2010/12/30 K.-Michael Aye : >>> I'm a bit puzzled that there seems just no way to cleanly code an >>> interval with evenly spaced numbers that includes the stop point given? >>> linspace offers to include the stop point, but arange does not? >>> Am I missing something? (I am aware, that I could do >>> arange(9,15.0001,0.1) but that's what I want to avoid!) >> >> Use numpy.linspace(9, 15, 7 * 10 + 1). ?FYI, there is also numpy.logspace(). > > Oh sorry, I overlooked that you're aware of the linspace functionality. ?Sorry. > > I think opting in or opting out the end point in arange() is at even > rate, because it's in both cases the same unreliable (about including > or not including the end point). ?Because it might pick a) if opting > in a point just 1e-14 above so not opting in as desired and b) vice > verse if opting out, it might pick a point just 1e-14 below. ?But I > believe someone more educated about fp issues will give a more > authoritative reply. Since linspace exists, I don't see much point in adding the stop point in arange. I use arange mainly for integers as numpy equivalent of python's range. And I often need arange(n+1) which is less writing than arange(n, include_end_point=True) Josef > > Friedrich > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From kmichael.aye at gmail.com Thu Dec 30 09:57:50 2010 From: kmichael.aye at gmail.com (K.-Michael Aye) Date: Thu, 30 Dec 2010 16:57:50 +0200 Subject: [Numpy-discussion] Why arange has no stop-point opt-in? References: Message-ID: On 2010-12-30 16:43:12 +0200, josef.pktd at gmail.com said: > > Since linspace exists, I don't see much point in adding the stop point > in arange. I use arange mainly for integers as numpy equivalent of > python's range. And I often need arange(n+1) which is less writing > than arange(n, include_end_point=True) I agree with the point of writing gets more in some cases. But arange(a, n+1, 0.1) would of course fail in this case. And the big difference is, that I need to calculate first how many steps it is for linspace to achieve what I believe is a frequent user case. As we already have the 'convenience' of both linspace and arange, which in principle could be done by one function alone if we'd precalculate all required information ourselves, why not go the full way, and take all overhead away from the user? Michael > > Josef > >> >> Friedrich >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthieu.brucher at gmail.com Thu Dec 30 10:12:03 2010 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Thu, 30 Dec 2010 16:12:03 +0100 Subject: [Numpy-discussion] Why arange has no stop-point opt-in? In-Reply-To: References: Message-ID: 2010/12/30 K.-Michael Aye : > On 2010-12-30 16:43:12 +0200, josef.pktd at gmail.com said: > >> >> Since linspace exists, I don't see much point in adding the stop point >> in arange. I use arange mainly for integers as numpy equivalent of >> python's range. And I often need arange(n+1) which is less writing >> than arange(n, include_end_point=True) > > I agree with the point of writing gets more in some cases. > But arange(a, n+1, 0.1) would of course fail in this case. > And the big difference is, that I need to calculate first how many > steps it is for linspace to achieve what I believe is a frequent user > case. > As we already have the 'convenience' of both linspace and arange, which > in principle could be done by one function alone if we'd precalculate > all required information ourselves, why not go the full way, and take > all overhead away from the user? I think arange() should really be seen as just the numpy version of range(). The issue with including the stop point is that it well may be the case when you do arange(0, 1, 0.1). It's just a matter of loat precision. In this case, I think the safest course of action is to let the user decide how it can handle this. If the step can be expressed as a rational fraction, then using arange with floats and a step of one, it may be the simplest way to achieve what you want. i.e. : np.arange(90., 150.+1) / 10 Matthieu -- Information System Engineer, Ph.D. Blog: http://matt.eifelle.com LinkedIn: http://www.linkedin.com/in/matthieubrucher From qubax at gmx.at Thu Dec 30 14:07:14 2010 From: qubax at gmx.at (qubax at gmx.at) Date: Thu, 30 Dec 2010 20:07:14 +0100 Subject: [Numpy-discussion] How to efficiently multiply 2**10 x 2**10 hermitian matrices Message-ID: <20101230190714.GA7993@tux.hotze.com> I'll have to work with large hermitian matrices and calculate traces, eigenvalues and perform several matric products. In order to speed those up, i noticed that blas includes a function called 'zhemm' for efficient matrix products with at least one hermitian matrix. is there a way to call that one directly for numpy arrays? are there other, more efficient methods for multiplying that large matrices that one of you might be aware of? especially with the knowledge that they are symmetric/hermitian. i'd appreciate any help in that regard. thanks, q ps: i tried to port the functionality of zhemm into cython, but this is still about a factor of 10 slower than directly using numpy.dot -- There are two things children should get from their parents: roots and wings. The king who needs to remind his people of his rank, is no king. A beggar's mistake harms no one but the beggar. A king's mistake, however, harms everyone but the king. Too often, the measure of power lies not in the number who obey your will, but in the number who suffer your stupidity. From erik at rigtorp.com Thu Dec 30 21:30:21 2010 From: erik at rigtorp.com (Erik Rigtorp) Date: Thu, 30 Dec 2010 21:30:21 -0500 Subject: [Numpy-discussion] Simple shared arrays Message-ID: Hi, I was trying to parallelize some algorithms and needed a writable array shared between processes. It turned out to be quite simple and gave a nice speed up almost linear in number of cores. Of course you need to know what you are doing to avoid segfaults and such. But I still think something like this should be included with NumPy for power users. This works by inheriting anonymous mmaped memory. Not sure if this works on windows. import numpy as np import multiprocessing as mp class shared(np.ndarray): """Shared writable array""" def __new__(subtype, shape, interface=None): size = np.prod(shape) if interface == None: buffer = mp.RawArray('d', size) self = np.ndarray.__new__(subtype, shape, float, buffer) else: class Dummy(object): pass buffer = Dummy() buffer.__array_interface__ = interface a = np.asarray(buffer) self = np.ndarray.__new__(subtype, shape=a.shape, buffer=a) return self def __reduce_ex__(self, protocol): return shared, (self.shape, self.__array_interface__) def __reduce__(self): return __reduce_ex__(self, 0) Also see attached file for example usage. Erik -------------- next part -------------- A non-text attachment was scrubbed... Name: shared.py Type: text/x-python Size: 1364 bytes Desc: not available URL: From pivanov314 at gmail.com Fri Dec 31 02:13:21 2010 From: pivanov314 at gmail.com (Paul Ivanov) Date: Thu, 30 Dec 2010 23:13:21 -0800 Subject: [Numpy-discussion] Simple shared arrays In-Reply-To: References: Message-ID: <20101231071321.GE19675@ykcyc> Erik Rigtorp, on 2010-12-30 21:30, wrote: > Hi, > > I was trying to parallelize some algorithms and needed a writable > array shared between processes. It turned out to be quite simple and > gave a nice speed up almost linear in number of cores. Of course you > need to know what you are doing to avoid segfaults and such. But I > still think something like this should be included with NumPy for > power users. > > This works by inheriting anonymous mmaped memory. Not sure if this > works on windows. --snip-- I've successfully used (what I think is) Sturla Molden's shmem_as_ndarray as outline here [1] and here [2] for these purposes. 1. http://groups.google.com/group/comp.lang.python/browse_thread/thread/79fcf022b01b7fc3 2. http://folk.uio.no/sturlamo/python/multiprocessing-tutorial.pdf -- Paul Ivanov 314 address only used for lists, off-list direct email at: http://pirsquared.org | GPG/PGP key id: 0x0F3E28F7 -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 198 bytes Desc: Digital signature URL: From erik at rigtorp.com Fri Dec 31 08:52:53 2010 From: erik at rigtorp.com (Erik Rigtorp) Date: Fri, 31 Dec 2010 08:52:53 -0500 Subject: [Numpy-discussion] Faster NaN functions Message-ID: Hi, I just send a pull request for some faster NaN functions, https://github.com/rigtorp/numpy. I implemented the following generalized ufuncs: nansum(), nancumsum(), nanmean(), nanstd() and for fun mean() and std(). It turns out that the generalized ufunc mean() and std() is faster than the current numpy functions. I'm also going to add nanprod(), nancumprod(), nanmax(), nanmin(), nanargmax(), nanargmin(). The current implementation is not optimized in any way and there are probably some speedups possible. I hope we can get this into numpy 2.0, me and people around me seems to have a need for these functions. Erik From erik at rigtorp.com Fri Dec 31 09:02:14 2010 From: erik at rigtorp.com (Erik Rigtorp) Date: Fri, 31 Dec 2010 09:02:14 -0500 Subject: [Numpy-discussion] Simple shared arrays In-Reply-To: <20101231071321.GE19675@ykcyc> References: <20101231071321.GE19675@ykcyc> Message-ID: On Fri, Dec 31, 2010 at 02:13, Paul Ivanov wrote: > Erik Rigtorp, on 2010-12-30 21:30, ?wrote: >> Hi, >> >> I was trying to parallelize some algorithms and needed a writable >> array shared between processes. It turned out to be quite simple and >> gave a nice speed up almost linear in number of cores. Of course you >> need to know what you are doing to avoid segfaults and such. But I >> still think something like this should be included with NumPy for >> power users. >> >> This works by inheriting anonymous mmaped memory. Not sure if this >> works on windows. > --snip-- > > I've successfully used (what I think is) Sturla Molden's > shmem_as_ndarray as outline here [1] and here [2] for these > purposes. > Yeah, i saw that code too. My implementation is even more lax, but easier to use. It sends arrays by memory reference to subprocesses. Dangerous: yes, effective: very. It would be nice if we could stamp out some good effective patterns using multiprocessing and include them with numpy. The best solution is probably a parallel_for function: def parallel_for(func, inherit_args, iterable): ... Where func should be def func(inherit_args, item): ... And parallel_for makes sure inherit_args are viewable as a class shared() with writable shared memory. Erik From lev at columbia.edu Fri Dec 31 11:21:14 2010 From: lev at columbia.edu (Lev Givon) Date: Fri, 31 Dec 2010 11:21:14 -0500 Subject: [Numpy-discussion] Faster NaN functions In-Reply-To: References: Message-ID: <20101231162114.GA17179@avicenna.ee.columbia.edu> Received from Erik Rigtorp on Fri, Dec 31, 2010 at 08:52:53AM EST: > Hi, > > I just send a pull request for some faster NaN functions, > https://github.com/rigtorp/numpy. > > I implemented the following generalized ufuncs: nansum(), nancumsum(), > nanmean(), nanstd() and for fun mean() and std(). It turns out that > the generalized ufunc mean() and std() is faster than the current > numpy functions. I'm also going to add nanprod(), nancumprod(), > nanmax(), nanmin(), nanargmax(), nanargmin(). > > The current implementation is not optimized in any way and there are > probably some speedups possible. > > I hope we can get this into numpy 2.0, me and people around me seems > to have a need for these functions. > > Erik > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > How does this compare to Bottleneck? http://pypi.python.org/pypi/Bottleneck/ L.G. From kwgoodman at gmail.com Fri Dec 31 12:20:45 2010 From: kwgoodman at gmail.com (Keith Goodman) Date: Fri, 31 Dec 2010 09:20:45 -0800 Subject: [Numpy-discussion] Faster NaN functions In-Reply-To: <20101231162114.GA17179@avicenna.ee.columbia.edu> References: <20101231162114.GA17179@avicenna.ee.columbia.edu> Message-ID: On Fri, Dec 31, 2010 at 8:21 AM, Lev Givon wrote: > Received from Erik Rigtorp on Fri, Dec 31, 2010 at 08:52:53AM EST: >> Hi, >> >> I just send a pull request for some faster NaN functions, >> https://github.com/rigtorp/numpy. >> >> I implemented the following generalized ufuncs: nansum(), nancumsum(), >> nanmean(), nanstd() and for fun mean() and std(). It turns out that >> the generalized ufunc mean() and std() is faster than the current >> numpy functions. I'm also going to add nanprod(), nancumprod(), >> nanmax(), nanmin(), nanargmax(), nanargmin(). >> >> The current implementation is not optimized in any way and there are >> probably some speedups possible. >> >> I hope we can get this into numpy 2.0, me and people around me seems >> to have a need for these functions. >> >> Erik >> _______________________________________________ >> NumPy-Discussion mailing list >> NumPy-Discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > How does this compare to Bottleneck? > > http://pypi.python.org/pypi/Bottleneck/ I had all sorts of problems with ABI differences (this is the first time I've tried numpy 2.0). So I couldn't get ipython, etc to work with Erik's new nan functions. That's why my speed comparison below might be hard to follow and only tests one example. For timing I used bottleneck's autotimeit function: >>> from bottleneck.benchmark.autotimeit import autotimeit First Erik's new nanmean: >>> stmt = "nanmean2(a.flat)" >>> setup = "import numpy as np; from numpy.core.umath_tests import nanmean as nanmean2; rs=np.random.RandomState([1,2,3]); a = rs.rand(100,100)" >>> autotimeit(stmt, setup) 5.1356482505798338e-05 Bottleneck's low level nanmean: >> stmt = "nanmean(a)" >> setup = "import numpy as np; from bottleneck.func import nanmean_2d_float64_axisNone as nanmean; rs=np.random.RandomState([1,2,3]); a = rs.rand(100,100)" >> autotimeit(stmt, setup) 1.5422070026397704e-05 Bottleneck's high level nanmean: >> setup = "import numpy as np; from bottleneck.func import nanmean; rs=np.random.RandomState([1,2,3]); a = rs.rand(100,100)" >> autotimeit(stmt, setup) 1.7850480079650879e-05 Numpy's mean: >> setup = "import numpy as np; from numpy import mean; rs=np.random.RandomState([1,2,3]); a = rs.rand(100,100)" >> stmt = "mean(a)" >> autotimeit(stmt, setup) 1.6718170642852782e-05 Scipy's nanmean: >> setup = "import numpy as np; from scipy.stats import nanmean; rs=np.random.RandomState([1,2,3]); a = rs.rand(100,100)" >> stmt = "nanmean(a)" >> autotimeit(stmt, setup) 0.00024667191505432128 The tests above should be repeated for arrays that contain NaNs, and for different array sizes and different axes. Bottleneck's benchmark suite can be modified to do all that but I can't import Erik's new numpy and bottleneck at the same time at the moment. From gideon.simpson at gmail.com Fri Dec 31 16:44:15 2010 From: gideon.simpson at gmail.com (Gideon) Date: Fri, 31 Dec 2010 13:44:15 -0800 (PST) Subject: [Numpy-discussion] OS X binaries. Message-ID: I noticed that 1.5.1 was released, and sourceforge is suggesting I use the package numpy-1.5.1-py2.6-python.org-macosx10.3.dmg. However, I have an OS X 10.6 machine. Can/should I use this binary? Should I just compile from source? From totonixsame at gmail.com Fri Dec 31 16:47:27 2010 From: totonixsame at gmail.com (totonixsame at gmail.com) Date: Fri, 31 Dec 2010 19:47:27 -0200 Subject: [Numpy-discussion] OS X binaries. In-Reply-To: References: Message-ID: On Fri, Dec 31, 2010 at 7:44 PM, Gideon wrote: > I noticed that 1.5.1 was released, and sourceforge is suggesting I use > the package numpy-1.5.1-py2.6-python.org-macosx10.3.dmg. ?However, I > have an OS X 10.6 machine. > > Can/should I use this binary? > > Should I just compile from source? I suggest you to install pip [1] then use it to install numpy using this command: pip install numpy It compiles fast. [1] - http://pypi.python.org/pypi/pip From erik at rigtorp.com Fri Dec 31 23:29:14 2010 From: erik at rigtorp.com (Erik Rigtorp) Date: Fri, 31 Dec 2010 23:29:14 -0500 Subject: [Numpy-discussion] Rolling window (moving average, moving std, and more) Message-ID: Hi, Implementing moving average, moving std and other functions working over rolling windows using python for loops are slow. This is a effective stride trick I learned from Keith Goodman's Bottleneck code but generalized into arrays of any dimension. This trick allows the loop to be performed in C code and in the future hopefully using multiple cores. import numpy as np def rolling_window(a, window): """ Make an ndarray with a rolling window of the last dimension Parameters ---------- a : array_like Array to add rolling window to window : int Size of rolling window Returns ------- Array that is a view of the original array with a added dimension of size w. Examples -------- >>> x=np.arange(10).reshape((2,5)) >>> rolling_window(x, 3) array([[[0, 1, 2], [1, 2, 3], [2, 3, 4]], [[5, 6, 7], [6, 7, 8], [7, 8, 9]]]) Calculate rolling mean of last dimension: >>> np.mean(rolling_window(x, 3), -1) array([[ 1., 2., 3.], [ 6., 7., 8.]]) """ if window < 1: raise ValueError, "`window` must be at least 1." if window > a.shape[-1]: raise ValueError, "`window` is too long." shape = a.shape[:-1] + (a.shape[-1] - window + 1, window) strides = a.strides + (a.strides[-1],) return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides) Using np.swapaxes(-1, axis) rolling aggregations over any axis can be computed. I submitted a pull request to add this to the stride_tricks module. Erik