From ndbecker2 at gmail.com Fri May 1 14:28:48 2009 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 01 May 2009 14:28:48 -0400 Subject: [Numpy-discussion] apply table lookup to each element Message-ID: Suggestion for efficient way to apply a table lookup to each element of an integer array? import numpy as np _cos = np.empty ((2**rom_in_bits,), dtype=int) _sin = np.empty ((2**rom_in_bits,), dtype=int) for address in xrange (2**12): _cos[address] = nint ((2.0**(rom_out_bits-1)-1) * cos (2 * pi * address * (2.0**-rom_in_bits))) _sin[address] = nint ((2.0**(rom_out_bits-1)-1) * sin (2 * pi * address * (2.0**-rom_in_bits))) Now _cos, _sin are arrays of integers (quantized sin, cos lookup tables) How to apply _cos lookup to each element of an integer array: phase = np.array (..., dtype =int) cos_out = lookup (phase, _cos) ??? From ndbecker2 at gmail.com Fri May 1 15:02:32 2009 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 01 May 2009 15:02:32 -0400 Subject: [Numpy-discussion] Really strange result Message-ID: In [16]: (np.linspace (0, len (x)-1, len(x)).astype (np.uint64)*2).dtype Out[16]: dtype('uint64') In [17]: (np.linspace (0, len (x)-1, len(x)).astype (np.uint64)*n).dtype Out[17]: dtype('float64') In [18]: type(n) Out[18]: Now that's just strange. What's going on? From charlesr.harris at gmail.com Fri May 1 18:58:42 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 May 2009 16:58:42 -0600 Subject: [Numpy-discussion] Really strange result In-Reply-To: References: Message-ID: On Fri, May 1, 2009 at 1:02 PM, Neal Becker wrote: > In [16]: (np.linspace (0, len (x)-1, len(x)).astype (np.uint64)*2).dtype > Out[16]: dtype('uint64') > > In [17]: (np.linspace (0, len (x)-1, len(x)).astype (np.uint64)*n).dtype > Out[17]: dtype('float64') > > In [18]: type(n) > Out[18]: > > Now that's just strange. What's going on? > > The n is signed, uint64 is unsigned. So a signed type that can hold uint64 is needed. There ain't no such integer, so float64 is used. I think the logic here is a bit goofy myself since float64 doesn't have the needed 64 bit precision and the conversion from int kind to float kind is confusing. I think it would be better to raise a NotAvailable error or some such. Lest you think this is an isolated oddity, sometimes numeric arrays can be converted to object arrays. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Fri May 1 21:24:19 2009 From: ndbecker2 at gmail.com (Neal Becker) Date: Fri, 01 May 2009 21:24:19 -0400 Subject: [Numpy-discussion] Really strange result References: Message-ID: Charles R Harris wrote: > On Fri, May 1, 2009 at 1:02 PM, Neal Becker wrote: > >> In [16]: (np.linspace (0, len (x)-1, len(x)).astype (np.uint64)*2).dtype >> Out[16]: dtype('uint64') >> >> In [17]: (np.linspace (0, len (x)-1, len(x)).astype (np.uint64)*n).dtype >> Out[17]: dtype('float64') >> >> In [18]: type(n) >> Out[18]: >> >> Now that's just strange. What's going on? >> >> > The n is signed, uint64 is unsigned. So a signed type that can hold > uint64 is needed. There ain't no such integer, so float64 is used. I think > the logic here is a bit goofy myself since float64 doesn't have the needed > 64 bit precision and the conversion from int kind to float kind is > confusing. I think it would be better to raise a NotAvailable error or > some such. Lest you think this is an isolated oddity, sometimes numeric > arrays can be converted to object arrays. > > Chuck I don't think that any type of integer arithmetic should ever be automatically promoted to float. Besides that, what about the first example? There, I used '2' rather than 'n'. Is not '2' also an int? From charlesr.harris at gmail.com Fri May 1 21:39:32 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 May 2009 19:39:32 -0600 Subject: [Numpy-discussion] Really strange result In-Reply-To: References: Message-ID: On Fri, May 1, 2009 at 7:24 PM, Neal Becker wrote: > Charles R Harris wrote: > > > On Fri, May 1, 2009 at 1:02 PM, Neal Becker wrote: > > > >> In [16]: (np.linspace (0, len (x)-1, len(x)).astype (np.uint64)*2).dtype > >> Out[16]: dtype('uint64') > >> > >> In [17]: (np.linspace (0, len (x)-1, len(x)).astype (np.uint64)*n).dtype > >> Out[17]: dtype('float64') > >> > >> In [18]: type(n) > >> Out[18]: > >> > >> Now that's just strange. What's going on? > >> > >> > > The n is signed, uint64 is unsigned. So a signed type that can hold > > uint64 is needed. There ain't no such integer, so float64 is used. I > think > > the logic here is a bit goofy myself since float64 doesn't have the > needed > > 64 bit precision and the conversion from int kind to float kind is > > confusing. I think it would be better to raise a NotAvailable error or > > some such. Lest you think this is an isolated oddity, sometimes numeric > > arrays can be converted to object arrays. > > > > Chuck > > I don't think that any type of integer arithmetic should ever be > automatically promoted to float. > > Besides that, what about the first example? There, I used '2' rather than > 'n'. Is not '2' also an int? What version of numpy are you using? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Fri May 1 21:40:21 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 1 May 2009 19:40:21 -0600 Subject: [Numpy-discussion] Really strange result In-Reply-To: References: Message-ID: On Fri, May 1, 2009 at 7:39 PM, Charles R Harris wrote: > > > On Fri, May 1, 2009 at 7:24 PM, Neal Becker wrote: > >> Charles R Harris wrote: >> >> > On Fri, May 1, 2009 at 1:02 PM, Neal Becker >> wrote: >> > >> >> In [16]: (np.linspace (0, len (x)-1, len(x)).astype >> (np.uint64)*2).dtype >> >> Out[16]: dtype('uint64') >> >> >> >> In [17]: (np.linspace (0, len (x)-1, len(x)).astype >> (np.uint64)*n).dtype >> >> Out[17]: dtype('float64') >> >> >> >> In [18]: type(n) >> >> Out[18]: >> >> >> >> Now that's just strange. What's going on? >> >> >> >> >> > The n is signed, uint64 is unsigned. So a signed type that can hold >> > uint64 is needed. There ain't no such integer, so float64 is used. I >> think >> > the logic here is a bit goofy myself since float64 doesn't have the >> needed >> > 64 bit precision and the conversion from int kind to float kind is >> > confusing. I think it would be better to raise a NotAvailable error or >> > some such. Lest you think this is an isolated oddity, sometimes numeric >> > arrays can be converted to object arrays. >> > >> > Chuck >> >> I don't think that any type of integer arithmetic should ever be >> automatically promoted to float. >> >> Besides that, what about the first example? There, I used '2' rather than >> 'n'. Is not '2' also an int? > > > What version of numpy are you using? > And what is the value of n? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Sat May 2 07:38:04 2009 From: ndbecker2 at gmail.com (Neal Becker) Date: Sat, 02 May 2009 07:38:04 -0400 Subject: [Numpy-discussion] Really strange result References: Message-ID: Charles R Harris wrote: > On Fri, May 1, 2009 at 7:39 PM, Charles R Harris > wrote: > >> >> >> On Fri, May 1, 2009 at 7:24 PM, Neal Becker wrote: >> >>> Charles R Harris wrote: >>> >>> > On Fri, May 1, 2009 at 1:02 PM, Neal Becker >>> wrote: >>> > >>> >> In [16]: (np.linspace (0, len (x)-1, len(x)).astype >>> (np.uint64)*2).dtype >>> >> Out[16]: dtype('uint64') >>> >> >>> >> In [17]: (np.linspace (0, len (x)-1, len(x)).astype >>> (np.uint64)*n).dtype >>> >> Out[17]: dtype('float64') >>> >> >>> >> In [18]: type(n) >>> >> Out[18]: >>> >> >>> >> Now that's just strange. What's going on? >>> >> >>> >> >>> > The n is signed, uint64 is unsigned. So a signed type that can hold >>> > uint64 is needed. There ain't no such integer, so float64 is used. I >>> think >>> > the logic here is a bit goofy myself since float64 doesn't have the >>> needed >>> > 64 bit precision and the conversion from int kind to float kind is >>> > confusing. I think it would be better to raise a NotAvailable error or >>> > some such. Lest you think this is an isolated oddity, sometimes >>> > numeric arrays can be converted to object arrays. >>> > >>> > Chuck >>> >>> I don't think that any type of integer arithmetic should ever be >>> automatically promoted to float. >>> >>> Besides that, what about the first example? There, I used '2' rather >>> than >>> 'n'. Is not '2' also an int? >> >> >> What version of numpy are you using? >> > > And what is the value of n? > > Chuck np.version.version Out[5]: '1.3.0' (I think the previous test was on 1.2.0 and did the same thing) (np.linspace (0, 1023,1024).astype(np.uint64)*2).dtype Out[2]: dtype('uint64') In [3]: n=-7 In [4]: (np.linspace (0, 1023,1024).astype(np.uint64)*n).dtype Out[4]: dtype('float64') From matthew.brett at gmail.com Sun May 3 00:54:44 2009 From: matthew.brett at gmail.com (Matthew Brett) Date: Sat, 2 May 2009 21:54:44 -0700 Subject: [Numpy-discussion] Structured array with no fields - possible? Message-ID: <1e2af89e0905022154o375b48b7u1f27da260f7286eb@mail.gmail.com> Hello, I'm trying to fix a bug in the scipy matlab loading routines, and this requires me to somehow represent an empty structured array. In matlab this is: >> a = struct() In numpy, you can do this: In [1]: dt = np.dtype([]) but then you can't use it: In [2]: np.zeros((),dtype=dt) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /home/mb312/tmp/ in () ValueError: Empty data-type Is there any way of representing a structured / record array, but with no fields? Thanks for any thoughts, Matthew From david at ar.media.kyoto-u.ac.jp Sun May 3 02:22:02 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sun, 03 May 2009 15:22:02 +0900 Subject: [Numpy-discussion] Structured array with no fields - possible? In-Reply-To: <1e2af89e0905022154o375b48b7u1f27da260f7286eb@mail.gmail.com> References: <1e2af89e0905022154o375b48b7u1f27da260f7286eb@mail.gmail.com> Message-ID: <49FD380A.9080600@ar.media.kyoto-u.ac.jp> Hi Matthew, Matthew Brett wrote: > Hello, > > I'm trying to fix a bug in the scipy matlab loading routines, and this > requires me to somehow represent an empty structured array. > Do you need the struct to be empty (size is 0) or to have no fields ? What would you expect np.zeros((), dtype=np.dtype([])) to return, for example ? cheers, David From cournape at gmail.com Sun May 3 02:49:34 2009 From: cournape at gmail.com (David Cournapeau) Date: Sun, 3 May 2009 15:49:34 +0900 Subject: [Numpy-discussion] Porting strategy for py3k In-Reply-To: <49F16F17.9000303@student.matnat.uio.no> References: <5b8d13220904230638i553d0e7bx2f3d93572861940c@mail.gmail.com> <5b8d13220904230752u3c8dd7fyb14f937d358472fd@mail.gmail.com> <49F095F0.90604@noaa.gov> <49F13EEF.6060205@ar.media.kyoto-u.ac.jp> <49F16F17.9000303@student.matnat.uio.no> Message-ID: <5b8d13220905022349v2f103cf5r3793cf8b54cdc405@mail.gmail.com> On Fri, Apr 24, 2009 at 4:49 PM, Dag Sverre Seljebotn wrote: > David Cournapeau wrote: >> Christopher Barker wrote: >>> Though I'm a bit surprised that that's not how the print function is >>> written in the first place (maybe it is in py3k -- I'm testing on 2.5) >>> >> >> That's actually how it works as far as I can tell. The thing with >> removing those print is that we can do it without too much trouble. As >> long as we cannot actually test any py3k code, warnings from python 2.6 >> is all we can get. >> >> I think we should aim at getting "something" which builds and runs (even >> if does not go further than import stage), so we can gradually port. For >> now, porting py3k is this huge thing that nobody can work on for say one >> hour. I would like to make sure we get at that stage, so that many >> people can take part of it, instead of the currently quite few people >> who are deeply intimate with numpy. > > One thing somebody *could* work on rather independently for some hours > is proper PEP 3118 support, as that is available in Python 2.6+ as well > and could be conditionally used on those systems. Yes, this could be done independently. I am not familiar with PEP 3118; from the python-dev ML, it looks like the current buffer API has some serious shortcomings, I don't whether this implies to numpy or not. Do you have more on this ? Another relatively independent thing is to be able to bootstrap our build from py3k. At least, distutils and the code for bootstrapping, so that we can then run 2to3 on the source code from distutils. Not being able to bootstrap our build process under py3k from distutils sounds too much of a pain. The only real alternative I could see is to have two codebases, because 2to3 does not seem able to convert numpy.distutils 100 % automatically. David From charlesr.harris at gmail.com Sun May 3 03:10:32 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 3 May 2009 01:10:32 -0600 Subject: [Numpy-discussion] Structured array with no fields - possible? In-Reply-To: <1e2af89e0905022154o375b48b7u1f27da260f7286eb@mail.gmail.com> References: <1e2af89e0905022154o375b48b7u1f27da260f7286eb@mail.gmail.com> Message-ID: On Sat, May 2, 2009 at 10:54 PM, Matthew Brett wrote: > Hello, > > I'm trying to fix a bug in the scipy matlab loading routines, and this > requires me to somehow represent an empty structured array. > > In matlab this is: > > >> a = struct() > Wouldn't a dictionary fit the matlab structure a bit better? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From dagss at student.matnat.uio.no Sun May 3 12:24:47 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Sun, 3 May 2009 18:24:47 +0200 (CEST) Subject: [Numpy-discussion] Porting strategy for py3k In-Reply-To: <5b8d13220905022349v2f103cf5r3793cf8b54cdc405@mail.gmail.com> References: <5b8d13220904230638i553d0e7bx2f3d93572861940c@mail.gmail.com> <5b8d13220904230752u3c8dd7fyb14f937d358472fd@mail.gmail.com> <49F095F0.90604@noaa.gov> <49F13EEF.6060205@ar.media.kyoto-u.ac.jp> <49F16F17.9000303@student.matnat.uio.no> <5b8d13220905022349v2f103cf5r3793cf8b54cdc405@mail.gmail.com> Message-ID: David Cournapeau wrote: > On Fri, Apr 24, 2009 at 4:49 PM, Dag Sverre Seljebotn >> One thing somebody *could* work on rather independently for some hours >> is proper PEP 3118 support, as that is available in Python 2.6+ as well >> and could be conditionally used on those systems. > > Yes, this could be done independently. I am not familiar with PEP > 3118; from the python-dev ML, it looks like the current buffer API has > some serious shortcomings, I don't whether this implies to numpy or > not. Do you have more on this ? Not sure what you refer to ... I'll just write more and hope it answers your question. The difference with PEP 3118 is that many more memory models are supported (beyond 1D contiguous). All of NumPy's strided arrays are very easy to expose (after all, Travis Oliphant did the PEP), with no information lost (i.e. the dtype, shape and strides can be communicated). This means that one should in Python 3/2.6+ be able to use other CPython libraries (like image libraries etc.) in a seamless fashion with NumPy arrays without those libraries having to know about NumPy as such; they can simply request a strided view of the data through PEP 3118. To support clients one mainly has to copy out information that is already there into a struct when requested. The one small challenge is creating a format string for the buffer dtype (which is incompatible with the current string representations of dtype). In addition it would be natural to act as a client, so that calling "np.array(obj)" (and/or np.view?) would acquire the data through PEP 3118. There is a class of buffers which doesn't fit in NumPy's memory model (e.g. pointers to rows of a matrix) and for which a copy would have to be made, but a lot of them could be used through a direct view as well. Dag Sverre From sccolbert at gmail.com Sun May 3 15:30:28 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Sun, 3 May 2009 15:30:28 -0400 Subject: [Numpy-discussion] is there a way to get Nd indices of max of Nd array Message-ID: <7f014ea60905031230l679e61d0h601d25aa29f218eb@mail.gmail.com> my case is only for 2d, but should apply to Nd as well. It would be convienent if np.max would return a tuple of the max value and its Nd location indices. Is there an easier way than just using the 1d flattened array max index (np.argmax) and calculating its corresponding Nd location? Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun May 3 16:34:23 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 3 May 2009 16:34:23 -0400 Subject: [Numpy-discussion] is there a way to get Nd indices of max of Nd array In-Reply-To: <7f014ea60905031230l679e61d0h601d25aa29f218eb@mail.gmail.com> References: <7f014ea60905031230l679e61d0h601d25aa29f218eb@mail.gmail.com> Message-ID: <1cd32cbb0905031334oee89fccw864e662d3349aabd@mail.gmail.com> On Sun, May 3, 2009 at 3:30 PM, Chris Colbert wrote: > my case is only for 2d, but should apply to Nd as well. > > It would be convienent if np.max would return a tuple of the max value and > its Nd location indices. > > Is there an easier way than just using the 1d flattened array max index > (np.argmax)?and calculating its corresponding Nd location? > >>> factors = np.random.randint(5,size=(4,3)) >>> factors array([[1, 1, 3], [0, 2, 1], [4, 4, 1], [2, 2, 4]]) >>> factors.max() 4 >>> np.argmax(factors) 6 >>> np.nonzero(factors==factors.max()) (array([2, 2, 3]), array([0, 1, 2])) Josef From sccolbert at gmail.com Sun May 3 18:30:37 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Sun, 3 May 2009 18:30:37 -0400 Subject: [Numpy-discussion] is there a way to get Nd indices of max of Nd array In-Reply-To: <1cd32cbb0905031334oee89fccw864e662d3349aabd@mail.gmail.com> References: <7f014ea60905031230l679e61d0h601d25aa29f218eb@mail.gmail.com> <1cd32cbb0905031334oee89fccw864e662d3349aabd@mail.gmail.com> Message-ID: <7f014ea60905031530p6fb47439w4e101bfd4afff1a1@mail.gmail.com> but this gives me just the locations of the column/row maximums. I need the (x,y) location of the array maximum. Chris On Sun, May 3, 2009 at 4:34 PM, wrote: > On Sun, May 3, 2009 at 3:30 PM, Chris Colbert > wrote: > > my case is only for 2d, but should apply to Nd as well. > > > > It would be convienent if np.max would return a tuple of the max value > and > > its Nd location indices. > > > > Is there an easier way than just using the 1d flattened array max index > > (np.argmax) and calculating its corresponding Nd location? > > > > >>> factors = np.random.randint(5,size=(4,3)) > >>> factors > array([[1, 1, 3], > [0, 2, 1], > [4, 4, 1], > [2, 2, 4]]) > >>> factors.max() > 4 > >>> np.argmax(factors) > 6 > >>> np.nonzero(factors==factors.max()) > (array([2, 2, 3]), array([0, 1, 2])) > > Josef > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sccolbert at gmail.com Sun May 3 18:31:37 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Sun, 3 May 2009 18:31:37 -0400 Subject: [Numpy-discussion] is there a way to get Nd indices of max of Nd array In-Reply-To: <7f014ea60905031530p6fb47439w4e101bfd4afff1a1@mail.gmail.com> References: <7f014ea60905031230l679e61d0h601d25aa29f218eb@mail.gmail.com> <1cd32cbb0905031334oee89fccw864e662d3349aabd@mail.gmail.com> <7f014ea60905031530p6fb47439w4e101bfd4afff1a1@mail.gmail.com> Message-ID: <7f014ea60905031531j431a4d92j53303a1c1b45f4dc@mail.gmail.com> wait, nevermind. your're right. Thanks! On Sun, May 3, 2009 at 6:30 PM, Chris Colbert wrote: > but this gives me just the locations of the column/row maximums. > > I need the (x,y) location of the array maximum. > > Chris > > On Sun, May 3, 2009 at 4:34 PM, wrote: > >> On Sun, May 3, 2009 at 3:30 PM, Chris Colbert >> wrote: >> > my case is only for 2d, but should apply to Nd as well. >> > >> > It would be convienent if np.max would return a tuple of the max value >> and >> > its Nd location indices. >> > >> > Is there an easier way than just using the 1d flattened array max index >> > (np.argmax) and calculating its corresponding Nd location? >> > >> >> >>> factors = np.random.randint(5,size=(4,3)) >> >>> factors >> array([[1, 1, 3], >> [0, 2, 1], >> [4, 4, 1], >> [2, 2, 4]]) >> >>> factors.max() >> 4 >> >>> np.argmax(factors) >> 6 >> >>> np.nonzero(factors==factors.max()) >> (array([2, 2, 3]), array([0, 1, 2])) >> >> Josef >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sccolbert at gmail.com Sun May 3 19:34:55 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Sun, 3 May 2009 19:34:55 -0400 Subject: [Numpy-discussion] index into a 3d histogram Message-ID: <7f014ea60905031634h35b54586x193e21e5a3a2adab@mail.gmail.com> Lets say I have histogram of a color image that is of size [16, 16, 16]. Now, I have a function that converts my rgb image into the format where each rgb color (i.e. img[x, y, :] = (r, g, b)) is an integer in the range(0, 16) I want create a new 2d array where new2darray[x, y] = hist[img[x,y, :]] that is, for each 3 tuple in img, use that as an index into the 3d histogram and store the value of the histogram into the (x,y) position of the 2d array. I've prototyped all the algorithms using for loops, now im just trying to speed it up. I can't quite wrap my head fully around this fancy indexing yet. Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From sccolbert at gmail.com Sun May 3 19:55:22 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Sun, 3 May 2009 19:55:22 -0400 Subject: [Numpy-discussion] index into a 3d histogram In-Reply-To: <7f014ea60905031634h35b54586x193e21e5a3a2adab@mail.gmail.com> References: <7f014ea60905031634h35b54586x193e21e5a3a2adab@mail.gmail.com> Message-ID: <7f014ea60905031655k69c81200m85bafb1cbddb2813@mail.gmail.com> solved my own question: for future googlers: newarray = histogram[img[:, :, 0], img[:, :, 1], img[:, :, 2]) where histogram is the 3d histogram and img is the 3d img of (r, g, b) histogram localization bins. This version is also 300x faster than nested for loops. Chris On Sun, May 3, 2009 at 7:34 PM, Chris Colbert wrote: > Lets say I have histogram of a color image that is of size [16, 16, 16]. > > Now, I have a function that converts my rgb image into the format where > each rgb color (i.e. img[x, y, :] = (r, g, b)) is an integer in the range(0, > 16) > > I want create a new 2d array where new2darray[x, y] = hist[img[x,y, :]] > > that is, for each 3 tuple in img, use that as an index into the 3d > histogram and store the value of the histogram into the (x,y) position of > the 2d array. > > I've prototyped all the algorithms using for loops, now im just trying to > speed it up. > > I can't quite wrap my head fully around this fancy indexing yet. > > Chris > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sccolbert at gmail.com Sun May 3 20:15:55 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Sun, 3 May 2009 20:15:55 -0400 Subject: [Numpy-discussion] efficient 3d histogram creation Message-ID: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> in my endless pursuit of perfomance, i'm searching for a quick way to create a 3d histogram from a 3d rgb image. Here is what I have so far for a (16,16,16) 3d histogram: def hist3d(imgarray): histarray = N.zeros((16, 16, 16)) temp = imgarray.copy() (i, j) = imgarray.shape[0:2] temp = (temp - temp % 16) / 16 for a in range(i): for b in range(j): (b1, b2, b3) = temp[a, b, :] histarray[b1, b2, b3] += 1 return histarray this works, but takes about 4 seconds for a 640x480 image. I tried doing the inverse of my previous post, namely replacing the nested for loop with: histarray[temp[:,:,0], temp[:,:,1], temp[:,:,2]] += 1 but that doesn't work for whatever reason. It gives me number, but they're incorrect. Any ideas? Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Sun May 3 20:31:58 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 4 May 2009 02:31:58 +0200 Subject: [Numpy-discussion] efficient 3d histogram creation In-Reply-To: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> References: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> Message-ID: <9457e7c80905031731r7db53e40p45c8a5d6e066b6e8@mail.gmail.com> Hi Chris 2009/5/4 Chris Colbert : > in my endless pursuit of perfomance, i'm searching for a quick way to create > a 3d histogram from a 3d rgb image. Does histogramdd do what you want? Regards St?fan From sccolbert at gmail.com Sun May 3 20:36:04 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Sun, 3 May 2009 20:36:04 -0400 Subject: [Numpy-discussion] efficient 3d histogram creation In-Reply-To: <9457e7c80905031731r7db53e40p45c8a5d6e066b6e8@mail.gmail.com> References: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> <9457e7c80905031731r7db53e40p45c8a5d6e066b6e8@mail.gmail.com> Message-ID: <7f014ea60905031736s283d015ev2d4736007fbe5c4a@mail.gmail.com> Stefan, I'm not sure: the docs say the input has to be: sample : array_like Data to histogram passed as a sequence of D arrays of length N, or as an (N,D) array i have an (N,M,D) array and not sure how to get it to conform to input required, mainly because I don't understand what it's asking. Chris 2009/5/3 St?fan van der Walt > Hi Chris > > 2009/5/4 Chris Colbert : > > in my endless pursuit of perfomance, i'm searching for a quick way to > create > > a 3d histogram from a 3d rgb image. > > Does histogramdd do what you want? > > Regards > St?fan > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Sun May 3 20:36:09 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 3 May 2009 20:36:09 -0400 Subject: [Numpy-discussion] efficient 3d histogram creation In-Reply-To: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> References: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> Message-ID: <1cd32cbb0905031736n99b1907v13de267be7639f39@mail.gmail.com> On Sun, May 3, 2009 at 8:15 PM, Chris Colbert wrote: > in my endless pursuit of perfomance, i'm searching for a quick way to create > a 3d histogram from a 3d rgb image. > > Here is what I have so far for a (16,16,16) 3d histogram: > > def hist3d(imgarray): > ??? histarray = N.zeros((16, 16, 16)) > ??? temp = imgarray.copy() > ??? (i, j) = imgarray.shape[0:2] > ??? temp = (temp - temp % 16) / 16 > ??? for a in range(i): > ??????? for b in range(j): > ??????????? (b1, b2, b3) = temp[a, b, :] > ??????????? histarray[b1, b2, b3] += 1 > ????return histarray > > this works, but takes about 4 seconds for a 640x480 image. > > I tried doing the inverse of my previous post, namely replacing the nested > for loop with: > histarray[temp[:,:,0], temp[:,:,1], temp[:,:,2]] += 1 > > > but that doesn't work for whatever reason. It gives me number, but they're > incorrect. > > Any ideas? I'm not sure what exactly you need, but did you look at np.histogramdd ? reading the help file, this might work numpy.histogramdd(temp[:,:,0].ravel(), temp[:,:,1].ravel(), temp[:,:,2].ravel(), bins=16) but I never used histogramdd. also looking at the source of numpy is often very instructive, lots of good tricks to find in there: np.source(np.histogramdd). Josef From stefan at sun.ac.za Sun May 3 20:57:23 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 4 May 2009 02:57:23 +0200 Subject: [Numpy-discussion] efficient 3d histogram creation In-Reply-To: <7f014ea60905031736s283d015ev2d4736007fbe5c4a@mail.gmail.com> References: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> <9457e7c80905031731r7db53e40p45c8a5d6e066b6e8@mail.gmail.com> <7f014ea60905031736s283d015ev2d4736007fbe5c4a@mail.gmail.com> Message-ID: <9457e7c80905031757n9605cbka8e44dcf580c42e7@mail.gmail.com> Hi Chris 2009/5/4 Chris Colbert : > I'm not sure: > > the docs say the input has to be: > sample : array_like > ??? Data to histogram passed as a sequence of D arrays of length N, or > ??? as an (N,D) array > > i have an (N,M,D) array and not sure how to get it to conform to input > required, mainly because I don't understand what it's asking. Try count, bins = np.histogramdd(x.reshape((-1,3)), bins=16) Regards St?fan From stefan at sun.ac.za Sun May 3 21:00:45 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 4 May 2009 03:00:45 +0200 Subject: [Numpy-discussion] is there a way to get Nd indices of max of Nd array In-Reply-To: <1cd32cbb0905031334oee89fccw864e662d3349aabd@mail.gmail.com> References: <7f014ea60905031230l679e61d0h601d25aa29f218eb@mail.gmail.com> <1cd32cbb0905031334oee89fccw864e662d3349aabd@mail.gmail.com> Message-ID: <9457e7c80905031800r7cf83580q78d6b1e41f3d9f86@mail.gmail.com> 2009/5/3 : >>>> factors = np.random.randint(5,size=(4,3)) >>>> factors > array([[1, 1, 3], > ? ? ? [0, 2, 1], > ? ? ? [4, 4, 1], > ? ? ? [2, 2, 4]]) >>>> factors.max() > 4 >>>> np.argmax(factors) > 6 >>>> np.nonzero(factors==factors.max()) > (array([2, 2, 3]), array([0, 1, 2])) Since you have more than one maximum here, you have to do something like Josef outlined above. If you only want the indices of the first maximum argument, you can use np.unravel_index(6, factors.shape) Regards St?fan From cournape at gmail.com Mon May 4 00:03:37 2009 From: cournape at gmail.com (David Cournapeau) Date: Mon, 4 May 2009 13:03:37 +0900 Subject: [Numpy-discussion] Porting strategy for py3k In-Reply-To: References: <5b8d13220904230638i553d0e7bx2f3d93572861940c@mail.gmail.com> <5b8d13220904230752u3c8dd7fyb14f937d358472fd@mail.gmail.com> <49F095F0.90604@noaa.gov> <49F13EEF.6060205@ar.media.kyoto-u.ac.jp> <49F16F17.9000303@student.matnat.uio.no> <5b8d13220905022349v2f103cf5r3793cf8b54cdc405@mail.gmail.com> Message-ID: <5b8d13220905032103w5ff20605q7afa936bae05ce3f@mail.gmail.com> On Mon, May 4, 2009 at 1:24 AM, Dag Sverre Seljebotn wrote: > David Cournapeau wrote: >> On Fri, Apr 24, 2009 at 4:49 PM, Dag Sverre Seljebotn >>> One thing somebody *could* work on rather independently for some hours >>> is proper PEP 3118 support, as that is available in Python 2.6+ as well >>> and could be conditionally used on those systems. >> >> Yes, this could be done independently. I am not familiar with PEP >> 3118; from the python-dev ML, it looks like the current buffer API has >> some serious shortcomings, I don't whether this implies to numpy or >> not. Do you have more on this ? > > Not sure what you refer to ... http://mail.python.org/pipermail/python-dev/2009-April/088211.html Thank you for those information. I don't understand what is meant by "not implemented for multi-dimensional array", and the consequences for numpy. Does it mean that PEP 3118 is not fully implemented ? Is the status of the buffer interface the same for python 2.6 and python 3 ? > In addition it would be natural to act as a client, so that calling > "np.array(obj)" (and/or np.view?) would acquire the data through PEP 3118. Yes, it would help making sure we implement the interface correctly for once :) I am almost done having a numpy.distutils which can bootstrap itself to the point of converting the rest of the python code to py3k. With the buffer interface, this should enable moving forward in a piecewise manner. thank you, David From sccolbert at gmail.com Mon May 4 00:31:40 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Mon, 4 May 2009 00:31:40 -0400 Subject: [Numpy-discussion] efficient 3d histogram creation In-Reply-To: <1cd32cbb0905031736n99b1907v13de267be7639f39@mail.gmail.com> References: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> <1cd32cbb0905031736n99b1907v13de267be7639f39@mail.gmail.com> Message-ID: <7f014ea60905032131i114b8fdbyb9bfd04ad7d1200a@mail.gmail.com> this actually sort of worked. Thanks for putting me on the right track. Here is what I ended up with. this is what I ended up with: def hist3d(imgarray): histarray = N.zeros((16, 16, 16)) temp = imgarray.copy() bins = N.arange(0, 257, 16) histarray = N.histogramdd((temp[:,:,0].ravel(), temp[:,:,1].ravel(), temp[:,:,2].ravel()), bins=(bins, bins, bins))[0] return histarray this creates a 3d histogram of rgb image values in the range 0,255 using 16 bins per component color. on a 640x480 image, it executes in 0.3 seconds vs 4.5 seconds for a for loop. not quite framerate, but good enough for prototyping. Thanks! Chris On Sun, May 3, 2009 at 8:36 PM, wrote: > On Sun, May 3, 2009 at 8:15 PM, Chris Colbert > wrote: > > in my endless pursuit of perfomance, i'm searching for a quick way to > create > > a 3d histogram from a 3d rgb image. > > > > Here is what I have so far for a (16,16,16) 3d histogram: > > > > def hist3d(imgarray): > > histarray = N.zeros((16, 16, 16)) > > temp = imgarray.copy() > > (i, j) = imgarray.shape[0:2] > > temp = (temp - temp % 16) / 16 > > for a in range(i): > > for b in range(j): > > (b1, b2, b3) = temp[a, b, :] > > histarray[b1, b2, b3] += 1 > > return histarray > > > > this works, but takes about 4 seconds for a 640x480 image. > > > > I tried doing the inverse of my previous post, namely replacing the > nested > > for loop with: > > histarray[temp[:,:,0], temp[:,:,1], temp[:,:,2]] += 1 > > > > > > but that doesn't work for whatever reason. It gives me number, but > they're > > incorrect. > > > > Any ideas? > > I'm not sure what exactly you need, but did you look at np.histogramdd ? > > reading the help file, this might work > > numpy.histogramdd(temp[:,:,0].ravel(), temp[:,:,1].ravel(), > temp[:,:,2].ravel(), bins=16) > > but I never used histogramdd. > > also looking at the source of numpy is often very instructive, lots of > good tricks to find in there: np.source(np.histogramdd). > > Josef > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon May 4 07:00:13 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 4 May 2009 07:00:13 -0400 Subject: [Numpy-discussion] efficient 3d histogram creation In-Reply-To: <7f014ea60905032131i114b8fdbyb9bfd04ad7d1200a@mail.gmail.com> References: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> <1cd32cbb0905031736n99b1907v13de267be7639f39@mail.gmail.com> <7f014ea60905032131i114b8fdbyb9bfd04ad7d1200a@mail.gmail.com> Message-ID: <1cd32cbb0905040400p4dcd3de7he45b3be942dc2c02@mail.gmail.com> On Mon, May 4, 2009 at 12:31 AM, Chris Colbert wrote: > this actually sort of worked. Thanks for putting me on the right track. > > Here is what I ended up with. > > this is what I ended up with: > > def hist3d(imgarray): > ??? histarray = N.zeros((16, 16, 16)) > ??? temp = imgarray.copy() > ????bins = N.arange(0, 257, 16) > ??? histarray = N.histogramdd((temp[:,:,0].ravel(), temp[:,:,1].ravel(), > temp[:,:,2].ravel()), bins=(bins, bins, bins))[0] > ??? return histarray > > this creates a 3d histogram of rgb image values in the range 0,255 using 16 > bins per component color. > > on a 640x480 image, it executes in 0.3 seconds vs 4.5 seconds for a for > loop. > > not quite framerate, but good enough for prototyping. > I don't think your copy to temp is necessary, and use reshape(-1,3) as in the example of Stefan, which will avoid copying the array 3 times. If you need to gain some more speed, then rewriting histogramdd and removing some of the unnecessary checks and calculations looks possible. Josef From ndbecker2 at gmail.com Mon May 4 07:54:34 2009 From: ndbecker2 at gmail.com (Neal Becker) Date: Mon, 04 May 2009 07:54:34 -0400 Subject: [Numpy-discussion] apply table lookup to each element References: Message-ID: Neal Becker wrote: > Suggestion for efficient way to apply a table lookup to each element of an > integer array? > > import numpy as np > _cos = np.empty ((2**rom_in_bits,), dtype=int) > _sin = np.empty ((2**rom_in_bits,), dtype=int) > for address in xrange (2**12): > _cos[address] = nint ((2.0**(rom_out_bits-1)-1) * cos (2 * pi * > address > * (2.0**-rom_in_bits))) > _sin[address] = nint ((2.0**(rom_out_bits-1)-1) * sin (2 * pi * > address > * (2.0**-rom_in_bits))) > > Now _cos, _sin are arrays of integers (quantized sin, cos lookup tables) > > How to apply _cos lookup to each element of an integer array: > > phase = np.array (..., dtype =int) > cos_out = lookup (phase, _cos) ??? Turns out that if A is an np.array and B is an np.array, then A[B] will do exactly what I wanted. Is this mentioned anywhere in the documentation? From stefan at sun.ac.za Mon May 4 08:10:58 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 4 May 2009 14:10:58 +0200 Subject: [Numpy-discussion] apply table lookup to each element In-Reply-To: References: Message-ID: <9457e7c80905040510v5244d611k5d160ef8bd6ad76d@mail.gmail.com> 2009/5/4 Neal Becker : > Turns out that if A is an np.array and B is an np.array, then > A[B] will do exactly what I wanted. > > Is this mentioned anywhere in the documentation? http://docs.scipy.org/numpy/docs/numpy-docs/reference/arrays.indexing.rst/#arrays-indexing St?fan From dagss at student.matnat.uio.no Mon May 4 08:22:34 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Mon, 04 May 2009 14:22:34 +0200 Subject: [Numpy-discussion] Porting strategy for py3k In-Reply-To: <5b8d13220905032103w5ff20605q7afa936bae05ce3f@mail.gmail.com> References: <5b8d13220904230638i553d0e7bx2f3d93572861940c@mail.gmail.com> <5b8d13220904230752u3c8dd7fyb14f937d358472fd@mail.gmail.com> <49F095F0.90604@noaa.gov> <49F13EEF.6060205@ar.media.kyoto-u.ac.jp> <49F16F17.9000303@student.matnat.uio.no> <5b8d13220905022349v2f103cf5r3793cf8b54cdc405@mail.gmail.com> <5b8d13220905032103w5ff20605q7afa936bae05ce3f@mail.gmail.com> Message-ID: <49FEDE0A.30105@student.matnat.uio.no> David Cournapeau wrote: > On Mon, May 4, 2009 at 1:24 AM, Dag Sverre Seljebotn > wrote: > >> David Cournapeau wrote: >> >>> On Fri, Apr 24, 2009 at 4:49 PM, Dag Sverre Seljebotn >>> >>>> One thing somebody *could* work on rather independently for some hours >>>> is proper PEP 3118 support, as that is available in Python 2.6+ as well >>>> and could be conditionally used on those systems. >>>> >>> Yes, this could be done independently. I am not familiar with PEP >>> 3118; from the python-dev ML, it looks like the current buffer API has >>> some serious shortcomings, I don't whether this implies to numpy or >>> not. Do you have more on this ? >>> >> Not sure what you refer to ... >> > > http://mail.python.org/pipermail/python-dev/2009-April/088211.html > > Thank you for those information. > > I don't understand what is meant by "not implemented for > multi-dimensional array", and the consequences for numpy. Does it mean > that PEP 3118 is not fully implemented ? Is the status of the buffer > interface the same for python 2.6 and python 3 ? > The "memoryview" is not implemented on 2.6, but that's just a utility for being able to acquire a buffer and inspect it from Python-space. From Cython or C one still has access. I think this just refers to there not being any multidimensional consumers nor exporters in the standard library. So from the point of view of the standard library it is a bit useless; but it is not if one uses 3rd party libraries. The API itself is working fine, and you can e.g. export a multidimensional buffer and use it in Cython (defined __getbuffer__ for a cdef class and then access it through classname[dtype, ndim=...]), under both 2.6 and 3.0. Dag Sverre From bsouthey at gmail.com Mon May 4 11:14:34 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Mon, 04 May 2009 10:14:34 -0500 Subject: [Numpy-discussion] Really strange result In-Reply-To: References: Message-ID: <49FF065A.6050307@gmail.com> Neal Becker wrote: > Charles R Harris wrote: > > >> On Fri, May 1, 2009 at 7:39 PM, Charles R Harris >> wrote: >> >> >>> On Fri, May 1, 2009 at 7:24 PM, Neal Becker wrote: >>> >>> >>>> Charles R Harris wrote: >>>> >>>> >>>>> On Fri, May 1, 2009 at 1:02 PM, Neal Becker >>>>> >>>> wrote: >>>> >>>>>> In [16]: (np.linspace (0, len (x)-1, len(x)).astype >>>>>> >>>> (np.uint64)*2).dtype >>>> >>>>>> Out[16]: dtype('uint64') >>>>>> >>>>>> In [17]: (np.linspace (0, len (x)-1, len(x)).astype >>>>>> >>>> (np.uint64)*n).dtype >>>> >>>>>> Out[17]: dtype('float64') >>>>>> >>>>>> In [18]: type(n) >>>>>> Out[18]: >>>>>> >>>>>> Now that's just strange. What's going on? >>>>>> >>>>>> >>>>>> >>>>> The n is signed, uint64 is unsigned. So a signed type that can hold >>>>> uint64 is needed. There ain't no such integer, so float64 is used. I >>>>> >>>> think >>>> >>>>> the logic here is a bit goofy myself since float64 doesn't have the >>>>> >>>> needed >>>> >>>>> 64 bit precision and the conversion from int kind to float kind is >>>>> confusing. I think it would be better to raise a NotAvailable error or >>>>> some such. Lest you think this is an isolated oddity, sometimes >>>>> numeric arrays can be converted to object arrays. >>>>> >>>>> Chuck >>>>> >>>> I don't think that any type of integer arithmetic should ever be >>>> automatically promoted to float. >>>> >>>> Besides that, what about the first example? There, I used '2' rather >>>> than >>>> 'n'. Is not '2' also an int? >>>> >>> What version of numpy are you using? >>> >>> >> And what is the value of n? >> >> > > >> Chuck >> > > np.version.version > Out[5]: '1.3.0' > (I think the previous test was on 1.2.0 and did the same thing) > > (np.linspace (0, 1023,1024).astype(np.uint64)*2).dtype > Out[2]: dtype('uint64') > > In [3]: n=-7 > > In [4]: (np.linspace (0, 1023,1024).astype(np.uint64)*n).dtype > Out[4]: dtype('float64') > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Hi, //I think this behavior has been raised before. IIRC, Numpy is trying to do the operation that is requested by converting the dtype into floats since this is a generic solution that will avoid overflow with any ints not just unsigned ints. Note that you get a different result if you use subtraction than multiplication. >>> np.linspace (0, 1023,1024) array([ 0.00000000e+00, 1.00000000e+00, 2.00000000e+00, ..., 1.02100000e+03, 1.02200000e+03, 1.02300000e+03]) >>> np.linspace (0, 1023,1024).astype(np.uint64)*-7 array([ -0.00000000e+00, -7.00000000e+00, -1.40000000e+01, ..., -7.14700000e+03, -7.15400000e+03, -7.16100000e+03]) >>> np.linspace (0, 1023,1024).astype(np.uint64)-7 array([18446744073709551609, 18446744073709551610, 18446744073709551611, ..., 1014, 1015, 1016], dtype=uint64) Bruce From dfnsonfsduifb at gmx.de Mon May 4 11:40:30 2009 From: dfnsonfsduifb at gmx.de (Johannes Bauer) Date: Mon, 04 May 2009 17:40:30 +0200 Subject: [Numpy-discussion] Efficient scaling of array Message-ID: <49FF0C6E.2040609@gmx.de> Hello list, is there a possibility to scale an array by interpolation, automatically? For illustration a 1D-example would be an array of size 5, which is scaled to size 3: before: [ 1, 2, 3, 4, 5 ] 1/1 2/3 1/3 1 1/3 2/3 1 after : [ 2.33, 5, 7.66 ] The same thing should be possible in nD, with the obvious analogy. Is there such a function in numpy? Kind regards, Johannes From zachary.pincus at yale.edu Mon May 4 11:52:20 2009 From: zachary.pincus at yale.edu (Zachary Pincus) Date: Mon, 4 May 2009 11:52:20 -0400 Subject: [Numpy-discussion] Efficient scaling of array In-Reply-To: <49FF0C6E.2040609@gmx.de> References: <49FF0C6E.2040609@gmx.de> Message-ID: scipy.ndimage.zoom (and related interpolation functions) would be a good bet -- different orders of interpolation are available, too, which can be useful. Zach On May 4, 2009, at 11:40 AM, Johannes Bauer wrote: > Hello list, > > is there a possibility to scale an array by interpolation, > automatically? For illustration a 1D-example would be an array of size > 5, which is scaled to size 3: > > before: [ 1, 2, 3, 4, 5 ] > 1/1 2/3 > 1/3 1 1/3 > 2/3 1 > after : [ 2.33, 5, 7.66 ] > > > The same thing should be possible in nD, with the obvious analogy. Is > there such a function in numpy? > > Kind regards, > Johannes > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From taste_of_r at yahoo.com Mon May 4 12:43:24 2009 From: taste_of_r at yahoo.com (Wei Su) Date: Mon, 4 May 2009 09:43:24 -0700 (PDT) Subject: [Numpy-discussion] How to convert a list into a structured array? Message-ID: <41459.25081.qm@web43503.mail.sp1.yahoo.com> Hi,All: ? My first post! I am very excited to find out structured array (record array) in Python. Since I do data manipulation every day, this is truly great. However, I typically download data using pyodbc, the default output is a big list. So I am wondering how to convert that big list into a structured array? using array() will turn it into a text array, afaik. it is even better if anybody can show me some tricks to download the data directly as a structured array. ? Thanks a lot for the help. ? Wei Su ? BTW: I am also interested in Python's ability to handle large data. Any hints or suggestion is welcome. -------------- next part -------------- An HTML attachment was scrubbed... URL: From nwagner at iam.uni-stuttgart.de Mon May 4 12:48:55 2009 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Mon, 04 May 2009 18:48:55 +0200 Subject: [Numpy-discussion] stop criterion for an alternating signal Message-ID: Hi all, How can I define a stop criterion for an alternating series ? Any pointer would be appreciated. Nils from numpy import loadtxt, arange from pylab import plot, show A = loadtxt('alternate.dat') m = len(A) x = arange(0,m) plot(x,A) show() -------------- next part -------------- A non-text attachment was scrubbed... Name: alternate.dat Type: video/mpeg Size: 210 bytes Desc: not available URL: From charlesr.harris at gmail.com Mon May 4 12:52:59 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 4 May 2009 10:52:59 -0600 Subject: [Numpy-discussion] stop criterion for an alternating signal In-Reply-To: References: Message-ID: On Mon, May 4, 2009 at 10:48 AM, Nils Wagner wrote: > Hi all, > > How can I define a stop criterion for an alternating series ? > > Any pointer would be appreciated. > Where does the series come from and what are you trying to do? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From nwagner at iam.uni-stuttgart.de Mon May 4 12:59:57 2009 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Mon, 04 May 2009 18:59:57 +0200 Subject: [Numpy-discussion] stop criterion for an alternating signal In-Reply-To: References: Message-ID: On Mon, 4 May 2009 10:52:59 -0600 Charles R Harris wrote: > On Mon, May 4, 2009 at 10:48 AM, Nils Wagner > wrote: > >> Hi all, >> >> How can I define a stop criterion for an alternating >>series ? >> >> Any pointer would be appreciated. >> > > Where does the series come from and what are you trying >to do? > > Chuck The data come from an iterative process. I am looking for convergence criteria. It should be possible to stop the process after 10-15 iterations. Nils From dfnsonfsduifb at gmx.de Mon May 4 13:17:47 2009 From: dfnsonfsduifb at gmx.de (Johannes Bauer) Date: Mon, 04 May 2009 19:17:47 +0200 Subject: [Numpy-discussion] Efficient scaling of array In-Reply-To: References: <49FF0C6E.2040609@gmx.de> Message-ID: <49FF233B.6080608@gmx.de> Zachary Pincus schrieb: > scipy.ndimage.zoom (and related interpolation functions) would be a > good bet -- different orders of interpolation are available, too, > which can be useful. Thanks a lot - exactly what I was looking for! Kind regards, Johannes From Chris.Barker at noaa.gov Mon May 4 13:23:43 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 04 May 2009 10:23:43 -0700 Subject: [Numpy-discussion] Really strange result In-Reply-To: References: Message-ID: <49FF249F.1010205@noaa.gov> Neal Becker wrote: > In [3]: n=-7 > > In [4]: (np.linspace (0, 1023,1024).astype(np.uint64)*n).dtype > Out[4]: dtype('float64') what would you like (expect) to happen when you multiply an unsigned type by a negative number? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From charlesr.harris at gmail.com Mon May 4 13:45:03 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 4 May 2009 11:45:03 -0600 Subject: [Numpy-discussion] stop criterion for an alternating signal In-Reply-To: References: Message-ID: On Mon, May 4, 2009 at 10:59 AM, Nils Wagner wrote: > On Mon, 4 May 2009 10:52:59 -0600 > Charles R Harris wrote: > > On Mon, May 4, 2009 at 10:48 AM, Nils Wagner > > wrote: > > > >> Hi all, > >> > >> How can I define a stop criterion for an alternating > >>series ? > >> > >> Any pointer would be appreciated. > >> > > > > Where does the series come from and what are you trying > >to do? > > > > Chuck > > The data come from an iterative process. > I am looking for convergence criteria. Well, the example didn't show convergence. If you are working on the harmonic series it takes an awful lot of iterations to get anywhere. So if your example is representative the algorithm needs fixing to accelerate the convergence. Assuming it actually converges and I'm not convinced of that. > > It should be possible to stop the process after 10-15 > iterations. > When an alternating series converges there is a decreasing upper bound and increasing lower bound and the difference goes to zero. Pick a cutoff and quit when the bounds are less than that apart. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.huard at gmail.com Mon May 4 15:18:20 2009 From: david.huard at gmail.com (David Huard) Date: Mon, 4 May 2009 15:18:20 -0400 Subject: [Numpy-discussion] efficient 3d histogram creation In-Reply-To: <1cd32cbb0905040400p4dcd3de7he45b3be942dc2c02@mail.gmail.com> References: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> <1cd32cbb0905031736n99b1907v13de267be7639f39@mail.gmail.com> <7f014ea60905032131i114b8fdbyb9bfd04ad7d1200a@mail.gmail.com> <1cd32cbb0905040400p4dcd3de7he45b3be942dc2c02@mail.gmail.com> Message-ID: <91cf711d0905041218p70bb44ct35a601844c8c262b@mail.gmail.com> On Mon, May 4, 2009 at 7:00 AM, wrote: > On Mon, May 4, 2009 at 12:31 AM, Chris Colbert > wrote: > > this actually sort of worked. Thanks for putting me on the right track. > > > > Here is what I ended up with. > > > > this is what I ended up with: > > > > def hist3d(imgarray): > > histarray = N.zeros((16, 16, 16)) > > temp = imgarray.copy() > > bins = N.arange(0, 257, 16) > > histarray = N.histogramdd((temp[:,:,0].ravel(), temp[:,:,1].ravel(), > > temp[:,:,2].ravel()), bins=(bins, bins, bins))[0] > > return histarray > > > > this creates a 3d histogram of rgb image values in the range 0,255 using > 16 > > bins per component color. > > > > on a 640x480 image, it executes in 0.3 seconds vs 4.5 seconds for a for > > loop. > > > > not quite framerate, but good enough for prototyping. > > > > I don't think your copy to temp is necessary, and use reshape(-1,3) as > in the example of Stefan, which will avoid copying the array 3 times. > > If you need to gain some more speed, then rewriting histogramdd and > removing some of the unnecessary checks and calculations looks > possible. Indeed, the strategy used in the histogram function is faster than the one used in the histogramdd case, so porting one to the other should speed things up. David > > Josef > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sccolbert at gmail.com Mon May 4 16:00:31 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Mon, 4 May 2009 16:00:31 -0400 Subject: [Numpy-discussion] efficient 3d histogram creation In-Reply-To: <91cf711d0905041218p70bb44ct35a601844c8c262b@mail.gmail.com> References: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> <1cd32cbb0905031736n99b1907v13de267be7639f39@mail.gmail.com> <7f014ea60905032131i114b8fdbyb9bfd04ad7d1200a@mail.gmail.com> <1cd32cbb0905040400p4dcd3de7he45b3be942dc2c02@mail.gmail.com> <91cf711d0905041218p70bb44ct35a601844c8c262b@mail.gmail.com> Message-ID: <7f014ea60905041300y49b48055i6df8d5e598d0fe80@mail.gmail.com> i'll take a look at them over the next few days and see what i can hack out. Chris On Mon, May 4, 2009 at 3:18 PM, David Huard wrote: > > > On Mon, May 4, 2009 at 7:00 AM, wrote: > >> On Mon, May 4, 2009 at 12:31 AM, Chris Colbert >> wrote: >> > this actually sort of worked. Thanks for putting me on the right track. >> > >> > Here is what I ended up with. >> > >> > this is what I ended up with: >> > >> > def hist3d(imgarray): >> > histarray = N.zeros((16, 16, 16)) >> > temp = imgarray.copy() >> > bins = N.arange(0, 257, 16) >> > histarray = N.histogramdd((temp[:,:,0].ravel(), temp[:,:,1].ravel(), >> > temp[:,:,2].ravel()), bins=(bins, bins, bins))[0] >> > return histarray >> > >> > this creates a 3d histogram of rgb image values in the range 0,255 using >> 16 >> > bins per component color. >> > >> > on a 640x480 image, it executes in 0.3 seconds vs 4.5 seconds for a for >> > loop. >> > >> > not quite framerate, but good enough for prototyping. >> > >> >> I don't think your copy to temp is necessary, and use reshape(-1,3) as >> in the example of Stefan, which will avoid copying the array 3 times. >> >> If you need to gain some more speed, then rewriting histogramdd and >> removing some of the unnecessary checks and calculations looks >> possible. > > > Indeed, the strategy used in the histogram function is faster than the one > used in the histogramdd case, so porting one to the other should speed > things up. > > David > > >> >> Josef >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bruno.piguet at gmail.com Mon May 4 16:06:54 2009 From: bruno.piguet at gmail.com (bruno Piguet) Date: Mon, 4 May 2009 22:06:54 +0200 Subject: [Numpy-discussion] loadtxt example problem ? Message-ID: Hello, I'm new to numpy, and considering using loadtxt() to read a data file. As a starter, I tried the example of the doc page ( http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html) : >>> from StringIO import StringIO # StringIO behaves like a file object >>> c = StringIO("0 1\n2 3") >>> np.loadtxt(c) I didn't get the expectd answer, but : Traceback (moste recent call last): File"(stdin)", line 1, in File "C:\Python25\lib\sire-packages\numpy\core\numeric.py", line 725, in loadtxt X = array(X, dtype) ValueError: setting an array element with a sequence. I'm using verison 1.0.4 of numpy). I got the same problem on a Ms-Windows and a Linux Machine. I could run the example by adding a \n at the end of c : c = StringIO("0 1\n2 3\n") Is it the normal and expected behaviour ? Bruno. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rmay31 at gmail.com Mon May 4 16:10:25 2009 From: rmay31 at gmail.com (Ryan May) Date: Mon, 4 May 2009 15:10:25 -0500 Subject: [Numpy-discussion] loadtxt example problem ? In-Reply-To: References: Message-ID: On Mon, May 4, 2009 at 3:06 PM, bruno Piguet wrote: > Hello, > > I'm new to numpy, and considering using loadtxt() to read a data file. > > As a starter, I tried the example of the doc page ( > http://docs.scipy.org/doc/numpy/reference/generated/numpy.loadtxt.html) : > > > >>> from StringIO import StringIO # StringIO behaves like a file object > >>> c = StringIO("0 1\n2 3") > >>> np.loadtxt(c) > I didn't get the expectd answer, but : > > Traceback (moste recent call last): > File"(stdin)", line 1, in > File "C:\Python25\lib\sire-packages\numpy\core\numeric.py", line 725, in loadtxt > X = array(X, dtype) > ValueError: setting an array element with a sequence. > > > I'm using verison 1.0.4 of numpy). > > I got the same problem on a Ms-Windows and a Linux Machine. > > I could run the example by adding a \n at the end of c : > c = StringIO("0 1\n2 3\n") > > > Is it the normal and expected behaviour ? > > Bruno. > > It's a bug that's been fixed. Numpy 1.0.4 is quite a bit out of date, so I'd recommend updating to the latest (1.3). Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon May 4 16:18:11 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 4 May 2009 16:18:11 -0400 Subject: [Numpy-discussion] efficient 3d histogram creation In-Reply-To: <7f014ea60905041300y49b48055i6df8d5e598d0fe80@mail.gmail.com> References: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> <1cd32cbb0905031736n99b1907v13de267be7639f39@mail.gmail.com> <7f014ea60905032131i114b8fdbyb9bfd04ad7d1200a@mail.gmail.com> <1cd32cbb0905040400p4dcd3de7he45b3be942dc2c02@mail.gmail.com> <91cf711d0905041218p70bb44ct35a601844c8c262b@mail.gmail.com> <7f014ea60905041300y49b48055i6df8d5e598d0fe80@mail.gmail.com> Message-ID: <1cd32cbb0905041318g11dea0b9oa61f08bcc380144d@mail.gmail.com> On Mon, May 4, 2009 at 4:00 PM, Chris Colbert wrote: > i'll take a look at them over the next few days and see what i can hack out. > > Chris > > On Mon, May 4, 2009 at 3:18 PM, David Huard wrote: >> >> >> On Mon, May 4, 2009 at 7:00 AM, wrote: >>> >>> On Mon, May 4, 2009 at 12:31 AM, Chris Colbert >>> wrote: >>> > this actually sort of worked. Thanks for putting me on the right track. >>> > >>> > Here is what I ended up with. >>> > >>> > this is what I ended up with: >>> > >>> > def hist3d(imgarray): >>> > ??? histarray = N.zeros((16, 16, 16)) >>> > ??? temp = imgarray.copy() >>> > ????bins = N.arange(0, 257, 16) >>> > ??? histarray = N.histogramdd((temp[:,:,0].ravel(), >>> > temp[:,:,1].ravel(), >>> > temp[:,:,2].ravel()), bins=(bins, bins, bins))[0] >>> > ??? return histarray >>> > >>> > this creates a 3d histogram of rgb image values in the range 0,255 >>> > using 16 >>> > bins per component color. >>> > >>> > on a 640x480 image, it executes in 0.3 seconds vs 4.5 seconds for a for >>> > loop. >>> > >>> > not quite framerate, but good enough for prototyping. >>> > >>> >>> I don't think your copy to temp is necessary, and use reshape(-1,3) as >>> in the example of Stefan, which will avoid copying the array 3 times. >>> >>> If you need to gain some more speed, then rewriting histogramdd and >>> removing some of the unnecessary checks and calculations looks >>> possible. >> >> Indeed, the strategy used in the histogram function is faster than the one >> used in the histogramdd case, so porting one to the other should speed >> things up. >> >> David is searchsorted faster than digitize and bincount ? Using the idea of histogramdd, I get a bit below a tenth of a second, my best for this problem is below. I was trying for a while what the fastest way is to convert a two dimensional array into a one dimensional index for bincount. I found that using the return index of unique1d is very slow compared to numeric index calculation. Josef example timed for: nobs = 307200 nbins = 16 factors = np.random.randint(256,size=(nobs,3)).copy() factors2 = factors.reshape(-1,480,3).copy() def hist3(factorsin, nbins): if factorsin.ndim != 2: factors = factorsin.reshape(-1,factorsin.shape[-1]) else: factors = factorsin N, D = factors.shape darr = np.empty(factors.T.shape, dtype=int) nele = np.max(factors)+1 bins = np.arange(0, nele, nele/nbins) bins[-1] += 1 for i in range(D): darr[i] = np.digitize(factors[:,i],bins) - 1 #add weighted rows darrind = darr[D-1] for i in range(D-1): darrind += darr[i]*nbins**(D-i-1) return np.bincount(darrind) # return flat not reshaped From dwf at cs.toronto.edu Mon May 4 16:55:59 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Mon, 4 May 2009 16:55:59 -0400 Subject: [Numpy-discussion] object arrays and == Message-ID: <33C0A55A-45AB-45A5-8F75-DF99DF4182F1@cs.toronto.edu> Hi, Is there a simple way to compare each element of an object array to a single object? objarray == None, for example, gives me a single "False". I couldn't find any reference to it in the documentation, but I'll admit, I wasn't quite sure where to look. David From rmay31 at gmail.com Mon May 4 17:02:52 2009 From: rmay31 at gmail.com (Ryan May) Date: Mon, 4 May 2009 16:02:52 -0500 Subject: [Numpy-discussion] object arrays and == In-Reply-To: <33C0A55A-45AB-45A5-8F75-DF99DF4182F1@cs.toronto.edu> References: <33C0A55A-45AB-45A5-8F75-DF99DF4182F1@cs.toronto.edu> Message-ID: On Mon, May 4, 2009 at 3:55 PM, David Warde-Farley wrote: > Hi, > > Is there a simple way to compare each element of an object array to a > single object? objarray == None, for example, gives me a single > "False". I couldn't find any reference to it in the documentation, but > I'll admit, I wasn't quite sure where to look. > I think it might depend on some factors: In [1]: a = np.array(['a','b'], dtype=np.object) In [2]: a=='a' Out[2]: array([ True, False], dtype=bool) In [3]: a==None Out[3]: False In [4]: a == [] Out[4]: False In [5]: a == '' Out[5]: array([False, False], dtype=bool) In [6]: a == dict() Out[6]: array([False, False], dtype=bool) In [7]: numpy.__version__ Out[7]: '1.4.0.dev6885' In [8]: a == 5 Out[8]: array([False, False], dtype=bool) In [9]: a == 5. Out[9]: array([False, False], dtype=bool) But based on these results, I have no idea what the factors might be. I know this works with datetime objects, but I'm really not sure why None and the empty list don't work. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma Sent from Norman, Oklahoma, United States -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Mon May 4 17:19:59 2009 From: kwgoodman at gmail.com (Keith Goodman) Date: Mon, 4 May 2009 14:19:59 -0700 Subject: [Numpy-discussion] object arrays and == In-Reply-To: References: <33C0A55A-45AB-45A5-8F75-DF99DF4182F1@cs.toronto.edu> Message-ID: On Mon, May 4, 2009 at 2:02 PM, Ryan May wrote: > On Mon, May 4, 2009 at 3:55 PM, David Warde-Farley > wrote: >> >> Hi, >> >> Is there a simple way to compare each element of an object array to a >> single object? objarray == None, for example, gives me a single >> "False". I couldn't find any reference to it in the documentation, but >> I'll admit, I wasn't quite sure where to look. > > I think it might depend on some factors: > > In [1]: a = np.array(['a','b'], dtype=np.object) > > In [2]: a=='a' > Out[2]: array([ True, False], dtype=bool) > > In [3]: a==None > Out[3]: False > > In [4]: a == [] > Out[4]: False > > In [5]: a == '' > Out[5]: array([False, False], dtype=bool) > > In [6]: a == dict() > Out[6]: array([False, False], dtype=bool) > > In [7]: numpy.__version__ > Out[7]: '1.4.0.dev6885' > > In [8]: a == 5 > Out[8]: array([False, False], dtype=bool) > > In [9]: a == 5. > Out[9]: array([False, False], dtype=bool) > > But based on these results, I have no idea what the factors might be.? I > know this works with datetime objects, but I'm really not sure why None and > the empty list don't work. Doing a little poking around, I found this: >> a = np.array([True, False]) >> a == None False >> np.equal(a, None) array([False, False], dtype=bool) From python at beyondcode.org Mon May 4 18:19:51 2009 From: python at beyondcode.org (Philipp K. Janert) Date: Mon, 4 May 2009 15:19:51 -0700 Subject: [Numpy-discussion] SVD failure Message-ID: <200905041519.52056.python@beyondcode.org> The following code: from scipy import * from scipy import linalg m = matrix( [ [1,1,0,0], [1,1,0,0], [0,0,1,1], [0,0,1,1] ] ) u,s,v = linalg.svd( m ) fails with the following message: Traceback (most recent call last): File "boo.py", line 10, in u,s,v = linalg.svd( m ) File "/usr/lib64/python2.6/site-packages/scipy/linalg/decomp.py", line 509, in svd lwork = calc_lwork.gesdd(gesdd.prefix,m,n,compute_uv)[1] RuntimeError: more argument specifiers than keyword list entries (remaining format:'|:calc_lwork.gesdd') On the other hand, calculating la, v = eig( m ) works just fine. If I see this correctly, my SciPy version is 0.6.0; running on 64bit Suse 11. Any thoughts? Best, Ph. From bolme1234 at comcast.net Mon May 4 18:58:52 2009 From: bolme1234 at comcast.net (David Bolme) Date: Mon, 4 May 2009 16:58:52 -0600 Subject: [Numpy-discussion] FaceL - Facile Face Labeling Message-ID: I have been working on a open source face recognition demo tool called FaceL for the past few months. FaceL is a simple and fun face processing and labeling tool that labels faces in a live video from an iSight camera or webcam. It uses OpenCV for face detection, ASEF correlation filters for eye localization, and a Support Vector Machine for face classification. FaceL is implemented in python (using PyVision, wxPython, SciPy, OpenCV, libsvm, and PIL libraries) with a binary executable available on Mac OS 10.5 Intel. (Windows and Linux versions expected soon.) FaceL has been fun to work on and I thought that the NumPy community might like to see their software in action. FaceL source code, videos, and executable can be found at: http://pyvision.sourceforge.net/facel -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Mon May 4 19:21:29 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 5 May 2009 01:21:29 +0200 Subject: [Numpy-discussion] SVD failure In-Reply-To: <200905041519.52056.python@beyondcode.org> References: <200905041519.52056.python@beyondcode.org> Message-ID: <9457e7c80905041621n64404f0apeb47260adbd8dc21@mail.gmail.com> Hi Philipp 2009/5/5 Philipp K. Janert : > If I see this correctly, my SciPy version > is 0.6.0; running on 64bit Suse 11. SciPy 0.6 is quite old, and it is likely that the problem was fixed in the mean time. On SciPy 0.7 I see: In [31]: u,s,v = linalg.svd(m) In [32]: u Out[32]: array([[-0.70710678, 0. , -0.70710678, 0. ], [-0.70710678, 0. , 0.70710678, 0. ], [ 0. , -0.70710678, 0. , -0.70710678], [ 0. , -0.70710678, 0. , 0.70710678]]) In [33]: s Out[33]: array([ 2., 2., 0., 0.]) In [34]: v Out[34]: array([[-0.70710678, -0.70710678, -0. , -0. ], [-0. , -0. , -0.70710678, -0.70710678], [-0.70710678, 0.70710678, 0. , 0. ], [ 0. , 0. , -0.70710678, 0.70710678]]) Regards St?fan From stefan at sun.ac.za Mon May 4 19:32:28 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 5 May 2009 01:32:28 +0200 Subject: [Numpy-discussion] FaceL - Facile Face Labeling In-Reply-To: References: Message-ID: <9457e7c80905041632x6cd497e8t22d14d1f7b489d9@mail.gmail.com> 2009/5/5 David Bolme : > I have been working on a?open source face recognition demo tool called > FaceL?for the past few months. ? FaceL is a simple and fun face processing > and labeling tool that labels faces in a live video from an iSight camera or > webcam. That's really neat -- I've always wanted to rig our office with biometric access control, and now you've provided a key ingredient! Thanks for sharing, David. Cheers St?fan From python at beyondcode.org Mon May 4 19:39:40 2009 From: python at beyondcode.org (Philipp K. Janert) Date: Mon, 4 May 2009 16:39:40 -0700 Subject: [Numpy-discussion] SVD failure In-Reply-To: <9457e7c80905041621n64404f0apeb47260adbd8dc21@mail.gmail.com> References: <200905041519.52056.python@beyondcode.org> <9457e7c80905041621n64404f0apeb47260adbd8dc21@mail.gmail.com> Message-ID: <200905041639.41083.python@beyondcode.org> Thanks for the quick reply. I'll try upgrading. Best, Ph. On Monday 04 May 2009 04:21:29 pm St?fan van der Walt wrote: > Hi Philipp > > 2009/5/5 Philipp K. Janert : > > If I see this correctly, my SciPy version > > is 0.6.0; running on 64bit Suse 11. > > SciPy 0.6 is quite old, and it is likely that the problem was fixed in > the mean time. > > On SciPy 0.7 I see: > > In [31]: u,s,v = linalg.svd(m) > > In [32]: u > Out[32]: > array([[-0.70710678, 0. , -0.70710678, 0. ], > [-0.70710678, 0. , 0.70710678, 0. ], > [ 0. , -0.70710678, 0. , -0.70710678], > [ 0. , -0.70710678, 0. , 0.70710678]]) > > In [33]: s > Out[33]: array([ 2., 2., 0., 0.]) > > In [34]: v > Out[34]: > array([[-0.70710678, -0.70710678, -0. , -0. ], > [-0. , -0. , -0.70710678, -0.70710678], > [-0.70710678, 0.70710678, 0. , 0. ], > [ 0. , 0. , -0.70710678, 0.70710678]]) > > Regards > St?fan > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From matthew.brett at gmail.com Tue May 5 00:56:22 2009 From: matthew.brett at gmail.com (Matthew Brett) Date: Mon, 4 May 2009 21:56:22 -0700 Subject: [Numpy-discussion] Structured array with no fields - possible? In-Reply-To: <49FD380A.9080600@ar.media.kyoto-u.ac.jp> References: <1e2af89e0905022154o375b48b7u1f27da260f7286eb@mail.gmail.com> <49FD380A.9080600@ar.media.kyoto-u.ac.jp> Message-ID: <1e2af89e0905042156x6aad087bj19109de1916b299e@mail.gmail.com> Hi, >> I'm trying to fix a bug in the scipy matlab loading routines, and this >> requires me to somehow represent an empty structured array. >> > > Do you need the struct to be empty (size is 0) or to have no fields ? > What would you expect np.zeros((), dtype=np.dtype([])) to return, for > example ? Yes, I've got nothing - I have no idea what that might return. I'm afraid what I need is some way of representing the fact that I have read, from matlab, a structure with no fields (and therefore no data) that can be - say - shape (10,2) - or any other. Some time ago we thought of switching to structured arrays to represent matlab structs, but this begins to make me think again. Thanks for the replies, Matthew From sccolbert at gmail.com Tue May 5 01:38:37 2009 From: sccolbert at gmail.com (S. Chris Colbert) Date: Tue, 5 May 2009 01:38:37 -0400 Subject: [Numpy-discussion] Structured array with no fields - possible? In-Reply-To: <1e2af89e0905042156x6aad087bj19109de1916b299e@mail.gmail.com> References: <1e2af89e0905022154o375b48b7u1f27da260f7286eb@mail.gmail.com><49FD380A.9080600@ar.media.kyoto-u.ac.jp> <1e2af89e0905042156x6aad087bj19109de1916b299e@mail.gmail.com> Message-ID: you can create an array without changing the values of the allocated memory by using numpy.empty() or numpy.ndarray() this will allow you to create an array of any size without specifying the contents beforehand. I'm not sure what you mean by "empty", because any memory address will have a value, whether its assigned to or not. Chris -------------------------------------------------- From: "Matthew Brett" Sent: Tuesday, May 05, 2009 12:56 AM To: "Discussion of Numerical Python" Subject: Re: [Numpy-discussion] Structured array with no fields - possible? > Hi, > >>> I'm trying to fix a bug in the scipy matlab loading routines, and this >>> requires me to somehow represent an empty structured array. >>> >> >> Do you need the struct to be empty (size is 0) or to have no fields ? >> What would you expect np.zeros((), dtype=np.dtype([])) to return, for >> example ? > > Yes, I've got nothing - I have no idea what that might return. > > I'm afraid what I need is some way of representing the fact that I > have read, from matlab, a structure with no fields (and therefore no > data) that can be - say - shape (10,2) - or any other. > > Some time ago we thought of switching to structured arrays to > represent matlab structs, but this begins to make me think again. > > Thanks for the replies, > > Matthew > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From faltet at pytables.org Tue May 5 03:10:25 2009 From: faltet at pytables.org (Francesc Alted) Date: Tue, 5 May 2009 09:10:25 +0200 Subject: [Numpy-discussion] How to convert a list into a structured array? In-Reply-To: <41459.25081.qm@web43503.mail.sp1.yahoo.com> References: <41459.25081.qm@web43503.mail.sp1.yahoo.com> Message-ID: <200905050910.25603.faltet@pytables.org> Welcome Wei! A Monday 04 May 2009, Wei Su escrigu?: > Hi,All: > ? > My first post! I am very excited to find out structured array (record > array) in Python. Since I do data manipulation every day, this is > truly great. However, I typically download data using pyodbc, the > default output is a big list. So I am wondering how to convert that > big list into a structured array? using array() will turn it into a > text array, afaik. it is even better if anybody can show me some > tricks to download the data directly as a structured array. > Thanks a lot for the help. Please, could you provide an example of the list that you are getting from your database? With that we can probably figure out your needs much better. > BTW: I am also interested in Python's ability to handle large data. > Any hints or suggestion is welcome. This is also a bit generic question. What kind of data you have to deal with? What sort of operations do you want to perform over it? Do you need a lot of speed or flexibility is more important? Some example? Cheers, -- Francesc Alted "One would expect people to feel threatened by the 'giant brains or machines that think'. In fact, the frightening computer becomes less frightening if it is used only to simulate a familiar noncomputer." -- Edsger W. Dykstra "On the cruelty of really teaching computer science" From cournape at gmail.com Tue May 5 05:33:09 2009 From: cournape at gmail.com (David Cournapeau) Date: Tue, 5 May 2009 18:33:09 +0900 Subject: [Numpy-discussion] [review] py3k_bootstrap branch Message-ID: <5b8d13220905050233m14c4a1e1l80dd6f231e100d5a@mail.gmail.com> Hi, I spent some more time on making numpy.distutils runnable under python 3. I finally made up to the point where it breaks at C code compilation, so we can start working on the hard part. The branch is there for review http://github.com/cournape/numpy/commits/py3k_bootstrap The code is quite ugly to be honest, but I have not found a better way; suggestions are welcomed. The biggest pain is by far exception catching (you can't do except IOError, e in python 3), and then print. Most other things can be handled by careful application of 2to3 with the fixers which keep python2 compatibility (print is unfortunately not one of them). There are also a few python 3.* bugs in distutils (I guess few C-based extensions made it for python 3 already). The rationale for making numpy.distutils runnable under both python2 and python3 (instead of just applying 2to3 on it): - it enables us to bootstrap our build process through the distutils 2to3 command (which is supposed to convert code to python 3 from python 2 sources on the fly). - The few informations I found on non trivial port all made sure their setup.py was python 2 and 3 compatible - which means numpy.distutils for us. - 2to3 is very slow (takes 5 minutes for me on numpy), so having to apply it every time from pristine source for python 3 support would be very painful IMHO. cheers, David From timmichelsen at gmx-topmail.de Tue May 5 05:56:55 2009 From: timmichelsen at gmx-topmail.de (Timmie) Date: Tue, 5 May 2009 09:56:55 +0000 (UTC) Subject: [Numpy-discussion] numpy docstrings Message-ID: Hello, is it possible to add sections to the allowed sections in the numpy docstring standard? What to think of a section like: Todo ---- Regards, Timmie From robert.kern at gmail.com Tue May 5 07:34:19 2009 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 5 May 2009 07:34:19 -0400 Subject: [Numpy-discussion] numpy docstrings In-Reply-To: References: Message-ID: <3d375d730905050434l32cec624wc3fb562c3a4e1880@mail.gmail.com> On Tue, May 5, 2009 at 05:56, Timmie wrote: > Hello, > is it possible to add sections to the allowed sections in the numpy docstring > standard? > > What to think of a section like: > > Todo > ---- I prefer to keep such things in comments rather than docstrings, myself. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From david.huard at gmail.com Tue May 5 09:46:38 2009 From: david.huard at gmail.com (David Huard) Date: Tue, 5 May 2009 09:46:38 -0400 Subject: [Numpy-discussion] efficient 3d histogram creation In-Reply-To: <1cd32cbb0905041318g11dea0b9oa61f08bcc380144d@mail.gmail.com> References: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> <1cd32cbb0905031736n99b1907v13de267be7639f39@mail.gmail.com> <7f014ea60905032131i114b8fdbyb9bfd04ad7d1200a@mail.gmail.com> <1cd32cbb0905040400p4dcd3de7he45b3be942dc2c02@mail.gmail.com> <91cf711d0905041218p70bb44ct35a601844c8c262b@mail.gmail.com> <7f014ea60905041300y49b48055i6df8d5e598d0fe80@mail.gmail.com> <1cd32cbb0905041318g11dea0b9oa61f08bcc380144d@mail.gmail.com> Message-ID: <91cf711d0905050646q7652eepc55d1aed17b1d1ed@mail.gmail.com> On Mon, May 4, 2009 at 4:18 PM, wrote: > On Mon, May 4, 2009 at 4:00 PM, Chris Colbert wrote: > > i'll take a look at them over the next few days and see what i can hack > out. > > > > Chris > > > > On Mon, May 4, 2009 at 3:18 PM, David Huard > wrote: > >> > >> > >> On Mon, May 4, 2009 at 7:00 AM, wrote: > >>> > >>> On Mon, May 4, 2009 at 12:31 AM, Chris Colbert > >>> wrote: > >>> > this actually sort of worked. Thanks for putting me on the right > track. > >>> > > >>> > Here is what I ended up with. > >>> > > >>> > this is what I ended up with: > >>> > > >>> > def hist3d(imgarray): > >>> > histarray = N.zeros((16, 16, 16)) > >>> > temp = imgarray.copy() > >>> > bins = N.arange(0, 257, 16) > >>> > histarray = N.histogramdd((temp[:,:,0].ravel(), > >>> > temp[:,:,1].ravel(), > >>> > temp[:,:,2].ravel()), bins=(bins, bins, bins))[0] > >>> > return histarray > >>> > > >>> > this creates a 3d histogram of rgb image values in the range 0,255 > >>> > using 16 > >>> > bins per component color. > >>> > > >>> > on a 640x480 image, it executes in 0.3 seconds vs 4.5 seconds for a > for > >>> > loop. > >>> > > >>> > not quite framerate, but good enough for prototyping. > >>> > > >>> > >>> I don't think your copy to temp is necessary, and use reshape(-1,3) as > >>> in the example of Stefan, which will avoid copying the array 3 times. > >>> > >>> If you need to gain some more speed, then rewriting histogramdd and > >>> removing some of the unnecessary checks and calculations looks > >>> possible. > >> > >> Indeed, the strategy used in the histogram function is faster than the > one > >> used in the histogramdd case, so porting one to the other should speed > >> things up. > >> > >> David > > is searchsorted faster than digitize and bincount ? > That depends on the number of bins and whether or not the bin width is uniform. A 1D benchmark I did a while ago showed that if the bin width is uniform, then the best strategy is to create a counter initialized to 0, loop through the data, compute i = (x-bin0) /binwidth and increment counter i by 1 (or by the weight of the data). If the bins are non uniform, then for nbin > 30 you'd better use searchsort, and digitize otherwise. For those interested in speeding up histogram code, I recommend reading a thread started by Cameron Walsh on the 12/12/06 named "Histograms of extremely large data sets" Code and benchmarks were posted. Chris, if your bins all have the same width, then you can certainly write an histogramdd routine that is way faster by using the indexing trick instead of digitize or searchsort. Cheers, David > > Using the idea of histogramdd, I get a bit below a tenth of a second, > my best for this problem is below. > I was trying for a while what the fastest way is to convert a two > dimensional array into a one dimensional index for bincount. I found > that using the return index of unique1d is very slow compared to > numeric index calculation. > > Josef > > example timed for: > nobs = 307200 > nbins = 16 > factors = np.random.randint(256,size=(nobs,3)).copy() > factors2 = factors.reshape(-1,480,3).copy() > > def hist3(factorsin, nbins): > if factorsin.ndim != 2: > factors = factorsin.reshape(-1,factorsin.shape[-1]) > else: > factors = factorsin > N, D = factors.shape > darr = np.empty(factors.T.shape, dtype=int) > nele = np.max(factors)+1 > bins = np.arange(0, nele, nele/nbins) > bins[-1] += 1 > for i in range(D): > darr[i] = np.digitize(factors[:,i],bins) - 1 > > #add weighted rows > darrind = darr[D-1] > for i in range(D-1): > darrind += darr[i]*nbins**(D-i-1) > return np.bincount(darrind) # return flat not reshaped > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From malkarouri at yahoo.co.uk Tue May 5 10:36:22 2009 From: malkarouri at yahoo.co.uk (Muhammad Alkarouri) Date: Tue, 5 May 2009 14:36:22 +0000 (GMT) Subject: [Numpy-discussion] linalg.svd not working? Message-ID: <164180.3450.qm@web24203.mail.ird.yahoo.com> Hi everyone, I have installed numpy 1.3.0 on Python 2.5.1 in an x86_64 machine, and it hangs when I do a numpy.test(verbose=10) on test_pinv (test_defmatrix.TestProperties) ... which I believe hangs on a call to numpy.linalg.svd. Can you please help me with this problem? The installation and configuration is probably a bit non-standard, so I am including the output of python -m numpy.distutils.system_info below. In particular, all the installation was done using CC='gcc -m32' to enforce 32 bit executables, as Python is a 32 bit executable here. Regards, Muhammad Alkarouri lapack_info: libraries lapack not found in /GWD/appbase/common/lib libraries lapack not found in /usr/local/lib FOUND: libraries = ['lapack'] library_dirs = ['/usr/lib'] language = f77 lapack_opt_info: lapack_mkl_info: mkl_info: libraries mkl,vml,guide not found in /GWD/appbase/common/lib libraries mkl,vml,guide not found in /usr/local/lib libraries mkl,vml,guide not found in /usr/lib NOT AVAILABLE NOT AVAILABLE atlas_threads_info: Setting PTATLAS=ATLAS libraries lapack_atlas not found in /users/d88/ma856388/lib __main__.atlas_threads_info Setting PTATLAS=ATLAS Setting PTATLAS=ATLAS FOUND: libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas'] library_dirs = ['/users/d88/ma856388/lib'] language = f77 customize GnuFCompiler Found executable /usr/bin/g77 gnu: no Fortran 90 compiler found gnu: no Fortran 90 compiler found customize GnuFCompiler gnu: no Fortran 90 compiler found gnu: no Fortran 90 compiler found customize GnuFCompiler using config compiling '_configtest.c': /* This file is generated from numpy/distutils/system_info.py */ void ATL_buildinfo(void); int main(void) { ATL_buildinfo(); return 0; } C compiler: gcc -m32 -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC compile options: '-c' gcc: _configtest.c gcc -m32 _configtest.o -m32 -fPIC -L/users/d88/ma856388/lib -llapack -lptf77blas -lptcblas -latlas -o _configtest ATLAS version 3.8.3 built by ma856388 on Fri May 1 14:56:25 BST 2009: UNAME : Linux stvwolx028 2.6.9-22.ELsmp #1 SMP Mon Sep 19 18:00:54 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux INSTFLG : -1 0 -a 1 ARCHDEFS : -DATL_OS_Linux -DATL_ARCH_P4E -DATL_CPUMHZ=3200 -DATL_SSE3 -DATL_SSE2 -DATL_SSE1 -DATL_GAS_x8632 F2CDEFS : -DAdd__ -DF77_INTEGER=int -DStringSunStyle CACHEEDGE: 8388608 F77 : g77, version GNU Fortran (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2) F77FLAGS : -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -fPIC -m32 -fPIC -m32 SMC : gcc, version gcc (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2) SMCFLAGS : -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -fPIC -m32 SKC : gcc, version gcc (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2) SKCFLAGS : -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -fPIC -m32 success! removing: _configtest.c _configtest.o _configtest FOUND: libraries = ['lapack', 'ptf77blas', 'ptcblas', 'atlas'] library_dirs = ['/users/d88/ma856388/lib'] language = f77 define_macros = [('ATLAS_INFO', '"\\"3.8.3\\""')] wx_info: Could not locate executable wx-config File not found: None. Cannot determine wx info. NOT AVAILABLE lapack_atlas_info: libraries lapack_atlas,f77blas,cblas,atlas not found in /users/d88/ma856388/lib libraries lapack_atlas not found in /users/d88/ma856388/lib libraries lapack_atlas,f77blas,cblas,atlas not found in /GWD/appbase/common/lib libraries lapack_atlas not found in /GWD/appbase/common/lib libraries lapack_atlas,f77blas,cblas,atlas not found in /usr/local/lib libraries lapack_atlas not found in /usr/local/lib libraries lapack_atlas,f77blas,cblas,atlas not found in /usr/lib/sse2 libraries lapack_atlas not found in /usr/lib/sse2 libraries lapack_atlas,f77blas,cblas,atlas not found in /usr/lib libraries lapack_atlas not found in /usr/lib __main__.lapack_atlas_info NOT AVAILABLE umfpack_info: libraries umfpack not found in /GWD/appbase/common/lib libraries umfpack not found in /usr/local/lib libraries umfpack not found in /usr/lib NOT AVAILABLE _pkg_config_info: Found executable /usr/bin/pkg-config NOT AVAILABLE lapack_atlas_threads_info: Setting PTATLAS=ATLAS libraries lapack_atlas,ptf77blas,ptcblas,atlas not found in /users/d88/ma856388/lib libraries lapack_atlas not found in /users/d88/ma856388/lib libraries lapack_atlas,ptf77blas,ptcblas,atlas not found in /GWD/appbase/common/lib libraries lapack_atlas not found in /GWD/appbase/common/lib libraries lapack_atlas,ptf77blas,ptcblas,atlas not found in /usr/local/lib libraries lapack_atlas not found in /usr/local/lib libraries lapack_atlas,ptf77blas,ptcblas,atlas not found in /usr/lib/sse2 libraries lapack_atlas not found in /usr/lib/sse2 libraries lapack_atlas,ptf77blas,ptcblas,atlas not found in /usr/lib libraries lapack_atlas not found in /usr/lib __main__.lapack_atlas_threads_info NOT AVAILABLE x11_info: FOUND: libraries = ['X11'] library_dirs = ['/usr/X11R6/lib'] blas_info: libraries blas not found in /GWD/appbase/common/lib libraries blas not found in /usr/local/lib FOUND: libraries = ['blas'] library_dirs = ['/usr/lib'] language = f77 fftw_info: libraries fftw3 not found in /GWD/appbase/common/lib libraries fftw3 not found in /usr/local/lib libraries fftw3 not found in /usr/lib fftw3 not found libraries rfftw,fftw not found in /GWD/appbase/common/lib libraries rfftw,fftw not found in /usr/local/lib libraries rfftw,fftw not found in /usr/lib fftw2 not found NOT AVAILABLE f2py_info: FOUND: sources = ['/users/d88/ma856388/lib/python/numpy/f2py/src/fortranobject.c'] include_dirs = ['/users/d88/ma856388/lib/python/numpy/f2py/src'] gdk_pixbuf_xlib_2_info: FOUND: libraries = ['gdk_pixbuf_xlib-2.0', 'gdk_pixbuf-2.0', 'm', 'gobject-2.0', 'gmodule-2.0', 'dl', 'glib-2.0'] extra_link_args = ['-Wl,--export-dynamic'] define_macros = [('GDK_PIXBUF_XLIB_2_INFO', '"\\"2.4.13\\""'), ('GDK_PIXBUF_XLIB_VERSION_2_4_13', None)] include_dirs = ['/usr/include/gtk-2.0', '/usr/include/glib-2.0', '/usr/lib64/glib-2.0/include'] dfftw_threads_info: libraries drfftw_threads,dfftw_threads not found in /GWD/appbase/common/lib libraries drfftw_threads,dfftw_threads not found in /usr/local/lib libraries drfftw_threads,dfftw_threads not found in /usr/lib dfftw threads not found NOT AVAILABLE atlas_blas_info: FOUND: libraries = ['f77blas', 'cblas', 'atlas'] library_dirs = ['/users/d88/ma856388/lib'] language = c fftw3_info: libraries fftw3 not found in /GWD/appbase/common/lib libraries fftw3 not found in /usr/local/lib libraries fftw3 not found in /usr/lib fftw3 not found NOT AVAILABLE blas_opt_info: blas_mkl_info: libraries mkl,vml,guide not found in /GWD/appbase/common/lib libraries mkl,vml,guide not found in /usr/local/lib libraries mkl,vml,guide not found in /usr/lib NOT AVAILABLE atlas_blas_threads_info: Setting PTATLAS=ATLAS Setting PTATLAS=ATLAS Setting PTATLAS=ATLAS FOUND: libraries = ['ptf77blas', 'ptcblas', 'atlas'] library_dirs = ['/users/d88/ma856388/lib'] language = c customize GnuFCompiler gnu: no Fortran 90 compiler found gnu: no Fortran 90 compiler found customize GnuFCompiler gnu: no Fortran 90 compiler found gnu: no Fortran 90 compiler found customize GnuFCompiler using config compiling '_configtest.c': /* This file is generated from numpy/distutils/system_info.py */ void ATL_buildinfo(void); int main(void) { ATL_buildinfo(); return 0; } C compiler: gcc -m32 -fno-strict-aliasing -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC compile options: '-c' gcc: _configtest.c gcc -m32 _configtest.o -m32 -fPIC -L/users/d88/ma856388/lib -lptf77blas -lptcblas -latlas -o _configtest ATLAS version 3.8.3 built by ma856388 on Fri May 1 14:56:25 BST 2009: UNAME : Linux stvwolx028 2.6.9-22.ELsmp #1 SMP Mon Sep 19 18:00:54 EDT 2005 x86_64 x86_64 x86_64 GNU/Linux INSTFLG : -1 0 -a 1 ARCHDEFS : -DATL_OS_Linux -DATL_ARCH_P4E -DATL_CPUMHZ=3200 -DATL_SSE3 -DATL_SSE2 -DATL_SSE1 -DATL_GAS_x8632 F2CDEFS : -DAdd__ -DF77_INTEGER=int -DStringSunStyle CACHEEDGE: 8388608 F77 : g77, version GNU Fortran (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2) F77FLAGS : -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -fPIC -m32 -fPIC -m32 SMC : gcc, version gcc (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2) SMCFLAGS : -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -fPIC -m32 SKC : gcc, version gcc (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2) SKCFLAGS : -fomit-frame-pointer -mfpmath=387 -O2 -falign-loops=4 -fPIC -m32 success! removing: _configtest.c _configtest.o _configtest FOUND: libraries = ['ptf77blas', 'ptcblas', 'atlas'] library_dirs = ['/users/d88/ma856388/lib'] language = c define_macros = [('ATLAS_INFO', '"\\"3.8.3\\""')] sfftw_info: libraries srfftw,sfftw not found in /GWD/appbase/common/lib libraries srfftw,sfftw not found in /usr/local/lib libraries srfftw,sfftw not found in /usr/lib sfftw not found NOT AVAILABLE xft_info: FOUND: libraries = ['Xft', 'X11', 'freetype', 'Xrender', 'fontconfig'] library_dirs = ['/usr/X11R6/lib64'] define_macros = [('XFT_INFO', '"\\"2.1.2.2\\""'), ('XFT_VERSION_2_1_2_2', None)] include_dirs = ['/usr/X11R6/include', '/usr/include/freetype2', '/usr/include/freetype2/config'] fft_opt_info: fftw2_info: libraries rfftw,fftw not found in /GWD/appbase/common/lib libraries rfftw,fftw not found in /usr/local/lib libraries rfftw,fftw not found in /usr/lib fftw2 not found NOT AVAILABLE dfftw_info: libraries drfftw,dfftw not found in /GWD/appbase/common/lib libraries drfftw,dfftw not found in /usr/local/lib libraries drfftw,dfftw not found in /usr/lib dfftw not found NOT AVAILABLE djbfft_info: NOT AVAILABLE NOT AVAILABLE gdk_x11_2_info: FOUND: libraries = ['gdk-x11-2.0', 'gdk_pixbuf-2.0', 'm', 'pangoxft-1.0', 'pangox-1.0', 'pango-1.0', 'gobject-2.0', 'gmodule-2.0', 'dl', 'glib-2.0'] extra_link_args = ['-Wl,--export-dynamic'] define_macros = [('GDK_X11_2_INFO', '"\\"2.4.13\\""'), ('GDK_X11_VERSION_2_4_13', None), ('XTHREADS', None), ('_REENTRANT', None), ('XUSE_MTSAFE_API', None)] include_dirs = ['/usr/include/gtk-2.0', '/usr/lib64/gtk-2.0/include', '/usr/X11R6/include', '/usr/include/pango-1.0', '/usr/include/freetype2', '/usr/include/freetype2/config', '/usr/include/glib-2.0', '/usr/lib64/glib-2.0/include'] agg2_info: NOT AVAILABLE numarray_info: NOT AVAILABLE blas_src_info: NOT AVAILABLE fftw_threads_info: libraries rfftw_threads,fftw_threads not found in /GWD/appbase/common/lib libraries rfftw_threads,fftw_threads not found in /usr/local/lib libraries rfftw_threads,fftw_threads not found in /usr/lib fftw threads not found NOT AVAILABLE _numpy_info: FOUND: define_macros = [('NUMERIC_VERSION', '"\\"23.8\\""'), ('NUMERIC', None)] gdk_info: FOUND: libraries = ['gdk', 'Xi', 'Xext', 'X11', 'm', 'glib'] library_dirs = ['/usr/X11R6/lib64'] define_macros = [('GDK_INFO', '"\\"1.2.10\\""'), ('GDK_VERSION_1_2_10', None)] include_dirs = ['/usr/include/gtk-1.2', '/usr/X11R6/include', '/usr/include/glib-1.2', '/usr/lib64/glib/include'] gtkp_x11_2_info: FOUND: libraries = ['gtk-x11-2.0', 'gdk-x11-2.0', 'atk-1.0', 'gdk_pixbuf-2.0', 'm', 'pangoxft-1.0', 'pangox-1.0', 'pango-1.0', 'gobject-2.0', 'gmodule-2.0', 'dl', 'glib-2.0'] extra_link_args = ['-Wl,--export-dynamic'] define_macros = [('GTKP_X11_2_INFO', '"\\"2.4.13\\""'), ('GTK_X11_VERSION_2_4_13', None), ('XTHREADS', None), ('_REENTRANT', None), ('XUSE_MTSAFE_API', None)] include_dirs = ['/usr/include/gtk-2.0', '/usr/lib64/gtk-2.0/include', '/usr/X11R6/include', '/usr/include/atk-1.0', '/usr/include/pango-1.0', '/usr/include/freetype2', '/usr/include/freetype2/config', '/usr/include/glib-2.0', '/usr/lib64/glib-2.0/include'] sfftw_threads_info: libraries srfftw_threads,sfftw_threads not found in /GWD/appbase/common/lib libraries srfftw_threads,sfftw_threads not found in /usr/local/lib libraries srfftw_threads,sfftw_threads not found in /usr/lib sfftw threads not found NOT AVAILABLE boost_python_info: NOT AVAILABLE freetype2_info: FOUND: libraries = ['freetype', 'z'] define_macros = [('FREETYPE2_INFO', '"\\"9.7.3\\""'), ('FREETYPE2_VERSION_9_7_3', None)] include_dirs = ['/usr/include/freetype2'] gdk_2_info: FOUND: libraries = ['gdk-x11-2.0', 'gdk_pixbuf-2.0', 'm', 'pangoxft-1.0', 'pangox-1.0', 'pango-1.0', 'gobject-2.0', 'gmodule-2.0', 'dl', 'glib-2.0'] extra_link_args = ['-Wl,--export-dynamic'] define_macros = [('GDK_2_INFO', '"\\"2.4.13\\""'), ('GDK_VERSION_2_4_13', None), ('XTHREADS', None), ('_REENTRANT', None), ('XUSE_MTSAFE_API', None)] include_dirs = ['/usr/include/gtk-2.0', '/usr/lib64/gtk-2.0/include', '/usr/X11R6/include', '/usr/include/pango-1.0', '/usr/include/freetype2', '/usr/include/freetype2/config', '/usr/include/glib-2.0', '/usr/lib64/glib-2.0/include'] lapack_src_info: NOT AVAILABLE gtkp_2_info: FOUND: libraries = ['gtk-x11-2.0', 'gdk-x11-2.0', 'atk-1.0', 'gdk_pixbuf-2.0', 'm', 'pangoxft-1.0', 'pangox-1.0', 'pango-1.0', 'gobject-2.0', 'gmodule-2.0', 'dl', 'glib-2.0'] extra_link_args = ['-Wl,--export-dynamic'] define_macros = [('GTKP_2_INFO', '"\\"2.4.13\\""'), ('GTK_VERSION_2_4_13', None), ('XTHREADS', None), ('_REENTRANT', None), ('XUSE_MTSAFE_API', None)] include_dirs = ['/usr/include/gtk-2.0', '/usr/lib64/gtk-2.0/include', '/usr/X11R6/include', '/usr/include/atk-1.0', '/usr/include/pango-1.0', '/usr/include/freetype2', '/usr/include/freetype2/config', '/usr/include/glib-2.0', '/usr/lib64/glib-2.0/include'] gdk_pixbuf_2_info: FOUND: libraries = ['gdk_pixbuf-2.0', 'm', 'gobject-2.0', 'gmodule-2.0', 'dl', 'glib-2.0'] extra_link_args = ['-Wl,--export-dynamic'] define_macros = [('GDK_PIXBUF_2_INFO', '"\\"2.4.13\\""'), ('GDK_PIXBUF_VERSION_2_4_13', None)] include_dirs = ['/usr/include/gtk-2.0', '/usr/include/glib-2.0', '/usr/lib64/glib-2.0/include'] amd_info: libraries amd not found in /GWD/appbase/common/lib libraries amd not found in /usr/local/lib libraries amd not found in /usr/lib NOT AVAILABLE atlas_info: libraries lapack_atlas not found in /users/d88/ma856388/lib __main__.atlas_info FOUND: libraries = ['lapack', 'f77blas', 'cblas', 'atlas'] library_dirs = ['/users/d88/ma856388/lib'] language = f77 Numeric_info: FOUND: define_macros = [('NUMERIC_VERSION', '"\\"23.8\\""'), ('NUMERIC', None)] numerix_info: numpy_info: FOUND: define_macros = [('NUMPY_VERSION', '"\\"1.3.0\\""'), ('NUMPY', None)] FOUND: define_macros = [('NUMPY_VERSION', '"\\"1.3.0\\""'), ('NUMPY', None)] From Chris.Barker at noaa.gov Tue May 5 11:20:59 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 05 May 2009 08:20:59 -0700 Subject: [Numpy-discussion] Structured array with no fields - possible? In-Reply-To: <1e2af89e0905042156x6aad087bj19109de1916b299e@mail.gmail.com> References: <1e2af89e0905022154o375b48b7u1f27da260f7286eb@mail.gmail.com> <49FD380A.9080600@ar.media.kyoto-u.ac.jp> <1e2af89e0905042156x6aad087bj19109de1916b299e@mail.gmail.com> Message-ID: <4A00595B.3040109@noaa.gov> Matthew Brett wrote: > I'm afraid what I need is some way of representing the fact that I > have read, from matlab, a structure with no fields (and therefore no > data) that can be - say - shape (10,2) - or any other. how about: >>> a = np.empty(size, dtype=np.object) >>> >>> a array([[None, None, None, None], [None, None, None, None], [None, None, None, None]], dtype=object) I also thinking of putting an empty as the items, but I couldn't figure out how to do that: >>> a[:] = () Traceback (most recent call last): File "", line 1, in ValueError: shape mismatch: objects cannot be broadcast to a single shape >>> a[0] = () Traceback (most recent call last): File "", line 1, in ValueError: shape mismatch: objects cannot be broadcast to a single shape Some folks think the way to spell a struct in python is a clas with only attributes, so: >>> class empty: ... def __repr__(self): ... return "empty class" ... >>> a[:] = empty() >>> a array([[empty class, empty class, empty class, empty class], [empty class, empty class, empty class, empty class], [empty class, empty class, empty class, empty class]], dtype=object) or you may be able to some trick with strides that would give you zero-size elements, though I suppose you'd need at least one byte allocated for the data pointer. Can you have an empty struct in C? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From charlesr.harris at gmail.com Tue May 5 11:24:53 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 May 2009 09:24:53 -0600 Subject: [Numpy-discussion] linalg.svd not working? In-Reply-To: <164180.3450.qm@web24203.mail.ird.yahoo.com> References: <164180.3450.qm@web24203.mail.ird.yahoo.com> Message-ID: On Tue, May 5, 2009 at 8:36 AM, Muhammad Alkarouri wrote: > > Hi everyone, > > I have installed numpy 1.3.0 on Python 2.5.1 in an x86_64 machine, and it > hangs when I do a numpy.test(verbose=10) on > test_pinv (test_defmatrix.TestProperties) ... > which I believe hangs on a call to numpy.linalg.svd. Can you please help me > with this problem? > > The installation and configuration is probably a bit non-standard, so I am > including the output of python -m numpy.distutils.system_info below. In > particular, all the installation was done using CC='gcc -m32' to enforce 32 > bit executables, as Python is a 32 bit executable here. > This is almost always an ATLAS problem. Where did your ATLAS come from and what distro are you running? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Tue May 5 11:26:04 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Tue, 05 May 2009 10:26:04 -0500 Subject: [Numpy-discussion] [review] py3k_bootstrap branch In-Reply-To: <5b8d13220905050233m14c4a1e1l80dd6f231e100d5a@mail.gmail.com> References: <5b8d13220905050233m14c4a1e1l80dd6f231e100d5a@mail.gmail.com> Message-ID: <4A005A8C.4060709@gmail.com> David Cournapeau wrote: > Hi, > > I spent some more time on making numpy.distutils runnable under python > 3. I finally made up to the point where it breaks at C code > compilation, so we can start working on the hard part. The branch is > there for review > > http://github.com/cournape/numpy/commits/py3k_bootstrap > > The code is quite ugly to be honest, but I have not found a better > way; suggestions are welcomed. The biggest pain is by far exception > catching (you can't do except IOError, e in python 3), and then print. > Most other things can be handled by careful application of 2to3 with > the fixers which keep python2 compatibility (print is unfortunately > not one of them). There are also a few python 3.* bugs in distutils (I > guess few C-based extensions made it for python 3 already). > > The rationale for making numpy.distutils runnable under both python2 > and python3 (instead of just applying 2to3 on it): > - it enables us to bootstrap our build process through the distutils > 2to3 command (which is supposed to convert code to python 3 from > python 2 sources on the fly). > - The few informations I found on non trivial port all made sure > their setup.py was python 2 and 3 compatible - which means > numpy.distutils for us. > - 2to3 is very slow (takes 5 minutes for me on numpy), so having to > apply it every time from pristine source for python 3 support would be > very painful IMHO. > > cheers, > > David > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > Hi, This is really impressive! I agree that there should only be one source for Python 2 and Python 3. Although it does mean that any new code must be compatible with both Python 2.4+ and Python 3.+. I have only been browsing some of the code and was wondering about the usage of print. In many cases it seems that the print statements are perhaps warnings. If so, should the print statements be changed to warnings? For example, I think, in setup.py 663d9e7, this clearly should be a warning. http://github.com/cournape/numpy/commit/663d9e7e29bfea0f7adc8de5ff0e9d83264c3962 print(" --- Could not run svn info --- ") Bruce From nwagner at iam.uni-stuttgart.de Tue May 5 11:50:02 2009 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Tue, 05 May 2009 17:50:02 +0200 Subject: [Numpy-discussion] cannot build numpy from trunk Message-ID: ... In file included from numpy/core/src/multiarray/ctors.c:16, from numpy/core/src/multiarray/multiarraymodule_onefile.c:13: numpy/core/src/multiarray/ctors.h: At top level: numpy/core/src/multiarray/ctors.h:68: warning: conflicting types for ?byte_swap_vector? numpy/core/src/multiarray/ctors.h:68: error: static declaration of ?byte_swap_vector? follows non-static declaration numpy/core/src/multiarray/scalarapi.c:640: error: previous implicit declaration of ?byte_swap_vector? was here error: Command "/usr/bin/gcc -fno-strict-aliasing -DNDEBUG -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 -fstack-protector -funwind-tables -fasynchronous-unwind-tables -g -fwrapv -fPIC -Inumpy/core/include -Ibuild/src.linux-x86_64-2.6/numpy/core/include/numpy -Inumpy/core/src/multiarray -Inumpy/core/src/umath -Inumpy/core/include -I/usr/include/python2.6 -Ibuild/src.linux-x86_64-2.6/numpy/core/src/multiarray -Ibuild/src.linux-x86_64-2.6/numpy/core/src/umath -c numpy/core/src/multiarray/multiarraymodule_onefile.c -o build/temp.linux-x86_64-2.6/numpy/core/src/multiarray/multiarraymodule_onefile.o" failed with exit status 1 Nils From cournape at gmail.com Tue May 5 11:50:43 2009 From: cournape at gmail.com (David Cournapeau) Date: Wed, 6 May 2009 00:50:43 +0900 Subject: [Numpy-discussion] [review] py3k_bootstrap branch In-Reply-To: <4A005A8C.4060709@gmail.com> References: <5b8d13220905050233m14c4a1e1l80dd6f231e100d5a@mail.gmail.com> <4A005A8C.4060709@gmail.com> Message-ID: <5b8d13220905050850i125ebe82x2ec026af4035b11c@mail.gmail.com> On Wed, May 6, 2009 at 12:26 AM, Bruce Southey wrote: > David Cournapeau wrote: >> Hi, >> >> I spent some more time on making numpy.distutils runnable under python >> 3. I finally made up to the point where it breaks at C code >> compilation, so we can start working on the hard part. The branch is >> there for review >> >> http://github.com/cournape/numpy/commits/py3k_bootstrap >> >> The code is quite ugly to be honest, but I have not found a better >> way; suggestions are welcomed. The biggest pain is by far exception >> catching (you can't do except IOError, e in python 3), and then print. >> Most other things can be handled by careful application of 2to3 with >> the fixers which keep python2 compatibility (print is unfortunately >> not one of them). There are also a few python 3.* bugs in distutils (I >> guess few C-based extensions made it for python 3 already). >> >> The rationale for making numpy.distutils runnable under both python2 >> and python3 (instead of just applying 2to3 on it): >> ?- it enables us to bootstrap our build process through the distutils >> 2to3 command (which is supposed to convert code to python 3 from >> python 2 sources on the fly). >> ?- The few informations I found on non trivial port all made sure >> their setup.py was python 2 and 3 compatible - which means >> numpy.distutils for us. >> ?- 2to3 is very slow (takes 5 minutes for me on numpy), so having to >> apply it every time from pristine source for python 3 support would be >> very painful IMHO. >> >> cheers, >> >> David >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > Hi, > This is really impressive! > > I agree that there should only be one source for Python 2 and Python 3. > Although it does mean that any new code must be compatible with both > Python 2.4+ and Python 3.+. That's almost impossible. It would be extremely painful to be source compatible. But we should aim at being able to produce most python 3 code from 2to3. > > I have only been browsing some of the code and was wondering about the > usage of print. In many cases it seems that the print statements are > perhaps warnings. If so, should the print statements be changed to warnings? yes, there are many things which could be done better. Ideally, we should first clean up numpy.distutils code, but that's not a very exciting task :) The goal is more reach something which works as quickly as possible, so that we can focus on the real issues (C code and design decision for strings vs bytes, etc...). David From charlesr.harris at gmail.com Tue May 5 12:04:11 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 May 2009 10:04:11 -0600 Subject: [Numpy-discussion] cannot build numpy from trunk In-Reply-To: References: Message-ID: On Tue, May 5, 2009 at 9:50 AM, Nils Wagner wrote: > ... > In file included from > numpy/core/src/multiarray/ctors.c:16, > from > numpy/core/src/multiarray/multiarraymodule_onefile.c:13: > numpy/core/src/multiarray/ctors.h: At top level: > numpy/core/src/multiarray/ctors.h:68: warning: conflicting > types for ?byte_swap_vector? > numpy/core/src/multiarray/ctors.h:68: error: static > declaration of ?byte_swap_vector? follows non-static > declaration > numpy/core/src/multiarray/scalarapi.c:640: error: previous > implicit declaration of ?byte_swap_vector? was here > error: Command "/usr/bin/gcc -fno-strict-aliasing -DNDEBUG > -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 > -fstack-protector -funwind-tables > -fasynchronous-unwind-tables -g -fwrapv -fPIC > -Inumpy/core/include > -Ibuild/src.linux-x86_64-2.6/numpy/core/include/numpy > -Inumpy/core/src/multiarray -Inumpy/core/src/umath > -Inumpy/core/include -I/usr/include/python2.6 > -Ibuild/src.linux-x86_64-2.6/numpy/core/src/multiarray > -Ibuild/src.linux-x86_64-2.6/numpy/core/src/umath -c > numpy/core/src/multiarray/multiarraymodule_onefile.c -o > > build/temp.linux-x86_64-2.6/numpy/core/src/multiarray/multiarraymodule_onefile.o" > failed with exit status 1 > What happens if you delete the build directory first? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From nwagner at iam.uni-stuttgart.de Tue May 5 12:12:36 2009 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Tue, 05 May 2009 18:12:36 +0200 Subject: [Numpy-discussion] cannot build numpy from trunk In-Reply-To: References: Message-ID: On Tue, 5 May 2009 10:04:11 -0600 Charles R Harris wrote: > On Tue, May 5, 2009 at 9:50 AM, Nils Wagner >wrote: > >> ... >> In file included from >> numpy/core/src/multiarray/ctors.c:16, >> from >> numpy/core/src/multiarray/multiarraymodule_onefile.c:13: >> numpy/core/src/multiarray/ctors.h: At top level: >> numpy/core/src/multiarray/ctors.h:68: warning: >>conflicting >> types for ?byte_swap_vector? >> numpy/core/src/multiarray/ctors.h:68: error: static >> declaration of ?byte_swap_vector? follows non-static >> declaration >> numpy/core/src/multiarray/scalarapi.c:640: error: >>previous >> implicit declaration of ?byte_swap_vector? was here >> error: Command "/usr/bin/gcc -fno-strict-aliasing >>-DNDEBUG >> -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 >> -fstack-protector -funwind-tables >> -fasynchronous-unwind-tables -g -fwrapv -fPIC >> -Inumpy/core/include >> -Ibuild/src.linux-x86_64-2.6/numpy/core/include/numpy >> -Inumpy/core/src/multiarray -Inumpy/core/src/umath >> -Inumpy/core/include -I/usr/include/python2.6 >> -Ibuild/src.linux-x86_64-2.6/numpy/core/src/multiarray >> -Ibuild/src.linux-x86_64-2.6/numpy/core/src/umath -c >> numpy/core/src/multiarray/multiarraymodule_onefile.c -o >> >> build/temp.linux-x86_64-2.6/numpy/core/src/multiarray/multiarraymodule_onefile.o" >> failed with exit status 1 >> > > What happens if you delete the build directory first? > > Chuck I have done that before ;-) Nils From matthew.brett at gmail.com Tue May 5 12:33:53 2009 From: matthew.brett at gmail.com (Matthew Brett) Date: Tue, 5 May 2009 09:33:53 -0700 Subject: [Numpy-discussion] Structured array with no fields - possible? In-Reply-To: <4A00595B.3040109@noaa.gov> References: <1e2af89e0905022154o375b48b7u1f27da260f7286eb@mail.gmail.com> <49FD380A.9080600@ar.media.kyoto-u.ac.jp> <1e2af89e0905042156x6aad087bj19109de1916b299e@mail.gmail.com> <4A00595B.3040109@noaa.gov> Message-ID: <1e2af89e0905050933p40f8ea0fi2726379b28e31ff5@mail.gmail.com> Hi, >> I'm afraid what I need is some way of representing the fact that I >> have read, from matlab, a structure with no fields (and therefore no >> data) that can be - say - shape (10,2) - or any other. > > how about: > ?>>> a = np.empty(size, dtype=np.object) > ?>>> > ?>>> a > array([[None, None, None, None], > ? ? ? ?[None, None, None, None], > ? ? ? ?[None, None, None, None]], dtype=object) Yes, that's the solution I came to in the end, the problem being that it is hard for the roundtrip (matlab->python->matlab) to tell that this Python thing should be converted to an empty struct; normally structs are numpy structured arrays, and I used object arrays for matlab cell arrays. Your empty class idea is good, and more obviously identifiable, at least to the code. But thanks for the thoughts, it's helpful to try and think it through, Matthew From dmitrey.kroshko at scipy.org Tue May 5 12:48:11 2009 From: dmitrey.kroshko at scipy.org (dmitrey) Date: Tue, 5 May 2009 09:48:11 -0700 (PDT) Subject: [Numpy-discussion] error building numpy: no file refecount.c Message-ID: Hi all, I've got the error during building numpy from latest svn snapshot - any ideas? D. ... executing numpy/core/code_generators/generate_numpy_api.py adding 'build/src.linux-x86_64-2.6/numpy/core/include/numpy/ __multiarray_api.h' to sources. numpy.core - nothing done with h_files = ['build/src.linux-x86_64-2.6/ numpy/core/include/numpy/config.h', 'build/src.linux-x86_64-2.6/numpy/ core/include/numpy/numpyconfig.h', 'build/src.linux-x86_64-2.6/numpy/ core/include/numpy/__multiarray_api.h'] building extension "numpy.core.multiarray" sources error: src/multiarray/refecount.c: No such file or directory From charlesr.harris at gmail.com Tue May 5 13:08:54 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 May 2009 11:08:54 -0600 Subject: [Numpy-discussion] error building numpy: no file refecount.c In-Reply-To: References: Message-ID: On Tue, May 5, 2009 at 10:48 AM, dmitrey wrote: > Hi all, > I've got the error during building numpy from latest svn snapshot - > any ideas? > D. > I would guess it is a consequence of David's ongoing breakup of the src files. Did you try the usual delete of the build directory? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue May 5 13:12:21 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 May 2009 11:12:21 -0600 Subject: [Numpy-discussion] error building numpy: no file refecount.c In-Reply-To: References: Message-ID: On Tue, May 5, 2009 at 11:08 AM, Charles R Harris wrote: > > > On Tue, May 5, 2009 at 10:48 AM, dmitrey wrote: > >> Hi all, >> I've got the error during building numpy from latest svn snapshot - >> any ideas? >> D. >> > > I would guess it is a consequence of David's ongoing breakup of the src > files. Did you try the usual delete of the build directory? > And David, it is probably time to slow down the grinding of src into little bits. I don't think it needs to be rushed. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From nmb at wartburg.edu Tue May 5 13:32:09 2009 From: nmb at wartburg.edu (Neil Martinsen-Burrell) Date: Tue, 05 May 2009 12:32:09 -0500 Subject: [Numpy-discussion] error building numpy: no file refecount.c In-Reply-To: References: Message-ID: <4A007819.8050808@wartburg.edu> On 2009-05-05 12:12 , Charles R Harris wrote: > > > On Tue, May 5, 2009 at 11:08 AM, Charles R Harris > > wrote: > > > > On Tue, May 5, 2009 at 10:48 AM, dmitrey > wrote: > > Hi all, > I've got the error during building numpy from latest svn snapshot - > any ideas? > D. > > > I would guess it is a consequence of David's ongoing breakup of the > src files. Did you try the usual delete of the build directory? > > > And David, it is probably time to slow down the grinding of src into > little bits. I don't think it needs to be rushed. Some bisection shows that the problem is not present in r6944, so for now, one can "svn up -r 6944" until David gets the problem resolved. While understanding that making sure the trunk builds on many platforms is a problem, I think that numpy could do better at keeping the trunk buildable and doing disruptive things on long-lived feature branches that could then be merged. -Neil From pav at iki.fi Tue May 5 13:38:37 2009 From: pav at iki.fi (Pauli Virtanen) Date: Tue, 5 May 2009 17:38:37 +0000 (UTC) Subject: [Numpy-discussion] error building numpy: no file refecount.c References: <4A007819.8050808@wartburg.edu> Message-ID: Tue, 05 May 2009 12:32:09 -0500, Neil Martinsen-Burrell wrote: [clip] > While understanding that making sure the trunk builds on many platforms > is a problem, I think that numpy could do better at keeping the trunk > buildable and doing disruptive things on long-lived feature branches > that could then be merged. I don't think broken trunk has often been a significant problem in Numpy in the past. Anyway, feature branches are good, and we have the buildbot.scipy.org, so there's no reason not to check it after committing. -- Pauli Virtanen From taste_of_r at yahoo.com Tue May 5 14:42:04 2009 From: taste_of_r at yahoo.com (Wei Su) Date: Tue, 5 May 2009 11:42:04 -0700 (PDT) Subject: [Numpy-discussion] How to download data directly from SQL into NumPy as a record array or structured array. Message-ID: <910467.47742.qm@web43516.mail.sp1.yahoo.com> ? Hi, Everyone: ? This is what I need to do everyday. Now I have to first save data as .csv file and the use csv2rec() to read the data as a record array. Anybody can give me some advice on how to directly get the data as record arrays? It will save me tons of time. ? Thanks in advance. ? Wei Su -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Tue May 5 14:44:31 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 5 May 2009 12:44:31 -0600 Subject: [Numpy-discussion] cannot build numpy from trunk In-Reply-To: References: Message-ID: On Tue, May 5, 2009 at 10:12 AM, Nils Wagner wrote: > On Tue, 5 May 2009 10:04:11 -0600 > Charles R Harris wrote: > > On Tue, May 5, 2009 at 9:50 AM, Nils Wagner > >wrote: > > > >> ... > >> In file included from > >> numpy/core/src/multiarray/ctors.c:16, > >> from > >> numpy/core/src/multiarray/multiarraymodule_onefile.c:13: > >> numpy/core/src/multiarray/ctors.h: At top level: > >> numpy/core/src/multiarray/ctors.h:68: warning: > >>conflicting > >> types for ?byte_swap_vector? > >> numpy/core/src/multiarray/ctors.h:68: error: static > >> declaration of ?byte_swap_vector? follows non-static > >> declaration > >> numpy/core/src/multiarray/scalarapi.c:640: error: > >>previous > >> implicit declaration of ?byte_swap_vector? was here > >> error: Command "/usr/bin/gcc -fno-strict-aliasing > >>-DNDEBUG > >> -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 > >> -fstack-protector -funwind-tables > >> -fasynchronous-unwind-tables -g -fwrapv -fPIC > >> -Inumpy/core/include > >> -Ibuild/src.linux-x86_64-2.6/numpy/core/include/numpy > >> -Inumpy/core/src/multiarray -Inumpy/core/src/umath > >> -Inumpy/core/include -I/usr/include/python2.6 > >> -Ibuild/src.linux-x86_64-2.6/numpy/core/src/multiarray > >> -Ibuild/src.linux-x86_64-2.6/numpy/core/src/umath -c > >> numpy/core/src/multiarray/multiarraymodule_onefile.c -o > >> > >> > build/temp.linux-x86_64-2.6/numpy/core/src/multiarray/multiarraymodule_onefile.o" > >> failed with exit status 1 > >> > > > > What happens if you delete the build directory first? > > > > Chuck > > I have done that before ;-) > Is this from the latest svn? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From nwagner at iam.uni-stuttgart.de Tue May 5 14:46:20 2009 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Tue, 05 May 2009 20:46:20 +0200 Subject: [Numpy-discussion] cannot build numpy from trunk In-Reply-To: References: Message-ID: On Tue, 5 May 2009 12:44:31 -0600 Charles R Harris wrote: > On Tue, May 5, 2009 at 10:12 AM, Nils Wagner > wrote: > >> On Tue, 5 May 2009 10:04:11 -0600 >> Charles R Harris wrote: >> > On Tue, May 5, 2009 at 9:50 AM, Nils Wagner >> >wrote: >> > >> >> ... >> >> In file included from >> >> numpy/core/src/multiarray/ctors.c:16, >> >> from >> >> >>numpy/core/src/multiarray/multiarraymodule_onefile.c:13: >> >> numpy/core/src/multiarray/ctors.h: At top level: >> >> numpy/core/src/multiarray/ctors.h:68: warning: >> >>conflicting >> >> types for ?byte_swap_vector? >> >> numpy/core/src/multiarray/ctors.h:68: error: static >> >> declaration of ?byte_swap_vector? follows non-static >> >> declaration >> >> numpy/core/src/multiarray/scalarapi.c:640: error: >> >>previous >> >> implicit declaration of ?byte_swap_vector? was here >> >> error: Command "/usr/bin/gcc -fno-strict-aliasing >> >>-DNDEBUG >> >> -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 >> >> -fstack-protector -funwind-tables >> >> -fasynchronous-unwind-tables -g -fwrapv -fPIC >> >> -Inumpy/core/include >> >> -Ibuild/src.linux-x86_64-2.6/numpy/core/include/numpy >> >> -Inumpy/core/src/multiarray -Inumpy/core/src/umath >> >> -Inumpy/core/include -I/usr/include/python2.6 >> >> >>-Ibuild/src.linux-x86_64-2.6/numpy/core/src/multiarray >> >> -Ibuild/src.linux-x86_64-2.6/numpy/core/src/umath -c >> >> numpy/core/src/multiarray/multiarraymodule_onefile.c >>-o >> >> >> >> >> build/temp.linux-x86_64-2.6/numpy/core/src/multiarray/multiarraymodule_onefile.o" >> >> failed with exit status 1 >> >> >> > >> > What happens if you delete the build directory first? >> > >> > Chuck >> >> I have done that before ;-) >> > > Is this from the latest svn? > > Chuck ------------------------------------------------------------------------ r6955 | cdavid | 2009-05-05 13:10:29 +0200 (Di, 05. Mai 2009) | 1 line Put buffer protocol in separate file. Nils From dsdale24 at gmail.com Tue May 5 14:57:45 2009 From: dsdale24 at gmail.com (Darren Dale) Date: Tue, 5 May 2009 14:57:45 -0400 Subject: [Numpy-discussion] [review] py3k_bootstrap branch In-Reply-To: <5b8d13220905050850i125ebe82x2ec026af4035b11c@mail.gmail.com> References: <5b8d13220905050233m14c4a1e1l80dd6f231e100d5a@mail.gmail.com> <4A005A8C.4060709@gmail.com> <5b8d13220905050850i125ebe82x2ec026af4035b11c@mail.gmail.com> Message-ID: On Tue, May 5, 2009 at 11:50 AM, David Cournapeau wrote: > On Wed, May 6, 2009 at 12:26 AM, Bruce Southey wrote: > > David Cournapeau wrote: > >> Hi, > >> > >> I spent some more time on making numpy.distutils runnable under python > >> 3. I finally made up to the point where it breaks at C code > >> compilation, so we can start working on the hard part. The branch is > >> there for review > >> > >> http://github.com/cournape/numpy/commits/py3k_bootstrap > >> > >> The code is quite ugly to be honest, but I have not found a better > >> way; suggestions are welcomed. The biggest pain is by far exception > >> catching (you can't do except IOError, e in python 3), and then print. > >> Most other things can be handled by careful application of 2to3 with > >> the fixers which keep python2 compatibility (print is unfortunately > >> not one of them). There are also a few python 3.* bugs in distutils (I > >> guess few C-based extensions made it for python 3 already). > >> > >> The rationale for making numpy.distutils runnable under both python2 > >> and python3 (instead of just applying 2to3 on it): > >> - it enables us to bootstrap our build process through the distutils > >> 2to3 command (which is supposed to convert code to python 3 from > >> python 2 sources on the fly). > >> - The few informations I found on non trivial port all made sure > >> their setup.py was python 2 and 3 compatible - which means > >> numpy.distutils for us. > >> - 2to3 is very slow (takes 5 minutes for me on numpy), so having to > >> apply it every time from pristine source for python 3 support would be > >> very painful IMHO. > >> > >> cheers, > >> > >> David > >> _______________________________________________ > >> Numpy-discussion mailing list > >> Numpy-discussion at scipy.org > >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > Hi, > > This is really impressive! > > > > I agree that there should only be one source for Python 2 and Python 3. > > Although it does mean that any new code must be compatible with both > > Python 2.4+ and Python 3.+. > > That's almost impossible. It would be extremely painful to be source > compatible. But we should aim at being able to produce most python 3 > code from 2to3. > There is a lot of interest in a 3to2 tool, and I have read speculation ( http://sayspy.blogspot.com/2009/04/pycon-2009-recap-best-pycon-ever.html) that going from 3 to 2 should be easier than the other way around. Maybe it will be worth keeping an eye on. Darren -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgmdevlist at gmail.com Tue May 5 15:15:23 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Tue, 5 May 2009 15:15:23 -0400 Subject: [Numpy-discussion] How to download data directly from SQL into NumPy as a record array or structured array. In-Reply-To: <910467.47742.qm@web43516.mail.sp1.yahoo.com> References: <910467.47742.qm@web43516.mail.sp1.yahoo.com> Message-ID: <50254212-E8F8-4C79-B30F-8308687EA685@gmail.com> On May 5, 2009, at 2:42 PM, Wei Su wrote: > > Hi, Everyone: > > This is what I need to do everyday. Now I have to first save data > as .csv file and the use csv2rec() to read the data as a record > array. Anybody can give me some advice on how to directly get the > data as record arrays? It will save me tons of time. Wei, Have a look to numpi.lib.io.genfromtxt, that should give you some ideas. From stefan at sun.ac.za Tue May 5 15:35:38 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Tue, 5 May 2009 21:35:38 +0200 Subject: [Numpy-discussion] error building numpy: no file refecount.c In-Reply-To: <4A007819.8050808@wartburg.edu> References: <4A007819.8050808@wartburg.edu> Message-ID: <9457e7c80905051235s1afde19ak61179ddfcf66cf84@mail.gmail.com> 2009/5/5 Neil Martinsen-Burrell : > While understanding that making sure the trunk builds on many platforms > is a problem, I think that numpy could do better at keeping the trunk > buildable and doing disruptive things on long-lived feature branches > that could then be merged. David frequently submits his branches for review (although few people take the time to comment). If he breaks the build once in a hundred commits, there really is no reason to complain. If you want to live on the bleeding edge, you must be prepared to bleed a little! Regards St?fan From taste_of_r at yahoo.com Tue May 5 19:39:13 2009 From: taste_of_r at yahoo.com (Wei Su) Date: Tue, 5 May 2009 16:39:13 -0700 (PDT) Subject: [Numpy-discussion] How to convert a list into a structured array? Message-ID: <202381.57946.qm@web43504.mail.sp1.yahoo.com> ? Hi, Francesc: ? Thanks a lot for offering me help. My code is really simple as of now. ? ********************************************************************************** from pyodbc import * from rpy import * cnxn = connect('DRIVER={SQL Server};SERVER=srdata01\\sql2k5;DATABASE=Qai;UID=;PWD=') cursor = cnxn.cursor() cursor.execute("select IsrCode, MstrName from qai..qaiLinkBase") data = cursor.fetchall() cursor.close() *************************************************** The result, data, I got from the above code tends to be a giant list, which is very hard to handle. My goal is to to turn it into a record array so that i can access the field directly by name or by index. My data is typically numerical, character and datetime variables. no other complications. ? >From the above code, you can also see that I used R for some time. But I have to switch to something else because I sometimes cannot even download all my data via R due to its memory limit under windows. I thought NumPy might be the solution. But I am not sure. Anybody can let me know whether Python has a memory limit? or can I use virtual memory by calling some Python module? ? Thanks in advance. ? Wei? Su ? ? --- On Tue, 5/5/09, Francesc Alted wrote: From: Francesc Alted Subject: Re: [Numpy-discussion] How to convert a list into a structured array? To: "Discussion of Numerical Python" Date: Tuesday, May 5, 2009, 7:10 AM Welcome Wei! A Monday 04 May 2009, Wei Su escrigu?: > Hi,All: > ? > My first post! I am very excited to find out structured array (record > array) in Python. Since I do data manipulation every day, this is > truly great. However, I typically download data using pyodbc, the > default output is a big list. So I am wondering how to convert that > big list into a structured array? using array() will turn it into a > text array, afaik. it is even better if anybody can show me some > tricks to download the data directly as a structured array. > Thanks a lot for the help. Please, could you provide an example of the list that you are getting from your database?? With that we can probably figure out your needs much better. > BTW: I am also interested in Python's ability to handle large data. > Any hints or suggestion is welcome. This is also a bit generic question.? What kind of data you have to deal with?? What sort of operations do you want to perform over it?? Do you need a lot of speed or flexibility is more important?? Some example? Cheers, -- Francesc Alted "One would expect people to feel threatened by the 'giant brains or machines that think'.? In fact, the frightening computer becomes less frightening if it is used only to simulate a familiar noncomputer." -- Edsger W. Dykstra ???"On the cruelty of really teaching computer science" _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Tue May 5 20:40:03 2009 From: cournape at gmail.com (David Cournapeau) Date: Wed, 6 May 2009 09:40:03 +0900 Subject: [Numpy-discussion] error building numpy: no file refecount.c In-Reply-To: <4A007819.8050808@wartburg.edu> References: <4A007819.8050808@wartburg.edu> Message-ID: <5b8d13220905051740p202fae1bxa0c023d4f4aaa9a8@mail.gmail.com> On Wed, May 6, 2009 at 2:32 AM, Neil Martinsen-Burrell wrote: > While understanding that making sure the trunk builds on many platforms > is a problem, I think that numpy could do better at keeping the trunk > buildable and doing disruptive things on long-lived feature branches > that could then be merged. The trunk is rarely broken more than a few hours. In this case, it is just a file which was not added to the trunk, hence I did not detect the problem. Using feature branches would not have prevented the problem, cheers, David From cournape at gmail.com Tue May 5 21:01:07 2009 From: cournape at gmail.com (David Cournapeau) Date: Wed, 6 May 2009 10:01:07 +0900 Subject: [Numpy-discussion] error building numpy: no file refecount.c In-Reply-To: References: Message-ID: <5b8d13220905051801wee8fdax3cca34cb407fcd67@mail.gmail.com> On Wed, May 6, 2009 at 1:48 AM, dmitrey wrote: > Hi all, > I've got the error during building numpy from latest svn snapshot - > any ideas? The problem should be fixed now, David From myeates at jpl.nasa.gov Tue May 5 21:01:49 2009 From: myeates at jpl.nasa.gov (Mathew Yeates) Date: Tue, 05 May 2009 18:01:49 -0700 Subject: [Numpy-discussion] difficult optimization problem Message-ID: <4A00E17D.6040203@jpl.nasa.gov> Hi I'm trying to solve an optimization problem where the search domain is limited. Suppose I want to minimize the function f(x,y) but f(x,y) is only valid over a subset (unknown without calling f) of (x,y)? I tried looking at OpenOpt but ... kind of unusable without some documentation. Thanks Mathew From cournape at gmail.com Tue May 5 21:02:04 2009 From: cournape at gmail.com (David Cournapeau) Date: Wed, 6 May 2009 10:02:04 +0900 Subject: [Numpy-discussion] cannot build numpy from trunk In-Reply-To: References: Message-ID: <5b8d13220905051802i40d2a8d4n8b611cbc24e0f027@mail.gmail.com> On Wed, May 6, 2009 at 12:50 AM, Nils Wagner wrote: > ... > In file included from > numpy/core/src/multiarray/ctors.c:16, > ? ? ? ? ? ? ? ? ?from > numpy/core/src/multiarray/multiarraymodule_onefile.c:13: > numpy/core/src/multiarray/ctors.h: At top level: > numpy/core/src/multiarray/ctors.h:68: warning: conflicting > types for ?byte_swap_vector? > numpy/core/src/multiarray/ctors.h:68: error: static > declaration of ?byte_swap_vector? follows non-static > declaration > numpy/core/src/multiarray/scalarapi.c:640: error: previous > implicit declaration of ?byte_swap_vector? was here > error: Command "/usr/bin/gcc -fno-strict-aliasing -DNDEBUG > -fmessage-length=0 -O2 -Wall -D_FORTIFY_SOURCE=2 > -fstack-protector -funwind-tables > -fasynchronous-unwind-tables -g -fwrapv -fPIC > -Inumpy/core/include > -Ibuild/src.linux-x86_64-2.6/numpy/core/include/numpy > -Inumpy/core/src/multiarray -Inumpy/core/src/umath > -Inumpy/core/include -I/usr/include/python2.6 > -Ibuild/src.linux-x86_64-2.6/numpy/core/src/multiarray > -Ibuild/src.linux-x86_64-2.6/numpy/core/src/umath -c > numpy/core/src/multiarray/multiarraymodule_onefile.c -o > build/temp.linux-x86_64-2.6/numpy/core/src/multiarray/multiarraymodule_onefile.o" > failed with exit status 1 Should be fixed now, David From cournape at gmail.com Tue May 5 21:12:20 2009 From: cournape at gmail.com (David Cournapeau) Date: Wed, 6 May 2009 10:12:20 +0900 Subject: [Numpy-discussion] [review] py3k_bootstrap branch In-Reply-To: References: <5b8d13220905050233m14c4a1e1l80dd6f231e100d5a@mail.gmail.com> <4A005A8C.4060709@gmail.com> <5b8d13220905050850i125ebe82x2ec026af4035b11c@mail.gmail.com> Message-ID: <5b8d13220905051812o39874428k6077cd5d9f3dc225@mail.gmail.com> On Wed, May 6, 2009 at 3:57 AM, Darren Dale wrote: > > There is a lot of interest in a 3to2 tool, and I have read speculation > (http://sayspy.blogspot.com/2009/04/pycon-2009-recap-best-pycon-ever.html) > that going from 3 to 2 should be easier than the other way around. Maybe it > will be worth keeping an eye on. I can see how this could help people who have a working python 3 implementation, but in numpy's case, I am not so sure. Do you know which version of python is targeted by 3to2 ? 2.6, 2.5 or even below ? cheers, David From mark.wendell at gmail.com Tue May 5 22:37:00 2009 From: mark.wendell at gmail.com (Mark Wendell) Date: Tue, 5 May 2009 20:37:00 -0600 Subject: [Numpy-discussion] array membership test? Message-ID: Is there a numpy equivalent of python's membership test (eg, "5 in [1,3,4,5]" returns True)? I'd like a quick way to test if a given number is in an array, without stepping through the elements individually. I realize this can be tricky with floats, but if there is such a thing for ints, that would be great. thanks Mark -- -- Mark Wendell From josef.pktd at gmail.com Tue May 5 23:42:26 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 5 May 2009 23:42:26 -0400 Subject: [Numpy-discussion] array membership test? In-Reply-To: References: Message-ID: <1cd32cbb0905052042v7725cd97i279936d435aa4ae6@mail.gmail.com> On Tue, May 5, 2009 at 10:37 PM, Mark Wendell wrote: > Is there a numpy equivalent of python's membership test (eg, ?"5 in > [1,3,4,5]" returns True)? I'd like a quick way to test if a given > number is in an array, without stepping through the elements > individually. I realize this can be tricky with floats, but if there > is such a thing for ints, that would be great. > >>> import numpy as np >>> (5==np.array([1,3,4,5])).any() True >>> np.setmember1d([5],[1,3,4,5]) array([ True], dtype=bool) >>> np.setmember1d([3,5,6],[1,3,4,5]) array([ True, True, False], dtype=bool) >>> np.setmember1d([3.1,5.3,6.],[1.,3.1,4.,5.3]) array([ True, True, False], dtype=bool) setmember1d requires unique elements in both arrays, but there is a version for non-unique arrays in a trac ticket Josef From mail at stevesimmons.com Wed May 6 01:46:54 2009 From: mail at stevesimmons.com (Stephen Simmons) Date: Wed, 06 May 2009 07:46:54 +0200 Subject: [Numpy-discussion] How to convert a list into a structured array? In-Reply-To: <202381.57946.qm@web43504.mail.sp1.yahoo.com> References: <202381.57946.qm@web43504.mail.sp1.yahoo.com> Message-ID: <4A01244E.8030508@stevesimmons.com> Wei Su wrote: > > Hi, Francesc: > > Thanks a lot for offering me help. My code is really simple as of now. > > ********************************************************************************** > > from pyodbc import * > > from rpy import * > > cnxn = connect(/'DRIVER={SQL > Server};SERVER=srdata01\\sql2k5;DATABASE=_Qai_;UID=;PWD='/) > > cursor = cnxn.cursor() > > cursor.execute(/"select IsrCode, MstrName from _qai_..qaiLinkBase"/) > > data = cursor.fetchall() > > cursor.close() > > *************************************************** > The result, data, I got from the above code tends to be a giant list, > which is very hard to handle. My goal is to to turn it into a record > array so that i can access the field directly by name or by index. My > data is typically numerical, character and datetime variables. no > other complications. > > From the above code, you can also see that I used R for some time. But > I have to switch to something else because I sometimes cannot even > download all my data via R due to its memory limit under windows. I > thought NumPy might be the solution. But I am not sure. Anybody can > let me know whether Python has a memory limit? or can I use virtual > memory by calling some Python module? > > Thanks in advance. > > Wei Su > > > Hi Wei Su, Below is an example from the code I use to read text files into recarrays. The same approach can be used for your SQL data by redefining the inner iterator(path) function to execute your SQL query. If your data is really big, you could also use the PyTables package (written by Francesc actually) to store SQL extracts as numpy-compatible HDF tables. The HDF format can compress the data transparently, so the resulting data files are 1/10 the size of an equivalent text dump. You can then read any or all rows into memory for subsequent process using table.read[row_from, row_to], thereby avoiding running out of memory if your dataset is really big. PyTables/HDF is also really fast for reading. As an example, my three year old laptop with slow hard drive achieves up to 250,000 row per second speeds on GROUP BY-style subtotals. This uses PyTables for storing the data and numpy's bincount() function for doing the aggregation. Stephen def text_file_to_sorted_numpy_array(path, dtype, row_fn, max_rows=None, header=None, order_by=None, min_row_length=None): """ Read a database extract into a numpy recarray, which is possibly sorted then returned. path Path to the text file. dtype String giving column names and numpy data types e.g. 'COL1,S8 COL2,i4' row_fn Optional function splitting a row into a list that is compatible with the numpy array's dtype. The function can indicate the row should be skipped by returning None. If not given, the row has leading and trailing whitespace removed and then is split on '|'. order_by Optional list of column names used to sort the array. header Optional prefix for a header line. If given, there must be a line with this prefix within the first 20 lines. Any leading whitespace is removed before checking. max_rows Optional maximum number of rows that a file will contain. min_row_length Optional length of row in text file, used to estimate upper bound on size of final array. One or both of max_rows and min_row_length must be given. """ # Create a numpy array large enough to hold the entire file in memory if min_row_length: file_size = os.stat(path).st_size num_rows_upper_bound = file_size/min_row_length else: num_rows_upper_bound = max_rows if num_rows_upper_bound is None: raise ValueError('No information given about size of the final array') if max_rows and num_rows_upper_bound>max_rows: raise ValueError("'%s' is %d bytes long, too large to fit in memory" % (os.path.basename(path), file_size)) # Define an iterator that reads the data file def iterator(path): # Read the file with file(path, 'rU') as fh: ftype, prefix = os.path.splitext(os.path.basename(path))[0].split('-', 2) pb = ProgressBar(prefix=prefix) # Read the data lines ctr = idx = 0 for s in fh: s = s.strip() if s in ('\x1A', '-', '') or s.startswith('-------'): # Empty lines after end of real data continue res = row_fn(s) if res: yield res ctr+=1 if ctr%1000==0: total_rows = float(file_size*ctr)/float(fh.tell()) pb(ctr, total=total_rows) pb(ctr, last=True) # Create an empty array to hold all data, then fill in blocks of 5000 rows # Doing this by blocks is 4x faster than adding one row at a time. dtype = list( tuple(x.split(',')) for x in dtype.split() ) arr = numpy.zeros(num_rows_upper_bound, dtype) def block_iterator(iterator, blk_size): "Group iterator into lists with blk_size elements" res = [] for i in iterator: res.append(i) if len(res)==blk_size: yield res res = [] if res: yield res # Now fill the array i = 0 try: for blk in block_iterator(iterator(path), 5000): b = len(blk) tmp = numpy.rec.fromrecords(blk, dtype=dtype, shape=b) arr[i:i+b] = tmp i+=b except KeyboardInterrupt: pass arr = arr[:i] # Remove unused rows at the end of the array # Sort array if required if order_by: print " Sorting %d-row array on %r" % (len(arr), order_by) arr.sort(order=order_by) # Return the final array return arr -------------- next part -------------- An HTML attachment was scrubbed... URL: From taste_of_r at yahoo.com Wed May 6 01:57:55 2009 From: taste_of_r at yahoo.com (Wei Su) Date: Tue, 5 May 2009 22:57:55 -0700 (PDT) Subject: [Numpy-discussion] How to convert a list into a structured array? Message-ID: <940469.68210.qm@web43513.mail.sp1.yahoo.com> ? Hi, Stephen: ? This is fantastic. I shall read your codes carefully next week. (I am taking the rest of the week off for vacation.) Hopefully I am not?so dumb that I need to ask again. ? Regards, ? Wei Su --- On Wed, 5/6/09, Stephen Simmons wrote: From: Stephen Simmons Subject: Re: [Numpy-discussion] How to convert a list into a structured array? To: "Discussion of Numerical Python" Date: Wednesday, May 6, 2009, 5:46 AM Wei Su wrote: ? Hi, Francesc: ? Thanks a lot for offering me help. My code is really simple as of now. ? ********************************************************************************** from pyodbc import * from rpy import * cnxn = connect('DRIVER={SQL Server};SERVER=srdata01\\sql2k5;DATABASE=Qai;UID=;PWD=') cursor = cnxn.cursor() cursor.execute("select IsrCode, MstrName from qai..qaiLinkBase") data = cursor.fetchall() cursor.close() *************************************************** The result, data, I got from the above code tends to be a giant list, which is very hard to handle. My goal is to to turn it into a record array so that i can access the field directly by name or by index. My data is typically numerical, character and datetime variables. no other complications. ? >From the above code, you can also see that I used R for some time. But I have to switch to something else because I sometimes cannot even download all my data via R due to its memory limit under windows. I thought NumPy might be the solution. But I am not sure. Anybody can let me know whether Python has a memory limit? or can I use virtual memory by calling some Python module? ? Thanks in advance. ? Wei? Su ? ? Hi Wei Su, Below is an example from the code I use to read text files into recarrays. The same approach can be used for your SQL data by redefining the inner iterator(path) function to execute your SQL query. If your data is really big, you could also use the PyTables package (written by Francesc actually) to store SQL extracts as numpy-compatible HDF tables. The HDF format can compress the data transparently, so the resulting data files are 1/10 the size of an equivalent text dump. You can then read any or all rows into memory for subsequent process using table.read[row_from, row_to], thereby avoiding running out of memory if your dataset is really big. PyTables/HDF is also really fast for reading. As an example, my three year old laptop with slow hard drive achieves up to 250,000 row per second speeds on GROUP BY-style subtotals. This uses PyTables for storing the data and numpy's bincount() function for doing the aggregation. Stephen def text_file_to_sorted_numpy_array(path, dtype, row_fn, max_rows=None, header=None, ?????????????????????????????????????????? order_by=None, min_row_length=None): ??? """ ??? Read a database extract into a numpy recarray, which is possibly sorted then returned. ??????? path??????????? Path to the text file. ??????? dtype?????????? String giving column names and numpy data types ??????????????????????? e.g. 'COL1,S8 COL2,i4' ??????? row_fn????????? Optional function splitting a row into a list that is ??????????????????????? compatible with the numpy array's dtype. The function ??????????????????????? can indicate the row should be skipped by returning ??????????????????????? None. If not given, the row has leading and trailing ??????????????????????? whitespace removed and then is split on '|'. ??????? order_by??????? Optional list of column names used to sort the array. ??????? header????????? Optional prefix for a header line. If given, there ??????????????????????? must be a line with this prefix within the first 20 lines. ??????????????????????? Any leading whitespace is removed before checking. ??????? max_rows??????? Optional maximum number of rows that a file will contain. ??????? min_row_length? Optional length of row in text file, used to estimate ??????????????????????? upper bound on size of final array. One or both of ??????????????????????? max_rows and min_row_length must be given. ??? """ ??? # Create a numpy array large enough to hold the entire file in memory ??? if min_row_length: ??????? file_size = os.stat(path).st_size ??????? num_rows_upper_bound = file_size/min_row_length ??? else: ??????? num_rows_upper_bound = max_rows ??? if num_rows_upper_bound is None: ??????? raise ValueError('No information given about size of the final array') ??? if max_rows and num_rows_upper_bound>max_rows: ??????? raise ValueError("'%s' is %d bytes long, too large to fit in memory" % (os.path.basename(path), file_size)) ??? # Define an iterator that reads the data file??? ??? def iterator(path): ??????? # Read the file ??????? with file(path, 'rU') as fh: ??????????? ftype, prefix = os.path.splitext(os.path.basename(path))[0].split('-', 2) ??????????? pb = ProgressBar(prefix=prefix) ??????????? # Read the data lines ??????????? ctr = idx = 0??????????? ??????????? for s in fh: ??????????????? s = s.strip() ??????????????? if s in ('\x1A', '-', '') or s.startswith('-------'): ??????????????????? # Empty lines after end of real data ??????????????????? continue ??????????????? res = row_fn(s) ??????????????? if res: ??????????????????? yield res ??????????????? ctr+=1 ??????????????? if ctr%1000==0: ??????????????????? total_rows = float(file_size*ctr)/float(fh.tell()) ??????????????????? pb(ctr, total=total_rows) ??????????? pb(ctr, last=True) ??? # Create an empty array to hold all data, then fill in blocks of 5000 rows ??? # Doing this by blocks is 4x faster than adding one row at a time.. ??? dtype = list( tuple(x.split(',')) for x in dtype.split() ) ??? arr = numpy.zeros(num_rows_upper_bound, dtype) ??? def block_iterator(iterator, blk_size): ??????? "Group iterator into lists with blk_size elements" ??????? res = [] ??????? for i in iterator: ??????????? res.append(i) ??????????? if len(res)==blk_size: ??????????????? yield res ??????????????? res = [] ??????? if res: ??????????? yield res ??? # Now fill the array??????????? ??? i = 0 ??? try: ??????? for blk in block_iterator(iterator(path), 5000): ??????????? b = len(blk) ??????????? tmp = numpy.rec.fromrecords(blk, dtype=dtype, shape=b) ??????????? arr[i:i+b] = tmp ??????????? i+=b ??? except KeyboardInterrupt: ??????? pass ??? arr = arr[:i]?????? # Remove unused rows at the end of the array ??? # Sort array if required ??? if order_by: ??????? print "? Sorting %d-row array on %r" % (len(arr), order_by) ??????? arr.sort(order=order_by) ??? # Return the final array ??? return arr ??? -----Inline Attachment Follows----- _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Wed May 6 02:03:58 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 05 May 2009 23:03:58 -0700 Subject: [Numpy-discussion] OS-X binary name... Message-ID: <4A01284E.6070104@noaa.gov> Hi all, The binary for OS-X on sourceforge is called: numpy-1.3.0-py2.5-macosx10.5.dmg However, as far as I can tell, it works just fine on OS-X 10.4, and maybe even 10.3.9. Perhaps a re-naming is in order? But to what? I'd say: numpy-1.3.0-py2.5-macosx10.4.dmg but would folks think that it's only for 10.4? maybe: numpy-1.3.0-py2.5-macosx-python.org.dmg to indicate that it's for the python.org build of python2.5, though I'v never seen anyone use that convention. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From malkarouri at yahoo.co.uk Wed May 6 03:56:48 2009 From: malkarouri at yahoo.co.uk (Muhammad Alkarouri) Date: Wed, 6 May 2009 07:56:48 +0000 (GMT) Subject: [Numpy-discussion] linalg.svd not working? In-Reply-To: Message-ID: <798430.55699.qm@web24204.mail.ird.yahoo.com> > Date: Tue, 5 May 2009 09:24:53 -0600 > From: Charles R Harris ... > This is almost always an ATLAS problem. Where did your > ATLAS come from and > what distro are you running? You are probably right. I compiled and installed ATLAS from source. The distro is Redhat Enterprise Linux 4. I had to because the ones from the distro are compiled targetting 64-bit architecture. I largely followed the instructions at http://www.scipy.org/Installing_SciPy/Linux#head-eecf834fad12bf7a625752528547588a93f8263c . Built lapack 3.1.1, then copied the library to ATLAS and compiled per instructions. The compilers are CC='gcc -m32' version 3.4.4 (enforcing 32 bit compilation), g77 3.4.4. At various points I needed to define flags to ensure 32 bits, though not in numpy compilation. I have gfortran on the system but I didn't use it. liblapack.so and other .so files are linked to libg2c.so not libgfortran. ATLAS (3.8.3) was configured with the additional -32 flag as well. make check works nicely. What should I check in order to find the error with ATLAS configuration and/or installation? Or is there a 32 bit version binary I can download/ use (even if only for testing)? Regards, Muhammad Alkarouri From david at ar.media.kyoto-u.ac.jp Wed May 6 03:45:59 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 06 May 2009 16:45:59 +0900 Subject: [Numpy-discussion] linalg.svd not working? In-Reply-To: <798430.55699.qm@web24204.mail.ird.yahoo.com> References: <798430.55699.qm@web24204.mail.ird.yahoo.com> Message-ID: <4A014037.5080205@ar.media.kyoto-u.ac.jp> Muhammad Alkarouri wrote: >> Date: Tue, 5 May 2009 09:24:53 -0600 >> From: Charles R Harris >> > ... > >> This is almost always an ATLAS problem. Where did your >> ATLAS come from and >> what distro are you running? >> > > You are probably right. I compiled and installed ATLAS from source. The distro is Redhat Enterprise Linux 4. I had to because the ones from the distro are compiled targetting 64-bit architecture. > > I largely followed the instructions at http://www.scipy.org/Installing_SciPy/Linux#head-eecf834fad12bf7a625752528547588a93f8263c . Built lapack 3.1.1, then copied the library to ATLAS and compiled per instructions. The compilers are CC='gcc -m32' version 3.4.4 (enforcing 32 bit compilation), g77 3.4.4. At various points I needed to define flags to ensure 32 bits, though not in numpy compilation. I have gfortran on the system but I didn't use it. > What does ldd lapack_lite.so returns (lapack_lite.so is in numpy/linalg, in your installed directory) ? It may be that numpy uses gfortran, whereas ATLAS is built with g77. gfortran and g77 should not be mixed, cheers, David From sebastian.walter at gmail.com Wed May 6 04:07:12 2009 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Wed, 6 May 2009 10:07:12 +0200 Subject: [Numpy-discussion] difficult optimization problem In-Reply-To: <4A00E17D.6040203@jpl.nasa.gov> References: <4A00E17D.6040203@jpl.nasa.gov> Message-ID: I tried looking at your question but ... kind of unusable without some documentation. You need to give at least the following information: what kind of optimization problem? LP,NLP, Mixed Integer LP, Stochastic, semiinfinite, semidefinite? Most solvers require the problem in the following form min_x f(x) subject to g(x)<=0 h(x) = 0 In your case that would mean: g(x) = g(x,f(x)). On Wed, May 6, 2009 at 3:01 AM, Mathew Yeates wrote: > Hi > I'm trying to solve an optimization problem where the search domain is > limited. Suppose I want to minimize the function f(x,y) but f(x,y) is > only valid over a subset (unknown without calling f) of (x,y)? > > I tried looking at OpenOpt but ... kind of unusable without some > documentation. > > Thanks > Mathew > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From malkarouri at yahoo.co.uk Wed May 6 05:15:39 2009 From: malkarouri at yahoo.co.uk (Muhammad Alkarouri) Date: Wed, 6 May 2009 09:15:39 +0000 (GMT) Subject: [Numpy-discussion] linalg.svd not working? In-Reply-To: <4A014037.5080205@ar.media.kyoto-u.ac.jp> Message-ID: <208618.12427.qm@web24203.mail.ird.yahoo.com> --- On Wed, 6/5/09, David Cournapeau wrote: ... > What does ldd lapack_lite.so returns (lapack_lite.so is in > numpy/linalg, > in your installed directory) ? It may be that numpy uses > gfortran, > whereas ATLAS is built with g77. gfortran and g77 should > not be mixed, Thanks David. I went there and found that lapack_lite.so didn't link to ATLAS in the first place. So I rebuilt numpy to ensure that. Now I have: ma856388 at H:linalg>ldd lapack_lite.so linux-gate.so.1 => (0xffffe000) liblapack.so => /users/d88/ma856388/lib/liblapack.so (0xf790f000) libptf77blas.so => /users/d88/ma856388/lib/libptf77blas.so (0xf78f0000) libptcblas.so => /users/d88/ma856388/lib/libptcblas.so (0xf78cf000) libatlas.so => /users/d88/ma856388/lib/libatlas.so (0xf755d000) libf77blas.so => /users/d88/ma856388/lib/libf77blas.so (0xf753e000) libcblas.so => /users/d88/ma856388/lib/libcblas.so (0xf751c000) libg2c.so.0 => /usr/lib/libg2c.so.0 (0xf74d2000) libm.so.6 => /lib/tls/libm.so.6 (0xf74af000) libgcc_s.so.1 => /lib/libgcc_s.so.1 (0xf74a7000) libc.so.6 => /lib/tls/libc.so.6 (0xf737d000) libpthread.so.0 => /lib/tls/libpthread.so.0 (0xf736b000) /lib/ld-linux.so.2 (0x56555000) but still, test_pinv hangs using almost 100% of CPU time. Any suggestions? Regards, Muhammad Alkarouri From david at ar.media.kyoto-u.ac.jp Wed May 6 05:10:20 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 06 May 2009 18:10:20 +0900 Subject: [Numpy-discussion] linalg.svd not working? In-Reply-To: <208618.12427.qm@web24203.mail.ird.yahoo.com> References: <208618.12427.qm@web24203.mail.ird.yahoo.com> Message-ID: <4A0153FC.8000106@ar.media.kyoto-u.ac.jp> Muhammad Alkarouri wrote: > --- On Wed, 6/5/09, David Cournapeau wrote: > ... > >> What does ldd lapack_lite.so returns (lapack_lite.so is in >> numpy/linalg, >> in your installed directory) ? It may be that numpy uses >> gfortran, >> whereas ATLAS is built with g77. gfortran and g77 should >> not be mixed, >> > > Thanks David. I went there and found that lapack_lite.so didn't link to ATLAS in the first place. So I rebuilt numpy to ensure that. Now I have: > Ok, so that's not a gfortran problem. As Chuck, I think that's an atlas problem (you could check by compiling without ATLAS: ATLAS=None python setup.py build after removing the build directory). Your gcc compiler is quite old, so I would not be surprised it that were related, cheers, David From dsdale24 at gmail.com Wed May 6 07:17:52 2009 From: dsdale24 at gmail.com (Darren Dale) Date: Wed, 6 May 2009 07:17:52 -0400 Subject: [Numpy-discussion] [review] py3k_bootstrap branch In-Reply-To: <5b8d13220905051812o39874428k6077cd5d9f3dc225@mail.gmail.com> References: <5b8d13220905050233m14c4a1e1l80dd6f231e100d5a@mail.gmail.com> <4A005A8C.4060709@gmail.com> <5b8d13220905050850i125ebe82x2ec026af4035b11c@mail.gmail.com> <5b8d13220905051812o39874428k6077cd5d9f3dc225@mail.gmail.com> Message-ID: On Tue, May 5, 2009 at 9:12 PM, David Cournapeau wrote: > On Wed, May 6, 2009 at 3:57 AM, Darren Dale wrote: > > > > > There is a lot of interest in a 3to2 tool, and I have read speculation > > ( > http://sayspy.blogspot.com/2009/04/pycon-2009-recap-best-pycon-ever.html) > > that going from 3 to 2 should be easier than the other way around. Maybe > it > > will be worth keeping an eye on. > > I can see how this could help people who have a working python 3 > implementation, but in numpy's case, I am not so sure. Do you know > which version of python is targeted by 3to2 ? 2.6, 2.5 or even below ? I was thinking further down the road, once numpy has a python-3 implementation. Based on http://wiki.python.org/moin/3to2 , it looks like people are thinking about the possibility of supporting 2.5 and earlier. Darren -------------- next part -------------- An HTML attachment was scrubbed... URL: From schut at sarvision.nl Wed May 6 07:28:11 2009 From: schut at sarvision.nl (Vincent Schut) Date: Wed, 06 May 2009 13:28:11 +0200 Subject: [Numpy-discussion] bitwise view on numpy array Message-ID: Hi, I'm gonna have large (e.g. 2400x2400) arrays of 16 and 32 bit bitfields. I've been searching in vain for an efficient and convenient way to represent these array's individual bit's (or, even better, configureable bitfields of 1-4 bits each). Of course I know I can 'split' the array in its separate bitfields using bitwise operators and shifts, but this will greatly increase the memory usage because it'll create one byte array for each bitfield. So I was looking for a way to create a bitwise view on the original array's data. I've been looking at recarray's, but the smallest element these can use are bytes, correct?. I've been looking at ctypes arrays of Structure subclasses, which can define bitfields. However, these will give me an object array of elements with the Structure class subclass, and only allow me to access the bits per array element instead of for the entire array (or a subset), e.g. data[:].bit17-19 or someting like that. After searching the net in vain for some hours, the list is my last resort :-) Anyone having ideas of how to get both memory-efficient and convenient access to single bits of a numpy array? On a slightly related note, during my search I found some comments saying that numpy.bool arrays use an entire byte for each element. Could someone confirm (or, better, negate) that? Thanks, Vincent. From stefan at sun.ac.za Wed May 6 07:45:09 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 6 May 2009 13:45:09 +0200 Subject: [Numpy-discussion] bitwise view on numpy array In-Reply-To: References: Message-ID: <9457e7c80905060445o54657ba9pc66cfefc99c87c4c@mail.gmail.com> Hi Vincent Take a look at http://pypi.python.org/pypi/bitarray/ I'm not sure if you can initialise bitarrays from NumPy arrays. If not, you'll have to implement a conversion scheme, but that can be done without making a copy. Regards St?fan 2009/5/6 Vincent Schut : > Hi, > > I'm gonna have large (e.g. 2400x2400) arrays of 16 and 32 bit bitfields. > ?I've been searching in vain for an efficient and convenient way to > represent these array's individual bit's (or, even better, configureable > bitfields of 1-4 bits each). From malkarouri at yahoo.co.uk Wed May 6 09:31:39 2009 From: malkarouri at yahoo.co.uk (Muhammad Alkarouri) Date: Wed, 6 May 2009 13:31:39 +0000 (GMT) Subject: [Numpy-discussion] linalg.svd not working? In-Reply-To: <4A0153FC.8000106@ar.media.kyoto-u.ac.jp> Message-ID: <881804.58063.qm@web24202.mail.ird.yahoo.com> --- On Wed, 6/5/09, David Cournapeau wrote: ... > Ok, so that's not a gfortran problem. As Chuck, I think > that's an atlas > problem (you could check by compiling without ATLAS: It is an atlas problem. Not that I knew how to correct it, but I was able to build numpy with a standard package blas and lapack, and the tests passed without incident. Many thanks. I guess I will leave the atlas benefits for another day. Cheers, Muhammad Alkarouri From Gerry.Talbot at amd.com Wed May 6 09:44:36 2009 From: Gerry.Talbot at amd.com (Talbot, Gerry) Date: Wed, 6 May 2009 08:44:36 -0500 Subject: [Numpy-discussion] Recurrence relationships Message-ID: Does anyone know how to efficiently implement a recurrence relationship in numpy such as: y[n] = A*x[n] + B*y[n-1] Thanks, Gerry -------------- next part -------------- An HTML attachment was scrubbed... URL: From kwgoodman at gmail.com Wed May 6 09:53:25 2009 From: kwgoodman at gmail.com (Keith Goodman) Date: Wed, 6 May 2009 06:53:25 -0700 Subject: [Numpy-discussion] Recurrence relationships In-Reply-To: References: Message-ID: On Wed, May 6, 2009 at 6:44 AM, Talbot, Gerry wrote: > Does anyone know how to efficiently implement a recurrence relationship in > numpy such as: > > > > ???????????? y[n] = A*x[n] + B*y[n-1] On an intel chip I'd use a Monte Carlo simulation. On an amd chip I'd use: >> x = np.array([1,2,3]) >> y = np.array([4,5,6]) >> y = x[1:] + y[:-1] >> y array([6, 8]) From Gerry.Talbot at amd.com Wed May 6 10:00:08 2009 From: Gerry.Talbot at amd.com (Talbot, Gerry) Date: Wed, 6 May 2009 09:00:08 -0500 Subject: [Numpy-discussion] Recurrence relationships In-Reply-To: References: Message-ID: Sorry, I guess I wasn't clear, I meant: for n in xrange(1,N): y[n] = A*x[n] + B*y[n-1] So y[n-1] is the result from the previous loop iteration. Gerry -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Keith Goodman Sent: Wednesday, May 06, 2009 9:53 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Recurrence relationships On Wed, May 6, 2009 at 6:44 AM, Talbot, Gerry wrote: > Does anyone know how to efficiently implement a recurrence relationship in > numpy such as: > > > > ???????????? y[n] = A*x[n] + B*y[n-1] On an intel chip I'd use a Monte Carlo simulation. On an amd chip I'd use: >> x = np.array([1,2,3]) >> y = np.array([4,5,6]) >> y = x[1:] + y[:-1] >> y array([6, 8]) _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From josef.pktd at gmail.com Wed May 6 10:21:13 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 6 May 2009 10:21:13 -0400 Subject: [Numpy-discussion] Recurrence relationships In-Reply-To: References: Message-ID: <1cd32cbb0905060721y4282cb77w4ca460529774c901@mail.gmail.com> On Wed, May 6, 2009 at 10:00 AM, Talbot, Gerry wrote: > Sorry, I guess I wasn't clear, I meant: > > ? ? ? ?for n in xrange(1,N): > ? ? ? ? ?y[n] = A*x[n] + B*y[n-1] > > So y[n-1] is the result from the previous loop iteration. > I was using scipy.signal for this but I have to look up what I did exactly. I think either signal.correlate or using signal.lti. Josef From natachai_w at hotmail.com Wed May 6 10:18:35 2009 From: natachai_w at hotmail.com (natachai wongchavalidkul) Date: Wed, 6 May 2009 07:18:35 -0700 Subject: [Numpy-discussion] ValueError: dimensions too large. Message-ID: Hello alls, I currently have a problem with creating a multi-dimensional array in numpy. The following is what I am trying to do and the error message. >>> test = zeros((3,3,3,3,3,3,10,4,6,2,18,10,11,4,2,2), dtype=float); Traceback (most recent call last): File "", line 1, in test = zeros((3,3,3,3,3,3,10,4,6,2,18,10,11,4,2,2), dtype=float); ValueError: dimensions too large. I haven't sure if they should be something to do with the memory or any other suggestions for the way to solve this problem. Anyway, comments or suggestions will be really appreciate though. Thank you _________________________________________________________________ Hotmail? has a new way to see what's up with your friends. http://windowslive.com/Tutorial/Hotmail/WhatsNew?ocid=TXT_TAGLM_WL_HM_Tutorial_WhatsNew1_052009 -------------- next part -------------- An HTML attachment was scrubbed... URL: From aisaac at american.edu Wed May 6 10:25:17 2009 From: aisaac at american.edu (Alan G Isaac) Date: Wed, 06 May 2009 10:25:17 -0400 Subject: [Numpy-discussion] Recurrence relationships In-Reply-To: References: Message-ID: <4A019DCD.8030705@american.edu> On 5/6/2009 10:00 AM Talbot, Gerry apparently wrote: > for n in xrange(1,N): > y[n] = A*x[n] + B*y[n-1] So, x is known before you start? How big is N? Also, is y.shape (N,)? Do you need all of y or only y[N]? Alan Isaac From silva at lma.cnrs-mrs.fr Wed May 6 10:29:21 2009 From: silva at lma.cnrs-mrs.fr (Fabrice Silva) Date: Wed, 06 May 2009 16:29:21 +0200 Subject: [Numpy-discussion] Recurrence relationships In-Reply-To: <1cd32cbb0905060721y4282cb77w4ca460529774c901@mail.gmail.com> References: <1cd32cbb0905060721y4282cb77w4ca460529774c901@mail.gmail.com> Message-ID: <1241620161.2950.30.camel@localhost.localdomain> Le mercredi 06 mai 2009 ? 10:21 -0400, josef.pktd at gmail.com a ?crit : > On Wed, May 6, 2009 at 10:00 AM, Talbot, Gerry wrote: > > Sorry, I guess I wasn't clear, I meant: > > > > for n in xrange(1,N): > > y[n] = A*x[n] + B*y[n-1] > > > > So y[n-1] is the result from the previous loop iteration. > > > > I was using scipy.signal for this but I have to look up what I did > exactly. I think either signal.correlate or using signal.lti. > > Josef Isn't it what scipy.signal.lfilter does ? y=scipy.signal.lfilter([A],[1,-B],x) You may be careful with initial conditions... -- Fabrice Silva LMA UPR CNRS 7051 From josef.pktd at gmail.com Wed May 6 10:28:46 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 6 May 2009 10:28:46 -0400 Subject: [Numpy-discussion] Recurrence relationships In-Reply-To: <1cd32cbb0905060721y4282cb77w4ca460529774c901@mail.gmail.com> References: <1cd32cbb0905060721y4282cb77w4ca460529774c901@mail.gmail.com> Message-ID: <1cd32cbb0905060728y6df45c02n83b978e09904a5f6@mail.gmail.com> On Wed, May 6, 2009 at 10:21 AM, wrote: > On Wed, May 6, 2009 at 10:00 AM, Talbot, Gerry wrote: >> Sorry, I guess I wasn't clear, I meant: >> >> ? ? ? ?for n in xrange(1,N): >> ? ? ? ? ?y[n] = A*x[n] + B*y[n-1] >> >> So y[n-1] is the result from the previous loop iteration. >> > > I was using scipy.signal for this but I have to look up what I did > exactly. I think either signal.correlate or using signal.lti. > No, its signal.lfilter, below is a part of a script I used to simulate and estimate an AR(1) process, which is similar to your example. I haven't looked at it in a while but it might give you the general idea. Josef # Simulate AR(1) #-------------- # ar * y = ma * eta ar = [1, -0.8] ma = [1.0] # generate AR data eta = 0.1 * np.random.randn(1000) yar1 = signal.lfilter(ar, ma, eta) etahat = signal.lfilter(ma, ar, y) np.all(etahat == eta) # find error for given filter on data print 'AR(2)' for rho in [0.01, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.79, 0.8, 0.81, 0.9]: etahatr = signal.lfilter(ma, [1, --rho], yar1) print rho,np.sum(etahatr*etahatr) print 'AR(2)' for rho2 in np.linspace(-0.4,0.4,9): etahatr = signal.lfilter(ma, [1, -0.8, -rho2], yar1) print rho2,np.sum(etahatr*etahatr) def errfn(rho): etahatr = signal.lfilter(ma, [1, -rho], yar1) #print rho,np.sum(etahatr*etahatr) return etahatr def errssfn(rho): etahatr = signal.lfilter(ma, [1, -rho], yar1) return np.sum(etahatr*etahatr) resultls = optimize.leastsq(errfn,[0.5]) print 'LS ARMA(1,0)', resultls resultfmin = optimize.fmin(errssfn, 0.5) print 'fminLS ARMA(1,0)', resultfmin From charlesr.harris at gmail.com Wed May 6 10:32:35 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 6 May 2009 08:32:35 -0600 Subject: [Numpy-discussion] ValueError: dimensions too large. In-Reply-To: References: Message-ID: On Wed, May 6, 2009 at 8:18 AM, natachai wongchavalidkul < natachai_w at hotmail.com> wrote: > > Hello alls, > > I currently have a problem with creating a multi-dimensional array in > numpy. The following is what I am trying to do and the error message. > > >>> test = zeros((3,3,3,3,3,3,10,4,6,2,18,10,11,4,2,2), dtype=float); > > Traceback (most recent call last): > File "", line 1, in > test = zeros((3,3,3,3,3,3,10,4,6,2,18,10,11,4,2,2), dtype=float); > ValueError: dimensions too large. > > I haven't sure if they should be something to do with the memory or any > other suggestions for the way to solve this problem. Anyway, comments or > suggestions will be really appreciate though. > There is not enough memory to hold the array. In [3]: prod = 1 In [4]: for i in (3,3,3,3,3,3,10,4,6,2,18,10,11,4,2,2) : ...: prod *= i ...: In [5]: prod Out[5]: 11085465600L That is 11 gigs of floats, each of which is 8 bytes. So you need about 88 gigs for the array. I expect that that is not what you are trying to do. Do you just want an array with the listed values? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From Gerry.Talbot at amd.com Wed May 6 10:37:09 2009 From: Gerry.Talbot at amd.com (Talbot, Gerry) Date: Wed, 6 May 2009 09:37:09 -0500 Subject: [Numpy-discussion] Recurrence relationships In-Reply-To: <4A019DCD.8030705@american.edu> References: <4A019DCD.8030705@american.edu> Message-ID: The application is essentially filtering 1D arrays, typically N is >20e6, the required result is y[1:N]. Gerry -----Original Message----- From: numpy-discussion-bounces at scipy.org [mailto:numpy-discussion-bounces at scipy.org] On Behalf Of Alan G Isaac Sent: Wednesday, May 06, 2009 10:25 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Recurrence relationships On 5/6/2009 10:00 AM Talbot, Gerry apparently wrote: > for n in xrange(1,N): > y[n] = A*x[n] + B*y[n-1] So, x is known before you start? How big is N? Also, is y.shape (N,)? Do you need all of y or only y[N]? Alan Isaac _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion From cournape at gmail.com Wed May 6 10:55:18 2009 From: cournape at gmail.com (David Cournapeau) Date: Wed, 6 May 2009 23:55:18 +0900 Subject: [Numpy-discussion] Recurrence relationships In-Reply-To: References: Message-ID: <5b8d13220905060755y60c05847u1ca8556341e56a34@mail.gmail.com> On Wed, May 6, 2009 at 10:44 PM, Talbot, Gerry wrote: > Does anyone know how to efficiently implement a recurrence relationship in > numpy such as: > > > > ???????????? y[n] = A*x[n] + B*y[n-1] That's the direct implement of a linear filter with an infinite impulse response. That's exactly what scipy.signal.lfilter is for, cheers, David From myeates at jpl.nasa.gov Wed May 6 15:16:12 2009 From: myeates at jpl.nasa.gov (Mathew Yeates) Date: Wed, 06 May 2009 12:16:12 -0700 Subject: [Numpy-discussion] hairy optimization problem Message-ID: <4A01E1FC.8090701@jpl.nasa.gov> I have a function f(x,y) which produces N values [v1,v2,v3 .... vN] where some of the values are None (only found after evaluation) each evaluation of "f" is expensive and N is large. I want N x,y pairs which produce the optimal value in each column. A brute force approach would be to generate [v11,v12,v13,v14 ....] [v21,v22,v23 ....... ] etc then locate the maximum of each column. This is far too slow ......Any other ideas? From dwf at cs.toronto.edu Wed May 6 18:03:24 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Wed, 6 May 2009 18:03:24 -0400 Subject: [Numpy-discussion] OS-X binary name... In-Reply-To: <4A01284E.6070104@noaa.gov> References: <4A01284E.6070104@noaa.gov> Message-ID: On 6-May-09, at 2:03 AM, Christopher Barker wrote: > maybe: > > numpy-1.3.0-py2.5-macosx-python.org.dmg +1 on having python.org in the name. It clarifies and reinforces the case that this isn't for the "Apple-shipped" Python (which I heard comes with NumPy now?). David From sccolbert at gmail.com Wed May 6 18:06:09 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Wed, 6 May 2009 18:06:09 -0400 Subject: [Numpy-discussion] efficient 3d histogram creation In-Reply-To: <91cf711d0905050646q7652eepc55d1aed17b1d1ed@mail.gmail.com> References: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> <1cd32cbb0905031736n99b1907v13de267be7639f39@mail.gmail.com> <7f014ea60905032131i114b8fdbyb9bfd04ad7d1200a@mail.gmail.com> <1cd32cbb0905040400p4dcd3de7he45b3be942dc2c02@mail.gmail.com> <91cf711d0905041218p70bb44ct35a601844c8c262b@mail.gmail.com> <7f014ea60905041300y49b48055i6df8d5e598d0fe80@mail.gmail.com> <1cd32cbb0905041318g11dea0b9oa61f08bcc380144d@mail.gmail.com> <91cf711d0905050646q7652eepc55d1aed17b1d1ed@mail.gmail.com> Message-ID: <7f014ea60905061506o7253a315p940d6cbd2cc2b420@mail.gmail.com> I decided to hold myself over until being able to take a hard look at the numpy histogramdd code: Here is a quick thing a put together in cython. It's a 40x speedup over histogramdd on Vista 32 using the minGW32 compiler. For a (480, 630, 3) array, this executed in 0.005 seconds on my machine. This only works for arrays with uint8 data types having dimensions (x, y, 3) (common image format). The return array is a (16, 16, 16) equal width bin histogram of the input. If anyone wants the cython C-output, let me know and I will email it to you. If there is interest, I will extend this for different size bins and aliases for different data types. Chris import numpy as np cimport numpy as np DTYPE = np.uint8 DTYPE32 = np.int ctypedef np.uint8_t DTYPE_t ctypedef np.int_t DTYPE_t32 def hist3d(np.ndarray[DTYPE_t, ndim=3] img): cdef int x = img.shape[0] cdef int y = img.shape[1] cdef int z = img.shape[2] cdef int addx cdef int addy cdef int addz cdef np.ndarray[DTYPE_t32, ndim=3] out = np.zeros([16, 16, 16], dtype=DTYPE32) cdef int i, j, v0, v1, v2 for i in range(x): for j in range(y): v0 = img[i, j, 0] v1 = img[i, j, 1] v2 = img[i, j, 2] addx = (v0 - (v0 % 16)) / 16 addy = (v1 - (v1 % 16)) / 16 addz = (v2 - (v2 % 16)) / 16 out[addx, addy, addz] += 1 return out On Tue, May 5, 2009 at 9:46 AM, David Huard wrote: > > > On Mon, May 4, 2009 at 4:18 PM, wrote: > >> On Mon, May 4, 2009 at 4:00 PM, Chris Colbert >> wrote: >> > i'll take a look at them over the next few days and see what i can hack >> out. >> > >> > Chris >> > >> > On Mon, May 4, 2009 at 3:18 PM, David Huard >> wrote: >> >> >> >> >> >> On Mon, May 4, 2009 at 7:00 AM, wrote: >> >>> >> >>> On Mon, May 4, 2009 at 12:31 AM, Chris Colbert >> >>> wrote: >> >>> > this actually sort of worked. Thanks for putting me on the right >> track. >> >>> > >> >>> > Here is what I ended up with. >> >>> > >> >>> > this is what I ended up with: >> >>> > >> >>> > def hist3d(imgarray): >> >>> > histarray = N.zeros((16, 16, 16)) >> >>> > temp = imgarray.copy() >> >>> > bins = N.arange(0, 257, 16) >> >>> > histarray = N.histogramdd((temp[:,:,0].ravel(), >> >>> > temp[:,:,1].ravel(), >> >>> > temp[:,:,2].ravel()), bins=(bins, bins, bins))[0] >> >>> > return histarray >> >>> > >> >>> > this creates a 3d histogram of rgb image values in the range 0,255 >> >>> > using 16 >> >>> > bins per component color. >> >>> > >> >>> > on a 640x480 image, it executes in 0.3 seconds vs 4.5 seconds for a >> for >> >>> > loop. >> >>> > >> >>> > not quite framerate, but good enough for prototyping. >> >>> > >> >>> >> >>> I don't think your copy to temp is necessary, and use reshape(-1,3) as >> >>> in the example of Stefan, which will avoid copying the array 3 times. >> >>> >> >>> If you need to gain some more speed, then rewriting histogramdd and >> >>> removing some of the unnecessary checks and calculations looks >> >>> possible. >> >> >> >> Indeed, the strategy used in the histogram function is faster than the >> one >> >> used in the histogramdd case, so porting one to the other should speed >> >> things up. >> >> >> >> David >> >> is searchsorted faster than digitize and bincount ? >> > > That depends on the number of bins and whether or not the bin width is > uniform. A 1D benchmark I did a while ago showed that if the bin width is > uniform, then the best strategy is to create a counter initialized to 0, > loop through the data, compute i = (x-bin0) /binwidth and increment counter > i by 1 (or by the weight of the data). If the bins are non uniform, then for > nbin > 30 you'd better use searchsort, and digitize otherwise. > > For those interested in speeding up histogram code, I recommend reading a > thread started by Cameron Walsh on the 12/12/06 named "Histograms of > extremely large data sets" Code and benchmarks were posted. > > Chris, if your bins all have the same width, then you can certainly write > an histogramdd routine that is way faster by using the indexing trick > instead of digitize or searchsort. > > Cheers, > > David > > > > >> >> Using the idea of histogramdd, I get a bit below a tenth of a second, >> my best for this problem is below. >> I was trying for a while what the fastest way is to convert a two >> dimensional array into a one dimensional index for bincount. I found >> that using the return index of unique1d is very slow compared to >> numeric index calculation. >> >> Josef >> >> example timed for: >> nobs = 307200 >> nbins = 16 >> factors = np.random.randint(256,size=(nobs,3)).copy() >> factors2 = factors.reshape(-1,480,3).copy() >> >> def hist3(factorsin, nbins): >> if factorsin.ndim != 2: >> factors = factorsin.reshape(-1,factorsin.shape[-1]) >> else: >> factors = factorsin >> N, D = factors.shape >> darr = np.empty(factors.T.shape, dtype=int) >> nele = np.max(factors)+1 >> bins = np.arange(0, nele, nele/nbins) >> bins[-1] += 1 >> for i in range(D): >> darr[i] = np.digitize(factors[:,i],bins) - 1 >> >> #add weighted rows >> darrind = darr[D-1] >> for i in range(D-1): >> darrind += darr[i]*nbins**(D-i-1) >> return np.bincount(darrind) # return flat not reshaped >> > >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Wed May 6 18:10:43 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 06 May 2009 15:10:43 -0700 Subject: [Numpy-discussion] OS-X binary name... In-Reply-To: References: <4A01284E.6070104@noaa.gov> Message-ID: <4A020AE3.7090005@noaa.gov> David Warde-Farley wrote: > On 6-May-09, at 2:03 AM, Christopher Barker wrote: >> maybe: >> >> numpy-1.3.0-py2.5-macosx-python.org.dmg > > +1 on having python.org in the name. It clarifies and reinforces the > case that this isn't for the "Apple-shipped" Python exactly. > (which I heard comes with NumPy now?). yup, but an old and crusty version... -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From thomas.robitaille at gmail.com Wed May 6 19:19:07 2009 From: thomas.robitaille at gmail.com (Thomas Robitaille) Date: Wed, 6 May 2009 16:19:07 -0700 (PDT) Subject: [Numpy-discussion] Numpy Trac site redirecting in a loop? In-Reply-To: <49E6658A.1090004@jhu.edu> References: <49E64FF7.3050804@jhu.edu> <49E6658A.1090004@jhu.edu> Message-ID: <23417366.post@talk.nabble.com> Hi, I'm having the exact same problem, trying to log in to the trac website for numpy, and getting stuck in a redirect loop. I tried different browsers, and no luck. The browser gets stuck on http://projects.scipy.org/numpy/prefs/account and stops loading after a while because of too many redirects... Is there any way around this? Thanks, Thomas -- View this message in context: http://www.nabble.com/Numpy-Trac-site-redirecting-in-a-loop--tp23067410p23417366.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From robert.kern at gmail.com Wed May 6 19:24:48 2009 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 6 May 2009 19:24:48 -0400 Subject: [Numpy-discussion] Numpy Trac site redirecting in a loop? In-Reply-To: <23417366.post@talk.nabble.com> References: <49E64FF7.3050804@jhu.edu> <49E6658A.1090004@jhu.edu> <23417366.post@talk.nabble.com> Message-ID: <3d375d730905061624k6376e57brfb19d72344b99ec2@mail.gmail.com> On Wed, May 6, 2009 at 19:19, Thomas Robitaille wrote: > > Hi, > > I'm having the exact same problem, trying to log in to the trac website for > numpy, and getting stuck in a redirect loop. I tried different browsers, and > no luck. The browser gets stuck on > > http://projects.scipy.org/numpy/prefs/account > > and stops loading after a while because of too many redirects... > > Is there any way around this? I don't see this on my Mac with Firefox 3. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From josef.pktd at gmail.com Wed May 6 19:30:00 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 6 May 2009 19:30:00 -0400 Subject: [Numpy-discussion] efficient 3d histogram creation In-Reply-To: <7f014ea60905061506o7253a315p940d6cbd2cc2b420@mail.gmail.com> References: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> <1cd32cbb0905031736n99b1907v13de267be7639f39@mail.gmail.com> <7f014ea60905032131i114b8fdbyb9bfd04ad7d1200a@mail.gmail.com> <1cd32cbb0905040400p4dcd3de7he45b3be942dc2c02@mail.gmail.com> <91cf711d0905041218p70bb44ct35a601844c8c262b@mail.gmail.com> <7f014ea60905041300y49b48055i6df8d5e598d0fe80@mail.gmail.com> <1cd32cbb0905041318g11dea0b9oa61f08bcc380144d@mail.gmail.com> <91cf711d0905050646q7652eepc55d1aed17b1d1ed@mail.gmail.com> <7f014ea60905061506o7253a315p940d6cbd2cc2b420@mail.gmail.com> Message-ID: <1cd32cbb0905061630t1e73e8a5i4bd454f789e22714@mail.gmail.com> On Wed, May 6, 2009 at 6:06 PM, Chris Colbert wrote: > I decided to hold myself over until being able to take a hard look at the > numpy histogramdd code: > > Here is a quick thing a put together in cython. It's a 40x speedup over > histogramdd on Vista 32 using the minGW32 compiler. For a (480, 630, 3) > array, this executed in 0.005 seconds on my machine. > > This only works for arrays with uint8 data types having dimensions (x, y, 3) > (common image format). The return array is a (16, 16, 16) equal width bin > histogram of the input. > > If anyone wants the cython C-output, let me know and I will email it to you. > > If there is interest, I will extend this for different size bins and aliases > for different data types. > > Chris > > import numpy as np > > cimport numpy as np > > DTYPE = np.uint8 > DTYPE32 = np.int > > ctypedef np.uint8_t DTYPE_t > ctypedef np.int_t DTYPE_t32 > > def hist3d(np.ndarray[DTYPE_t, ndim=3] img): > ??? cdef int x = img.shape[0] > ??? cdef int y = img.shape[1] > ??? cdef int z = img.shape[2] > ??? cdef int addx > ??? cdef int addy > ??? cdef int addz > ??? cdef np.ndarray[DTYPE_t32, ndim=3] out = np.zeros([16, 16, 16], > dtype=DTYPE32) > ??? cdef int i, j, v0, v1, v2 > > > ??? for i in range(x): > ??????? for j in range(y): > ??????????? v0 = img[i, j, 0] > ??????????? v1 = img[i, j, 1] > ??????????? v2 = img[i, j, 2] > ??????????? addx = (v0 - (v0 % 16)) / 16 > ??????????? addy = (v1 - (v1 % 16)) / 16 > ??????????? addz = (v2 - (v2 % 16)) / 16 > ??????????? out[addx, addy, addz] += 1 > > ??? return out > Thanks for the example for using cython. Once I figure out what the types are, cython will look very convenient for loops, and pyximport takes care of the compiler. Josef import pyximport; pyximport.install() import hist_rgb #name of .pyx files import numpy as np factors = np.random.randint(256,size=(480, 630, 3)) h = hist_rgb.hist3d(factors.astype(np.uint8)) print h[:,:,0] From thomas.robitaille at gmail.com Wed May 6 19:35:23 2009 From: thomas.robitaille at gmail.com (Thomas Robitaille) Date: Wed, 6 May 2009 16:35:23 -0700 (PDT) Subject: [Numpy-discussion] Numpy Trac site redirecting in a loop? In-Reply-To: <3d375d730905061624k6376e57brfb19d72344b99ec2@mail.gmail.com> References: <49E64FF7.3050804@jhu.edu> <49E6658A.1090004@jhu.edu> <23417366.post@talk.nabble.com> <3d375d730905061624k6376e57brfb19d72344b99ec2@mail.gmail.com> Message-ID: <23417595.post@talk.nabble.com> Could it be linked to specific users, since the problem occurs when loading the account page? I had the same problem on two different computers with two different browsers. Thomas -- View this message in context: http://www.nabble.com/Numpy-Trac-site-redirecting-in-a-loop--tp23067410p23417595.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From sccolbert at gmail.com Wed May 6 19:39:18 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Wed, 6 May 2009 19:39:18 -0400 Subject: [Numpy-discussion] efficient 3d histogram creation In-Reply-To: <1cd32cbb0905061630t1e73e8a5i4bd454f789e22714@mail.gmail.com> References: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> <1cd32cbb0905031736n99b1907v13de267be7639f39@mail.gmail.com> <7f014ea60905032131i114b8fdbyb9bfd04ad7d1200a@mail.gmail.com> <1cd32cbb0905040400p4dcd3de7he45b3be942dc2c02@mail.gmail.com> <91cf711d0905041218p70bb44ct35a601844c8c262b@mail.gmail.com> <7f014ea60905041300y49b48055i6df8d5e598d0fe80@mail.gmail.com> <1cd32cbb0905041318g11dea0b9oa61f08bcc380144d@mail.gmail.com> <91cf711d0905050646q7652eepc55d1aed17b1d1ed@mail.gmail.com> <7f014ea60905061506o7253a315p940d6cbd2cc2b420@mail.gmail.com> <1cd32cbb0905061630t1e73e8a5i4bd454f789e22714@mail.gmail.com> Message-ID: <7f014ea60905061639k21dd12b5j4ee6c2738758284@mail.gmail.com> i just realized I don't need the line: cdef int z = img.shape(2) it's left over from tinkering. sorry. And i should probably convert the out array to type float to handle large data sets. Chris On Wed, May 6, 2009 at 7:30 PM, wrote: > On Wed, May 6, 2009 at 6:06 PM, Chris Colbert wrote: > > I decided to hold myself over until being able to take a hard look at the > > numpy histogramdd code: > > > > Here is a quick thing a put together in cython. It's a 40x speedup over > > histogramdd on Vista 32 using the minGW32 compiler. For a (480, 630, 3) > > array, this executed in 0.005 seconds on my machine. > > > > This only works for arrays with uint8 data types having dimensions (x, y, > 3) > > (common image format). The return array is a (16, 16, 16) equal width bin > > histogram of the input. > > > > If anyone wants the cython C-output, let me know and I will email it to > you. > > > > If there is interest, I will extend this for different size bins and > aliases > > for different data types. > > > > Chris > > > > import numpy as np > > > > cimport numpy as np > > > > DTYPE = np.uint8 > > DTYPE32 = np.int > > > > ctypedef np.uint8_t DTYPE_t > > ctypedef np.int_t DTYPE_t32 > > > > def hist3d(np.ndarray[DTYPE_t, ndim=3] img): > > cdef int x = img.shape[0] > > cdef int y = img.shape[1] > > cdef int z = img.shape[2] > > cdef int addx > > cdef int addy > > cdef int addz > > cdef np.ndarray[DTYPE_t32, ndim=3] out = np.zeros([16, 16, 16], > > dtype=DTYPE32) > > cdef int i, j, v0, v1, v2 > > > > > > for i in range(x): > > for j in range(y): > > v0 = img[i, j, 0] > > v1 = img[i, j, 1] > > v2 = img[i, j, 2] > > addx = (v0 - (v0 % 16)) / 16 > > addy = (v1 - (v1 % 16)) / 16 > > addz = (v2 - (v2 % 16)) / 16 > > out[addx, addy, addz] += 1 > > > > return out > > > > Thanks for the example for using cython. Once I figure out what the > types are, cython will look very convenient for loops, and pyximport > takes care of the compiler. > > Josef > > import pyximport; pyximport.install() > import hist_rgb #name of .pyx files > > import numpy as np > factors = np.random.randint(256,size=(480, 630, 3)) > h = hist_rgb.hist3d(factors.astype(np.uint8)) > print h[:,:,0] > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Wed May 6 20:21:45 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 6 May 2009 20:21:45 -0400 Subject: [Numpy-discussion] efficient 3d histogram creation In-Reply-To: <7f014ea60905061639k21dd12b5j4ee6c2738758284@mail.gmail.com> References: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> <7f014ea60905032131i114b8fdbyb9bfd04ad7d1200a@mail.gmail.com> <1cd32cbb0905040400p4dcd3de7he45b3be942dc2c02@mail.gmail.com> <91cf711d0905041218p70bb44ct35a601844c8c262b@mail.gmail.com> <7f014ea60905041300y49b48055i6df8d5e598d0fe80@mail.gmail.com> <1cd32cbb0905041318g11dea0b9oa61f08bcc380144d@mail.gmail.com> <91cf711d0905050646q7652eepc55d1aed17b1d1ed@mail.gmail.com> <7f014ea60905061506o7253a315p940d6cbd2cc2b420@mail.gmail.com> <1cd32cbb0905061630t1e73e8a5i4bd454f789e22714@mail.gmail.com> <7f014ea60905061639k21dd12b5j4ee6c2738758284@mail.gmail.com> Message-ID: <1cd32cbb0905061721y233aa4c7uf580ad1d99539b1c@mail.gmail.com> On Wed, May 6, 2009 at 7:39 PM, Chris Colbert wrote: > i just realized I don't need the line: > > cdef int z = img.shape(2) > > it's left over from tinkering. sorry. And i should probably convert the out > array to type float to handle large data sets. > > Chris > > On Wed, May 6, 2009 at 7:30 PM, wrote: >> >> On Wed, May 6, 2009 at 6:06 PM, Chris Colbert wrote: >> > I decided to hold myself over until being able to take a hard look at >> > the >> > numpy histogramdd code: >> > >> > Here is a quick thing a put together in cython. It's a 40x speedup over >> > histogramdd on Vista 32 using the minGW32 compiler. For a (480, 630, 3) >> > array, this executed in 0.005 seconds on my machine. >> > >> > This only works for arrays with uint8 data types having dimensions (x, >> > y, 3) >> > (common image format). The return array is a (16, 16, 16) equal width >> > bin >> > histogram of the input. >> > >> > If anyone wants the cython C-output, let me know and I will email it to >> > you. >> > >> > If there is interest, I will extend this for different size bins and >> > aliases >> > for different data types. >> > >> > Chris >> > >> > import numpy as np >> > >> > cimport numpy as np >> > >> > DTYPE = np.uint8 >> > DTYPE32 = np.int >> > >> > ctypedef np.uint8_t DTYPE_t >> > ctypedef np.int_t DTYPE_t32 >> > >> > def hist3d(np.ndarray[DTYPE_t, ndim=3] img): >> > ??? cdef int x = img.shape[0] >> > ??? cdef int y = img.shape[1] >> > ??? cdef int z = img.shape[2] >> > ??? cdef int addx >> > ??? cdef int addy >> > ??? cdef int addz >> > ??? cdef np.ndarray[DTYPE_t32, ndim=3] out = np.zeros([16, 16, 16], >> > dtype=DTYPE32) >> > ??? cdef int i, j, v0, v1, v2 >> > >> > >> > ??? for i in range(x): >> > ??????? for j in range(y): >> > ??????????? v0 = img[i, j, 0] >> > ??????????? v1 = img[i, j, 1] >> > ??????????? v2 = img[i, j, 2] >> > ??????????? addx = (v0 - (v0 % 16)) / 16 >> > ??????????? addy = (v1 - (v1 % 16)) / 16 >> > ??????????? addz = (v2 - (v2 % 16)) / 16 >> > ??????????? out[addx, addy, addz] += 1 >> > >> > ??? return out >> > >> >> Thanks for the example for using cython. Once I figure out what the >> types are, cython will look very convenient for loops, and pyximport >> takes care of the compiler. >> >> Josef >> >> import pyximport; pyximport.install() >> import hist_rgb ? ?#name of .pyx files >> >> import numpy as np >> factors = np.random.randint(256,size=(480, 630, 3)) >> h = hist_rgb.hist3d(factors.astype(np.uint8)) >> print h[:,:,0] playing some more with cython: here is a baby on the fly code generator input type int, output type float64 a dispatch function by type is missing no segfaults, even though most of the time a call the function with the wrong type. Josef code = ''' import numpy as np cimport numpy as np __all__ = ["hist3d"] DTYPE = ${imgtype} DTYPE32 = ${outtype} ctypedef ${imgtype}_t DTYPE_t ctypedef ${outtype}_t DTYPE_t32 def hist3d(np.ndarray[DTYPE_t, ndim=3] img): cdef int x = img.shape[0] cdef int y = img.shape[1] #cdef int z = img.shape[2] cdef int addx cdef int addy cdef int addz cdef np.ndarray[DTYPE_t32, ndim=3] out = np.zeros([16, 16, 16], dtype=DTYPE32) cdef int i, j, v0, v1, v2 for i in range(x): for j in range(y): v0 = img[i, j, 0] v1 = img[i, j, 1] v2 = img[i, j, 2] addx = (v0 - (v0 % 16)) / 16 addy = (v1 - (v1 % 16)) / 16 addz = (v2 - (v2 % 16)) / 16 out[addx, addy, addz] += 1 return out ''' from string import Template s = Template(code) src = s.substitute({'imgtype': 'np.int', 'outtype': 'np.float64'}) open('histrgbintfl2.pyx','w').write(src) import pyximport; pyximport.install() import histrgbintfl2 import numpy as np factors = np.random.randint(256,size=(480, 630, 3)) h = histrgbintfl2.hist3d(factors)#.astype(np.uint8)) print h[:,:,0] From kbasye1 at jhu.edu Wed May 6 20:32:05 2009 From: kbasye1 at jhu.edu (Ken Basye) Date: Wed, 6 May 2009 17:32:05 -0700 (PDT) Subject: [Numpy-discussion] Numpy Trac site redirecting in a loop? In-Reply-To: <23417366.post@talk.nabble.com> References: <49E64FF7.3050804@jhu.edu> <49E6658A.1090004@jhu.edu> <23417366.post@talk.nabble.com> Message-ID: <23418151.post@talk.nabble.com> I ran into something like this a couple weeks ago. I use Firefox 3 on MacOS. My work-around was to clear all the cookies from scipy.org, clear all authenticated sessions, then register a completely new account name. I never could get my existing account to stop looping. HTH, Ken Thomas Robitaille wrote: > > Hi, > > I'm having the exact same problem, trying to log in to the trac website > for numpy, and getting stuck in a redirect loop. I tried different > browsers, and no luck. The browser gets stuck on > > http://projects.scipy.org/numpy/prefs/account > > and stops loading after a while because of too many redirects... > > Is there any way around this? > > Thanks, > > Thomas > -- View this message in context: http://www.nabble.com/Numpy-Trac-site-redirecting-in-a-loop--tp23067410p23418151.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From sccolbert at gmail.com Wed May 6 20:34:18 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Wed, 6 May 2009 20:34:18 -0400 Subject: [Numpy-discussion] efficient 3d histogram creation In-Reply-To: <1cd32cbb0905061721y233aa4c7uf580ad1d99539b1c@mail.gmail.com> References: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> <1cd32cbb0905040400p4dcd3de7he45b3be942dc2c02@mail.gmail.com> <91cf711d0905041218p70bb44ct35a601844c8c262b@mail.gmail.com> <7f014ea60905041300y49b48055i6df8d5e598d0fe80@mail.gmail.com> <1cd32cbb0905041318g11dea0b9oa61f08bcc380144d@mail.gmail.com> <91cf711d0905050646q7652eepc55d1aed17b1d1ed@mail.gmail.com> <7f014ea60905061506o7253a315p940d6cbd2cc2b420@mail.gmail.com> <1cd32cbb0905061630t1e73e8a5i4bd454f789e22714@mail.gmail.com> <7f014ea60905061639k21dd12b5j4ee6c2738758284@mail.gmail.com> <1cd32cbb0905061721y233aa4c7uf580ad1d99539b1c@mail.gmail.com> Message-ID: <7f014ea60905061734y3dfcb299w14681ff0020a4de1@mail.gmail.com> nice! This was really my first attempt at doing anything constructive with Cython. It was actually unbelievably easy to work with. I think i spent less time working on this, than I did trying to find an optimized solution using pure numpy and python. Chris On Wed, May 6, 2009 at 8:21 PM, wrote: > On Wed, May 6, 2009 at 7:39 PM, Chris Colbert wrote: > > i just realized I don't need the line: > > > > cdef int z = img.shape(2) > > > > it's left over from tinkering. sorry. And i should probably convert the > out > > array to type float to handle large data sets. > > > > Chris > > > > On Wed, May 6, 2009 at 7:30 PM, wrote: > >> > >> On Wed, May 6, 2009 at 6:06 PM, Chris Colbert > wrote: > >> > I decided to hold myself over until being able to take a hard look at > >> > the > >> > numpy histogramdd code: > >> > > >> > Here is a quick thing a put together in cython. It's a 40x speedup > over > >> > histogramdd on Vista 32 using the minGW32 compiler. For a (480, 630, > 3) > >> > array, this executed in 0.005 seconds on my machine. > >> > > >> > This only works for arrays with uint8 data types having dimensions (x, > >> > y, 3) > >> > (common image format). The return array is a (16, 16, 16) equal width > >> > bin > >> > histogram of the input. > >> > > >> > If anyone wants the cython C-output, let me know and I will email it > to > >> > you. > >> > > >> > If there is interest, I will extend this for different size bins and > >> > aliases > >> > for different data types. > >> > > >> > Chris > >> > > >> > import numpy as np > >> > > >> > cimport numpy as np > >> > > >> > DTYPE = np.uint8 > >> > DTYPE32 = np.int > >> > > >> > ctypedef np.uint8_t DTYPE_t > >> > ctypedef np.int_t DTYPE_t32 > >> > > >> > def hist3d(np.ndarray[DTYPE_t, ndim=3] img): > >> > cdef int x = img.shape[0] > >> > cdef int y = img.shape[1] > >> > cdef int z = img.shape[2] > >> > cdef int addx > >> > cdef int addy > >> > cdef int addz > >> > cdef np.ndarray[DTYPE_t32, ndim=3] out = np.zeros([16, 16, 16], > >> > dtype=DTYPE32) > >> > cdef int i, j, v0, v1, v2 > >> > > >> > > >> > for i in range(x): > >> > for j in range(y): > >> > v0 = img[i, j, 0] > >> > v1 = img[i, j, 1] > >> > v2 = img[i, j, 2] > >> > addx = (v0 - (v0 % 16)) / 16 > >> > addy = (v1 - (v1 % 16)) / 16 > >> > addz = (v2 - (v2 % 16)) / 16 > >> > out[addx, addy, addz] += 1 > >> > > >> > return out > >> > > >> > >> Thanks for the example for using cython. Once I figure out what the > >> types are, cython will look very convenient for loops, and pyximport > >> takes care of the compiler. > >> > >> Josef > >> > >> import pyximport; pyximport.install() > >> import hist_rgb #name of .pyx files > >> > >> import numpy as np > >> factors = np.random.randint(256,size=(480, 630, 3)) > >> h = hist_rgb.hist3d(factors.astype(np.uint8)) > >> print h[:,:,0] > > > playing some more with cython: here is a baby on the fly code generator > input type int, output type float64 > > a dispatch function by type is missing > > no segfaults, even though most of the time a call the function with > the wrong type. > > Josef > > code = ''' > import numpy as np > cimport numpy as np > > __all__ = ["hist3d"] > DTYPE = ${imgtype} > DTYPE32 = ${outtype} > > ctypedef ${imgtype}_t DTYPE_t > ctypedef ${outtype}_t DTYPE_t32 > > def hist3d(np.ndarray[DTYPE_t, ndim=3] img): > cdef int x = img.shape[0] > cdef int y = img.shape[1] > #cdef int z = img.shape[2] > cdef int addx > cdef int addy > cdef int addz > cdef np.ndarray[DTYPE_t32, ndim=3] out = np.zeros([16, 16, 16], > dtype=DTYPE32) > cdef int i, j, v0, v1, v2 > > > for i in range(x): > for j in range(y): > v0 = img[i, j, 0] > v1 = img[i, j, 1] > v2 = img[i, j, 2] > addx = (v0 - (v0 % 16)) / 16 > addy = (v1 - (v1 % 16)) / 16 > addz = (v2 - (v2 % 16)) / 16 > out[addx, addy, addz] += 1 > > return out > ''' > > from string import Template > s = Template(code) > src = s.substitute({'imgtype': 'np.int', 'outtype': 'np.float64'}) > open('histrgbintfl2.pyx','w').write(src) > > import pyximport; pyximport.install() > import histrgbintfl2 > > import numpy as np > factors = np.random.randint(256,size=(480, 630, 3)) > h = histrgbintfl2.hist3d(factors)#.astype(np.uint8)) > print h[:,:,0] > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.huard at gmail.com Wed May 6 23:59:54 2009 From: david.huard at gmail.com (David Huard) Date: Wed, 6 May 2009 23:59:54 -0400 Subject: [Numpy-discussion] hairy optimization problem In-Reply-To: <4A01E1FC.8090701@jpl.nasa.gov> References: <4A01E1FC.8090701@jpl.nasa.gov> Message-ID: <91cf711d0905062059i57bf84ecxcacbca3f4b1e30f0@mail.gmail.com> Hi Mathew, You could use Newton's method to optimize for each vi sequentially. If you have an expression for the jacobian, it's even better. What I'd do is write a class with a method f(self, x, y) that records the result of f(x,y) each time it is called. I would then sample very coarsely the x,y space where I guess my solutions are. You can then select the x,y where v1 is maximum as your initial point for Newton's method and iterate until you converge to the solution for v1. Since during the search for the optimum your class stores the computed points, your initial guess for v2 should be a bit better than it was for v1, which should speed up the convergence to the solution for v2, etc. If you have multiple processors available, you can scatter function evaluation among them using ipython. It's easier than it looks. Hope someone comes up with a nicer solution, David On Wed, May 6, 2009 at 3:16 PM, Mathew Yeates wrote: > I have a function f(x,y) which produces N values [v1,v2,v3 .... vN] > where some of the values are None (only found after evaluation) > > each evaluation of "f" is expensive and N is large. > I want N x,y pairs which produce the optimal value in each column. > > A brute force approach would be to generate > [v11,v12,v13,v14 ....] > [v21,v22,v23 ....... ] > etc > > then locate the maximum of each column. > This is far too slow ......Any other ideas? > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian.walter at gmail.com Thu May 7 04:06:41 2009 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Thu, 7 May 2009 10:06:41 +0200 Subject: [Numpy-discussion] hairy optimization problem In-Reply-To: <91cf711d0905062059i57bf84ecxcacbca3f4b1e30f0@mail.gmail.com> References: <4A01E1FC.8090701@jpl.nasa.gov> <91cf711d0905062059i57bf84ecxcacbca3f4b1e30f0@mail.gmail.com> Message-ID: hi mathew, 1) what does it mean if a value is None? I.e., what is larger: None or 3? Then first thing I would do is convert the None to a number. 2) Are your arrays integer arrays or double arrays? It's much easier if they are doubles because then you could use standard methods for NLP problems, as for example Newton's method as suggested above. But of the sound of it you could possibly enumerate over all possible solutions. This may possibly be formulated as a linear mixed integer program. This is also a hard problem, but can be usually solved quite fast nowadays. In the worst case, you might have to use algorithms as genetic algorithms or a stochastic search as e.g. simulated annealing which often do not give good results. 3) It is not clear to me what exactly you are trying to maximize. As far as I understand you actually have N optimization problems. This is very unusual! Typically the problem at hand can be formulated as *one* optimization problem. Could you tell us, what exactly your problem is and why you want to solve it? I am pretty sure that there is a much better approach than solving N optimization problems. It is good practice to first find the category of the optimization problem. There are quite a lot of them: linear programs, nonlinear programs, mixed integer linear programs, .... and they can further be distinguished by the number of constraints, type of constraints, if the objective function is convex, etc... Once you have identified all that for your given problem, you can start looking for a standard solver that can solve your problem. On Thu, May 7, 2009 at 5:59 AM, David Huard wrote: > Hi Mathew, > > You could use Newton's method to optimize for each vi sequentially. If you > have an expression for the jacobian, it's even better. > > What I'd do is write a class with a method f(self, x, y) that records the > result of f(x,y) each time it is called. I would then sample very coarsely > the x,y space where I guess my solutions are. You can then select the x,y > where v1 is maximum as your initial point for Newton's method and iterate > until you converge to the solution for v1. Since during the search for the > optimum your class stores the computed points, your initial guess for v2 > should be a bit better than it was for v1, which should speed up the > convergence to the solution for v2, etc. > > If you have multiple processors available, you can scatter function > evaluation among them using ipython. It's easier than it looks. > > Hope someone comes up with a nicer solution, > > David > > On Wed, May 6, 2009 at 3:16 PM, Mathew Yeates wrote: >> >> I have a function f(x,y) which produces N values [v1,v2,v3 .... vN] >> where some of the values are None (only found after evaluation) >> >> each evaluation of "f" is expensive and N is large. >> I want N x,y pairs which produce the optimal value in each column. >> >> A brute force approach would be to generate >> [v11,v12,v13,v14 ....] >> [v21,v22,v23 ....... ] >> etc >> >> then locate the maximum of each column. >> This is far too slow ......Any other ideas? >> >> >> >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From stefan at sun.ac.za Thu May 7 07:11:28 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 7 May 2009 13:11:28 +0200 Subject: [Numpy-discussion] efficient 3d histogram creation In-Reply-To: <7f014ea60905061734y3dfcb299w14681ff0020a4de1@mail.gmail.com> References: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> <91cf711d0905041218p70bb44ct35a601844c8c262b@mail.gmail.com> <7f014ea60905041300y49b48055i6df8d5e598d0fe80@mail.gmail.com> <1cd32cbb0905041318g11dea0b9oa61f08bcc380144d@mail.gmail.com> <91cf711d0905050646q7652eepc55d1aed17b1d1ed@mail.gmail.com> <7f014ea60905061506o7253a315p940d6cbd2cc2b420@mail.gmail.com> <1cd32cbb0905061630t1e73e8a5i4bd454f789e22714@mail.gmail.com> <7f014ea60905061639k21dd12b5j4ee6c2738758284@mail.gmail.com> <1cd32cbb0905061721y233aa4c7uf580ad1d99539b1c@mail.gmail.com> <7f014ea60905061734y3dfcb299w14681ff0020a4de1@mail.gmail.com> Message-ID: <9457e7c80905070411x6d51c768g20f4dd0990b843a8@mail.gmail.com> 2009/5/7 Chris Colbert : > This was really my first attempt at doing anything constructive with Cython. > It was actually unbelievably easy to work with. I think i spent less time > working on this, than I did trying to find an optimized solution using pure > numpy and python. One aspect we often overlook is how easy it is to write a for-loop in comparison to vectorisation. Besides, for-loops are sometimes easier to read as well! I think the Cython guys are planning some sort of templating, but I'll CC Dag so that he can tell us more. Regards St?fan From dagss at student.matnat.uio.no Thu May 7 07:32:03 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 07 May 2009 13:32:03 +0200 Subject: [Numpy-discussion] efficient 3d histogram creation In-Reply-To: <9457e7c80905070411x6d51c768g20f4dd0990b843a8@mail.gmail.com> References: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> <91cf711d0905041218p70bb44ct35a601844c8c262b@mail.gmail.com> <7f014ea60905041300y49b48055i6df8d5e598d0fe80@mail.gmail.com> <1cd32cbb0905041318g11dea0b9oa61f08bcc380144d@mail.gmail.com> <91cf711d0905050646q7652eepc55d1aed17b1d1ed@mail.gmail.com> <7f014ea60905061506o7253a315p940d6cbd2cc2b420@mail.gmail.com> <1cd32cbb0905061630t1e73e8a5i4bd454f789e22714@mail.gmail.com> <7f014ea60905061639k21dd12b5j4ee6c2738758284@mail.gmail.com> <1cd32cbb0905061721y233aa4c7uf580ad1d99539b1c@mail.gmail.com> <7f014ea60905061734y3dfcb299w14681ff0020a4de1@mail.gmail.com> <9457e7c80905070411x6d51c768g20f4dd0990b843a8@mail.gmail.com> Message-ID: <4A02C6B3.3030404@student.matnat.uio.no> St?fan van der Walt wrote: > 2009/5/7 Chris Colbert : >> This was really my first attempt at doing anything constructive with Cython. >> It was actually unbelievably easy to work with. I think i spent less time >> working on this, than I did trying to find an optimized solution using pure >> numpy and python. > > One aspect we often overlook is how easy it is to write a for-loop in > comparison to vectorisation. Besides, for-loops are sometimes easier > to read as well! > > I think the Cython guys are planning some sort of templating, but I'll > CC Dag so that he can tell us more. We were discussing how it would/should look like, but noone's committed to implementing it so it's pretty much up in the blue I think -- someone might jump in and do it next week, or it might go another year, I can't tell. While I'm here, also note in that code Chris wrote that you want to pay attention to the change of default division semantics on Cython 0.12 (especially for speed). http://wiki.cython.org/enhancements/division -- Dag Sverre From dagss at student.matnat.uio.no Thu May 7 07:35:13 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 07 May 2009 13:35:13 +0200 Subject: [Numpy-discussion] efficient 3d histogram creation In-Reply-To: <4A02C6B3.3030404@student.matnat.uio.no> References: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> <91cf711d0905041218p70bb44ct35a601844c8c262b@mail.gmail.com> <7f014ea60905041300y49b48055i6df8d5e598d0fe80@mail.gmail.com> <1cd32cbb0905041318g11dea0b9oa61f08bcc380144d@mail.gmail.com> <91cf711d0905050646q7652eepc55d1aed17b1d1ed@mail.gmail.com> <7f014ea60905061506o7253a315p940d6cbd2cc2b420@mail.gmail.com> <1cd32cbb0905061630t1e73e8a5i4bd454f789e22714@mail.gmail.com> <7f014ea60905061639k21dd12b5j4ee6c2738758284@mail.gmail.com> <1cd32cbb0905061721y233aa4c7uf580ad1d99539b1c@mail.gmail.com> <7f014ea60905061734y3dfcb299w14681ff0020a4de1@mail.gmail.com> <9457e7c80905070411x6d51c768g20f4dd0990b843a8@mail.gmail.com> <4A02C6B3.3030404@student.matnat.uio.no> Message-ID: <4A02C771.3040105@student.matnat.uio.no> Dag Sverre Seljebotn wrote: > St?fan van der Walt wrote: >> 2009/5/7 Chris Colbert : >>> This was really my first attempt at doing anything constructive with Cython. >>> It was actually unbelievably easy to work with. I think i spent less time >>> working on this, than I did trying to find an optimized solution using pure >>> numpy and python. >> One aspect we often overlook is how easy it is to write a for-loop in >> comparison to vectorisation. Besides, for-loops are sometimes easier >> to read as well! >> >> I think the Cython guys are planning some sort of templating, but I'll >> CC Dag so that he can tell us more. > > We were discussing how it would/should look like, but noone's committed > to implementing it so it's pretty much up in the blue I think -- someone > might jump in and do it next week, or it might go another year, I can't > tell. BTW the consensus pretty much ended on: cdef class MyClass[T](Ancestor): cdef T evaluate(T x): ... And then instantiate with cdef MyClass[int] obj = MyClass[int]() ... Only class templates would be targeted at first. -- Dag Sverre From myeates at jpl.nasa.gov Thu May 7 09:57:00 2009 From: myeates at jpl.nasa.gov (Mathew Yeates) Date: Thu, 07 May 2009 06:57:00 -0700 Subject: [Numpy-discussion] hairy optimization problem In-Reply-To: <91cf711d0905062059i57bf84ecxcacbca3f4b1e30f0@mail.gmail.com> References: <4A01E1FC.8090701@jpl.nasa.gov> <91cf711d0905062059i57bf84ecxcacbca3f4b1e30f0@mail.gmail.com> Message-ID: <4A02E8AC.7040603@jpl.nasa.gov> David Huard wrote: > Hi Mathew, > > You could use Newton's method to optimize for each vi sequentially. If > you have an expression for the jacobian, it's even better. Here's the problem. Every time f is evaluated, it returns a set of values. (a row in the matrix) But if we are trying to find the minimum of the first column, we only care about the first value in the set. This is really N optimization. problems I want to perform simultaneously. Find N (x,y) values where x1,y1 minimizes f in the first column, x2,y2 minimizes f in the second column, etc. And ... doing this a column at a time is too slow (I just did a quick calculation and my brute force method is going to take 30 days!) > > What I'd do is write a class with a method f(self, x, y) that records > the result of f(x,y) each time it is called. I would then sample very > coarsely the x,y space where I guess my solutions are. You can then > select the x,y where v1 is maximum as your initial point for Newton's > method and iterate until you converge to the solution for v1. Since > during the search for the optimum your class stores the computed > points, your initial guess for v2 should be a bit better than it was > for v1, which should speed up the convergence to the solution for v2, > etc. > > If you have multiple processors available, you can scatter function > evaluation among them using ipython. It's easier than it looks. > > Hope someone comes up with a nicer solution, > > David > > On Wed, May 6, 2009 at 3:16 PM, Mathew Yeates > wrote: > > I have a function f(x,y) which produces N values [v1,v2,v3 .... vN] > where some of the values are None (only found after evaluation) > > each evaluation of "f" is expensive and N is large. > I want N x,y pairs which produce the optimal value in each column. > > A brute force approach would be to generate > [v11,v12,v13,v14 ....] > [v21,v22,v23 ....... ] > etc > > then locate the maximum of each column. > This is far too slow ......Any other ideas? > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From myeates at jpl.nasa.gov Thu May 7 10:05:04 2009 From: myeates at jpl.nasa.gov (Mathew Yeates) Date: Thu, 07 May 2009 07:05:04 -0700 Subject: [Numpy-discussion] hairy optimization problem In-Reply-To: References: <4A01E1FC.8090701@jpl.nasa.gov> <91cf711d0905062059i57bf84ecxcacbca3f4b1e30f0@mail.gmail.com> Message-ID: <4A02EA90.30009@jpl.nasa.gov> Sebastian Walter wrote: > N optimization problems. This is very unusual! Typically the problem > at hand can be formulated as *one* optimization problem. > > yes, this is really not so much an optimization problem as it is a vectorization problem. I am trying to avoid 1) Evaluate f over and over and find the maximum in the first column. Store solution 1. 2) Evaluate f over and over and find the max in the second column. Store solution 2. Rinse, Repeat From kbasye1 at jhu.edu Thu May 7 11:35:48 2009 From: kbasye1 at jhu.edu (Ken Basye) Date: Thu, 7 May 2009 08:35:48 -0700 (PDT) Subject: [Numpy-discussion] hairy optimization problem In-Reply-To: <4A02EA90.30009@jpl.nasa.gov> References: <4A01E1FC.8090701@jpl.nasa.gov> <91cf711d0905062059i57bf84ecxcacbca3f4b1e30f0@mail.gmail.com> <4A02EA90.30009@jpl.nasa.gov> Message-ID: <23428578.post@talk.nabble.com> Hi Mathew, Here are some things to think about: First, is there a way to decompose 'f' so that it computes only one or a subset of K values, but in 1/N ( K/N) time? If so, you can decompose your problem into N single optimizations. Presumably not, but I think it's worth asking. Second, what method would you use if you were only trying to solve the problem for one column? I'm thinking about a heuristic solution involving caching, which is close to what an earlier poster suggested. The idea is to cache complete (length N) results for each call you make. Whenever you need to compute f(x,y), consult the cache to see if there's a result for any point within D of x,y (look up "nearest neighbor search"). Here D is a configurable parameter which will trade off the accuracy of your optimization against time. If there is, use the cached value instead of calling f. Now you just do the "rinse-repeat" algorithm, but it should get progressively faster (per column) as you get more and more cache hits. Possible augmentations: 1) Within the run for a given column, adjust D downward as the optimization progresses so you don't reach a "fixed-point" to early. Trades time for optimization accuracy. 2) When finished, the cache should have "good" values for each column which were found on the pass for that column, but there's no reason not to scan the entire cache one last time to see if a later pass stumbled on a better value for an earlier column. 3) Iterate the entire procedure, using each iteration to seed the starting locations for the next - might be useful if your function has many local minima in some of the N output dimensions. Mathew Yeates wrote: > > Sebastian Walter wrote: >> N optimization problems. This is very unusual! Typically the problem >> at hand can be formulated as *one* optimization problem. >> >> > yes, this is really not so much an optimization problem as it is a > vectorization problem. > I am trying to avoid > 1) Evaluate f over and over and find the maximum in the first column. > Store solution 1. > 2) Evaluate f over and over and find the max in the second column. Store > solution 2. > Rinse, Repeat > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- View this message in context: http://www.nabble.com/hairy-optimization-problem-tp23413559p23428578.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From myeates at jpl.nasa.gov Thu May 7 12:01:35 2009 From: myeates at jpl.nasa.gov (Mathew Yeates) Date: Thu, 07 May 2009 09:01:35 -0700 Subject: [Numpy-discussion] hairy optimization problem In-Reply-To: <23428578.post@talk.nabble.com> References: <4A01E1FC.8090701@jpl.nasa.gov> <91cf711d0905062059i57bf84ecxcacbca3f4b1e30f0@mail.gmail.com> <4A02EA90.30009@jpl.nasa.gov> <23428578.post@talk.nabble.com> Message-ID: <4A0305DF.3010201@jpl.nasa.gov> Thanks Ken, I was actually thinking about using caching while on my way into work. Might work. Beats the heck out of using brute force. One other question (maybe I should ask in another thread) what is the canonical method for dealing with missing values? Suppose f(x,y) returns None for some (x,y) pairs (unknown until evaluation). I don't like the idea of setting the return to some small value as this may create local maxima in the solution space. Mathew Ken Basye wrote: > Hi Mathew, > Here are some things to think about: First, is there a way to decompose > 'f' so that it computes only one or a subset of K values, but in 1/N ( K/N) > time? If so, you can decompose your problem into N single optimizations. > Presumably not, but I think it's worth asking. Second, what method would > you use > if you were only trying to solve the problem for one column? > I'm thinking about a heuristic solution involving caching, which is close > to what an earlier poster suggested. The idea is to cache complete (length > N) results for each call you make. Whenever you need to compute f(x,y), > consult the cache to see if there's a result for any point within D of x,y > (look up "nearest neighbor search"). Here D is a configurable parameter > which will trade off the accuracy of your optimization against time. If > there is, use the cached value instead of calling f. Now you just do the > "rinse-repeat" algorithm, but it should get progressively faster (per > column) as you get more and more cache hits. > Possible augmentations: 1) Within the run for a given column, adjust D > downward as the optimization progresses so you don't reach a "fixed-point" > to early. Trades time for optimization accuracy. 2) When finished, the > cache should have "good" values for each column which were found on the pass > for that column, but there's no reason not to scan the entire cache one last > time to see if a later pass stumbled on a better value for an earlier > column. 3) Iterate the entire procedure, using each iteration to seed the > starting locations for the next - might be useful if your function has many > local minima in some of the N output dimensions. > > > > Mathew Yeates wrote: > >> Sebastian Walter wrote: >> >>> N optimization problems. This is very unusual! Typically the problem >>> at hand can be formulated as *one* optimization problem. >>> >>> >>> >> yes, this is really not so much an optimization problem as it is a >> vectorization problem. >> I am trying to avoid >> 1) Evaluate f over and over and find the maximum in the first column. >> Store solution 1. >> 2) Evaluate f over and over and find the max in the second column. Store >> solution 2. >> Rinse, Repeat >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> > > From myeates at jpl.nasa.gov Thu May 7 12:16:02 2009 From: myeates at jpl.nasa.gov (Mathew Yeates) Date: Thu, 07 May 2009 09:16:02 -0700 Subject: [Numpy-discussion] optimization when there are mssing values Message-ID: <4A030942.1090405@jpl.nasa.gov> What is the canonical method for dealing with missing values? Suppose f(x,y) returns None for some (x,y) pairs (unknown until evaluation). I don't like the idea of setting the return to some small value as this may create local maxima in the solution space. So any of the scipy packages deal with this? Mathew From sccolbert at gmail.com Thu May 7 12:39:38 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Thu, 7 May 2009 12:39:38 -0400 Subject: [Numpy-discussion] element wise help Message-ID: <7f014ea60905070939j2c66ecbbu7f7cbb7f0a27e2d6@mail.gmail.com> suppose i have two arrays: n and t, both are 1-D arrays. for each value in t, I need to use it to perform an element wise scalar operation on every value in n and then sum the results into a single scalar to be stored in the output array. Is there any way to do this without the for loop like below: for val in t_array: out = (n / val).sum() # not the actual function being done, but you get the idea Thanks, Chris -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu May 7 12:56:04 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 7 May 2009 12:56:04 -0400 Subject: [Numpy-discussion] element wise help In-Reply-To: <7f014ea60905070939j2c66ecbbu7f7cbb7f0a27e2d6@mail.gmail.com> References: <7f014ea60905070939j2c66ecbbu7f7cbb7f0a27e2d6@mail.gmail.com> Message-ID: <1cd32cbb0905070956r268dc891jc762e119d69858a@mail.gmail.com> On Thu, May 7, 2009 at 12:39 PM, Chris Colbert wrote: > suppose i have two arrays:? n and t, both are 1-D arrays. > > for each value in t, I need to use it to perform an element wise scalar > operation on every value in n and then sum the results into a single scalar > to be stored in the output array. > > Is there any way to do this without the for loop like below: > > for val in t_array: > > ????????? out = (n / val).sum()? # not the actual function being done, but > you get the idea > broad casting should work, e.g. (n[:,np.newaxis] / val[np.newaxis,:]).sum() but it constructs the full product array, which is memory intensive for a reduce operation, if the 1d arrays are large. another candidate for a cython loop if the arrays are large? Josef From sccolbert at gmail.com Thu May 7 13:04:46 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Thu, 7 May 2009 13:04:46 -0400 Subject: [Numpy-discussion] element wise help In-Reply-To: <1cd32cbb0905070956r268dc891jc762e119d69858a@mail.gmail.com> References: <7f014ea60905070939j2c66ecbbu7f7cbb7f0a27e2d6@mail.gmail.com> <1cd32cbb0905070956r268dc891jc762e119d69858a@mail.gmail.com> Message-ID: <7f014ea60905071004n41aef796s8667c0a9745370f1@mail.gmail.com> unfortunately, the actual function being processes is not so simple, and involves evaluating user functions input from the prompt as strings. So i have no idea how to do it in Cython. Let me look into this broadcasting. Thanks Josef! On Thu, May 7, 2009 at 12:56 PM, wrote: > On Thu, May 7, 2009 at 12:39 PM, Chris Colbert > wrote: > > suppose i have two arrays: n and t, both are 1-D arrays. > > > > for each value in t, I need to use it to perform an element wise scalar > > operation on every value in n and then sum the results into a single > scalar > > to be stored in the output array. > > > > Is there any way to do this without the for loop like below: > > > > for val in t_array: > > > > out = (n / val).sum() # not the actual function being done, > but > > you get the idea > > > > > broad casting should work, e.g. > > (n[:,np.newaxis] / val[np.newaxis,:]).sum() > > but it constructs the full product array, which is memory intensive > for a reduce operation, if the 1d arrays are large. > > another candidate for a cython loop if the arrays are large? > > Josef > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sccolbert at gmail.com Thu May 7 13:08:43 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Thu, 7 May 2009 13:08:43 -0400 Subject: [Numpy-discussion] element wise help In-Reply-To: <7f014ea60905071004n41aef796s8667c0a9745370f1@mail.gmail.com> References: <7f014ea60905070939j2c66ecbbu7f7cbb7f0a27e2d6@mail.gmail.com> <1cd32cbb0905070956r268dc891jc762e119d69858a@mail.gmail.com> <7f014ea60905071004n41aef796s8667c0a9745370f1@mail.gmail.com> Message-ID: <7f014ea60905071008r39a6a73eie040dce046fe7d31@mail.gmail.com> let me just post my code: t is the time array and n is also an array. For every value of time t, these operations are performed on the entire array n. Then, n is summed to a scalar which represents the system response at time t. I would like to eliminate this for loop if possible. Chris #### code #### b = 4.7 f = [] n = arange(1, N+1, 1) for t in timearray: arg1 = {'S': ((b/t) + (1J*n*pi/t))} exec('from numpy import *', arg1) tempval = eval(transform, arg1)*((-1)**n) rsum = tempval.real.sum() arg2 = {'S': b/t} exec('from numpy import *', arg2) tempval2 = eval(transform, arg2)*0.5 fval = (exp(b) / t) * (tempval2 + rsum) f.append(fval) #### /code ##### On Thu, May 7, 2009 at 1:04 PM, Chris Colbert wrote: > unfortunately, the actual function being processes is not so simple, and > involves evaluating user functions input from the prompt as strings. So i > have no idea how to do it in Cython. > > Let me look into this broadcasting. > > Thanks Josef! > > > On Thu, May 7, 2009 at 12:56 PM, wrote: > >> On Thu, May 7, 2009 at 12:39 PM, Chris Colbert >> wrote: >> > suppose i have two arrays: n and t, both are 1-D arrays. >> > >> > for each value in t, I need to use it to perform an element wise scalar >> > operation on every value in n and then sum the results into a single >> scalar >> > to be stored in the output array. >> > >> > Is there any way to do this without the for loop like below: >> > >> > for val in t_array: >> > >> > out = (n / val).sum() # not the actual function being done, >> but >> > you get the idea >> > >> >> >> broad casting should work, e.g. >> >> (n[:,np.newaxis] / val[np.newaxis,:]).sum() >> >> but it constructs the full product array, which is memory intensive >> for a reduce operation, if the 1d arrays are large. >> >> another candidate for a cython loop if the arrays are large? >> >> Josef >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu May 7 13:37:40 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 7 May 2009 13:37:40 -0400 Subject: [Numpy-discussion] element wise help In-Reply-To: <7f014ea60905071008r39a6a73eie040dce046fe7d31@mail.gmail.com> References: <7f014ea60905070939j2c66ecbbu7f7cbb7f0a27e2d6@mail.gmail.com> <1cd32cbb0905070956r268dc891jc762e119d69858a@mail.gmail.com> <7f014ea60905071004n41aef796s8667c0a9745370f1@mail.gmail.com> <7f014ea60905071008r39a6a73eie040dce046fe7d31@mail.gmail.com> Message-ID: <1cd32cbb0905071037h5c9e9754tb4b1346eb53a88cc@mail.gmail.com> On Thu, May 7, 2009 at 1:08 PM, Chris Colbert wrote: > let me just post my code: > > t is the time array and n is also an array. > > For every value of time t, these operations are performed on the entire > array n. Then, n is summed to a scalar which represents the system response > at time t. > > I would like to eliminate this for loop if possible. > > Chris > > #### code #### > > b = 4.7 > f = [] > n = arange(1, N+1, 1) > > for t in timearray: > ??????? arg1 = {'S': ((b/t) + (1J*n*pi/t))} > ??????? exec('from numpy import *', arg1) > ??????? tempval = eval(transform, arg1)*((-1)**n) > ??????? rsum = tempval.real.sum() > ??????? arg2 = {'S': b/t} > ??????? exec('from numpy import *', arg2) > ??????? tempval2 = eval(transform, arg2)*0.5 > ??????? fval = (exp(b) / t) * (tempval2 + rsum) > ??????? f.append(fval) > > > #### /code ##### > I don't understand what the exec statements are doing, I never use it. what is transform? Can you use regular functions instead or is there a special reason for the exec and eval? In these expressions ((b/t) + (1J*n*pi/t)), (exp(b) / t) broadcasting can be used. Whats the size of t and n? Josef From sccolbert at gmail.com Thu May 7 13:41:31 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Thu, 7 May 2009 13:41:31 -0400 Subject: [Numpy-discussion] element wise help In-Reply-To: <1cd32cbb0905071037h5c9e9754tb4b1346eb53a88cc@mail.gmail.com> References: <7f014ea60905070939j2c66ecbbu7f7cbb7f0a27e2d6@mail.gmail.com> <1cd32cbb0905070956r268dc891jc762e119d69858a@mail.gmail.com> <7f014ea60905071004n41aef796s8667c0a9745370f1@mail.gmail.com> <7f014ea60905071008r39a6a73eie040dce046fe7d31@mail.gmail.com> <1cd32cbb0905071037h5c9e9754tb4b1346eb53a88cc@mail.gmail.com> Message-ID: <7f014ea60905071041n1b000616nca1ea88ab2330a89@mail.gmail.com> its part of a larger program for designing PID controllers. This particular function numerical calculates the inverse laplace transform using riemann sums. The exec statements, from what i gather, allow the follow eval statement to be executed in the scope of numpy and its functions. I don't get how it works either, but it doesnt work without it. I've just about got something working using broadcasting and will post it soon. chris On Thu, May 7, 2009 at 1:37 PM, wrote: > On Thu, May 7, 2009 at 1:08 PM, Chris Colbert wrote: > > let me just post my code: > > > > t is the time array and n is also an array. > > > > For every value of time t, these operations are performed on the entire > > array n. Then, n is summed to a scalar which represents the system > response > > at time t. > > > > I would like to eliminate this for loop if possible. > > > > Chris > > > > #### code #### > > > > b = 4.7 > > f = [] > > n = arange(1, N+1, 1) > > > > for t in timearray: > > arg1 = {'S': ((b/t) + (1J*n*pi/t))} > > exec('from numpy import *', arg1) > > tempval = eval(transform, arg1)*((-1)**n) > > rsum = tempval.real.sum() > > arg2 = {'S': b/t} > > exec('from numpy import *', arg2) > > tempval2 = eval(transform, arg2)*0.5 > > fval = (exp(b) / t) * (tempval2 + rsum) > > f.append(fval) > > > > > > #### /code ##### > > > > I don't understand what the exec statements are doing, I never use it. > what is transform? > Can you use regular functions instead or is there a special reason for > the exec and eval? > > In these expressions ((b/t) + (1J*n*pi/t)), (exp(b) / t) > broadcasting can be used. > > Whats the size of t and n? > > Josef > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sccolbert at gmail.com Thu May 7 14:11:08 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Thu, 7 May 2009 14:11:08 -0400 Subject: [Numpy-discussion] element wise help In-Reply-To: <7f014ea60905071041n1b000616nca1ea88ab2330a89@mail.gmail.com> References: <7f014ea60905070939j2c66ecbbu7f7cbb7f0a27e2d6@mail.gmail.com> <1cd32cbb0905070956r268dc891jc762e119d69858a@mail.gmail.com> <7f014ea60905071004n41aef796s8667c0a9745370f1@mail.gmail.com> <7f014ea60905071008r39a6a73eie040dce046fe7d31@mail.gmail.com> <1cd32cbb0905071037h5c9e9754tb4b1346eb53a88cc@mail.gmail.com> <7f014ea60905071041n1b000616nca1ea88ab2330a89@mail.gmail.com> Message-ID: <7f014ea60905071111q6e27f7b0s98b2d3e210d8dc79@mail.gmail.com> alright I got it working. Thanks! This version is an astonishingly 1900x faster than my original implementation which had two for loops. Both versions are below: thanks again! ### new fast code #### b = 4.7 n = arange(1, N+1, 1.0).reshape(N, -1) n1 = (-1)**n prefix = exp(b) / timearray arg1 = {'S': b / timearray} exec('from numpy import *', arg1) term1 = (0.5) * eval(transform, arg1) temp1 = b + (1J * pi * n) temp2 = temp1 / timearray arg2 = {'S': temp2} exec('from numpy import *', arg2) term2 = (eval(transform, arg2) * n1).sum(axis=0).real f = prefix * (term1 + term2) return f ##### old slow code ###### b = 4.7 f = [] for t in timearray: rsum = 0.0 for n in range(1, N+1): arg1 = {'S': ((b/t) + (1J*n*pi/t))} exec('from numpy import *', arg1) tempval = eval(transform, arg1)*((-1)**n) rsum = rsum + tempval.real arg2 = {'S': b/t} exec('from numpy import *', arg2) tempval2 = eval(transform, arg2)*0.5 fval = (exp(b) / t) * (tempval2 + rsum) f.append(fval) return f On Thu, May 7, 2009 at 1:41 PM, Chris Colbert wrote: > its part of a larger program for designing PID controllers. This particular > function numerical calculates the inverse laplace transform using riemann > sums. > > The exec statements, from what i gather, allow the follow eval statement to > be executed in the scope of numpy and its functions. I don't get how it > works either, but it doesnt work without it. > > I've just about got something working using broadcasting and will post it > soon. > > chris > > > On Thu, May 7, 2009 at 1:37 PM, wrote: > >> On Thu, May 7, 2009 at 1:08 PM, Chris Colbert >> wrote: >> > let me just post my code: >> > >> > t is the time array and n is also an array. >> > >> > For every value of time t, these operations are performed on the entire >> > array n. Then, n is summed to a scalar which represents the system >> response >> > at time t. >> > >> > I would like to eliminate this for loop if possible. >> > >> > Chris >> > >> > #### code #### >> > >> > b = 4.7 >> > f = [] >> > n = arange(1, N+1, 1) >> > >> > for t in timearray: >> > arg1 = {'S': ((b/t) + (1J*n*pi/t))} >> > exec('from numpy import *', arg1) >> > tempval = eval(transform, arg1)*((-1)**n) >> > rsum = tempval.real.sum() >> > arg2 = {'S': b/t} >> > exec('from numpy import *', arg2) >> > tempval2 = eval(transform, arg2)*0.5 >> > fval = (exp(b) / t) * (tempval2 + rsum) >> > f.append(fval) >> > >> > >> > #### /code ##### >> > >> >> I don't understand what the exec statements are doing, I never use it. >> what is transform? >> Can you use regular functions instead or is there a special reason for >> the exec and eval? >> >> In these expressions ((b/t) + (1J*n*pi/t)), (exp(b) / t) >> broadcasting can be used. >> >> Whats the size of t and n? >> >> Josef >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Thu May 7 14:14:16 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Thu, 07 May 2009 13:14:16 -0500 Subject: [Numpy-discussion] FYI: Numpy and Unladen Swallow Message-ID: <4A0324F8.80801@gmail.com> Hi, LWN.net had a article on the development of Unladen Swallow that aims to speed up CPython. http://lwn.net/SubscriberLink/332038/675304c610f0e34a/ The project link is: http://code.google.com/p/unladen-swallow/ At least on my Linux system, Numpy does run without any test failures. It appears slightly faster than my distros Python2.5 (the installation overwrote my local copy of Python2.6 that I built myself so I can not say if is faster than that). Bruce From josef.pktd at gmail.com Thu May 7 14:45:36 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 7 May 2009 14:45:36 -0400 Subject: [Numpy-discussion] element wise help In-Reply-To: <7f014ea60905071111q6e27f7b0s98b2d3e210d8dc79@mail.gmail.com> References: <7f014ea60905070939j2c66ecbbu7f7cbb7f0a27e2d6@mail.gmail.com> <1cd32cbb0905070956r268dc891jc762e119d69858a@mail.gmail.com> <7f014ea60905071004n41aef796s8667c0a9745370f1@mail.gmail.com> <7f014ea60905071008r39a6a73eie040dce046fe7d31@mail.gmail.com> <1cd32cbb0905071037h5c9e9754tb4b1346eb53a88cc@mail.gmail.com> <7f014ea60905071041n1b000616nca1ea88ab2330a89@mail.gmail.com> <7f014ea60905071111q6e27f7b0s98b2d3e210d8dc79@mail.gmail.com> Message-ID: <1cd32cbb0905071145i79bfe627r226a4592f1b80157@mail.gmail.com> On Thu, May 7, 2009 at 2:11 PM, Chris Colbert wrote: > alright I got it working. Thanks! > > This version is an astonishingly 1900x faster than my original > implementation which had two for loops. Both versions are below: > > thanks again! > > ### new fast code #### > > ??? b = 4.7 > ??? n = arange(1, N+1, 1.0).reshape(N, -1) > ??? n1 = (-1)**n > ??? prefix = exp(b) / timearray > > ??? arg1 = {'S': b / timearray} > ??? exec('from numpy import *', arg1) > ??? term1 = (0.5) * eval(transform, arg1) > > ??? temp1 = b + (1J * pi * n) > ??? temp2 = temp1 / timearray > ??? arg2 = {'S': temp2} > ??? exec('from numpy import *', arg2) > ??? term2 = (eval(transform, arg2) * n1).sum(axis=0).real > > ??? f = prefix * (term1 + term2) > > ??? return f If you don't do code generation and have control over transform, then, I think, it would be more readable to replace the exec and eval by a function call to transform. I haven't found a case yet where eval is necessary, except for code generation as in sympy. Josef > > ##### old slow code ###### > ??? b = 4.7 > ??? f = [] > > ??? for t in timearray: > ??????? rsum = 0.0 > ??????? for n in range(1, N+1): > ??????????? arg1 = {'S': ((b/t) + (1J*n*pi/t))} > ??????????? exec('from numpy import *', arg1) > ??????????? tempval = eval(transform, arg1)*((-1)**n) > ??????????? rsum = rsum + tempval.real > ??????? arg2 = {'S': b/t} > ??????? exec('from numpy import *', arg2) > ??????? tempval2 = eval(transform, arg2)*0.5 > ??????? fval = (exp(b) / t) * (tempval2 + rsum) > ??????? f.append(fval) > > ?? return f > > > > > On Thu, May 7, 2009 at 1:41 PM, Chris Colbert wrote: >> >> its part of a larger program for designing PID controllers. This >> particular function numerical calculates the inverse laplace transform using >> riemann sums. >> >> The exec statements, from what i gather, allow the follow eval statement >> to be executed in the scope of numpy and its functions. I don't get how it >> works either, but it doesnt work without it. >> >> I've just about got something working using broadcasting and will post it >> soon. >> >> chris >> >> On Thu, May 7, 2009 at 1:37 PM, wrote: >>> >>> On Thu, May 7, 2009 at 1:08 PM, Chris Colbert >>> wrote: >>> > let me just post my code: >>> > >>> > t is the time array and n is also an array. >>> > >>> > For every value of time t, these operations are performed on the entire >>> > array n. Then, n is summed to a scalar which represents the system >>> > response >>> > at time t. >>> > >>> > I would like to eliminate this for loop if possible. >>> > >>> > Chris >>> > >>> > #### code #### >>> > >>> > b = 4.7 >>> > f = [] >>> > n = arange(1, N+1, 1) >>> > >>> > for t in timearray: >>> > ??????? arg1 = {'S': ((b/t) + (1J*n*pi/t))} >>> > ??????? exec('from numpy import *', arg1) >>> > ??????? tempval = eval(transform, arg1)*((-1)**n) >>> > ??????? rsum = tempval.real.sum() >>> > ??????? arg2 = {'S': b/t} >>> > ??????? exec('from numpy import *', arg2) >>> > ??????? tempval2 = eval(transform, arg2)*0.5 >>> > ??????? fval = (exp(b) / t) * (tempval2 + rsum) >>> > ??????? f.append(fval) >>> > >>> > >>> > #### /code ##### >>> > >>> >>> I don't understand what the exec statements are doing, I never use it. >>> what is transform? >>> Can you use regular functions instead or is there a special reason for >>> the exec and eval? >>> >>> In these expressions ((b/t) + (1J*n*pi/t)), ?(exp(b) / t) >>> broadcasting can be used. >>> >>> Whats the size of t and n? >>> >>> Josef >>> _______________________________________________ >>> Numpy-discussion mailing list >>> Numpy-discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From sccolbert at gmail.com Thu May 7 15:10:23 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Thu, 7 May 2009 15:10:23 -0400 Subject: [Numpy-discussion] element wise help In-Reply-To: <1cd32cbb0905071145i79bfe627r226a4592f1b80157@mail.gmail.com> References: <7f014ea60905070939j2c66ecbbu7f7cbb7f0a27e2d6@mail.gmail.com> <1cd32cbb0905070956r268dc891jc762e119d69858a@mail.gmail.com> <7f014ea60905071004n41aef796s8667c0a9745370f1@mail.gmail.com> <7f014ea60905071008r39a6a73eie040dce046fe7d31@mail.gmail.com> <1cd32cbb0905071037h5c9e9754tb4b1346eb53a88cc@mail.gmail.com> <7f014ea60905071041n1b000616nca1ea88ab2330a89@mail.gmail.com> <7f014ea60905071111q6e27f7b0s98b2d3e210d8dc79@mail.gmail.com> <1cd32cbb0905071145i79bfe627r226a4592f1b80157@mail.gmail.com> Message-ID: <7f014ea60905071210k3b4926a6pa99e882e871bac25@mail.gmail.com> the user of the program inputs the transform in a text field. So I have no way of know the function apriori. that doesn't mean I still couldn't throw the exec and eval commands into another function just to clean things up. Chris On Thu, May 7, 2009 at 2:45 PM, wrote: > On Thu, May 7, 2009 at 2:11 PM, Chris Colbert wrote: > > alright I got it working. Thanks! > > > > This version is an astonishingly 1900x faster than my original > > implementation which had two for loops. Both versions are below: > > > > thanks again! > > > > ### new fast code #### > > > > b = 4.7 > > n = arange(1, N+1, 1.0).reshape(N, -1) > > n1 = (-1)**n > > prefix = exp(b) / timearray > > > > arg1 = {'S': b / timearray} > > exec('from numpy import *', arg1) > > term1 = (0.5) * eval(transform, arg1) > > > > temp1 = b + (1J * pi * n) > > temp2 = temp1 / timearray > > arg2 = {'S': temp2} > > exec('from numpy import *', arg2) > > term2 = (eval(transform, arg2) * n1).sum(axis=0).real > > > > f = prefix * (term1 + term2) > > > > return f > > If you don't do code generation and have control over transform, then, > I think, it would be more readable to replace the exec and eval by a > function call to transform. > > I haven't found a case yet where eval is necessary, except for code > generation as in sympy. > > Josef > > > > > > > ##### old slow code ###### > > b = 4.7 > > f = [] > > > > for t in timearray: > > rsum = 0.0 > > for n in range(1, N+1): > > arg1 = {'S': ((b/t) + (1J*n*pi/t))} > > exec('from numpy import *', arg1) > > tempval = eval(transform, arg1)*((-1)**n) > > rsum = rsum + tempval.real > > arg2 = {'S': b/t} > > exec('from numpy import *', arg2) > > tempval2 = eval(transform, arg2)*0.5 > > fval = (exp(b) / t) * (tempval2 + rsum) > > f.append(fval) > > > > return f > > > > > > > > > > On Thu, May 7, 2009 at 1:41 PM, Chris Colbert > wrote: > >> > >> its part of a larger program for designing PID controllers. This > >> particular function numerical calculates the inverse laplace transform > using > >> riemann sums. > >> > >> The exec statements, from what i gather, allow the follow eval statement > >> to be executed in the scope of numpy and its functions. I don't get how > it > >> works either, but it doesnt work without it. > >> > >> I've just about got something working using broadcasting and will post > it > >> soon. > >> > >> chris > >> > >> On Thu, May 7, 2009 at 1:37 PM, wrote: > >>> > >>> On Thu, May 7, 2009 at 1:08 PM, Chris Colbert > >>> wrote: > >>> > let me just post my code: > >>> > > >>> > t is the time array and n is also an array. > >>> > > >>> > For every value of time t, these operations are performed on the > entire > >>> > array n. Then, n is summed to a scalar which represents the system > >>> > response > >>> > at time t. > >>> > > >>> > I would like to eliminate this for loop if possible. > >>> > > >>> > Chris > >>> > > >>> > #### code #### > >>> > > >>> > b = 4.7 > >>> > f = [] > >>> > n = arange(1, N+1, 1) > >>> > > >>> > for t in timearray: > >>> > arg1 = {'S': ((b/t) + (1J*n*pi/t))} > >>> > exec('from numpy import *', arg1) > >>> > tempval = eval(transform, arg1)*((-1)**n) > >>> > rsum = tempval.real.sum() > >>> > arg2 = {'S': b/t} > >>> > exec('from numpy import *', arg2) > >>> > tempval2 = eval(transform, arg2)*0.5 > >>> > fval = (exp(b) / t) * (tempval2 + rsum) > >>> > f.append(fval) > >>> > > >>> > > >>> > #### /code ##### > >>> > > >>> > >>> I don't understand what the exec statements are doing, I never use it. > >>> what is transform? > >>> Can you use regular functions instead or is there a special reason for > >>> the exec and eval? > >>> > >>> In these expressions ((b/t) + (1J*n*pi/t)), (exp(b) / t) > >>> broadcasting can be used. > >>> > >>> Whats the size of t and n? > >>> > >>> Josef > >>> _______________________________________________ > >>> Numpy-discussion mailing list > >>> Numpy-discussion at scipy.org > >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion > >> > > > > > > _______________________________________________ > > Numpy-discussion mailing list > > Numpy-discussion at scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Thu May 7 15:39:43 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 7 May 2009 15:39:43 -0400 Subject: [Numpy-discussion] element wise help In-Reply-To: <7f014ea60905071210k3b4926a6pa99e882e871bac25@mail.gmail.com> References: <7f014ea60905070939j2c66ecbbu7f7cbb7f0a27e2d6@mail.gmail.com> <1cd32cbb0905070956r268dc891jc762e119d69858a@mail.gmail.com> <7f014ea60905071004n41aef796s8667c0a9745370f1@mail.gmail.com> <7f014ea60905071008r39a6a73eie040dce046fe7d31@mail.gmail.com> <1cd32cbb0905071037h5c9e9754tb4b1346eb53a88cc@mail.gmail.com> <7f014ea60905071041n1b000616nca1ea88ab2330a89@mail.gmail.com> <7f014ea60905071111q6e27f7b0s98b2d3e210d8dc79@mail.gmail.com> <1cd32cbb0905071145i79bfe627r226a4592f1b80157@mail.gmail.com> <7f014ea60905071210k3b4926a6pa99e882e871bac25@mail.gmail.com> Message-ID: <1cd32cbb0905071239t264ced31m63ff38d4876f9fe6@mail.gmail.com> On Thu, May 7, 2009 at 3:10 PM, Chris Colbert wrote: > the user of the program inputs the transform in a text field. So I have no > way of know the function apriori. > > that doesn't mean I still couldn't throw the exec and eval commands into > another function just to clean things up. > > Chris No, I think this is ok then, it is similar to what sympy.lambdify does. Now you call exec only twice, which might have been the main slowdown in the loop version. In this case, users need to write vectorized transform functions that handle the full n x t array. Josef > > On Thu, May 7, 2009 at 2:45 PM, wrote: >> >> On Thu, May 7, 2009 at 2:11 PM, Chris Colbert wrote: >> > alright I got it working. Thanks! >> > >> > This version is an astonishingly 1900x faster than my original >> > implementation which had two for loops. Both versions are below: >> > >> > thanks again! >> > >> > ### new fast code #### >> > >> > ??? b = 4.7 >> > ??? n = arange(1, N+1, 1.0).reshape(N, -1) >> > ??? n1 = (-1)**n >> > ??? prefix = exp(b) / timearray >> > >> > ??? arg1 = {'S': b / timearray} >> > ??? exec('from numpy import *', arg1) >> > ??? term1 = (0.5) * eval(transform, arg1) >> > >> > ??? temp1 = b + (1J * pi * n) >> > ??? temp2 = temp1 / timearray >> > ??? arg2 = {'S': temp2} >> > ??? exec('from numpy import *', arg2) >> > ??? term2 = (eval(transform, arg2) * n1).sum(axis=0).real >> > >> > ??? f = prefix * (term1 + term2) >> > >> > ??? return f >> >> If you don't do code generation and have control over transform, then, >> I think, it would be more readable to replace the exec and eval by a >> function call to transform. >> >> I haven't found a case yet where eval is necessary, except for code >> generation as in sympy. >> >> Josef >> >> >> >> > >> > ##### old slow code ###### >> > ??? b = 4.7 >> > ??? f = [] >> > >> > ??? for t in timearray: >> > ??????? rsum = 0.0 >> > ??????? for n in range(1, N+1): >> > ??????????? arg1 = {'S': ((b/t) + (1J*n*pi/t))} >> > ??????????? exec('from numpy import *', arg1) >> > ??????????? tempval = eval(transform, arg1)*((-1)**n) >> > ??????????? rsum = rsum + tempval.real >> > ??????? arg2 = {'S': b/t} >> > ??????? exec('from numpy import *', arg2) >> > ??????? tempval2 = eval(transform, arg2)*0.5 >> > ??????? fval = (exp(b) / t) * (tempval2 + rsum) >> > ??????? f.append(fval) >> > >> > ?? return f >> > >> > >> > >> > >> > On Thu, May 7, 2009 at 1:41 PM, Chris Colbert >> > wrote: >> >> >> >> its part of a larger program for designing PID controllers. This >> >> particular function numerical calculates the inverse laplace transform >> >> using >> >> riemann sums. >> >> >> >> The exec statements, from what i gather, allow the follow eval >> >> statement >> >> to be executed in the scope of numpy and its functions. I don't get how >> >> it >> >> works either, but it doesnt work without it. >> >> >> >> I've just about got something working using broadcasting and will post >> >> it >> >> soon. >> >> >> >> chris >> >> >> >> On Thu, May 7, 2009 at 1:37 PM, wrote: >> >>> >> >>> On Thu, May 7, 2009 at 1:08 PM, Chris Colbert >> >>> wrote: >> >>> > let me just post my code: >> >>> > >> >>> > t is the time array and n is also an array. >> >>> > >> >>> > For every value of time t, these operations are performed on the >> >>> > entire >> >>> > array n. Then, n is summed to a scalar which represents the system >> >>> > response >> >>> > at time t. >> >>> > >> >>> > I would like to eliminate this for loop if possible. >> >>> > >> >>> > Chris >> >>> > >> >>> > #### code #### >> >>> > >> >>> > b = 4.7 >> >>> > f = [] >> >>> > n = arange(1, N+1, 1) >> >>> > >> >>> > for t in timearray: >> >>> > ??????? arg1 = {'S': ((b/t) + (1J*n*pi/t))} >> >>> > ??????? exec('from numpy import *', arg1) >> >>> > ??????? tempval = eval(transform, arg1)*((-1)**n) >> >>> > ??????? rsum = tempval.real.sum() >> >>> > ??????? arg2 = {'S': b/t} >> >>> > ??????? exec('from numpy import *', arg2) >> >>> > ??????? tempval2 = eval(transform, arg2)*0.5 >> >>> > ??????? fval = (exp(b) / t) * (tempval2 + rsum) >> >>> > ??????? f.append(fval) >> >>> > >> >>> > >> >>> > #### /code ##### >> >>> > >> >>> >> >>> I don't understand what the exec statements are doing, I never use it. >> >>> what is transform? >> >>> Can you use regular functions instead or is there a special reason for >> >>> the exec and eval? >> >>> >> >>> In these expressions ((b/t) + (1J*n*pi/t)), ?(exp(b) / t) >> >>> broadcasting can be used. >> >>> >> >>> Whats the size of t and n? >> >>> >> >>> Josef >> >>> _______________________________________________ >> >>> Numpy-discussion mailing list >> >>> Numpy-discussion at scipy.org >> >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> >> > >> > >> > _______________________________________________ >> > Numpy-discussion mailing list >> > Numpy-discussion at scipy.org >> > http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > >> > >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From josef.pktd at gmail.com Thu May 7 16:22:05 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 7 May 2009 16:22:05 -0400 Subject: [Numpy-discussion] element wise help In-Reply-To: <1cd32cbb0905071239t264ced31m63ff38d4876f9fe6@mail.gmail.com> References: <7f014ea60905070939j2c66ecbbu7f7cbb7f0a27e2d6@mail.gmail.com> <1cd32cbb0905070956r268dc891jc762e119d69858a@mail.gmail.com> <7f014ea60905071004n41aef796s8667c0a9745370f1@mail.gmail.com> <7f014ea60905071008r39a6a73eie040dce046fe7d31@mail.gmail.com> <1cd32cbb0905071037h5c9e9754tb4b1346eb53a88cc@mail.gmail.com> <7f014ea60905071041n1b000616nca1ea88ab2330a89@mail.gmail.com> <7f014ea60905071111q6e27f7b0s98b2d3e210d8dc79@mail.gmail.com> <1cd32cbb0905071145i79bfe627r226a4592f1b80157@mail.gmail.com> <7f014ea60905071210k3b4926a6pa99e882e871bac25@mail.gmail.com> <1cd32cbb0905071239t264ced31m63ff38d4876f9fe6@mail.gmail.com> Message-ID: <1cd32cbb0905071322o19a52d6sb9c41f9e96511999@mail.gmail.com> On Thu, May 7, 2009 at 3:39 PM, wrote: > On Thu, May 7, 2009 at 3:10 PM, Chris Colbert wrote: >> the user of the program inputs the transform in a text field. So I have no >> way of know the function apriori. >> >> that doesn't mean I still couldn't throw the exec and eval commands into >> another function just to clean things up. >> >> Chris > > No, I think this is ok then, it is similar to what sympy.lambdify > does. Now you call exec only twice, which might have been the main > slowdown in the loop version. > > In this case, users need to write vectorized transform functions that > handle the full n x t array. > > Josef this would be an alternative, which might also work better in a loop, requires numpy.* in local scope >>> transform = 'sqrt(x)' >>> exec('def fun(x): return ' + transform) >>> from numpy import * >>> fun(5) 2.2360679774997898 >>> fun(np.arange(5)) array([ 0. , 1. , 1.41421356, 1.73205081, 2. ]) >>> >>> fun Josef From sccolbert at gmail.com Thu May 7 16:25:19 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Thu, 7 May 2009 16:25:19 -0400 Subject: [Numpy-discussion] element wise help In-Reply-To: <1cd32cbb0905071322o19a52d6sb9c41f9e96511999@mail.gmail.com> References: <7f014ea60905070939j2c66ecbbu7f7cbb7f0a27e2d6@mail.gmail.com> <7f014ea60905071004n41aef796s8667c0a9745370f1@mail.gmail.com> <7f014ea60905071008r39a6a73eie040dce046fe7d31@mail.gmail.com> <1cd32cbb0905071037h5c9e9754tb4b1346eb53a88cc@mail.gmail.com> <7f014ea60905071041n1b000616nca1ea88ab2330a89@mail.gmail.com> <7f014ea60905071111q6e27f7b0s98b2d3e210d8dc79@mail.gmail.com> <1cd32cbb0905071145i79bfe627r226a4592f1b80157@mail.gmail.com> <7f014ea60905071210k3b4926a6pa99e882e871bac25@mail.gmail.com> <1cd32cbb0905071239t264ced31m63ff38d4876f9fe6@mail.gmail.com> <1cd32cbb0905071322o19a52d6sb9c41f9e96511999@mail.gmail.com> Message-ID: <7f014ea60905071325o6e6b10dxd192c644c2f4c4a6@mail.gmail.com> that's essentially what the eval statement does. On Thu, May 7, 2009 at 4:22 PM, wrote: > On Thu, May 7, 2009 at 3:39 PM, wrote: > > On Thu, May 7, 2009 at 3:10 PM, Chris Colbert > wrote: > >> the user of the program inputs the transform in a text field. So I have > no > >> way of know the function apriori. > >> > >> that doesn't mean I still couldn't throw the exec and eval commands into > >> another function just to clean things up. > >> > >> Chris > > > > No, I think this is ok then, it is similar to what sympy.lambdify > > does. Now you call exec only twice, which might have been the main > > slowdown in the loop version. > > > > In this case, users need to write vectorized transform functions that > > handle the full n x t array. > > > > Josef > > this would be an alternative, which might also work better in a loop, > requires numpy.* in local scope > > >>> transform = 'sqrt(x)' > >>> exec('def fun(x): return ' + transform) > >>> from numpy import * > >>> fun(5) > 2.2360679774997898 > >>> fun(np.arange(5)) > array([ 0. , 1. , 1.41421356, 1.73205081, 2. ]) > >>> > >>> fun > > > Josef > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Thu May 7 16:31:06 2009 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 7 May 2009 16:31:06 -0400 Subject: [Numpy-discussion] element wise help In-Reply-To: <7f014ea60905071325o6e6b10dxd192c644c2f4c4a6@mail.gmail.com> References: <7f014ea60905070939j2c66ecbbu7f7cbb7f0a27e2d6@mail.gmail.com> <7f014ea60905071008r39a6a73eie040dce046fe7d31@mail.gmail.com> <1cd32cbb0905071037h5c9e9754tb4b1346eb53a88cc@mail.gmail.com> <7f014ea60905071041n1b000616nca1ea88ab2330a89@mail.gmail.com> <7f014ea60905071111q6e27f7b0s98b2d3e210d8dc79@mail.gmail.com> <1cd32cbb0905071145i79bfe627r226a4592f1b80157@mail.gmail.com> <7f014ea60905071210k3b4926a6pa99e882e871bac25@mail.gmail.com> <1cd32cbb0905071239t264ced31m63ff38d4876f9fe6@mail.gmail.com> <1cd32cbb0905071322o19a52d6sb9c41f9e96511999@mail.gmail.com> <7f014ea60905071325o6e6b10dxd192c644c2f4c4a6@mail.gmail.com> Message-ID: <3d375d730905071331g5a7e23f7mf82cc7a1d687c44b@mail.gmail.com> On Thu, May 7, 2009 at 16:25, Chris Colbert wrote: > that's essentially what the eval statement does. The difference would be performance. Although I wouldn't bet money on the sign of that difference. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From charlesr.harris at gmail.com Thu May 7 17:35:45 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 7 May 2009 15:35:45 -0600 Subject: [Numpy-discussion] efficient 3d histogram creation In-Reply-To: <4A02C6B3.3030404@student.matnat.uio.no> References: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> <1cd32cbb0905041318g11dea0b9oa61f08bcc380144d@mail.gmail.com> <91cf711d0905050646q7652eepc55d1aed17b1d1ed@mail.gmail.com> <7f014ea60905061506o7253a315p940d6cbd2cc2b420@mail.gmail.com> <1cd32cbb0905061630t1e73e8a5i4bd454f789e22714@mail.gmail.com> <7f014ea60905061639k21dd12b5j4ee6c2738758284@mail.gmail.com> <1cd32cbb0905061721y233aa4c7uf580ad1d99539b1c@mail.gmail.com> <7f014ea60905061734y3dfcb299w14681ff0020a4de1@mail.gmail.com> <9457e7c80905070411x6d51c768g20f4dd0990b843a8@mail.gmail.com> <4A02C6B3.3030404@student.matnat.uio.no> Message-ID: 2009/5/7 Dag Sverre Seljebotn > St?fan van der Walt wrote: > > 2009/5/7 Chris Colbert : > >> This was really my first attempt at doing anything constructive with > Cython. > >> It was actually unbelievably easy to work with. I think i spent less > time > >> working on this, than I did trying to find an optimized solution using > pure > >> numpy and python. > > > > One aspect we often overlook is how easy it is to write a for-loop in > > comparison to vectorisation. Besides, for-loops are sometimes easier > > to read as well! > > > > I think the Cython guys are planning some sort of templating, but I'll > > CC Dag so that he can tell us more. > > We were discussing how it would/should look like, but noone's committed > to implementing it so it's pretty much up in the blue I think -- someone > might jump in and do it next week, or it might go another year, I can't > tell. > > While I'm here, also note in that code Chris wrote that you want to pay > attention to the change of default division semantics on Cython 0.12 > (especially for speed). > > http://wiki.cython.org/enhancements/division > Hi Dag, Numpy can now do separate compilations with controlled export of symbols when the object files are linked together to make a module. Does Cython have anyway of controlling the visibility of symbols or should we just include the right files in Numpy to get the needed macros? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sccolbert at gmail.com Thu May 7 18:20:34 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Thu, 7 May 2009 18:20:34 -0400 Subject: [Numpy-discussion] efficient 3d histogram creation In-Reply-To: References: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> <91cf711d0905050646q7652eepc55d1aed17b1d1ed@mail.gmail.com> <7f014ea60905061506o7253a315p940d6cbd2cc2b420@mail.gmail.com> <1cd32cbb0905061630t1e73e8a5i4bd454f789e22714@mail.gmail.com> <7f014ea60905061639k21dd12b5j4ee6c2738758284@mail.gmail.com> <1cd32cbb0905061721y233aa4c7uf580ad1d99539b1c@mail.gmail.com> <7f014ea60905061734y3dfcb299w14681ff0020a4de1@mail.gmail.com> <9457e7c80905070411x6d51c768g20f4dd0990b843a8@mail.gmail.com> <4A02C6B3.3030404@student.matnat.uio.no> Message-ID: <7f014ea60905071520g50de3887q2544de2d98cc1493@mail.gmail.com> after looking at it for a while, I don't see a way to easily speed it up using pure numpy. As a matter of fact, the behavior shown below is a little confusing. Using fancy indexing, multiples of the same index are interpreted as a single call to that index, probably this a for a reason that I dont currently understand. I would think multiple calls to the same index would cause multiple increments in the example below. For the life of me, I can't think of how to do this 3d histogram in numpy without a for loop. Chris ########## example code ############# >>> a = np.arange(0,8,1).reshape((2,2,2)) >>> a array([[[0, 1], [2, 3]], [[4, 5], [6, 7]]]) >>> indx = np.array([[0,0,0],[0,0,1],[0,1,0],[0,1,1],[1,0,0],[1,0,1],[1,1,0],[1,1,1]]) >>> indx = indx.reshape(4,2,3) >>> a[indx[:,:,0], indx[:,:,1], indx[:,:,2]]+=1 >>> a array([[[2, 3], [4, 5]], [[6, 7], [8, 9]]]) >>> indx2 = np.zeros((4,2,3)).astype(np.uint8) >>> a[indx2[:,:,0], indx2[:,:,1], indx2[:,:,2]]+=1 >>> a array([[[3, 3], [4, 5]], [[6, 7], [8, 9]]]) >>> On Thu, May 7, 2009 at 5:35 PM, Charles R Harris wrote: > > > 2009/5/7 Dag Sverre Seljebotn > >> St?fan van der Walt wrote: >> > 2009/5/7 Chris Colbert : >> >> This was really my first attempt at doing anything constructive with >> Cython. >> >> It was actually unbelievably easy to work with. I think i spent less >> time >> >> working on this, than I did trying to find an optimized solution using >> pure >> >> numpy and python. >> > >> > One aspect we often overlook is how easy it is to write a for-loop in >> > comparison to vectorisation. Besides, for-loops are sometimes easier >> > to read as well! >> > >> > I think the Cython guys are planning some sort of templating, but I'll >> > CC Dag so that he can tell us more. >> >> We were discussing how it would/should look like, but noone's committed >> to implementing it so it's pretty much up in the blue I think -- someone >> might jump in and do it next week, or it might go another year, I can't >> tell. >> >> While I'm here, also note in that code Chris wrote that you want to pay >> attention to the change of default division semantics on Cython 0.12 >> (especially for speed). >> >> http://wiki.cython.org/enhancements/division >> > > Hi Dag, > > Numpy can now do separate compilations with controlled export of symbols > when the object files are linked together to make a module. Does Cython have > anyway of controlling the visibility of symbols or should we just include > the right files in Numpy to get the needed macros? > > Chuck > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From brennan.williams at visualreservoir.com Thu May 7 19:36:09 2009 From: brennan.williams at visualreservoir.com (Brennan Williams) Date: Fri, 08 May 2009 11:36:09 +1200 Subject: [Numpy-discussion] replacing Nan's in a string array converted from a float array Message-ID: <4A037069.2030204@visualreservoir.com> I've created an array of strings using something like.... stringarray=self.karray.astype("|S8") If the array value is a Nan I get "1.#QNAN" in my string array. For cosmetic reasons I'd like to change this to something else, e.g. "invalid" or "inactive". My string array can be up to 100,000+ values. Is there a fast way to do this? Thanks Brennan From robert.kern at gmail.com Thu May 7 20:00:43 2009 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 7 May 2009 20:00:43 -0400 Subject: [Numpy-discussion] replacing Nan's in a string array converted from a float array In-Reply-To: <4A037069.2030204@visualreservoir.com> References: <4A037069.2030204@visualreservoir.com> Message-ID: <3d375d730905071700l667f7ec5l2d37d99c4ea96bbc@mail.gmail.com> On Thu, May 7, 2009 at 19:36, Brennan Williams wrote: > I've created an array of strings using something like.... > > ? ? ? ? ? ? ?stringarray=self.karray.astype("|S8") > > If the array value is a Nan I get "1.#QNAN" in my string array. > > For cosmetic reasons I'd like to change this to something else, e.g. > "invalid" or "inactive". > > My string array can be up to 100,000+ values. > > Is there a fast way to do this? Well, there is a print option that lets you change how nans are represented when arrays are printed. It is possible that this setting should also be used when converting to string arrays. However, it does not do so currently: In [9]: %push_print --nanstr invalid Precision: 8 Threshold: 1000 Edge items: 3 Line width: 75 Suppress: False NaN: invalid Inf: Inf In [10]: a = zeros(10) In [11]: a[5] = nan In [12]: a Out[12]: array([ 0., 0., 0., 0., 0., invalid, 0., 0., 0., 0.]) In [13]: a.astype('|S8') Out[13]: array(['0.0', '0.0', '0.0', '0.0', '0.0', 'nan', '0.0', '0.0', '0.0', '0.0'], dtype='|S8') You will need to use the typical approach: mask = (stringarray == '1.#QNAN') stringarray[mask] = 'invalid' This will be wasteful of memory, so with your large array size, you might want to consider breaking it into chunks and modifying the chunks in this way. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From charlesr.harris at gmail.com Thu May 7 20:51:17 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 7 May 2009 18:51:17 -0600 Subject: [Numpy-discussion] replacing Nan's in a string array converted from a float array In-Reply-To: <4A037069.2030204@visualreservoir.com> References: <4A037069.2030204@visualreservoir.com> Message-ID: On Thu, May 7, 2009 at 5:36 PM, Brennan Williams < brennan.williams at visualreservoir.com> wrote: > I've created an array of strings using something like.... > > stringarray=self.karray.astype("|S8") > > If the array value is a Nan I get "1.#QNAN" in my string array. > > For cosmetic reasons I'd like to change this to something else, e.g. > "invalid" or "inactive". > > My string array can be up to 100,000+ values. > > Is there a fast way to do this? > I think this is a bug. Making the printing of nans uniform was one of the goals of numpy 1.3, although a few bits were unfixable. However, this looks fixable. If you are using 1.3 please open a ticket and note the OS and numpy version. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From brennan.williams at visualreservoir.com Thu May 7 20:54:54 2009 From: brennan.williams at visualreservoir.com (Brennan Williams) Date: Fri, 08 May 2009 12:54:54 +1200 Subject: [Numpy-discussion] replacing Nan's in a string array converted from a float array In-Reply-To: References: <4A037069.2030204@visualreservoir.com> Message-ID: <4A0382DE.7090202@visualreservoir.com> Charles R Harris wrote: > > > On Thu, May 7, 2009 at 5:36 PM, Brennan Williams > > wrote: > > I've created an array of strings using something like.... > > stringarray=self.karray.astype("|S8") > > If the array value is a Nan I get "1.#QNAN" in my string array. > > For cosmetic reasons I'd like to change this to something else, e.g. > "invalid" or "inactive". > > My string array can be up to 100,000+ values. > > Is there a fast way to do this? > > > I think this is a bug. Making the printing of nans uniform was one of > the goals of numpy 1.3, although a few bits were unfixable. However, > this looks fixable. If you are using 1.3 please open a ticket and note > the OS and numpy version. > ok looks like numpy 1.3.0rc1 on winxp > Chuck > > > ------------------------------------------------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From ia at jscc.ru Fri May 8 03:45:51 2009 From: ia at jscc.ru (Ilya A. Kozyreff) Date: Fri, 8 May 2009 11:45:51 +0400 Subject: [Numpy-discussion] Installation NumPy v.1.2.1 on Linux Message-ID: <007701c9cfb1$05cde370$1169aa50$@ru> From: Ilya A. Kozyreff [mailto:ia at jscc.ru] Sent: Friday, May 08, 2009 11:16 AM To: 'numpy-discussion at scipy.org' Subject: Installation NumPy v.1.2.1 on Linux Hi, all! I try to install NumPy v.1.2.1 on Red Hat Linux as user like this: $ python setup.py install --prefix=/nethome/ia/usr/ $ python -c 'import numpy; numpy.test()' Traceback (most recent call last): File "", line 1, in ? ImportError: No module named numpy What I do wrong? Best regards, Ilya Kozyrev -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Fri May 8 03:58:12 2009 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 8 May 2009 07:58:12 +0000 (UTC) Subject: [Numpy-discussion] Installation NumPy v.1.2.1 on Linux References: <007701c9cfb1$05cde370$1169aa50$@ru> Message-ID: Fri, 08 May 2009 11:45:51 +0400, Ilya A. Kozyreff kirjoitti: > $ python setup.py install --prefix=/nethome/ia/usr/ > > $ python -c 'import numpy; numpy.test()' [clip] > ImportError: No module named numpy [clip] > What I do wrong? http://docs.python.org/install/index.html#modifying-python-s-search-path From stefan at sun.ac.za Fri May 8 05:37:33 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Fri, 8 May 2009 11:37:33 +0200 Subject: [Numpy-discussion] David does C code coverage! Message-ID: <9457e7c80905080237k561f7d9dj46fbbb664521f546@mail.gmail.com> Hi all, David Cournapeau got gcov working with NumPy! Well done, David! http://cournape.wordpress.com/2009/05/08/first-steps-toward-c-code-coverage-in-numpy/ Regards St?fan From malkarouri at yahoo.co.uk Fri May 8 09:33:46 2009 From: malkarouri at yahoo.co.uk (Muhammad Alkarouri) Date: Fri, 8 May 2009 13:33:46 +0000 (GMT) Subject: [Numpy-discussion] linalg.svd not working? In-Reply-To: <881804.58063.qm@web24202.mail.ird.yahoo.com> Message-ID: <615547.10687.qm@web24205.mail.ird.yahoo.com> Replying to myself, just that the experience may benefit a later user. --- On Wed, 6/5/09, Muhammad Alkarouri wrote: > From: Muhammad Alkarouri > Subject: Re: [Numpy-discussion] linalg.svd not working? ... > It is an atlas problem. Not that I knew how to correct it, > but I was able to build numpy with a standard package blas > and lapack, and the tests passed without incident. After installing numpy using the standard blas/lapack on red hat enterprise 4, the tests succeeded. Scipy was then compiled and installed, but it failed its test. Specifically, some tests failed with "undefined symbolr: srotmg_". This means that the standard blas library is not complete according to past emails, so I had to go back and try installing Atlas. It turns out that for the compilers I am using (gcc, g77 3.4.4) lapack and atlas must be compiled with the option "-ffloat-store" (https://bugzilla.redhat.com/show_bug.cgi?id=138683) so after doing that, recompiling and installing numpy, numpy tests succeeded (again). The last problem was when installing scipy, I had a series of errors "NoneType object is not callable", and errors with calc_lwork. Turns out they are from a library flinalg. The solution is to compile scipy with the option: python setup.py build_ext -DUNDERSCORE_G77. Probably the reason for the last group of problems would be that scipy used gfortran in some place and that the correct action was to force the g77 compiler, but I didn't check. Anyway, the numpy and scipy tests succeeded after that. For all this installation I used CC='gcc -m32 -ffloat-store' and F77FLAGS='-m32' to get 32 bit installation on an x86_64 machine. Changing most of the other flags will mess up numpy and/or scipy installation. Many thanks, Muhammad Alkarouri From dagss at student.matnat.uio.no Fri May 8 12:52:33 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 8 May 2009 18:52:33 +0200 (CEST) Subject: [Numpy-discussion] efficient 3d histogram creation In-Reply-To: References: <7f014ea60905031715h635a69faof6a06e10c3621ba7@mail.gmail.com> <1cd32cbb0905041318g11dea0b9oa61f08bcc380144d@mail.gmail.com> <91cf711d0905050646q7652eepc55d1aed17b1d1ed@mail.gmail.com> <7f014ea60905061506o7253a315p940d6cbd2cc2b420@mail.gmail.com> <1cd32cbb0905061630t1e73e8a5i4bd454f789e22714@mail.gmail.com> <7f014ea60905061639k21dd12b5j4ee6c2738758284@mail.gmail.com> <1cd32cbb0905061721y233aa4c7uf580ad1d99539b1c@mail.gmail.com> <7f014ea60905061734y3dfcb299w14681ff0020a4de1@mail.gmail.com> <9457e7c80905070411x6d51c768g20f4dd0990b843a8@mail.gmail.com> <4A02C6B3.3030404@student.matnat.uio.no> Message-ID: <854d01dbb4ab7870839fda52c15e9e0b.squirrel@webmail.uio.no> Charles R Harris wrote: > Hi Dag, > > Numpy can now do separate compilations with controlled export of symbols > when the object files are linked together to make a module. Does Cython > have > anyway of controlling the visibility of symbols or should we just include > the right files in Numpy to get the needed macros? I'll try an answer but in general it's better to ask these kind of questions on the Cython list; I'm not an expert on this part of Cython. If you refer to functions you create in Cython, they are static by default and not exported. If you declare them "public" then they are not made static. Finally, if you declare them "api" then they will be made static but a symbol table for the module in which they can be look up is exported (as a Python variable in the module; __pyx_c_api). Dag Sverre From sccolbert at gmail.com Fri May 8 14:26:42 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Fri, 8 May 2009 14:26:42 -0400 Subject: [Numpy-discussion] David does C code coverage! In-Reply-To: <9457e7c80905080237k561f7d9dj46fbbb664521f546@mail.gmail.com> References: <9457e7c80905080237k561f7d9dj46fbbb664521f546@mail.gmail.com> Message-ID: <7f014ea60905081126w23b15e49k3015d7f3beb2d4b0@mail.gmail.com> Now man up and buy him his beer! 2009/5/8 St?fan van der Walt > Hi all, > > David Cournapeau got gcov working with NumPy! Well done, David! > > > http://cournape.wordpress.com/2009/05/08/first-steps-toward-c-code-coverage-in-numpy/ > > Regards > St?fan > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Fri May 8 14:49:28 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Fri, 8 May 2009 20:49:28 +0200 Subject: [Numpy-discussion] David does C code coverage! In-Reply-To: <7f014ea60905081126w23b15e49k3015d7f3beb2d4b0@mail.gmail.com> References: <9457e7c80905080237k561f7d9dj46fbbb664521f546@mail.gmail.com> <7f014ea60905081126w23b15e49k3015d7f3beb2d4b0@mail.gmail.com> Message-ID: <9457e7c80905081149m21ecd35ax26d0dcb36e7bd2e0@mail.gmail.com> 2009/5/8 Chris Colbert : > Now man up and buy him his beer! All debt will be repaid at SciPy09! The tally is currently: Joe Harrington : 1 ice-cream I lost a bet on the number of people who would sign up to document NumPy. My bet: 30. Current count: approaching 100. David Cournapeau: 1 beer Implemented C code coverage, which is something I've wanted for a very long time. I am so happy I lost these bets/challenges, and I am willing to lose more :-) This also makes me wonder it would be worth keeping a page with SciPy Bounties? Enjoy your weekend! St?fan From pav at iki.fi Fri May 8 15:05:13 2009 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 8 May 2009 19:05:13 +0000 (UTC) Subject: [Numpy-discussion] Numpy Trac site redirecting in a loop? References: <49E64FF7.3050804@jhu.edu> <49E6658A.1090004@jhu.edu> <23417366.post@talk.nabble.com> <3d375d730905061624k6376e57brfb19d72344b99ec2@mail.gmail.com> <23417595.post@talk.nabble.com> Message-ID: Wed, 06 May 2009 16:35:23 -0700, Thomas Robitaille wrote: > Could it be linked to specific users, since the problem occurs when > loading the account page? I had the same problem on two different > computers with two different browsers. Looks like this bug in TracAccountManager: http://trac-hacks.org/ticket/3233 I applied the patch from the ticket; I think password resets should work now, so you can try using your old accounts again. -- Pauli Virtanen From thomas.robitaille at gmail.com Fri May 8 18:42:07 2009 From: thomas.robitaille at gmail.com (Thomas Robitaille) Date: Fri, 8 May 2009 15:42:07 -0700 (PDT) Subject: [Numpy-discussion] Numpy Trac site redirecting in a loop? In-Reply-To: References: <49E64FF7.3050804@jhu.edu> <49E6658A.1090004@jhu.edu> <23417366.post@talk.nabble.com> <3d375d730905061624k6376e57brfb19d72344b99ec2@mail.gmail.com> <23417595.post@talk.nabble.com> Message-ID: <23454826.post@talk.nabble.com> Pauli Virtanen-3 wrote: > > I applied the patch from the ticket; I think password resets should work > now, so you can try using your old accounts again. > That worked, thanks! Now I think of it, the problem started occurring after I had forgotten my password and had to reset it. Thomas -- View this message in context: http://www.nabble.com/Numpy-Trac-site-redirecting-in-a-loop--tp23067410p23454826.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From ebressert at cfa.harvard.edu Sat May 9 17:22:18 2009 From: ebressert at cfa.harvard.edu (Eli Bressert) Date: Sat, 9 May 2009 17:22:18 -0400 Subject: [Numpy-discussion] Are masked arrays slower for processing than ndarrays? Message-ID: Hi, I'm using masked arrays to compute large-scale standard deviation, multiplication, gaussian, and weighted averages. At first I thought using the masked arrays would be a great way to sidestep looping (which it is), but it's still slower than expected. Here's a snippet of the code that I'm using it for. # Computing nearest neighbor distances. # Output will be about 270,000 rows long for the index # and 270,000x50 for the dist array. tree = ann.kd_tree(np.column_stack([l,b])) index, dist = tree.search(np.column_stack([l,b]),k=nth) # Clipping bad values by replacing them acceptable values av[np.where(av<=-10)] = -10 av[np.where(av>=50)] = 50 # Distance clipping and creating mask dist_arcsec = np.sqrt(dist)*3600 mask = dist_arcsec <= d_thresh # Creating masked array av_good = ma.array(av[index],mask=mask) dist_good = ma.array(dist_arcsec,mask=mask) # Reason why I'm using masked arrays. If these were # ndarrays with nan's, then the output would be nan. Std = np.array(np.std(av_good,axis=1)) Var = Std*Std Rho = np.zeros( (len(av), nth) ) Rho2? = np.zeros( (len(av), nth) ) dist_std = np.std(dist_good,axis=1) for j in range(nth): ??? Rho[:,j] = dist_std ??? Rho2[:,j] = Var # This part takes about 20 seconds to compute for a 270,000x50 masked array. # Using ndarrays of the same size takes about 2 second spatial_weight = 1.0 / (Rho*np.sqrt(2*np.pi)) * np.exp( - dist_good / (2*Rho**2)) # Like the spatial_weight section, this takes about 20 seconds W = spatial_weight / Rho2 # Takes less than one second. Ave = np.average(av_good,axis=1,weights=W) Any ideas on why it would take such a long time for processing? Especially the spatial_weight and W variables? Would there be a faster way to do this? Or is there a way that numpy.std can process ignore nan's when processing? Thanks, Eli Bressert From efiring at hawaii.edu Sat May 9 18:01:31 2009 From: efiring at hawaii.edu (Eric Firing) Date: Sat, 09 May 2009 12:01:31 -1000 Subject: [Numpy-discussion] Are masked arrays slower for processing than ndarrays? In-Reply-To: References: Message-ID: <4A05FD3B.8010205@hawaii.edu> Eli Bressert wrote: > Hi, > > I'm using masked arrays to compute large-scale standard deviation, > multiplication, gaussian, and weighted averages. At first I thought > using the masked arrays would be a great way to sidestep looping > (which it is), but it's still slower than expected. Here's a snippet > of the code that I'm using it for. > > # Computing nearest neighbor distances. > # Output will be about 270,000 rows long for the index > # and 270,000x50 for the dist array. > tree = ann.kd_tree(np.column_stack([l,b])) > index, dist = tree.search(np.column_stack([l,b]),k=nth) > > # Clipping bad values by replacing them acceptable values > av[np.where(av<=-10)] = -10 > av[np.where(av>=50)] = 50 > > # Distance clipping and creating mask > dist_arcsec = np.sqrt(dist)*3600 > mask = dist_arcsec <= d_thresh > > # Creating masked array > av_good = ma.array(av[index],mask=mask) > dist_good = ma.array(dist_arcsec,mask=mask) > > # Reason why I'm using masked arrays. If these were > # ndarrays with nan's, then the output would be nan. > Std = np.array(np.std(av_good,axis=1)) > Var = Std*Std > > Rho = np.zeros( (len(av), nth) ) > Rho2 = np.zeros( (len(av), nth) ) > > dist_std = np.std(dist_good,axis=1) > > for j in range(nth): > Rho[:,j] = dist_std > Rho2[:,j] = Var > > # This part takes about 20 seconds to compute for a 270,000x50 masked array. > # Using ndarrays of the same size takes about 2 second > spatial_weight = 1.0 / (Rho*np.sqrt(2*np.pi)) * np.exp( - dist_good / > (2*Rho**2)) > > # Like the spatial_weight section, this takes about 20 seconds > W = spatial_weight / Rho2 The short answer to your subject line is "yes". A simple illustration of division: In [11]:x = np.ones((270000,50), float) In [12]:y = np.ones((270000,50), float) In [13]:timeit x/y 10 loops, best of 3: 199 ms per loop In [14]:x = np.ma.ones((270000,50), float) In [15]:y = np.ma.ones((270000,50), float) In [16]:x[1,1] = np.ma.masked In [17]:y[1,2] = np.ma.masked In [18]:timeit x/y 10 loops, best of 3: 2.45 s per loop So it is slower by more than a factor of 10. That's much worse than I expected for division (and multiplication is similar). It makes me suspect there is might be a simple way to improve it greatly, but I haven't looked. > > # Takes less than one second. > Ave = np.average(av_good,axis=1,weights=W) > > Any ideas on why it would take such a long time for processing? > Especially the spatial_weight and W variables? Would there be a faster > way to do this? Or is there a way that numpy.std can process ignore > nan's when processing? There is a numpy.nansum; and see the following thread: http://www.mail-archive.com/numpy-discussion at scipy.org/msg09407.html Eric > > Thanks, > > Eli Bressert > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From efiring at hawaii.edu Sat May 9 20:06:30 2009 From: efiring at hawaii.edu (Eric Firing) Date: Sat, 09 May 2009 14:06:30 -1000 Subject: [Numpy-discussion] Are masked arrays slower for processing than ndarrays? In-Reply-To: References: Message-ID: <4A061A86.5020207@hawaii.edu> Eli Bressert wrote: > Hi, > > I'm using masked arrays to compute large-scale standard deviation, > multiplication, gaussian, and weighted averages. At first I thought > using the masked arrays would be a great way to sidestep looping > (which it is), but it's still slower than expected. Here's a snippet > of the code that I'm using it for. [...] > # Like the spatial_weight section, this takes about 20 seconds > W = spatial_weight / Rho2 > > # Takes less than one second. > Ave = np.average(av_good,axis=1,weights=W) > > Any ideas on why it would take such a long time for processing? A part of the slowdown is what looks to me like unnecessary copying in _MaskedBinaryOperation.__call__. It is using getdata, which applies numpy.array to its input, forcing a copy. I think the copy is actually unintentional, in at least one sense, and possibly two: first, because the default argument of getattr is always evaluated, even if it is not needed; and second, because the call to np.array is used where np.asarray or equivalent would suffice. The first file attached below shows the kernprof in the case of multiplying two masked arrays, shape (100000,50), with no masked elements; 2/3 of the time is taken copying the data. Now, if there are actually masked elements in the arrays, it gets much worse: see the second attachment. The total time has increased by more than a factor of 3, and the culprit is numpy.which(), a very slow function. It looks to me like it is doing nothing useful at all; the numpy binary operation is still being executed for all elements, regardless of mask, contrary to the intention implied by the comment in the code. The third attached file has a patch that fixes the getdata problem and eliminates the which(). With this patch applied we get the profile in the 4th file, to be compared to the second profile. Much better. I am pretty sure it could still be sped up quite a bit, though. It looks like the masks are essentially being calculated twice for no good reason, but I don't completely understand all the mask considerations, so at this point I am not trying to fix that problem. Eric > Especially the spatial_weight and W variables? Would there be a faster > way to do this? Or is there a way that numpy.std can process ignore > nan's when processing? > > Thanks, > > Eli Bressert > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: prof1.txt URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: prof2.txt URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: macore.diff Type: text/x-patch Size: 2285 bytes Desc: not available URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: prof3.txt URL: From efiring at hawaii.edu Sat May 9 20:17:55 2009 From: efiring at hawaii.edu (Eric Firing) Date: Sat, 09 May 2009 14:17:55 -1000 Subject: [Numpy-discussion] Are masked arrays slower for processing than ndarrays? In-Reply-To: <4A061A86.5020207@hawaii.edu> References: <4A061A86.5020207@hawaii.edu> Message-ID: <4A061D33.7080704@hawaii.edu> Eric Firing wrote: Pierre, ... I pressed "send" too soon. There are test failures with the patch I attached to my last message. I think the basic ideas are correct, but evidently there are wrinkles to be worked out. Maybe putmask() has to be used instead of where() (putmask is much faster) to maintain the ability to do *= and similar, and maybe there are other adjustments. Somehow, though, it should be possible to get decent speed for simple multiplication and division; a 10x penalty relative to ndarray operations is just too much. Eric > Eli Bressert wrote: >> Hi, >> >> I'm using masked arrays to compute large-scale standard deviation, >> multiplication, gaussian, and weighted averages. At first I thought >> using the masked arrays would be a great way to sidestep looping >> (which it is), but it's still slower than expected. Here's a snippet >> of the code that I'm using it for. > [...] >> # Like the spatial_weight section, this takes about 20 seconds >> W = spatial_weight / Rho2 >> >> # Takes less than one second. >> Ave = np.average(av_good,axis=1,weights=W) >> >> Any ideas on why it would take such a long time for processing? > > A part of the slowdown is what looks to me like unnecessary copying in > _MaskedBinaryOperation.__call__. It is using getdata, which applies > numpy.array to its input, forcing a copy. I think the copy is actually > unintentional, in at least one sense, and possibly two: first, because > the default argument of getattr is always evaluated, even if it is not > needed; and second, because the call to np.array is used where > np.asarray or equivalent would suffice. > > The first file attached below shows the kernprof in the case of > multiplying two masked arrays, shape (100000,50), with no masked > elements; 2/3 of the time is taken copying the data. > > Now, if there are actually masked elements in the arrays, it gets much > worse: see the second attachment. The total time has increased by more > than a factor of 3, and the culprit is numpy.which(), a very slow > function. It looks to me like it is doing nothing useful at all; the > numpy binary operation is still being executed for all elements, > regardless of mask, contrary to the intention implied by the comment in > the code. > > The third attached file has a patch that fixes the getdata problem and > eliminates the which(). > With this patch applied we get the profile in the 4th file, to be > compared to the second profile. Much better. I am pretty sure it could > still be sped up quite a bit, though. It looks like the masks are > essentially being calculated twice for no good reason, but I don't > completely understand all the mask considerations, so at this point I am > not trying to fix that problem. > > Eric > > >> Especially the spatial_weight and W variables? Would there be a faster >> way to do this? Or is there a way that numpy.std can process ignore >> nan's when processing? >> >> Thanks, >> >> Eli Bressert >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > ------------------------------------------------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From pgmdevlist at gmail.com Sat May 9 20:18:49 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Sat, 9 May 2009 20:18:49 -0400 Subject: [Numpy-discussion] Are masked arrays slower for processing than ndarrays? In-Reply-To: References: Message-ID: Short answer to the subject: Oh yes. Basically, MaskedArrays in its current implementation is more of a convenience class than anything. Most of the functions manipulating masked arrays create a lot of temporaries. When performance is needed, I must advise you to work directly on the data and the mask. For example, let's examine the division of 2 MaskedArrays a & b. * We take the 2 ndarrays of data (da and db) and the 2 ndarrays of mask (ma and mb) * we create a new array for db using np.where, putting 1 where db==0 and keeping db otherwise (if we were not doing that, we would get some NaNs down the road) * we create a new mask by combining ma and mb * we create the result array using np.where, using da where m is True, da/db otherwise (if we were not doing that, we would be processing the masked data and we may not want that) * Then, we add the mask to the result array. I suspect that the np.where functions are sub-optimal, and there might be a smarter way to achieve the same result while keeping all the functionalities (no NaNs (even masked) in the result, data kept when it should). I agree that these functionalities might be a bit overkill in simpler cases, such as yours. You may then want to use something like >>> ma.masked_array(a.data/b.data, mask=(a.mask | b.mask | (b.data==0)) Using Eric's example, I have 229ms/loop when dividing 2 ndarrays, 2.83s/loop when dividing 2 masked arrays, and down to 493ms/loop when using the quick-and-dirty function above). So anyway, you'll still be slower using MA than ndarrays, but not as slow... On May 9, 2009, at 5:22 PM, Eli Bressert wrote: > Hi, > > I'm using masked arrays to compute large-scale standard deviation, > multiplication, gaussian, and weighted averages. At first I thought > using the masked arrays would be a great way to sidestep looping > (which it is), but it's still slower than expected. Here's a snippet > of the code that I'm using it for. > > # Computing nearest neighbor distances. > # Output will be about 270,000 rows long for the index > # and 270,000x50 for the dist array. > tree = ann.kd_tree(np.column_stack([l,b])) > index, dist = tree.search(np.column_stack([l,b]),k=nth) > > # Clipping bad values by replacing them acceptable values > av[np.where(av<=-10)] = -10 > av[np.where(av>=50)] = 50 > > # Distance clipping and creating mask > dist_arcsec = np.sqrt(dist)*3600 > mask = dist_arcsec <= d_thresh > > # Creating masked array > av_good = ma.array(av[index],mask=mask) > dist_good = ma.array(dist_arcsec,mask=mask) > > # Reason why I'm using masked arrays. If these were > # ndarrays with nan's, then the output would be nan. > Std = np.array(np.std(av_good,axis=1)) > Var = Std*Std > > Rho = np.zeros( (len(av), nth) ) > Rho2 = np.zeros( (len(av), nth) ) > > dist_std = np.std(dist_good,axis=1) > > for j in range(nth): > Rho[:,j] = dist_std > Rho2[:,j] = Var > > # This part takes about 20 seconds to compute for a 270,000x50 > masked array. > # Using ndarrays of the same size takes about 2 second > spatial_weight = 1.0 / (Rho*np.sqrt(2*np.pi)) * np.exp( - dist_good / > (2*Rho**2)) > > # Like the spatial_weight section, this takes about 20 seconds > W = spatial_weight / Rho2 > > # Takes less than one second. > Ave = np.average(av_good,axis=1,weights=W) > > Any ideas on why it would take such a long time for processing? > Especially the spatial_weight and W variables? Would there be a faster > way to do this? Or is there a way that numpy.std can process ignore > nan's when processing? > > Thanks, > > Eli Bressert > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From pgmdevlist at gmail.com Sat May 9 20:37:40 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Sat, 9 May 2009 20:37:40 -0400 Subject: [Numpy-discussion] Are masked arrays slower for processing than ndarrays? In-Reply-To: <4A061D33.7080704@hawaii.edu> References: <4A061A86.5020207@hawaii.edu> <4A061D33.7080704@hawaii.edu> Message-ID: <268F602E-4DF0-4F5F-910F-D8C797A62040@gmail.com> On May 9, 2009, at 8:17 PM, Eric Firing wrote: > Eric Firing wrote: > > A part of the slowdown is what looks to me like unnecessary copying > in _MaskedBinaryOperation.__call__. It is using getdata, which > applies numpy.array to its input, forcing a copy. I think the copy > is actually unintentional, in at least one sense, and possibly two: > first, because the default argument of getattr is always evaluated, > even if it is not needed; and second, because the call to np.array > is used where np.asarray or equivalent would suffice. Yep, good call. the try/except should be better, and yes, I forgot to force copy=False (thought it was on by default...). I didn't know that getattr always evaluated the default, the docs are scarce on that subject... > Pierre, > > ... I pressed "send" too soon. There are test failures with the > patch I attached to my last message. I think the basic ideas are > correct, but evidently there are wrinkles to be worked out. Maybe > putmask() has to be used instead of where() (putmask is much faster) > to maintain the ability to do *= and similar, and maybe there are > other adjustments. Somehow, though, it should be possible to get > decent speed for simple multiplication and division; a 10x penalty > relative to ndarray operations is just too much. Quite agreed. It was a shock to realize that we were that slow. I gonna have to start testing w/ large arrays... I'm confident we can significantly speed up the _MaskedOperations without losing any of the features. Yes, putmask may be a better option. We could probably use the following MO: * result = a.data/b.data * putmask(result, m, a) However, I gonna need a good couple of weeks before being able to really look into it... From cournape at gmail.com Sun May 10 01:45:54 2009 From: cournape at gmail.com (David Cournapeau) Date: Sun, 10 May 2009 14:45:54 +0900 Subject: [Numpy-discussion] Detecting C API mismatch (was Managing Python with NumPy and many external libraries on multiple Windows machines) Message-ID: <5b8d13220905092245j3cb89b77wa7794ab138678937@mail.gmail.com> Hi, I worked on some code to detect C API mismatches both for developers and for users: http://github.com/cournape/numpy/tree/runtime_feature It adds the following: - if a numpy or ufunc function is added in the C API without the NPY_FEATURE_VERSION to be updated, a warning is generated at built time (the warning is turned into an exception for release) - I added a function PyArray_GetNDArrayCFeatureVersion which returns the C API version, and the version is checked in import_array. If the compile-time version > import-time version, an import error is raised, so the following happens (assuming the ABI is not changed). So we keep backward compatibility (building an extension with say numpy 1.2.1 will still work after installing numpy 1.3), and forward incompatibility is detected (building an extension with numpy 1.3.0 and importing it with installed numpy 1.2.1 will fail). Ironically, adding the function means that we have to add one function to the C API, so this will not be useful for numpy < 1.4, but I don't think it is possible to do it without modifying the C API. cheers, David From cournape at gmail.com Sun May 10 02:55:30 2009 From: cournape at gmail.com (David Cournapeau) Date: Sun, 10 May 2009 15:55:30 +0900 Subject: [Numpy-discussion] OS-X binary name... In-Reply-To: <4A01284E.6070104@noaa.gov> References: <4A01284E.6070104@noaa.gov> Message-ID: <5b8d13220905092355v40875665y89eb0372f4692a8d@mail.gmail.com> On Wed, May 6, 2009 at 3:03 PM, Christopher Barker wrote: > Hi all, > > The binary for OS-X on sourceforge is called: > > numpy-1.3.0-py2.5-macosx10.5.dmg > > However, as far as I can tell, it works just fine on OS-X 10.4, and > maybe even 10.3.9. I have to confess I don't understand mac os x backward compatibility story. Are you sure they are compatible ? Or is it just a happy accident ? > > Perhaps a re-naming is in order? But to what? > > I'd say: > > numpy-1.3.0-py2.5-macosx10.4.dmg > > but would folks think that it's only for 10.4? > > maybe: > > numpy-1.3.0-py2.5-macosx-python.org.dmg > > to indicate that it's for the python.org build of python2.5, though I'v > never seen anyone use that convention. At that point, we could just drop macosx altogether I think. I changed the name convention for scipy build scripts. cheers, David From stefan at sun.ac.za Sun May 10 08:15:17 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sun, 10 May 2009 14:15:17 +0200 Subject: [Numpy-discussion] Detecting C API mismatch (was Managing Python with NumPy and many external libraries on multiple Windows machines) In-Reply-To: <5b8d13220905092245j3cb89b77wa7794ab138678937@mail.gmail.com> References: <5b8d13220905092245j3cb89b77wa7794ab138678937@mail.gmail.com> Message-ID: <9457e7c80905100515t1f1b7a4ie97360f6f070a053@mail.gmail.com> 2009/5/10 David Cournapeau : > I worked on some code to detect C API mismatches both for developers > and for users: > > http://github.com/cournape/numpy/tree/runtime_feature Great, thanks for taking care of this! I think the message "ABI version %%x of C-API" is unclear, maybe simply use "ABI version %%x" on its own. The hash file can be loaded in one line with np.loadtxt('/tmp/dat.dat', usecols=(0, 2), dtype=[('api', 'S10'), ('hash', 'S32')]) The rest looks good. Cheers St?fan From charlesr.harris at gmail.com Sun May 10 12:13:32 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 10 May 2009 10:13:32 -0600 Subject: [Numpy-discussion] Detecting C API mismatch (was Managing Python with NumPy and many external libraries on multiple Windows machines) In-Reply-To: <5b8d13220905092245j3cb89b77wa7794ab138678937@mail.gmail.com> References: <5b8d13220905092245j3cb89b77wa7794ab138678937@mail.gmail.com> Message-ID: On Sat, May 9, 2009 at 11:45 PM, David Cournapeau wrote: > Hi, > > I worked on some code to detect C API mismatches both for developers > and for users: > > http://github.com/cournape/numpy/tree/runtime_feature > > It adds the following: > - if a numpy or ufunc function is added in the C API without the > NPY_FEATURE_VERSION to be updated, a warning is generated at built > time (the warning is turned into an exception for release) > - I added a function PyArray_GetNDArrayCFeatureVersion which returns > the C API version, and the version is checked in import_array. If the > compile-time version > import-time version, an import error is raised, > so the following happens (assuming the ABI is not changed). > > So we keep backward compatibility (building an extension with say > numpy 1.2.1 will still work after installing numpy 1.3), and forward > incompatibility is detected (building an extension with numpy 1.3.0 > and importing it with installed numpy 1.2.1 will fail). > > Ironically, adding the function means that we have to add one function > to the C API, so this will not be useful for numpy < 1.4, but I don't > think it is possible to do it without modifying the C API. > Why not just use the current API to get the number? That looks easy to do, what is the problem? I'll fix it up if you want. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun May 10 12:55:46 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 10 May 2009 10:55:46 -0600 Subject: [Numpy-discussion] Detecting C API mismatch (was Managing Python with NumPy and many external libraries on multiple Windows machines) In-Reply-To: References: <5b8d13220905092245j3cb89b77wa7794ab138678937@mail.gmail.com> Message-ID: On Sun, May 10, 2009 at 10:13 AM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Sat, May 9, 2009 at 11:45 PM, David Cournapeau wrote: > >> Hi, >> >> I worked on some code to detect C API mismatches both for developers >> and for users: >> >> http://github.com/cournape/numpy/tree/runtime_feature >> >> It adds the following: >> - if a numpy or ufunc function is added in the C API without the >> NPY_FEATURE_VERSION to be updated, a warning is generated at built >> time (the warning is turned into an exception for release) >> - I added a function PyArray_GetNDArrayCFeatureVersion which returns >> the C API version, and the version is checked in import_array. If the >> compile-time version > import-time version, an import error is raised, >> so the following happens (assuming the ABI is not changed). >> >> So we keep backward compatibility (building an extension with say >> numpy 1.2.1 will still work after installing numpy 1.3), and forward >> incompatibility is detected (building an extension with numpy 1.3.0 >> and importing it with installed numpy 1.2.1 will fail). >> >> Ironically, adding the function means that we have to add one function >> to the C API, so this will not be useful for numpy < 1.4, but I don't >> think it is possible to do it without modifying the C API. >> > > Why not just use the current API to get the number? That looks easy to do, > what is the problem? I'll fix it up if you want. > As you may have noticed, I really, really, don't like adding functions to the API ;) Especially unneeded ones or ones that could be done at the python level. So I think the thing to do here is split the version into two 16 bit parts, then start with API version 0x000A and ABI version 0x0100 with the version number being 0x01000009. There is nothing sacred about the numbers as long as they remain ordered. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From rgr001 at sbcglobal.net Sun May 10 13:08:57 2009 From: rgr001 at sbcglobal.net (Robert Radocinski) Date: Sun, 10 May 2009 10:08:57 -0700 Subject: [Numpy-discussion] Subcripts for arrays of nested structures Message-ID: <4A070A29.6020705@sbcglobal.net> In order to illustrate my question which involves two related subscript expressions, on arrays of nested structures, I have created a short code example which is given below. In the code example, I create a nested "structure" data type (called dtype3) and two numpy.ndarray's (called arecords and brecords) which are identical in shape and dtype. The remaining code consists of the replacement statement (lines 34 and 49) at the heart of my question and print statements. Specifically, my question centers on my expectation that the statement on line 34 (arecords["data"][1] = data_values) would have the same effect on arecords as the statement on line 49 (brecords[1]["data"]=data_values) would have on brecords. From running the code example, this is obviously not the case. In the former case, all four of the "data" arrays of the second record in arecords are set to the values found in data_values. In the latter case, only the first "data" array of the second record in brecords is set to a value in data_values and the other three "data" values remain unchanged. I am baffled by the latter case involving brecords. If you examine the left-hand side and righthand side of the statement on line 49 (brecords[1]["data"]=data_values), both sides have the same shape(1-dim with 4 elements) and dtype. I am having a great deal of difficulty trying to understand why the replacement statement only effects the first "data" array and not all 4 "data" arrays. What am I overlooking? Any help in explaining this behavior would be appreciated. Thanks, RR CODE EXAMPLE ---------------------------------------------------------------------------- import numpy type1 = numpy.dtype([("a", numpy.int32), ("b", numpy.int32)]) type2 = numpy.dtype([("alpha", numpy.float64), ("beta", numpy.float64), ("gamma", numpy.float64)]) type3 = numpy.dtype([("header", type1), ("data", type2, 4)]) header_values = numpy.empty(1, dtype=type1) data_values = numpy.empty(4, dtype=type2) header_values["a"] = 1000 header_values["b"] = 2000 data_values["alpha"] = [11.0, 21.0, 31.0, 41.0] data_values["beta"] = [12.0, 22.0, 32.0, 42.0] data_values["gamma"] = [13.0, 23.0, 33.0, 43.0] arecords = numpy.empty(2, dtype=type3) brecords = numpy.empty(2, dtype=type3) print print "Case A:" print print arecords, ': arecords:'.upper() print data_values, ': data_values'.upper() print arecords["data"][1].shape, ': arecords["data"][1].shape'.upper() print arecords[1]["data"].shape, ': arecords[1]["data"].shape'.upper() print data_values.shape, ': data_values.shape' print arecords["data"][1].dtype, ': arecords["data"][1].dtype'.upper() print arecords[1]["data"].dtype, ': arecords[1]["data"].dtype'.upper() print data_values.dtype, ': data_values.dtype'.upper() arecords["header"][1] = header_values arecords["data"][1] = data_values print arecords, ': arecords:'.upper() print print "Case B:" print print brecords, ': brecords:'.upper() print data_values, ': data_values'.upper() print brecords["data"][1].shape, ': brecords["data"][1].shape'.upper() print brecords[1]["data"].shape, ': brecords[1]["data"].shape'.upper() print data_values.shape, ': data_values.shape' print brecords["data"][1].dtype, ': brecords["data"][1].dtype'.upper() print brecords[1]["data"].dtype, ': brecords[1]["data"].dtype'.upper() print data_values.dtype, ': data_values.dtype'.upper() brecords[1]["header"] = header_values brecords[1]["data"] = data_values print brecords, ': brecords:'.upper() From david at ar.media.kyoto-u.ac.jp Sun May 10 22:30:26 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Mon, 11 May 2009 11:30:26 +0900 Subject: [Numpy-discussion] Detecting C API mismatch (was Managing Python with NumPy and many external libraries on multiple Windows machines) In-Reply-To: References: <5b8d13220905092245j3cb89b77wa7794ab138678937@mail.gmail.com> Message-ID: <4A078DC2.5060006@ar.media.kyoto-u.ac.jp> Charles R Harris wrote: > > As you may have noticed, I really, really, don't like adding functions > to the API ;) Me neither :) > Especially unneeded ones or ones that could be done at the python > level. So I think the thing to do here is split the version into two > 16 bit parts, then start with API version 0x000A and ABI version > 0x0100 with the version number being 0x01000009. I don't think you can do that: you will break a lot of code if you do so. The currently built extensions will fail to load if the number is any different with a new numpy. We need two independent numbers: one which should prevents loading if the number is any different (the ABI part) and one which should prevents loading if the compile-time number is strictly greater than the runtime one. As the ABI part is already checked in since at least numpy 1.2 and maybe lower, you can't change it easily. They could be part of the same underlying int if we did that from the beginning, but that's not the case. > There is nothing sacred about the numbers as long as they remain ordered. if you change the number NPY_VERSION, you break every single extension already built. With my scheme, you needs one more function, but you don't break backward compatibility. cheers, David From david at ar.media.kyoto-u.ac.jp Sun May 10 22:35:11 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Mon, 11 May 2009 11:35:11 +0900 Subject: [Numpy-discussion] Detecting C API mismatch (was Managing Python with NumPy and many external libraries on multiple Windows machines) In-Reply-To: <9457e7c80905100515t1f1b7a4ie97360f6f070a053@mail.gmail.com> References: <5b8d13220905092245j3cb89b77wa7794ab138678937@mail.gmail.com> <9457e7c80905100515t1f1b7a4ie97360f6f070a053@mail.gmail.com> Message-ID: <4A078EDF.8080804@ar.media.kyoto-u.ac.jp> St?fan van der Walt wrote: > 2009/5/10 David Cournapeau : > >> I worked on some code to detect C API mismatches both for developers >> and for users: >> >> http://github.com/cournape/numpy/tree/runtime_feature >> > > Great, thanks for taking care of this! > > I think the message "ABI version %%x of C-API" is unclear, maybe > simply use "ABI version %%x" on its own. > Ok, I changed it. > The hash file can be loaded in one line with > > np.loadtxt('/tmp/dat.dat', usecols=(0, 2), dtype=[('api', 'S10'), > ('hash', 'S32')]) > Well, we need to do this at build time, and we can't assume numpy is already installed when building numpy :) David From nwagner at iam.uni-stuttgart.de Mon May 11 06:28:20 2009 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Mon, 11 May 2009 12:28:20 +0200 Subject: [Numpy-discussion] List of arrays Message-ID: Hi all, How can I convert a list of arrays into one array ? Nils >>> data [array([ 40. , 285.6, 45. , 285.3, 50. , 285.1, 55. , 284.8]), array([ 60. , 284.5, 65. , 282.8, 70. , 281.1, 75. , 280. ]), array([ 80. , 278.8, 85. , 278.1, 90. , 277.4, 95. , 276.9]), array([ 100. , 276.3, 105. , 276.1, 110. , 275.9, 115. , 275.7]), array([ 120. , 275.5, 125. , 275.2, 130. , 274.8, 135. , 274.5]), array([ 140. , 274.1, 145. , 273.7, 150. , 273.2, 155. , 272.7]), array([ 160. , 272.2, 165. , 272.1, 170. , 272. , 175. , 271.8]), array([ 180. , 271.6, 185. , 271. , 190. , 270.3, 195. , 269.5]), array([ 200. , 268.5, 205. , 267.4, 210. , 266.1, 215. , 263.5]), array([ 220. , 260.1, 225. , 256.1, 230. , 249.9, 235. , 239.3]), array([ 238.7, 186.2, 240., 160. , 245. , 119.7, 250. , 111.3])] newdata=array([ 40. , 285.6, 45. , 285.3, 50. , 285.1, 55. , 284.8, 60. , 284.5, 65. , 282.8, ..., 111.3]) Nils From faltet at pytables.org Mon May 11 06:40:01 2009 From: faltet at pytables.org (Francesc Alted) Date: Mon, 11 May 2009 12:40:01 +0200 Subject: [Numpy-discussion] List of arrays In-Reply-To: References: Message-ID: <200905111240.01943.faltet@pytables.org> A Monday 11 May 2009, Nils Wagner escrigu?: > Hi all, > > How can I convert a list of arrays into one array ? > > Nils > > >>> data > > [array([ 40. , 285.6, 45. , 285.3, 50. , 285.1, > 55. , 284.8]), array([ 60. , 284.5, 65. , 282.8, > 70. , 281.1, 75. , 280. ]), array([ 80. , 278.8, > 85. , 278.1, 90. , 277.4, 95. , 276.9]), array([ > 100. , 276.3, 105. , 276.1, 110. , 275.9, 115. , > 275.7]), array([ 120. , 275.5, 125. , 275.2, 130. , > 274.8, 135. , 274.5]), array([ 140. , 274.1, 145. , > 273.7, 150. , 273.2, 155. , 272.7]), array([ 160. , > 272.2, 165. , 272.1, 170. , 272. , 175. , 271.8]), > array([ 180. , 271.6, 185. , 271. , 190. , 270.3, > 195. , 269.5]), array([ 200. , 268.5, 205. , 267.4, > 210. , 266.1, 215. , 263.5]), array([ 220. , 260.1, > 225. , 256.1, 230. , 249.9, 235. , 239.3]), array([ > 238.7, 186.2, 240., 160. , 245. , 119.7, 250. , > 111.3])] > > newdata=array([ 40. , 285.6, 45. , 285.3, 50. , > 285.1, 55. , 284.8, 60. , 284.5, 65. , 282.8, ..., > 111.3]) Try np.concatenate: In [9]: a = np.arange(10) In [10]: b = np.arange(10,20) In [11]: np.concatenate(l) Out[11]: array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]) Hope that helps, -- Francesc Alted "One would expect people to feel threatened by the 'giant brains or machines that think'. In fact, the frightening computer becomes less frightening if it is used only to simulate a familiar noncomputer." -- Edsger W. Dykstra "On the cruelty of really teaching computer science" From aisaac at american.edu Mon May 11 06:54:45 2009 From: aisaac at american.edu (Alan G Isaac) Date: Mon, 11 May 2009 06:54:45 -0400 Subject: [Numpy-discussion] List of arrays In-Reply-To: References: Message-ID: <4A0803F5.8010807@american.edu> On 5/11/2009 6:28 AM Nils Wagner apparently wrote: > How can I convert a list of arrays into one array ? Do you mean one long array, so that ``concatenate`` is appropriate, or a 2d array, in which case you can just use ``array``. But your example looks like you should preallocate the larger array and fill it as the data arrive, if that's possible. Alan Isaac From nwagner at iam.uni-stuttgart.de Mon May 11 07:00:12 2009 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Mon, 11 May 2009 13:00:12 +0200 Subject: [Numpy-discussion] List of arrays In-Reply-To: <4A0803F5.8010807@american.edu> References: <4A0803F5.8010807@american.edu> Message-ID: On Mon, 11 May 2009 06:54:45 -0400 Alan G Isaac wrote: > On 5/11/2009 6:28 AM Nils Wagner apparently wrote: >> How can I convert a list of arrays into one array ? > > Do you mean one long array, so that ``concatenate`` > is appropriate, or a 2d array, in which case you > can just use ``array``. > > But your example looks like you should preallocate the > larger array and fill it as the data arrive, > if that's possible. > > Alan Isaac > Hi Alan, concatenate works fine for me. The problem is that the arrays within the list vary in length. Thank you very much. Nils From nwagner at iam.uni-stuttgart.de Mon May 11 08:03:45 2009 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Mon, 11 May 2009 14:03:45 +0200 Subject: [Numpy-discussion] String manipulation Message-ID: Hi all, Please consider two strings >>> line_a '12345678abcdefgh12345678' >>> line_b '12345678 abcdefgh 12345678' >>> line_b.split() ['12345678', 'abcdefgh', '12345678'] Is it possible to split line_a such that the output is ['12345678', 'abcdefgh', '12345678'] Nils From nwagner at iam.uni-stuttgart.de Mon May 11 08:06:07 2009 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Mon, 11 May 2009 14:06:07 +0200 Subject: [Numpy-discussion] FAIL: Test bug in reduceat with structured arrays Message-ID: Hi all, Can someone reproduce the following failure ? I am using >>> numpy.__version__ '1.4.0.dev6983' ====================================================================== FAIL: Test bug in reduceat with structured arrays copied for speed. ---------------------------------------------------------------------- Traceback (most recent call last): File "/data/home/nwagner/local/lib/python2.5/site-packages/nose-0.10.4-py2.5.egg/nose/case.py", line 182, in runTest self.test(*self.arg) File "/data/home/nwagner/local/lib/python2.5/site-packages/numpy/core/tests/test_umath.py", line 664, in test_reduceat assert np.all(h1 == h2) AssertionError Nils From faltet at pytables.org Mon May 11 08:25:46 2009 From: faltet at pytables.org (Francesc Alted) Date: Mon, 11 May 2009 14:25:46 +0200 Subject: [Numpy-discussion] String manipulation In-Reply-To: References: Message-ID: <200905111425.46475.faltet@pytables.org> A Monday 11 May 2009, Nils Wagner escrigu?: > Hi all, > > Please consider two strings > > >>> line_a > > '12345678abcdefgh12345678' > > >>> line_b > > '12345678 abcdefgh 12345678' > > >>> line_b.split() > > ['12345678', 'abcdefgh', '12345678'] > > Is it possible to split line_a such that the output > is > > ['12345678', 'abcdefgh', '12345678'] Mmh, your question is a bit too generic. If what you want is to separate the strings made of digits and the ones made of letters, it is worth to use regular expressions: In [22]: re.split("(\d*)", line_a)[1:-1] Out[22]: ['12345678', 'abcdefgh', '12345678'] Although regular expressions seems a bit thought to learn, they will payoff your effort in many occasions. Cheers, -- Francesc Alted "One would expect people to feel threatened by the 'giant brains or machines that think'. In fact, the frightening computer becomes less frightening if it is used only to simulate a familiar noncomputer." -- Edsger W. Dykstra "On the cruelty of really teaching computer science" From faltet at pytables.org Mon May 11 08:28:47 2009 From: faltet at pytables.org (Francesc Alted) Date: Mon, 11 May 2009 14:28:47 +0200 Subject: [Numpy-discussion] String manipulation In-Reply-To: <200905111425.46475.faltet@pytables.org> References: <200905111425.46475.faltet@pytables.org> Message-ID: <200905111428.47887.faltet@pytables.org> A Monday 11 May 2009, Francesc Alted escrigu?: > Although regular expressions seems a bit thought to learn, they will ^^^^^^^ --> tough :-\ -- Francesc Alted "One would expect people to feel threatened by the 'giant brains or machines that think'. In fact, the frightening computer becomes less frightening if it is used only to simulate a familiar noncomputer." -- Edsger W. Dykstra "On the cruelty of really teaching computer science" From nwagner at iam.uni-stuttgart.de Mon May 11 08:36:17 2009 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Mon, 11 May 2009 14:36:17 +0200 Subject: [Numpy-discussion] String manipulation In-Reply-To: <200905111425.46475.faltet@pytables.org> References: <200905111425.46475.faltet@pytables.org> Message-ID: On Mon, 11 May 2009 14:25:46 +0200 Francesc Alted wrote: > A Monday 11 May 2009, Nils Wagner escrigu?: >> Hi all, >> >> Please consider two strings >> >> >>> line_a >> >> '12345678abcdefgh12345678' >> >> >>> line_b >> >> '12345678 abcdefgh 12345678' >> >> >>> line_b.split() >> >> ['12345678', 'abcdefgh', '12345678'] >> >> Is it possible to split line_a such that the output >> is >> >> ['12345678', 'abcdefgh', '12345678'] > > Mmh, your question is a bit too generic. Indeed. I would like to split strings made of digits after eight characters each. >>> line_a '111111.1222222.2333333.3' >>> line_b '111111.1 222222.2 333333.3' >>> line_b.split() ['111111.1', '222222.2', '333333.3'] How can I accomplish that ? Nils From seb.binet at gmail.com Mon May 11 09:03:02 2009 From: seb.binet at gmail.com (Sebastien Binet) Date: Mon, 11 May 2009 15:03:02 +0200 Subject: [Numpy-discussion] String manipulation Message-ID: <200905111503.02782.binet@cern.ch> On Monday 11 May 2009 14:36:17 Nils Wagner wrote: > On Mon, 11 May 2009 14:25:46 +0200 > > Francesc Alted wrote: > > A Monday 11 May 2009, Nils Wagner escrigu?: > >> Hi all, > >> > >> Please consider two strings > >> > >> >>> line_a > >> > >> '12345678abcdefgh12345678' > >> > >> >>> line_b > >> > >> '12345678 abcdefgh 12345678' > >> > >> >>> line_b.split() > >> > >> ['12345678', 'abcdefgh', '12345678'] > >> > >> Is it possible to split line_a such that the output > >> is > >> > >> ['12345678', 'abcdefgh', '12345678'] > > > > Mmh, your question is a bit too generic. > > Indeed. > I would like to split strings made of digits after eight > characters each. > > >>> line_a > > '111111.1222222.2333333.3' > > >>> line_b > > '111111.1 222222.2 333333.3' > > >>> line_b.split() > > ['111111.1', '222222.2', '333333.3'] > > How can I accomplish that ? would this suit you ? >>> np.asarray(line_b,dtype=[('hdr','|S8'),('mid','|S8'),('tail','|S8')]) array(('111111.1', '222222.2', '333333.3'), dtype=[('hdr', '|S8'), ('mid', '|S8'), ('tail', '|S8')]) hth, sebastien. -- ######################################### # Dr. Sebastien Binet # Laboratoire de l'Accelerateur Lineaire # Universite Paris-Sud XI # Batiment 200 # 91898 Orsay ######################################### From nwagner at iam.uni-stuttgart.de Mon May 11 09:53:39 2009 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Mon, 11 May 2009 15:53:39 +0200 Subject: [Numpy-discussion] String manipulation In-Reply-To: <200905111503.02782.binet@cern.ch> References: <200905111503.02782.binet@cern.ch> Message-ID: On Mon, 11 May 2009 15:03:02 +0200 Sebastien Binet wrote: > On Monday 11 May 2009 14:36:17 Nils Wagner wrote: >> On Mon, 11 May 2009 14:25:46 +0200 >> >> Francesc Alted wrote: >> > A Monday 11 May 2009, Nils Wagner escrigu?: >> >> Hi all, >> >> >> >> Please consider two strings >> >> >> >> >>> line_a >> >> >> >> '12345678abcdefgh12345678' >> >> >> >> >>> line_b >> >> >> >> '12345678 abcdefgh 12345678' >> >> >> >> >>> line_b.split() >> >> >> >> ['12345678', 'abcdefgh', '12345678'] >> >> >> >> Is it possible to split line_a such that the output >> >> is >> >> >> >> ['12345678', 'abcdefgh', '12345678'] >> > >> > Mmh, your question is a bit too generic. >> >> Indeed. >> I would like to split strings made of digits after eight >> characters each. >> >> >>> line_a >> >> '111111.1222222.2333333.3' >> >> >>> line_b >> >> '111111.1 222222.2 333333.3' >> >> >>> line_b.split() >> >> ['111111.1', '222222.2', '333333.3'] >> >> How can I accomplish that ? > would this suit you ? >>>> np.asarray(line_b,dtype=[('hdr','|S8'),('mid','|S8'),('tail','|S8')]) > array(('111111.1', '222222.2', '333333.3'), > dtype=[('hdr', '|S8'), ('mid', '|S8'), ('tail', >'|S8')]) > > hth, > sebastien. > -- > ######################################### > # Dr. Sebastien Binet > # Laboratoire de l'Accelerateur Lineaire > # Universite Paris-Sud XI > # Batiment 200 > # 91898 Orsay > ######################################### > > here is my workaround. from numpy import arange line_a = '111111.1222222.2333333.3' # without separator line_b = '111111.1 222222.2 333333.3' # including space as a delimiter div, mod = divmod(len(line_a),8) liste = [] for j in arange(0,div): liste.append(line_a[j*8:(j+1)*8]) print liste print line_b.split() # Works for line_b but not for line_a Cheers, Nils From pav at iki.fi Mon May 11 10:05:13 2009 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 11 May 2009 14:05:13 +0000 (UTC) Subject: [Numpy-discussion] FAIL: Test bug in reduceat with structured arrays References: Message-ID: Mon, 11 May 2009 14:06:07 +0200, Nils Wagner kirjoitti: > Can someone reproduce the following failure ? I am using >>>> numpy.__version__ > '1.4.0.dev6983' > > ====================================================================== > FAIL: Test bug in reduceat with structured arrays copied for speed. > ---------------------------------------------------------------------- [clip] Buildbot can't. I'd suggest removing your build/ directory and rebuilding, to see if it's caused by some file not rebuilding properly. Otherwise, what is the platform you are using? -- Pauli Virtanen From seb.binet at gmail.com Mon May 11 10:12:10 2009 From: seb.binet at gmail.com (Sebastien Binet) Date: Mon, 11 May 2009 16:12:10 +0200 Subject: [Numpy-discussion] String manipulation In-Reply-To: References: <200905111503.02782.binet@cern.ch> Message-ID: <200905111612.10884.binet@cern.ch> hi, > here is my workaround. > > from numpy import arange > line_a = '111111.1222222.2333333.3' # without > separator > line_b = '111111.1 222222.2 333333.3' # including space > as a delimiter > > div, mod = divmod(len(line_a),8) > liste = [] > for j in arange(0,div): > liste.append(line_a[j*8:(j+1)*8]) > > print liste > > > print line_b.split() # Works for line_b > but not for line_a how about this, then: import numpy as np def massage(data): fmt = np.dtype([('hdr', '|S8'), ('mid', '|S8'), ('tail','|S8')]) data = data.replace(' ','') assert len(data)==3*8, "contract failed or invalid assumption" return np.asarray(data,dtype=fmt).tolist() assert(massage(line_a) == massage(line_b)) cheers, sebastien. -- ######################################### # Dr. Sebastien Binet # Laboratoire de l'Accelerateur Lineaire # Universite Paris-Sud XI # Batiment 200 # 91898 Orsay ######################################### From nwagner at iam.uni-stuttgart.de Mon May 11 10:22:37 2009 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Mon, 11 May 2009 16:22:37 +0200 Subject: [Numpy-discussion] FAIL: Test bug in reduceat with structured arrays In-Reply-To: References: Message-ID: On Mon, 11 May 2009 14:05:13 +0000 (UTC) Pauli Virtanen wrote: > Mon, 11 May 2009 14:06:07 +0200, Nils Wagner kirjoitti: >> Can someone reproduce the following failure ? I am using >>>>> numpy.__version__ >> '1.4.0.dev6983' >> >> ====================================================================== >> FAIL: Test bug in reduceat with structured arrays copied >>for speed. >> ---------------------------------------------------------------------- > [clip] > > Buildbot can't. I'd suggest removing your build/ >directory and > rebuilding, to see if it's caused by some file not >rebuilding properly. > > Otherwise, what is the platform you are using? > > -- > Pauli Virtanen > Everytime I rebuild numpy I remove the build directory before. CentOS release 4.6 x86_64 Python 2.5.1 Nils From aisaac at american.edu Mon May 11 10:41:11 2009 From: aisaac at american.edu (Alan G Isaac) Date: Mon, 11 May 2009 10:41:11 -0400 Subject: [Numpy-discussion] String manipulation In-Reply-To: References: Message-ID: <4A083907.1060906@american.edu> On 5/11/2009 8:03 AM Nils Wagner apparently wrote: >>>> line_a > '12345678abcdefgh12345678' > Is it possible to split line_a such that the output > is > > ['12345678', 'abcdefgh', '12345678'] More of a comp.lang.python question, I think: out = list() for k, g in groupby('123abc456',lambda x: x.isalpha()): out.append( ''.join(g) ) fwiw, Alan Isaac From pav at iki.fi Mon May 11 10:55:40 2009 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 11 May 2009 14:55:40 +0000 (UTC) Subject: [Numpy-discussion] FAIL: Test bug in reduceat with structured arrays References: Message-ID: Mon, 11 May 2009 16:22:37 +0200, Nils Wagner kirjoitti: > On Mon, 11 May 2009 14:05:13 +0000 (UTC) > Pauli Virtanen wrote: >> Mon, 11 May 2009 14:06:07 +0200, Nils Wagner kirjoitti: >>> Can someone reproduce the following failure ? I am using >>>>>> numpy.__version__ >>> '1.4.0.dev6983' >>> >>> ====================================================================== >>> FAIL: Test bug in reduceat with structured arrays copied >>>for speed. >>> ---------------------------------------------------------------------- >> [clip] > Everytime I rebuild numpy I remove the build directory > before. > > CentOS release 4.6 x86_64 Python 2.5.1 Ok, I can reproduce this, too. x86_64 Debian etch, Python 2.5. Probably connected to r6977, Travis's changes in reduceat. Don't know if the test or code is buggy, though... Wonder why buildbot's 64-bit SPARC boxes don't see this if it's something connected to 64-bitness... -- Pauli Virtanen From aisaac at american.edu Mon May 11 10:48:14 2009 From: aisaac at american.edu (Alan G Isaac) Date: Mon, 11 May 2009 10:48:14 -0400 Subject: [Numpy-discussion] String manipulation In-Reply-To: References: <200905111425.46475.faltet@pytables.org> Message-ID: <4A083AAE.1010209@american.edu> On 5/11/2009 8:36 AM Nils Wagner apparently wrote: > I would like to split strings made of digits after eight > characters each. [l[i*8:(i+1)*8] for i in range(len(l)/8)] Alan Isaac From sccolbert at gmail.com Mon May 11 11:40:35 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Mon, 11 May 2009 11:40:35 -0400 Subject: [Numpy-discussion] strange behavior convolving via fft Message-ID: <7f014ea60905110840n31188285w4b051225034e8e3@mail.gmail.com> at least I think this is strange behavior. When convolving an image with a large kernel, its know that its faster to perform the operation as multiplication in the frequency domain. The below code example shows that the results of my 2d filtering are shifted from the expected value a distance 1/2 the width of the filter in both the x and y directions. Can anyone explain why this occurs? I have been able to find the answer in any of my image processing books. The code sample below is an artificial image of size (100, 100) full of zeros, the center of the image is populated by a (10, 10) square of 1's. The filter kernel is also a (10,10) square of 1's. The expected result of the convolution would therefore be a peak at location (50,50) in the image. Instead, I get (54, 54). The same shifting occurs regardless of the image and filter (assuming the filter is symetric, so flipping isnt necessary). I came across this behavior when filtering actual images, so this is not a byproduct of this example. The same effect also occurs using the full FFT as opposed to RFFT. I have links to the images produced by this process below. Thanks for any insight anyone can give! Chris In [12]: a = np.zeros((100,100)) In [13]: a[45:55,45:55] = 1 In [15]: k = np.ones((10,10)) In [16]: afft = np.fft.rfft2(a, s=(256,256)) In [19]: kfft = np.fft.rfft2(k, s=(256,256)) In [21]: result = np.fft.irfft2(afft*kfft).real[0:100,0:100] In [23]: result.argmax() Out[23]: 5454 www.therealstevencolbert.com/dump/a.png www.therealstevencolbert.com/dump/afft.png www.therealstevencolbert.com/dump/kfft.png www.therealstevencolbert.com/dump/result.png -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Mon May 11 14:03:26 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 11 May 2009 11:03:26 -0700 Subject: [Numpy-discussion] OS-X binary name... In-Reply-To: <5b8d13220905092355v40875665y89eb0372f4692a8d@mail.gmail.com> References: <4A01284E.6070104@noaa.gov> <5b8d13220905092355v40875665y89eb0372f4692a8d@mail.gmail.com> Message-ID: <4A08686E.9090506@noaa.gov> David Cournapeau wrote: > On Wed, May 6, 2009 at 3:03 PM, Christopher Barker >> The binary for OS-X on sourceforge is called: >> >> numpy-1.3.0-py2.5-macosx10.5.dmg >> >> However, as far as I can tell, it works just fine on OS-X 10.4, and >> maybe even 10.3.9. > > I have to confess I don't understand mac os x backward compatibility > story. Are you sure they are compatible ? Or is it just a happy > accident ? I can't be sure without knowing how it was built, but probably. The python.org python was built for 10.3.9 and above, and one of the points of distutils is to pass all the same flags along, so, unless it depends on other libraries built only for 10.5, it should be compatible. $otool -L multiarray.so multiarray.so: /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 88.3.9) I'm pretty sure libSystem.B is shipped with 10.3.9 (or maybe earlier...) I think a coincidence is unlikely. >> numpy-1.3.0-py2.5-macosx-python.org.dmg >> >> to indicate that it's for the python.org build of python2.5, though I'v >> never seen anyone use that convention. > > At that point, we could just drop macosx altogether I think. I guess a dmg wouldn't be anything else, so yes. > I changed > the name convention for scipy build scripts. great, thanks! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From charlesr.harris at gmail.com Mon May 11 14:41:59 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 11 May 2009 12:41:59 -0600 Subject: [Numpy-discussion] strange behavior convolving via fft In-Reply-To: <7f014ea60905110840n31188285w4b051225034e8e3@mail.gmail.com> References: <7f014ea60905110840n31188285w4b051225034e8e3@mail.gmail.com> Message-ID: On Mon, May 11, 2009 at 9:40 AM, Chris Colbert wrote: > at least I think this is strange behavior. > > When convolving an image with a large kernel, its know that its faster to > perform the operation as multiplication in the frequency domain. The below > code example shows that the results of my 2d filtering are shifted from the > expected value a distance 1/2 the width of the filter in both the x and y > directions. Can anyone explain why this occurs? I have been able to find the > answer in any of my image processing books. > > The code sample below is an artificial image of size (100, 100) full of > zeros, the center of the image is populated by a (10, 10) square of 1's. The > filter kernel is also a (10,10) square of 1's. The expected result of the > convolution would therefore be a peak at location (50,50) in the image. > Instead, I get (54, 54). The same shifting occurs regardless of the image > and filter (assuming the filter is symetric, so flipping isnt necessary). > Your kernel is offset and the result is expected. The kernel needs to be centered on the origin, aliasing will then put parts of it in all four corners of the array *before* you transform it. If you want to keep it simple you can phase shift the transform instead. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From sccolbert at gmail.com Mon May 11 14:45:27 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Mon, 11 May 2009 14:45:27 -0400 Subject: [Numpy-discussion] strange behavior convolving via fft In-Reply-To: References: <7f014ea60905110840n31188285w4b051225034e8e3@mail.gmail.com> Message-ID: <7f014ea60905111145m259fe185m6a255d509c7e0cfa@mail.gmail.com> Ok, that makes sense. Thanks Chuck. On Mon, May 11, 2009 at 2:41 PM, Charles R Harris wrote: > > > On Mon, May 11, 2009 at 9:40 AM, Chris Colbert wrote: > >> at least I think this is strange behavior. >> >> When convolving an image with a large kernel, its know that its faster to >> perform the operation as multiplication in the frequency domain. The below >> code example shows that the results of my 2d filtering are shifted from the >> expected value a distance 1/2 the width of the filter in both the x and y >> directions. Can anyone explain why this occurs? I have been able to find the >> answer in any of my image processing books. >> >> The code sample below is an artificial image of size (100, 100) full of >> zeros, the center of the image is populated by a (10, 10) square of 1's. The >> filter kernel is also a (10,10) square of 1's. The expected result of the >> convolution would therefore be a peak at location (50,50) in the image. >> Instead, I get (54, 54). The same shifting occurs regardless of the image >> and filter (assuming the filter is symetric, so flipping isnt necessary). >> > > Your kernel is offset and the result is expected. The kernel needs to be > centered on the origin, aliasing will then put parts of it in all four > corners of the array *before* you transform it. If you want to keep it > simple you can phase shift the transform instead. > > Chuck > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Mon May 11 16:06:33 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 11 May 2009 22:06:33 +0200 Subject: [Numpy-discussion] strange behavior convolving via fft In-Reply-To: <7f014ea60905110840n31188285w4b051225034e8e3@mail.gmail.com> References: <7f014ea60905110840n31188285w4b051225034e8e3@mail.gmail.com> Message-ID: <9457e7c80905111306o7ef34302x2f2f6760400d4050@mail.gmail.com> Hi Chris 2009/5/11 Chris Colbert : > When convolving an image with a large kernel, its know that its faster to > perform the operation as multiplication in the frequency domain. The below > code example shows that the results of my 2d filtering are shifted from the > expected value a distance 1/2 the width of the filter in both the x and y > directions. Can anyone explain why this occurs? I have been able to find the > answer in any of my image processing books. Just as a reminder, when doing this kind of filtering always pad correctly. Scipy does this in scipy.signal.fftconvolve I've also got some filtering implemented in http://mentat.za.net/cgi-bin/hgwebdir.cgi/filter/file/e97c0a6dd0ea/lpi_filter.py Regards St?fan From sccolbert at gmail.com Mon May 11 16:15:18 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Mon, 11 May 2009 16:15:18 -0400 Subject: [Numpy-discussion] strange behavior convolving via fft In-Reply-To: <9457e7c80905111306o7ef34302x2f2f6760400d4050@mail.gmail.com> References: <7f014ea60905110840n31188285w4b051225034e8e3@mail.gmail.com> <9457e7c80905111306o7ef34302x2f2f6760400d4050@mail.gmail.com> Message-ID: <7f014ea60905111315t4446c6f4r50e0cbd55877962f@mail.gmail.com> Stefan, Did I pad my example incorrectly? Both images were upped to the larger nearest power of 2 (256)... Does the scipy implementation do this differently? I thought that since FFTW support has been dropped, that scipy and numpy use the same routines... Thanks! Chris 2009/5/11 St?fan van der Walt > Hi Chris > > 2009/5/11 Chris Colbert : > > When convolving an image with a large kernel, its know that its faster to > > perform the operation as multiplication in the frequency domain. The > below > > code example shows that the results of my 2d filtering are shifted from > the > > expected value a distance 1/2 the width of the filter in both the x and y > > directions. Can anyone explain why this occurs? I have been able to find > the > > answer in any of my image processing books. > > Just as a reminder, when doing this kind of filtering always pad > correctly. Scipy does this in > > scipy.signal.fftconvolve > > I've also got some filtering implemented in > > > http://mentat.za.net/cgi-bin/hgwebdir.cgi/filter/file/e97c0a6dd0ea/lpi_filter.py > > Regards > St?fan > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From stefan at sun.ac.za Mon May 11 16:25:53 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 11 May 2009 22:25:53 +0200 Subject: [Numpy-discussion] strange behavior convolving via fft In-Reply-To: <7f014ea60905111315t4446c6f4r50e0cbd55877962f@mail.gmail.com> References: <7f014ea60905110840n31188285w4b051225034e8e3@mail.gmail.com> <9457e7c80905111306o7ef34302x2f2f6760400d4050@mail.gmail.com> <7f014ea60905111315t4446c6f4r50e0cbd55877962f@mail.gmail.com> Message-ID: <9457e7c80905111325y34f867f9x806d37aba40f703d@mail.gmail.com> Hi Chris, If you have MxN and PxQ signals, you must pad them to shape M+P-1 x N+Q-1, in order to prevent circular convolution (i.e. values on the one end sliding back in at the other). Regards St?fan 2009/5/11 Chris Colbert : > Stefan, > > Did I pad my example incorrectly? Both images were upped to the larger > nearest power of 2 (256)... > > Does the scipy implementation do this differently? I thought that since FFTW > support has been dropped, that scipy and numpy use the same routines... > > Thanks! > > Chris From stefan at sun.ac.za Mon May 11 16:27:16 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 11 May 2009 22:27:16 +0200 Subject: [Numpy-discussion] strange behavior convolving via fft In-Reply-To: <7f014ea60905111315t4446c6f4r50e0cbd55877962f@mail.gmail.com> References: <7f014ea60905110840n31188285w4b051225034e8e3@mail.gmail.com> <9457e7c80905111306o7ef34302x2f2f6760400d4050@mail.gmail.com> <7f014ea60905111315t4446c6f4r50e0cbd55877962f@mail.gmail.com> Message-ID: <9457e7c80905111327h10e3e3f0u57a83825da68d68a@mail.gmail.com> 2009/5/11 Chris Colbert : > Does the scipy implementation do this differently? I thought that since FFTW > support has been dropped, that scipy and numpy use the same routines... Just to be clear, I was referring to scipy.signal.fftconvolve, not scipy's FFT (which is the same as NumPy's). Regards St?fan From sccolbert at gmail.com Mon May 11 17:21:22 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Mon, 11 May 2009 17:21:22 -0400 Subject: [Numpy-discussion] strange behavior convolving via fft In-Reply-To: <9457e7c80905111327h10e3e3f0u57a83825da68d68a@mail.gmail.com> References: <7f014ea60905110840n31188285w4b051225034e8e3@mail.gmail.com> <9457e7c80905111306o7ef34302x2f2f6760400d4050@mail.gmail.com> <7f014ea60905111315t4446c6f4r50e0cbd55877962f@mail.gmail.com> <9457e7c80905111327h10e3e3f0u57a83825da68d68a@mail.gmail.com> Message-ID: <7f014ea60905111421j6afebc2ewd89ad3111aad55de@mail.gmail.com> Thanks Stefan. 2009/5/11 St?fan van der Walt > 2009/5/11 Chris Colbert : > > Does the scipy implementation do this differently? I thought that since > FFTW > > support has been dropped, that scipy and numpy use the same routines... > > Just to be clear, I was referring to scipy.signal.fftconvolve, not > scipy's FFT (which is the same as NumPy's). > > Regards > St?fan > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From taste_of_r at yahoo.com Mon May 11 17:33:10 2009 From: taste_of_r at yahoo.com (Wei Su) Date: Mon, 11 May 2009 14:33:10 -0700 (PDT) Subject: [Numpy-discussion] List of arrays Message-ID: <705433.40748.qm@web43506.mail.sp1.yahoo.com> Hi, Francesc: ? The codes do not work. Guess you forgot something there. ? Thanks. ? Wei Su --- On Mon, 5/11/09, Francesc Alted wrote: From: Francesc Alted Subject: Re: [Numpy-discussion] List of arrays To: "Discussion of Numerical Python" Date: Monday, May 11, 2009, 10:40 AM A Monday 11 May 2009, Nils Wagner escrigu?: > Hi all, > > How can I convert a list of arrays into one array ? > > Nils > > >>> data > > [array([? 40. ,? 285.6,???45. ,? 285.3,???50. ,? 285.1, >???55. ,? 284.8]), array([? 60. ,? 284.5,???65. ,? 282..8, >???70. ,? 281.1,???75. ,? 280. ]), array([? 80. ,? 278..8, >???85. ,? 278.1,???90. ,? 277.4,???95. ,? 276.9]), array([ > 100. ,? 276.3,? 105. ,? 276.1,? 110. ,? 275.9,? 115. , >? 275.7]), array([ 120. ,? 275.5,? 125. ,? 275.2,? 130. , >? 274.8,? 135. ,? 274.5]), array([ 140. ,? 274.1,? 145. , >? 273.7,? 150. ,? 273.2,? 155. ,? 272.7]), array([ 160. , >? 272.2,? 165. ,? 272.1,? 170. ,? 272. ,? 175. ,? 271.8]), > array([ 180. ,? 271.6,? 185. ,? 271. ,? 190. ,? 270.3, >? 195. ,? 269.5]), array([ 200. ,? 268.5,? 205. ,? 267.4, >? 210. ,? 266.1,? 215. ,? 263.5]), array([ 220. ,? 260.1, >? 225. ,? 256.1,? 230. ,? 249.9,? 235. ,? 239.3]), array([ > 238.7,? 186.2,? 240.,? 160. ,? 245. ,? 119.7,? 250. , >? 111.3])] > > newdata=array([ 40. ,? 285.6,???45. ,? 285.3,???50. , >? 285.1, 55. ,? 284.8, 60. ,? 284.5,???65. ,? 282.8, ..., >? 111.3]) Try np.concatenate: In [9]: a = np.arange(10) In [10]: b = np.arange(10,20) In [11]: np.concatenate(l) Out[11]: array([ 0,? 1,? 2,? 3,? 4,? 5,? 6,? 7,? 8,? 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]) Hope that helps, -- Francesc Alted "One would expect people to feel threatened by the 'giant brains or machines that think'.? In fact, the frightening computer becomes less frightening if it is used only to simulate a familiar noncomputer." -- Edsger W. Dykstra ???"On the cruelty of really teaching computer science" _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From taste_of_r at yahoo.com Mon May 11 17:44:22 2009 From: taste_of_r at yahoo.com (Wei Su) Date: Mon, 11 May 2009 14:44:22 -0700 (PDT) Subject: [Numpy-discussion] How to merge or SQL join record arrays in Python? Message-ID: <386820.30846.qm@web43514.mail.sp1.yahoo.com> ? ? Hi, All, ? Coming from SAS and R, this is probably the first thing I want to do now that I can convert my data into record arrays. But I could not find any clues after googling for a while. Any hint or suggestions will be great! ? Thanks a lot. ? Wei Su -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Mon May 11 17:52:29 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 11 May 2009 14:52:29 -0700 Subject: [Numpy-discussion] List of arrays In-Reply-To: <705433.40748.qm@web43506.mail.sp1.yahoo.com> References: <705433.40748.qm@web43506.mail.sp1.yahoo.com> Message-ID: <4A089E1D.7040900@noaa.gov> Wei Su wrote: > The codes do not work. Guess you forgot something there. l wasn't defined: In [16]: a = np.arange(10) In [17]: b = np.arange(5) In [20]: l = [a,b] In [21]: l Out[21]: [array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([0, 1, 2, 3, 4])] In [22]: np.concatenate(l) Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pgmdevlist at gmail.com Mon May 11 18:03:23 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 11 May 2009 18:03:23 -0400 Subject: [Numpy-discussion] How to merge or SQL join record arrays in Python? In-Reply-To: <386820.30846.qm@web43514.mail.sp1.yahoo.com> References: <386820.30846.qm@web43514.mail.sp1.yahoo.com> Message-ID: On May 11, 2009, at 5:44 PM, Wei Su wrote: > > Coming from SAS and R, this is probably the first thing I want to do > now that I can convert my data into record arrays. But I could not > find any clues after googling for a while. Any hint or suggestions > will be great! That depends what you want, actually, ut this should get you started http://docs.scipy.org/doc/numpy/user/basics.rec.html Note the slight difference between a structured array (fields accessible as items) and a recarray (fields accessible as items and attributes). From taste_of_r at yahoo.com Mon May 11 18:18:03 2009 From: taste_of_r at yahoo.com (Wei Su) Date: Mon, 11 May 2009 15:18:03 -0700 (PDT) Subject: [Numpy-discussion] How to merge or SQL join record arrays in Python? Message-ID: <495569.40950.qm@web43501.mail.sp1.yahoo.com> ? Hi, Pierre: ? Thanks for the reply. I can now actually turn a big list into a record array. My question is actually how to join related record arrays in Python. This is done in SAS by MERGE and PROC SQL and by merge() in R. But I have no idea how to do it in Python. ? Thanks. ? Wei Su --- On Mon, 5/11/09, Pierre GM wrote: From: Pierre GM Subject: Re: [Numpy-discussion] How to merge or SQL join record arrays in Python? To: "Discussion of Numerical Python" Date: Monday, May 11, 2009, 10:03 PM On May 11, 2009, at 5:44 PM, Wei Su wrote: > > Coming from SAS and R, this is probably the first thing I want to do? > now that I can convert my data into record arrays. But I could not? > find any clues after googling for a while. Any hint or suggestions? > will be great! That depends what you want, actually, ut this should get you started http://docs.scipy.org/doc/numpy/user/basics.rec.html Note the slight difference between a structured array (fields? accessible as items) and a recarray (fields accessible as items and? attributes). _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From kxroberto at googlemail.com Mon May 11 18:22:22 2009 From: kxroberto at googlemail.com (Robert) Date: Tue, 12 May 2009 00:22:22 +0200 Subject: [Numpy-discussion] minimal numpy ? Message-ID: for use in binary distribution where I need only basics and fast startup/low memory footprint, I try to isolate the minimal ndarray type and what I need.. with "import numpy" or "import numpy.core.multiarray" almost the whole numpy package tree is imported, _dotblas etc. cxFreeze produces some 10MB numpy baggage (4MB zipped) yet when copying and using the multiarray DLL only, I can create arrays, but he most things fail: >>> import multiarray >>> x=multiarray.array([5,6]) >>> x+x Traceback (most recent call last): File "", line 1, in TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'numpy.ndarray' while this works: >>> b=numpy.core.multiarray.array([3,9]) >>> b+b array([ 6, 18]) I added some things from core.__init__.py like this: import umath import _internal # for freeze programs import numerictypes as nt multiarray.set_typeDict(nt.sctypeDict) .. but the problem of failed type self-recognition remains. What is this? What to do? From pgmdevlist at gmail.com Mon May 11 18:36:08 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 11 May 2009 18:36:08 -0400 Subject: [Numpy-discussion] How to merge or SQL join record arrays in Python? In-Reply-To: <495569.40950.qm@web43501.mail.sp1.yahoo.com> References: <495569.40950.qm@web43501.mail.sp1.yahoo.com> Message-ID: <1A95C113-130D-4257-8EB7-9743A9D11514@gmail.com> On May 11, 2009, at 6:18 PM, Wei Su wrote: > > Thanks for the reply. I can now actually turn a big list into a > record array. My question is actually how to join related record > arrays in Python.. This is done in SAS by MERGE and PROC SQL and by > merge() in R. But I have no idea how to do it in Python. OK. Try numpy.lib.recfunctions.join_by, and let me know if you have any problem. It's a rewritten version of an equivalent function in matplotlib (matplotlib.mlab.rec_join), that should work (maybe not, there hasn't been enough testing feedback to judge...) From jsseabold at gmail.com Mon May 11 18:36:14 2009 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 11 May 2009 18:36:14 -0400 Subject: [Numpy-discussion] How to merge or SQL join record arrays in Python? In-Reply-To: <495569.40950.qm@web43501.mail.sp1.yahoo.com> References: <495569.40950.qm@web43501.mail.sp1.yahoo.com> Message-ID: On Mon, May 11, 2009 at 6:18 PM, Wei Su wrote: > > Hi, Pierre: > > Thanks for the reply. I can now actually turn a big list into a record > array. My question is actually how to join related record arrays in Python.. > This is done in SAS by MERGE and PROC SQL and by merge() in R. But I have no > idea how to do it in Python. > > Thanks. > > Wei Su > Does merge_arrays in numpy.lib.recfunctions do what you want? Skipper From pwang at enthought.com Mon May 11 18:49:02 2009 From: pwang at enthought.com (Peter Wang) Date: Mon, 11 May 2009 17:49:02 -0500 Subject: [Numpy-discussion] How to include numpy headers in C across versions 1.1, 1.2, and 1.3 Message-ID: Hey guys, I've got a small C extension that uses isnan() and (in numpy 1.1) had been importing it from ufuncobject.h. I see that it has now moved into npy_math.h in 1.3. What is the best way to ensure that I can reliably include this function across versions 1.1, 1.2, and 1.3? (Checking NPY_FEATURE_VERSION won't work, since it did not change from 1.2 to 1.3, although the location of the function definition did.) My best idea right now is to simply do a numpy version check in my setup.py, and hard-code some macros at the top of my C extension to #include the appropriate headers for each version. Any help or suggestions would be appreciated! Thanks, Peter From pgmdevlist at gmail.com Mon May 11 18:56:00 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 11 May 2009 18:56:00 -0400 Subject: [Numpy-discussion] How to merge or SQL join record arrays in Python? In-Reply-To: References: <495569.40950.qm@web43501.mail.sp1.yahoo.com> Message-ID: <2EB52889-49FF-4C07-BE31-8F1D614D3408@gmail.com> On May 11, 2009, at 6:36 PM, Skipper Seabold wrote: > On Mon, May 11, 2009 at 6:18 PM, Wei Su wrote: >> >> Hi, Pierre: >> >> Thanks for the reply. I can now actually turn a big list into a >> record >> array. My question is actually how to join related record arrays in >> Python.. >> This is done in SAS by MERGE and PROC SQL and by merge() in R. But >> I have no >> idea how to do it in Python. >> >> Thanks. >> >> Wei Su >> > > Does merge_arrays in numpy.lib.recfunctions do what you want? Probably not. merge_arrays is close to concatenate, and will raise an exception if 2 fields have the same name (in the flattened version). Testing R's merge(), join_by looks like the corresponding function. From charlesr.harris at gmail.com Mon May 11 20:11:30 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 11 May 2009 18:11:30 -0600 Subject: [Numpy-discussion] How to include numpy headers in C across versions 1.1, 1.2, and 1.3 In-Reply-To: References: Message-ID: On Mon, May 11, 2009 at 4:49 PM, Peter Wang wrote: > Hey guys, > > I've got a small C extension that uses isnan() and (in numpy 1.1) had > been importing it from ufuncobject.h. I see that it has now moved > into npy_math.h in 1.3. > > What is the best way to ensure that I can reliably include this > function across versions 1.1, 1.2, and 1.3? (Checking > NPY_FEATURE_VERSION won't work, since it did not change from 1.2 to > 1.3, although the location of the function definition did.) > > My best idea right now is to simply do a numpy version check in my > setup.py, and hard-code some macros at the top of my C extension to > #include the appropriate headers for each version. > > Any help or suggestions would be appreciated! > Oops, looks like we broke the ABI ;) For numpy itself we should fix things by including npy_math in ufuncobject.h. Looks like a fixup release might be in offing here. Otherwise there might be a workaround that would work. In 1.1.x it looks like isnan is defined in ufuncobject iff it is compiled on windows. Try #include ufuncobject.h #ifdef _MSC_VER #ifndef isnan #define isnan(x) ((x) != (x)) #endif #endif Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Mon May 11 23:00:58 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 12 May 2009 12:00:58 +0900 Subject: [Numpy-discussion] How to include numpy headers in C across versions 1.1, 1.2, and 1.3 In-Reply-To: References: Message-ID: <4A08E66A.9020202@ar.media.kyoto-u.ac.jp> Charles R Harris wrote: > > > On Mon, May 11, 2009 at 4:49 PM, Peter Wang > wrote: > > Hey guys, > > I've got a small C extension that uses isnan() and (in numpy 1.1) had > been importing it from ufuncobject.h. I see that it has now moved > into npy_math.h in 1.3. > isnan is a C99 function (more exactly a macro), so we should not have defined it in the first place in public header, strictly speaking. The replacement in numpy 1.3 is npy_nan (and for every math function, replaced with the npy_ prefix). > My best idea right now is to simply do a numpy version check in my > setup.py, and hard-code some macros at the top of my C extension to > #include the appropriate headers for each version. > > Any help or suggestions would be appreciated! > You could just reproduce the logic used for numpy 1.3: check whether isnan is declared in math.h, and if not, use a replacement (the replacement are in npy_math.h - they are guaranteed to work on most platforms where numpy runs). It avoids hardcoding versions, which is often problematic if you need to support many platforms. cheers, David From david at ar.media.kyoto-u.ac.jp Mon May 11 23:42:06 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 12 May 2009 12:42:06 +0900 Subject: [Numpy-discussion] minimal numpy ? In-Reply-To: References: Message-ID: <4A08F00E.1040605@ar.media.kyoto-u.ac.jp> Hi Robert, Robert wrote: > for use in binary distribution where I need only basics and fast > startup/low memory footprint, I try to isolate the minimal ndarray > type and what I need.. > > with "import numpy" or "import numpy.core.multiarray" almost the > whole numpy package tree is imported, _dotblas etc. > cxFreeze produces some 10MB numpy baggage (4MB zipped) > Yes, we have some circular import going on. Ideally, numpy.core should be totally independent from the rest of the package. When built with -Os (or the equivalent with non posix compilers), numpy/core is ~ 800 kb zip compressed (2.4 Mb, uncompressed). > yet when copying and using the multiarray DLL only, I can create > arrays, but he most things fail: > > >>> import multiarray > >>> x=multiarray.array([5,6]) > >>> x+x > Traceback (most recent call last): > File "", line 1, in > TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and > 'numpy.ndarray' > I think you need at least umath to make this work: when doing import numpy.core.multiarray, you pull out the whole numpy (because import foo.bar induces import foo I believe), whereas import multiarray just imports the multiarray C extension. So my suggestion would be to modify numpy such as you can do import numpy after having removed most directories inside numpy. The big ones are distutils and f2py, which should already save 2.5 Mb and are not used at all in numpy itself. IIRC, the only problematic package is numpy.lib (we import numpy.lib in numpy.core IIRC). cheers, David From faltet at pytables.org Tue May 12 03:55:38 2009 From: faltet at pytables.org (Francesc Alted) Date: Tue, 12 May 2009 09:55:38 +0200 Subject: [Numpy-discussion] List of arrays In-Reply-To: <4A089E1D.7040900@noaa.gov> References: <705433.40748.qm@web43506.mail.sp1.yahoo.com> <4A089E1D.7040900@noaa.gov> Message-ID: <200905120955.38707.faltet@pytables.org> On Monday 11 May 2009 23:52:29 Christopher Barker wrote: > Wei Su wrote: > > The codes do not work. Guess you forgot something there. > > l wasn't defined: > > In [16]: a = np.arange(10) > > In [17]: b = np.arange(5) > > In [20]: l = [a,b] > > In [21]: l > Out[21]: [array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9]), array([0, 1, 2, 3, 4])] > > In [22]: np.concatenate(l) > Out[22]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 1, 2, 3, 4]) Oops. That's it. Thanks Chris! -- Francesc Alted From robince at gmail.com Tue May 12 09:12:12 2009 From: robince at gmail.com (Robin) Date: Tue, 12 May 2009 14:12:12 +0100 Subject: [Numpy-discussion] copy and paste arrays from matlab Message-ID: [crossposted to numpy-discussion and mlabwrap-user] Hi, I wrote a little utility class in Matlab that inherits from double and overloads the display function so you can easily print matlab arrays of arbitrary dimension in Numpy format for easy copy and pasting. I have to work a lot with other peoples code - and while mlabwrap and reading and writing is great, sometimes I find it easier and quicker just to copy and paste smaller arrays between interactive sessions. Anyway you put it in your Matlab path then you can do x = rand(2,3,4,5); a = array(x) You can specify the fprintf style format string either in the constructor or after: a = array(x,'%2.6f') a.format = '%2.2f' eg: >> x = rand(4,3,2); >> array(x) ans = array([[[2.071566461449581e-01, 3.501602151029837e-02], [1.589135260727248e-01, 3.766891927380323e-01], [8.757206127846399e-01, 7.259276565938600e-01]], [[7.570839415557700e-01, 3.974969411279816e-02], [8.109207856487061e-01, 5.043242527988604e-01], [6.351863794630047e-01, 7.013280585980169e-01]], [[8.863281096304466e-01, 9.885678912262633e-01], [4.765077527169480e-01, 7.634956792870943e-01], [9.728134909163066e-02, 4.588908258125032e-01]], [[4.722298594969571e-01, 6.861815984603373e-01], [1.162875322461844e-01, 4.887479677951201e-02], [9.084394562396312e-01, 5.822948089552498e-01]]]) It's a while since I've tried to do anything like this in Matlab and I must admit I found it pretty painful, so I hope it can be useful to someone else! I will try and do one for Python for copying and pasting to Matlab, but I'm expecting that to be a lot easier! Cheers Robin -------------- next part -------------- A non-text attachment was scrubbed... Name: array.m Type: application/octet-stream Size: 2104 bytes Desc: not available URL: From craig at brechmos.org Tue May 12 15:51:21 2009 From: craig at brechmos.org (brechmos) Date: Tue, 12 May 2009 12:51:21 -0700 (PDT) Subject: [Numpy-discussion] Matlab/Numpy index order Message-ID: <23509178.post@talk.nabble.com> I am very new to Numpy and relatively new to Python. I have used Matlab for 15+ years now. But, I am starting to lean toward using Numpy for all my work. One thing that I am not understanding is the order of data when read in from a file. Let's say I have a 256x256x150 uint16 dataset (MRI, 150 slices). In Matlab I would read it in as: >> fp=fopen(); >> d = fread(fp,256*256*150, 'int16'); >> fclose(fp); >> c = reshape(d, [256 256 150]); >> imagesc(c(:,:,1)); I am very used to having it read it in and doing the reshaping such that it is 256 rows by 256 columns by 150 slices. Now, Numpy, I can read in the binary data using fromfile (or open, read, close): In [85]: a = fromfile(, dtype='int16') In [86]: b = array(struct.unpack('<%dH'%(256*256*150), a)).reshape(150,256,256) So, in Numpy I have to reshape it so the "slices" are in the first dimension. Obviously, I can do a b.transpose( (1,2,0) ) to get it to look like Matlab, but... I don't understand why the index ordering is different between Matlab and Numpy. (It isn't a C/Fortran ordering thing, I don' think). Is the data access faster if I have b without the tranpose, or can I transpose it so it "looks" like Matlab without taking a hit when I do imshow( b[:,:,0] ). Any help for a Numpy newbie would be appreciated. -- View this message in context: http://www.nabble.com/Matlab-Numpy-index-order-tp23509178p23509178.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From rmay31 at gmail.com Tue May 12 15:55:16 2009 From: rmay31 at gmail.com (Ryan May) Date: Tue, 12 May 2009 14:55:16 -0500 Subject: [Numpy-discussion] Matlab/Numpy index order In-Reply-To: <23509178.post@talk.nabble.com> References: <23509178.post@talk.nabble.com> Message-ID: On Tue, May 12, 2009 at 2:51 PM, brechmos wrote: > So, in Numpy I have to reshape it so the "slices" are in the first > dimension. Obviously, I can do a b.transpose( (1,2,0) ) to get it to look > like Matlab, but... > > I don't understand why the index ordering is different between Matlab and > Numpy. (It isn't a C/Fortran ordering thing, I don' think). Actually, that's precisely the reason. > Is the data access faster if I have b without the tranpose, or can I > transpose it so it "looks" like Matlab without taking a hit when I do > imshow( b[:,:,0] ). > It's going to be faster to do it without the transpose. Besides, for numpy, that imshow becomes: imshow(b[0]) Which, IMHO, looks better than Matlab. Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma -------------- next part -------------- An HTML attachment was scrubbed... URL: From craig at brechmos.org Tue May 12 16:02:00 2009 From: craig at brechmos.org (brechmos) Date: Tue, 12 May 2009 13:02:00 -0700 (PDT) Subject: [Numpy-discussion] Matlab/Numpy index order In-Reply-To: References: <23509178.post@talk.nabble.com> Message-ID: <23509345.post@talk.nabble.com> Ah, hah. In [3]: c = b.reshape((256,256,150), order='F') Ok, I needed more coffee. If I do it this way (without the transpose), it should be as fast as c=b.reshape((150,256,256)), right? It is just changing the stride (or something like that)? Or is it going to be faster without changing the order? Thanks for the help. Ryan May-3 wrote: > > On Tue, May 12, 2009 at 2:51 PM, brechmos wrote: > >> So, in Numpy I have to reshape it so the "slices" are in the first >> dimension. Obviously, I can do a b.transpose( (1,2,0) ) to get it to >> look >> like Matlab, but... >> >> I don't understand why the index ordering is different between Matlab and >> Numpy. (It isn't a C/Fortran ordering thing, I don' think). > > > Actually, that's precisely the reason. > > >> Is the data access faster if I have b without the tranpose, or can I >> transpose it so it "looks" like Matlab without taking a hit when I do >> imshow( b[:,:,0] ). >> > > It's going to be faster to do it without the transpose. Besides, for > numpy, > that imshow becomes: > > imshow(b[0]) > > Which, IMHO, looks better than Matlab. > > Ryan > > -- > Ryan May > Graduate Research Assistant > School of Meteorology > University of Oklahoma > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- View this message in context: http://www.nabble.com/Matlab-Numpy-index-order-tp23509178p23509345.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From robert.kern at gmail.com Tue May 12 16:02:24 2009 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 12 May 2009 15:02:24 -0500 Subject: [Numpy-discussion] Matlab/Numpy index order In-Reply-To: References: <23509178.post@talk.nabble.com> Message-ID: <3d375d730905121302k54d4472dh39568e6833f2caaa@mail.gmail.com> On Tue, May 12, 2009 at 14:55, Ryan May wrote: > On Tue, May 12, 2009 at 2:51 PM, brechmos wrote: >> >> So, in Numpy I have to reshape it so the "slices" are in the first >> dimension. ?Obviously, I can do a b.transpose( (1,2,0) ) to get it to look >> like Matlab, but... >> >> I don't understand why the index ordering is different between Matlab and >> Numpy. ?(It isn't a C/Fortran ordering thing, I don' think). > > Actually, that's precisely the reason. To expand on this comment, when Matlab was first released, it was basically just an interactive shell on top of FORTRAN routines from LAPACK and other linear algebra *PACKs. Consequently, it standardized on FORTRAN's column-major format. While numpy isn't really beholden to C's ordering for multidimensional arrays (numpy arrays are just blocks of strided memory, not x[i][j][k] arrays of pointers to arrays of pointers to arrays), we do want consistency with the equivalent nested Python lists, and that does imply row-major formatting by default. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From dwf at cs.toronto.edu Tue May 12 16:14:48 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Tue, 12 May 2009 16:14:48 -0400 Subject: [Numpy-discussion] Matlab/Numpy index order In-Reply-To: References: <23509178.post@talk.nabble.com> Message-ID: <030F44C8-36D1-4C82-ADEC-98D6C508AD83@cs.toronto.edu> On 12-May-09, at 3:55 PM, Ryan May wrote: > > It's going to be faster to do it without the transpose. Besides, > for numpy, > that imshow becomes: > > imshow(b[0]) > > Which, IMHO, looks better than Matlab. You're right, that is better, odd how I never thought of doing it like that. I've been stuck in my Matlab-esque world with dstack() as my default mental model of how images/matrices ought to be stacked. Am I right in thinking that b[0] is stored in a big contiguous block of memory, thus making the read marginally faster than slicing on the third? David From sccolbert at gmail.com Tue May 12 16:32:40 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Tue, 12 May 2009 16:32:40 -0400 Subject: [Numpy-discussion] Matlab/Numpy index order In-Reply-To: <030F44C8-36D1-4C82-ADEC-98D6C508AD83@cs.toronto.edu> References: <23509178.post@talk.nabble.com> <030F44C8-36D1-4C82-ADEC-98D6C508AD83@cs.toronto.edu> Message-ID: <7f014ea60905121332u3e1af243u9d645ecde4a7219c@mail.gmail.com> This is interesting. I have always done RGB imaging with numpy using arrays of shape (height, width, 3). In fact, this is the form that PIL gives when calling np.asarray() on a PIL image. It does seem more efficient to be able to do a[0],a[1],a[2] to get the R, G, and B channels respectively. This, obviously is not currently the case. Would it be better for me to switch to this way of doing things and/or work a patch for PIL so that the array is built in the form (3, height, width)? Chris On Tue, May 12, 2009 at 4:14 PM, David Warde-Farley wrote: > > On 12-May-09, at 3:55 PM, Ryan May wrote: > > > > It's going to be faster to do it without the transpose. Besides, > > for numpy, > > that imshow becomes: > > > > imshow(b[0]) > > > > Which, IMHO, looks better than Matlab. > > You're right, that is better, odd how I never thought of doing it like > that. I've been stuck in my Matlab-esque world with dstack() as my > default mental model of how images/matrices ought to be stacked. > > Am I right in thinking that b[0] is stored in a big contiguous block > of memory, thus making the read marginally faster than slicing on the > third? > > David > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Tue May 12 16:47:51 2009 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 12 May 2009 15:47:51 -0500 Subject: [Numpy-discussion] Matlab/Numpy index order In-Reply-To: <7f014ea60905121332u3e1af243u9d645ecde4a7219c@mail.gmail.com> References: <23509178.post@talk.nabble.com> <030F44C8-36D1-4C82-ADEC-98D6C508AD83@cs.toronto.edu> <7f014ea60905121332u3e1af243u9d645ecde4a7219c@mail.gmail.com> Message-ID: <3d375d730905121347w5d5940f5w8ad87a638cc4373d@mail.gmail.com> On Tue, May 12, 2009 at 15:32, Chris Colbert wrote: > This is interesting. > > I have always done RGB imaging with numpy using arrays of shape (height, > width, 3). In fact, this is the form that PIL gives when calling > np.asarray() on a PIL image. > > It does seem more efficient to be able to do a[0],a[1],a[2] to get the R, G, > and B channels respectively. This, obviously is not currently the case. It's not *that* much more efficient. > Would it be better for me to switch to this way of doing things? and/or work > a patch for PIL so that the array is built in the form (3, height, width)? Submitting a patch for PIL would neither be successful, nor worth your time. Not to mention breaking existing code. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From james.jackson at cern.ch Tue May 12 17:14:05 2009 From: james.jackson at cern.ch (James Jackson) Date: Tue, 12 May 2009 22:14:05 +0100 Subject: [Numpy-discussion] Problem building 1.3.0 on x86_64 platform Message-ID: <5DE54C2B-B7DD-4A6B-B870-66C0ACB4CB01@cern.ch> Hi, I am attempting (and failing...) to build numpy on a Scientific Linux 4.6 x86_64 (essentially RHEL I believe) with Python 2.4 (i386). The machine has the following Python RPM installed: python2.4-2.4-1pydotorg.i386 python2.4-tools-2.4-1pydotorg.i386 python2.4-devel-2.4-1pydotorg.i386 And also has gcc, g++ and f77 installed. Running python setup.py config appears to be successful, with lots of warnings abougt ATLAS, BLAS etc not being available (this is fine, I just want numpy for the array handling features used in matplotlib). However, the build fails (following doesn't show missing library warnings, as above): ------------------------------------------------------------------------------------------------------ running build running config_cc unifing config_cc, config, build_clib, build_ext, build commands -- compiler options running config_fc unifing config_fc, config, build_clib, build_ext, build commands -- fcompiler options running build_src building py_modules sources building library "npymath" sources building extension "numpy.core._sort" sources Generating build/src.linux-x86_64-2.4/numpy/core/include/numpy/config.h customize GnuFCompiler Found executable /usr/bin/g77 gnu: no Fortran 90 compiler found gnu: no Fortran 90 compiler found customize GnuFCompiler gnu: no Fortran 90 compiler found gnu: no Fortran 90 compiler found customize GnuFCompiler using config C compiler: gcc -pthread -fno-strict-aliasing -DNDEBUG -g -O3 -Wall - Wstrict-prototypes -fPIC compile options: '-Inumpy/core/src -Inumpy/core/include -I/usr/include/ python2.4 -c' gcc: _configtest.c In file included from /usr/include/python2.4/Python.h:55, from _configtest.c:1: /usr/include/python2.4/pyport.h:612:2: #error "LONG_BIT definition appears wrong for platform (bad gcc/glibc config?)." In file included from /usr/include/python2.4/Python.h:55, from _configtest.c:1: /usr/include/python2.4/pyport.h:612:2: #error "LONG_BIT definition appears wrong for platform (bad gcc/glibc config?)." failure. removing: _configtest.c _configtest.o Traceback (most recent call last): File "setup.py", line 172, in ? setup_package() File "setup.py", line 165, in setup_package configuration=configuration ) File "/home/castormonitor/PythonLibs/numpy-1.3.0/numpy/distutils/ core.py", line 184, in setup return old_setup(**new_attr) File "/var/tmp/python2.4-2.4-root/usr/lib/python2.4/distutils/ core.py", line 149, in setup File "/var/tmp/python2.4-2.4-root/usr/lib/python2.4/distutils/ dist.py", line 946, in run_commands File "/var/tmp/python2.4-2.4-root/usr/lib/python2.4/distutils/ dist.py", line 966, in run_command File "/home/castormonitor/PythonLibs/numpy-1.3.0/numpy/distutils/ command/build.py", line 37, in run old_build.run(self) File "/var/tmp/python2.4-2.4-root/usr/lib/python2.4/distutils/ command/build.py", line 112, in run File "/usr/lib/python2.4/cmd.py", line 333, in run_command del help[cmd] File "/var/tmp/python2.4-2.4-root/usr/lib/python2.4/distutils/ dist.py", line 966, in run_command File "/home/castormonitor/PythonLibs/numpy-1.3.0/numpy/distutils/ command/build_src.py", line 130, in run self.build_sources() File "/home/castormonitor/PythonLibs/numpy-1.3.0/numpy/distutils/ command/build_src.py", line 147, in build_sources self.build_extension_sources(ext) File "/home/castormonitor/PythonLibs/numpy-1.3.0/numpy/distutils/ command/build_src.py", line 250, in build_extension_sources sources = self.generate_sources(sources, ext) File "/home/castormonitor/PythonLibs/numpy-1.3.0/numpy/distutils/ command/build_src.py", line 307, in generate_sources source = func(extension, build_dir) File "numpy/core/setup.py", line 286, in generate_config_h moredefs, ignored = cocache.check_types(config_cmd, ext, build_dir) File "numpy/core/setup.py", line 30, in check_types out = check_types(*a, **kw) File "numpy/core/setup.py", line 185, in check_types raise SystemError( SystemError: Cannot compiler 'Python.h'. Perhaps you need to install python-dev|python-devel. ------------------------------------------------------------------------------------------------------ I note that the distribution directory being created is build/ src.linux-x86_64-2.4 - not i386. Can I force the architecture in the configure step, as it appears this would be the problem (hinted at by LONG_BIG wrong for platform error). Any hints gratefully received! Regards, James. From eads at soe.ucsc.edu Tue May 12 21:44:56 2009 From: eads at soe.ucsc.edu (Damian Eads) Date: Tue, 12 May 2009 18:44:56 -0700 Subject: [Numpy-discussion] Distance Formula on an Array In-Reply-To: References: Message-ID: <91b4b1ab0905121844i7b6a345em91abb0ee9f6c041f@mail.gmail.com> Hi Ian, Sorry for responding so late. I've been traveling and I'm just catching up on my e-mail now. This is easily accomplished with the cdist function, which computes the pairwise distances between two sets of vectors. In your case, one of the sets contains only a single vector. In [6]: scipy.spatial.distance.cdist([[0,4,0]],[[0,0,0],[0,1,0],[0,0,3]]) Out[6]: array([[ 4., 3., 5.]]) I hope this helps. Cheers, Damian On Sat, Apr 25, 2009 at 11:50 AM, Ian Mallett wrote: > Hi, > > I have an array sized n*3. Each three-component is a 3D position. Given > another 3D position, how is the distance between it and every > three-component in the array found with NumPy? > > So, for example, if the array is: > [[0,0,0],[0,1,0],[0,0,3]] > And the position is: > [0,4,0] > I need this array out: > [4,3,5] > (Just a simple Pythagorean Distance Formula) > > Ideas? > Thanks, > Ian > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > ----------------------------------------------------- Damian Eads Ph.D. Candidate Jack Baskin School of Engineering, UCSC E2-489 1156 High Street Machine Learning Lab Santa Cruz, CA 95064 http://www.soe.ucsc.edu/~eads From geometrian at gmail.com Tue May 12 21:46:36 2009 From: geometrian at gmail.com (Ian Mallett) Date: Tue, 12 May 2009 18:46:36 -0700 Subject: [Numpy-discussion] Distance Formula on an Array In-Reply-To: <91b4b1ab0905121844i7b6a345em91abb0ee9f6c041f@mail.gmail.com> References: <91b4b1ab0905121844i7b6a345em91abb0ee9f6c041f@mail.gmail.com> Message-ID: Thanks, but I don't want to make SciPy a dependency. NumPy is ok though. -------------- next part -------------- An HTML attachment was scrubbed... URL: From eads at soe.ucsc.edu Tue May 12 21:54:57 2009 From: eads at soe.ucsc.edu (Damian Eads) Date: Tue, 12 May 2009 18:54:57 -0700 Subject: [Numpy-discussion] Distance Formula on an Array In-Reply-To: References: <91b4b1ab0905121844i7b6a345em91abb0ee9f6c041f@mail.gmail.com> Message-ID: <91b4b1ab0905121854x2eae74besc46da280c5c7544f@mail.gmail.com> If you want the distance functionality without the rest of SciPy, you can download the scipy-cluster package (http://scipy-cluster.googlecode.com), which I still maintain. It does not depend on any other libraries except NumPy and is very easy to build. I understand if that's not an option for you. Cheers, Damian On Tue, May 12, 2009 at 6:46 PM, Ian Mallett wrote: > Thanks, but I don't want to make SciPy a dependency. NumPy is ok though. > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion ----------------------------------------------------- Damian Eads Ph.D. Candidate Jack Baskin School of Engineering, UCSC E2-489 1156 High Street Machine Learning Lab Santa Cruz, CA 95064 http://www.soe.ucsc.edu/~eads From geometrian at gmail.com Tue May 12 21:59:00 2009 From: geometrian at gmail.com (Ian Mallett) Date: Tue, 12 May 2009 18:59:00 -0700 Subject: [Numpy-discussion] Distance Formula on an Array In-Reply-To: <91b4b1ab0905121854x2eae74besc46da280c5c7544f@mail.gmail.com> References: <91b4b1ab0905121844i7b6a345em91abb0ee9f6c041f@mail.gmail.com> <91b4b1ab0905121854x2eae74besc46da280c5c7544f@mail.gmail.com> Message-ID: Hey, this looks cool! I may use it in the future. The problem has already been solved, though, and I don't think changing it is necessary. I'd also like to keep the dependencies (even packaged ones) to a minimum. -------------- next part -------------- An HTML attachment was scrubbed... URL: From cournape at gmail.com Tue May 12 22:06:21 2009 From: cournape at gmail.com (David Cournapeau) Date: Wed, 13 May 2009 11:06:21 +0900 Subject: [Numpy-discussion] Problem building 1.3.0 on x86_64 platform In-Reply-To: <5DE54C2B-B7DD-4A6B-B870-66C0ACB4CB01@cern.ch> References: <5DE54C2B-B7DD-4A6B-B870-66C0ACB4CB01@cern.ch> Message-ID: <5b8d13220905121906t387695a5t1f669e55cbd872d@mail.gmail.com> On Wed, May 13, 2009 at 6:14 AM, James Jackson wrote: > > > I note that the distribution directory being created is build/ > src.linux-x86_64-2.4 - not i386. Can I force the architecture in the > configure step, as it appears this would be the problem (hinted at by > LONG_BIG wrong for platform error). You should make sure you are using the 32 bits python, so that 32 bits headers will be used. You could use the following to check: python -c "import platform; print platform.architecture()" cheers, David From cournape at gmail.com Tue May 12 23:02:02 2009 From: cournape at gmail.com (David Cournapeau) Date: Wed, 13 May 2009 12:02:02 +0900 Subject: [Numpy-discussion] Detecting C API mismatch (was Managing Python with NumPy and many external libraries on multiple Windows machines) In-Reply-To: <9457e7c80905100515t1f1b7a4ie97360f6f070a053@mail.gmail.com> References: <5b8d13220905092245j3cb89b77wa7794ab138678937@mail.gmail.com> <9457e7c80905100515t1f1b7a4ie97360f6f070a053@mail.gmail.com> Message-ID: <5b8d13220905122002h71f36476h44ef041ae2fad101@mail.gmail.com> 2009/5/10 St?fan van der Walt : > > I think the message "ABI version %%x of C-API" is unclear, maybe > simply use "ABI version %%x" on its own. > > The hash file can be loaded in one line with > > np.loadtxt('/tmp/dat.dat', usecols=(0, 2), dtype=[('api', 'S10'), > ('hash', 'S32')]) > > The rest looks good. Ok, I committed the branch to numpy trunk. thanks for the review, David From cimrman3 at ntc.zcu.cz Wed May 13 08:48:37 2009 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Wed, 13 May 2009 14:48:37 +0200 Subject: [Numpy-discussion] building inplace with numpy.distutils? Message-ID: <4A0AC1A5.20404@ntc.zcu.cz> Hi (David)! I am evaluating numpy.distutils as a build/install system for my project - is it possible to build the extension modules in-place so that the project can be used without installing it? A pointer to documentation concerning this would be handy... Currently I use a regular Makefile for the build, which works quite well, but is not very portable and does not solve the package installation. Otherwise let me say that numpy.distutils work very well, much better than the plain old distutils. Best regards, r. From robince at gmail.com Wed May 13 12:39:01 2009 From: robince at gmail.com (Robin) Date: Wed, 13 May 2009 17:39:01 +0100 Subject: [Numpy-discussion] copy and paste arrays from matlab In-Reply-To: References: Message-ID: [crossposted to numpy-discussion and mlabwrap-user] Hi, Please find attached Python code for the opposite direction - ie format Python arrays for copy and pasting into an interactive Matlab session. It doesn't look as nice because newlines are row seperators in matlab so I put everything on one line. Also theres no way to input >2D arrays in Matlab that I know of without using reshape. In [286]: from mmat import mmat In [289]: x = rand(4,2) In [290]: mmat(x,'%2.3f') [ 0.897 0.074 ; 0.005 0.174 ; 0.207 0.736 ; 0.453 0.111 ] In [287]: mmat(x,'%2.3f') reshape([ [ 0.405 0.361 0.609 ; 0.249 0.275 0.620 ; 0.740 0.754 0.699 ; 0.280 0.053 0.181 ] [ 0.796 0.114 0.720 ; 0.296 0.692 0.352 ; 0.218 0.894 0.818 ; 0.709 0.946 0.860 ] ],[ 4 3 2 ]) In [288]: mmat(x) reshape([ [ 4.046905655728e-01 3.605995195844e-01 6.089653771166e-01 ; 2.491999503702e-01 2.751880043180e-01 6.199629932480e-01 ; 7.401974485581e-01 7.537929345351e-01 6.991798908866e-01 ; 2.800494872019e-01 5.258468515210e-02 1.812706305994e-01 ] [ 7.957907133899e-01 1.144010574386e-01 7.203522053853e-01 ; 2.962977637560e-01 6.920657079182e-01 3.522371076632e-01 ; 2.181950954650e-01 8.936401263709e-01 8.177351741233e-01 ; 7.092517323839e-01 9.458774967489e-01 8.595104463863e-01 ] ],[ 4 3 2 ]) Hope someone else finds it useful. Cheers Robin On Tue, May 12, 2009 at 2:12 PM, Robin wrote: > [crossposted to numpy-discussion and mlabwrap-user] > > Hi, > > I wrote a little utility class in Matlab that inherits from double and > overloads the display function so you can easily print matlab arrays > of arbitrary dimension in Numpy format for easy copy and pasting. > > I have to work a lot with other peoples code - and while mlabwrap and > reading and writing is great, sometimes I find it easier and quicker > just to copy and paste smaller arrays between interactive sessions. > > Anyway you put it in your Matlab path then you can do > x = rand(2,3,4,5); > a = array(x) > > You can specify the fprintf style format string either in the > constructor or after: > a = array(x,'%2.6f') > a.format = '%2.2f' > > eg: >>> x = rand(4,3,2); >>> array(x) > ans = > > array([[[2.071566461449581e-01, 3.501602151029837e-02], > ? ? ? ?[1.589135260727248e-01, 3.766891927380323e-01], > ? ? ? ?[8.757206127846399e-01, 7.259276565938600e-01]], > > ? ? ? [[7.570839415557700e-01, 3.974969411279816e-02], > ? ? ? ?[8.109207856487061e-01, 5.043242527988604e-01], > ? ? ? ?[6.351863794630047e-01, 7.013280585980169e-01]], > > ? ? ? [[8.863281096304466e-01, 9.885678912262633e-01], > ? ? ? ?[4.765077527169480e-01, 7.634956792870943e-01], > ? ? ? ?[9.728134909163066e-02, 4.588908258125032e-01]], > > ? ? ? [[4.722298594969571e-01, 6.861815984603373e-01], > ? ? ? ?[1.162875322461844e-01, 4.887479677951201e-02], > ? ? ? ?[9.084394562396312e-01, 5.822948089552498e-01]]]) > > It's a while since I've tried to do anything like this in Matlab and I > must admit I found it pretty painful, so I hope it can be useful to > someone else! > > I will try and do one for Python for copying and pasting to Matlab, > but I'm expecting that to be a lot easier! > > Cheers > > Robin > -------------- next part -------------- A non-text attachment was scrubbed... Name: mmat.py Type: application/octet-stream Size: 1363 bytes Desc: not available URL: From josef.pktd at gmail.com Wed May 13 13:08:36 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 13 May 2009 13:08:36 -0400 Subject: [Numpy-discussion] copy and paste arrays from matlab In-Reply-To: References: Message-ID: <1cd32cbb0905131008j542e6201o25cd77ba097055f3@mail.gmail.com> On Wed, May 13, 2009 at 12:39 PM, Robin wrote: > [crossposted to numpy-discussion and mlabwrap-user] > > Hi, > > Please find attached Python code for the opposite direction - ie > format Python arrays for copy and pasting into an interactive Matlab > session. > > It doesn't look as nice because newlines are row seperators in matlab > so I put everything on one line. Also theres no way to input >2D > arrays in Matlab that I know of without using reshape. You could use ``...`` as row continuation, and the matlab help mentions ``cat`` to build multi dimensional arrays. But cat seems to require nesting for more than 3 dimensions, so is not really an improvement to reshape. >> C = cat(4, cat(3,[1,1;2,3],[1,2;3,3]),cat(3,[1,1;2,3],[1,2;3,3])); >> size(C) ans = 2 2 2 2 Thanks, it will be useful. Josef > > In [286]: from mmat import mmat > In [289]: x = rand(4,2) > In [290]: mmat(x,'%2.3f') > [ 0.897 0.074 ; ? 0.005 0.174 ; ? 0.207 0.736 ; ? 0.453 0.111 ] > In [287]: mmat(x,'%2.3f') > reshape([ ?[ 0.405 0.361 0.609 ; ? 0.249 0.275 0.620 ; ? 0.740 0.754 > 0.699 ; ? 0.280 0.053 0.181 ] [ 0.796 0.114 0.720 ; ? 0.296 0.692 > 0.352 ; ? 0.218 0.894 0.818 ; ? 0.709 0.946 0.860 ] ],[ 4 3 2 ]) > In [288]: mmat(x) > reshape([ ?[ 4.046905655728e-01 3.605995195844e-01 6.089653771166e-01 > ; ? 2.491999503702e-01 2.751880043180e-01 6.199629932480e-01 ; > 7.401974485581e-01 7.537929345351e-01 6.991798908866e-01 ; > 2.800494872019e-01 5.258468515210e-02 1.812706305994e-01 ] [ > 7.957907133899e-01 1.144010574386e-01 7.203522053853e-01 ; > 2.962977637560e-01 6.920657079182e-01 3.522371076632e-01 ; > 2.181950954650e-01 8.936401263709e-01 8.177351741233e-01 ; > 7.092517323839e-01 9.458774967489e-01 8.595104463863e-01 ] ],[ 4 3 2 > ]) > > Hope someone else finds it useful. > > Cheers > > Robin > > On Tue, May 12, 2009 at 2:12 PM, Robin wrote: >> [crossposted to numpy-discussion and mlabwrap-user] >> >> Hi, >> >> I wrote a little utility class in Matlab that inherits from double and >> overloads the display function so you can easily print matlab arrays >> of arbitrary dimension in Numpy format for easy copy and pasting. >> >> I have to work a lot with other peoples code - and while mlabwrap and >> reading and writing is great, sometimes I find it easier and quicker >> just to copy and paste smaller arrays between interactive sessions. >> >> Anyway you put it in your Matlab path then you can do >> x = rand(2,3,4,5); >> a = array(x) >> >> You can specify the fprintf style format string either in the >> constructor or after: >> a = array(x,'%2.6f') >> a.format = '%2.2f' >> >> eg: >>>> x = rand(4,3,2); >>>> array(x) >> ans = >> >> array([[[2.071566461449581e-01, 3.501602151029837e-02], >> ? ? ? ?[1.589135260727248e-01, 3.766891927380323e-01], >> ? ? ? ?[8.757206127846399e-01, 7.259276565938600e-01]], >> >> ? ? ? [[7.570839415557700e-01, 3.974969411279816e-02], >> ? ? ? ?[8.109207856487061e-01, 5.043242527988604e-01], >> ? ? ? ?[6.351863794630047e-01, 7.013280585980169e-01]], >> >> ? ? ? [[8.863281096304466e-01, 9.885678912262633e-01], >> ? ? ? ?[4.765077527169480e-01, 7.634956792870943e-01], >> ? ? ? ?[9.728134909163066e-02, 4.588908258125032e-01]], >> >> ? ? ? [[4.722298594969571e-01, 6.861815984603373e-01], >> ? ? ? ?[1.162875322461844e-01, 4.887479677951201e-02], >> ? ? ? ?[9.084394562396312e-01, 5.822948089552498e-01]]]) >> >> It's a while since I've tried to do anything like this in Matlab and I >> must admit I found it pretty painful, so I hope it can be useful to >> someone else! >> >> I will try and do one for Python for copying and pasting to Matlab, >> but I'm expecting that to be a lot easier! >> >> Cheers >> >> Robin >> > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From strozzi2 at llnl.gov Wed May 13 16:18:45 2009 From: strozzi2 at llnl.gov (David J Strozzi) Date: Wed, 13 May 2009 13:18:45 -0700 Subject: [Numpy-discussion] (no subject) Message-ID: Hi, [You may want to edit the numpy homepage numpy.scipy.org to tell people they must subscribe to post, and adding a link to http://www.scipy.org/Mailing_Lists] Many of you probably know of the interpreter yorick by Dave Munro. As a Livermoron, I use it all the time. There are some built-in functions there, analogous to but above and beyond numpy's sum() and diff(), which are quite useful for common operations on gridded data. Of course one can write their own, but maybe they should be cleanly canonized? For instance: x = linspace(0,10,10) y = sin(x) It is common, say when integrating y(x), to take "point-centered" data and want to zone-center it: I = sum(zcen(y)*diff(x)) def zcen(x): return 0.5*(x[0:-1]+x[1:]) Besides zcen, yorick has builtins for "point centering", "un-zone centering," etc. Also, due to its slick syntax you can give these things as array "indexes": x(zcen), y(dif), z(:,sum,:) Just some thoughts, David Strozzi From pgmdevlist at gmail.com Wed May 13 18:35:21 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 13 May 2009 18:35:21 -0400 Subject: [Numpy-discussion] Are masked arrays slower for processing than ndarrays? In-Reply-To: <4A061D33.7080704@hawaii.edu> References: <4A061A86.5020207@hawaii.edu> <4A061D33.7080704@hawaii.edu> Message-ID: <44ACCB4C-0BD6-4D82-B052-B779CEB6ED12@gmail.com> All, I just committed (r6994) some modifications to numpy.ma.getdata (Eric Firing's patch) and to the ufunc wrappers that were too slow with large arrays. We're roughly 3 times faster than we used to, but still slower than the equivalent classic ufuncs (no surprise here). Here's the catch: it's basically cheating. I got rid of the pre- processing (where a mask was calculated depending on the domain and the input set to a filling value depending on this mask, before the actual computation). Instead, I force np.seterr(divide='ignore',invalid='ignore') before calling the ufunc on the .data part, then mask the invalid values (if any) and reset the corresponding entries in .data to the input. Finally, I reset the error status. All in all, we're still data-friendly, meaning that the value below a masked entry is the same as the input, but we can't say that values initially masked are discarded (they're used in the computation but reset to their initial value)... This playing around with the error status may (or may not, I don't know) cause some problems down the road. It's still faaar faster than computing the domain (especially _DomainSafeDivide) when the inputs are large... I'd be happy if you could give it a try and send some feedback. Cheers P. On May 9, 2009, at 8:17 PM, Eric Firing wrote: > Eric Firing wrote: > > Pierre, > > ... I pressed "send" too soon. There are test failures with the > patch I attached to my last message. I think the basic ideas are > correct, but evidently there are wrinkles to be worked out. Maybe > putmask() has to be used instead of where() (putmask is much faster) > to maintain the ability to do *= and similar, and maybe there are > other adjustments. Somehow, though, it should be possible to get > decent speed for simple multiplication and division; a 10x penalty > relative to ndarray operations is just too much. > > Eric > > >> Eli Bressert wrote: >>> Hi, >>> >>> I'm using masked arrays to compute large-scale standard deviation, >>> multiplication, gaussian, and weighted averages. At first I thought >>> using the masked arrays would be a great way to sidestep looping >>> (which it is), but it's still slower than expected. Here's a snippet >>> of the code that I'm using it for. >> [...] >>> # Like the spatial_weight section, this takes about 20 seconds >>> W = spatial_weight / Rho2 >>> >>> # Takes less than one second. >>> Ave = np.average(av_good,axis=1,weights=W) >>> >>> Any ideas on why it would take such a long time for processing? >> A part of the slowdown is what looks to me like unnecessary copying >> in _MaskedBinaryOperation.__call__. It is using getdata, which >> applies numpy.array to its input, forcing a copy. I think the copy >> is actually unintentional, in at least one sense, and possibly two: >> first, because the default argument of getattr is always evaluated, >> even if it is not needed; and second, because the call to np.array >> is used where np.asarray or equivalent would suffice. >> The first file attached below shows the kernprof in the case of >> multiplying two masked arrays, shape (100000,50), with no masked >> elements; 2/3 of the time is taken copying the data. >> Now, if there are actually masked elements in the arrays, it gets >> much worse: see the second attachment. The total time has >> increased by more than a factor of 3, and the culprit is >> numpy.which(), a very slow function. It looks to me like it is >> doing nothing useful at all; the numpy binary operation is still >> being executed for all elements, regardless of mask, contrary to >> the intention implied by the comment in the code. >> The third attached file has a patch that fixes the getdata problem >> and eliminates the which(). >> With this patch applied we get the profile in the 4th file, to be >> compared to the second profile. Much better. I am pretty sure it >> could still be sped up quite a bit, though. It looks like the >> masks are essentially being calculated twice for no good reason, >> but I don't completely understand all the mask considerations, so >> at this point I am not trying to fix that problem. >> Eric >>> Especially the spatial_weight and W variables? Would there be a >>> faster >>> way to do this? Or is there a way that numpy.std can process ignore >>> nan's when processing? >>> >>> Thanks, >>> >>> Eli Bressert >>> _______________________________________________ >>> Numpy-discussion mailing list >>> Numpy-discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> ------------------------------------------------------------------------ >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > From stefan at sun.ac.za Wed May 13 19:12:26 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 14 May 2009 01:12:26 +0200 Subject: [Numpy-discussion] Are masked arrays slower for processing than ndarrays? In-Reply-To: <44ACCB4C-0BD6-4D82-B052-B779CEB6ED12@gmail.com> References: <4A061A86.5020207@hawaii.edu> <4A061D33.7080704@hawaii.edu> <44ACCB4C-0BD6-4D82-B052-B779CEB6ED12@gmail.com> Message-ID: <9457e7c80905131612m2cd32374gcca2b7da4e415e12@mail.gmail.com> Hi Pierre 2009/5/14 Pierre GM : > This playing around with the error status may (or may not, I don't > know) cause some problems down the road. I see the buildbot is complaining on SPARC. Not sure if it is complaining about your commit, but might be worth checking out nontheless. Cheers St?fan From dwf at cs.toronto.edu Wed May 13 19:18:21 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Wed, 13 May 2009 19:18:21 -0400 Subject: [Numpy-discussion] FAIL: Test bug in reduceat with structured arrays In-Reply-To: References: Message-ID: <6D2BEA67-AB1B-4050-846D-072BA4D3118C@cs.toronto.edu> On 11-May-09, at 10:55 AM, Pauli Virtanen wrote: > Wonder why buildbot's 64-bit SPARC boxes don't see this if it's > something > connected to 64-bitness... Different endianness, maybe? That seems even weirder, honestly. David From mattknox.ca at gmail.com Wed May 13 19:36:30 2009 From: mattknox.ca at gmail.com (Matt Knox) Date: Wed, 13 May 2009 23:36:30 +0000 (UTC) Subject: [Numpy-discussion] =?utf-8?q?Are_masked_arrays_slower_for_process?= =?utf-8?q?ing_than=09ndarrays=3F?= References: <4A061A86.5020207@hawaii.edu> <4A061D33.7080704@hawaii.edu> <44ACCB4C-0BD6-4D82-B052-B779CEB6ED12@gmail.com> Message-ID: Hi Pierre, > Here's the catch: it's basically cheating. I got rid of the pre- > processing (where a mask was calculated depending on the domain and > the input set to a filling value depending on this mask, before the > actual computation). Instead, I force > np.seterr(divide='ignore',invalid='ignore') before calling the ufunc This isn't a thread safe approach and could cause wierd side effects in a multi-threaded application. I think modifying global options/variables inside any function where it generally wouldn't be expected by the user is a bad idea. - Matt From pgmdevlist at gmail.com Wed May 13 19:47:24 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 13 May 2009 19:47:24 -0400 Subject: [Numpy-discussion] Are masked arrays slower for processing than ndarrays? In-Reply-To: References: <4A061A86.5020207@hawaii.edu> <4A061D33.7080704@hawaii.edu> <44ACCB4C-0BD6-4D82-B052-B779CEB6ED12@gmail.com> Message-ID: On May 13, 2009, at 7:36 PM, Matt Knox wrote: > >> Here's the catch: it's basically cheating. I got rid of the pre- >> processing (where a mask was calculated depending on the domain and >> the input set to a filling value depending on this mask, before the >> actual computation). Instead, I force >> np.seterr(divide='ignore',invalid='ignore') before calling the ufunc > > This isn't a thread safe approach and could cause wierd side effects > in a > multi-threaded application. I think modifying global options/ > variables inside > any function where it generally wouldn't be expected by the user is > a bad idea. Whine. I was afraid of something like that... 2 options, then: * We revert to computing a mask beforehand. That looks like the part that takes the most time w/ domained operations (according to Robert K's profiler. Robert, you deserve a statue for this tool). And that doesn't solve the pb of power, anyway: how do you compute the domain of power ? * We reimplement masked versions of the ufuncs in C. Won't happen from me anytime soon (this fall or winter, maybe...) Also, importing numpy.ma currently calls numpy.seterr(all='ignore') anyway... So that's a -1 from Matt. Anybody else ? From matthew.brett at gmail.com Wed May 13 19:53:23 2009 From: matthew.brett at gmail.com (Matthew Brett) Date: Wed, 13 May 2009 16:53:23 -0700 Subject: [Numpy-discussion] Are masked arrays slower for processing than ndarrays? In-Reply-To: References: <4A061A86.5020207@hawaii.edu> <4A061D33.7080704@hawaii.edu> <44ACCB4C-0BD6-4D82-B052-B779CEB6ED12@gmail.com> Message-ID: <1e2af89e0905131653p50d9b546l9423a7bc47a655fd@mail.gmail.com> Hi, > Whine. I was afraid of something like that... > 2 options, then: > * We revert to computing a mask beforehand. That looks like the part > that takes the most time w/ domained operations (according to Robert > K's profiler. Robert, you deserve a statue for this tool). And that > doesn't solve the pb of power, anyway: how do you compute the domain > of power ? > * We reimplement masked versions of the ufuncs in C. Won't happen from > me anytime soon (this fall or winter, maybe...) > Also, importing numpy.ma currently calls numpy.seterr(all='ignore') > anyway... I'm afraid I don't know the code at all, so count this as seems good, but I had the feeling that the change is good for speed but possibly bad for stability / readability? In that case it seems right not to do that, and wait until someone needs speed enough to write it in C or similar... Best, Matthew From robert.kern at gmail.com Wed May 13 19:53:37 2009 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 13 May 2009 18:53:37 -0500 Subject: [Numpy-discussion] Are masked arrays slower for processing than ndarrays? In-Reply-To: References: <4A061A86.5020207@hawaii.edu> <4A061D33.7080704@hawaii.edu> <44ACCB4C-0BD6-4D82-B052-B779CEB6ED12@gmail.com> Message-ID: <3d375d730905131653l295eea56o8c3e49109c9bbfde@mail.gmail.com> On Wed, May 13, 2009 at 18:36, Matt Knox wrote: > Hi Pierre, > >> Here's the catch: it's basically cheating. I got rid of the pre- >> processing (where a mask was calculated depending on the domain and >> the input set to a filling value depending on this mask, before the >> actual computation). Instead, I ?force >> np.seterr(divide='ignore',invalid='ignore') before calling the ufunc > > This isn't a thread safe approach and could cause wierd side effects in a > multi-threaded application. I think modifying global options/variables inside > any function where it generally wouldn't be expected by the user is a bad idea. seterr() uses thread-local storage. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From mattknox.ca at gmail.com Wed May 13 20:07:03 2009 From: mattknox.ca at gmail.com (Matt Knox) Date: Thu, 14 May 2009 00:07:03 +0000 (UTC) Subject: [Numpy-discussion] =?utf-8?q?Are_masked_arrays_slower_for_process?= =?utf-8?q?ing_than=09ndarrays=3F?= References: <4A061A86.5020207@hawaii.edu> <4A061D33.7080704@hawaii.edu> <44ACCB4C-0BD6-4D82-B052-B779CEB6ED12@gmail.com> <3d375d730905131653l295eea56o8c3e49109c9bbfde@mail.gmail.com> Message-ID: > Robert Kern gmail.com> writes: > > seterr() uses thread-local storage. Oh. I stand corrected. Ignore my earlier objections then. > Pierre GM gmail.com> writes: > > Also, importing numpy.ma currently calls numpy.seterr(all='ignore') > anyway... hmm. While this doesn't affect me personally... I wonder if everyone is aware of this. Importing modules generally shouldn't have side effects either I would think. Has this always been the case for the masked array module? - Matt From pgmdevlist at gmail.com Wed May 13 20:22:42 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 13 May 2009 20:22:42 -0400 Subject: [Numpy-discussion] Are masked arrays slower for processing than ndarrays? In-Reply-To: References: <4A061A86.5020207@hawaii.edu> <4A061D33.7080704@hawaii.edu> <44ACCB4C-0BD6-4D82-B052-B779CEB6ED12@gmail.com> <3d375d730905131653l295eea56o8c3e49109c9bbfde@mail.gmail.com> Message-ID: On May 13, 2009, at 8:07 PM, Matt Knox wrote: > > hmm. While this doesn't affect me personally... I wonder if everyone > is aware of > this. Importing modules generally shouldn't have side effects either > I would > think. Has this always been the case for the masked array module? Well, can't remember, actually... I was indeed surprised to see it was there. I guess I must have added when working on the power section. I will get of rid on the next commit, this is clearly bad practice from my part. Bad, bad Pierre. From charlesr.harris at gmail.com Wed May 13 21:38:20 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 13 May 2009 19:38:20 -0600 Subject: [Numpy-discussion] FAIL: Test bug in reduceat with structured arrays In-Reply-To: <6D2BEA67-AB1B-4050-846D-072BA4D3118C@cs.toronto.edu> References: <6D2BEA67-AB1B-4050-846D-072BA4D3118C@cs.toronto.edu> Message-ID: On Wed, May 13, 2009 at 5:18 PM, David Warde-Farley wrote: > On 11-May-09, at 10:55 AM, Pauli Virtanen wrote: > > > Wonder why buildbot's 64-bit SPARC boxes don't see this if it's > > something > > connected to 64-bitness... > > Different endianness, maybe? That seems even weirder, honestly. > I managed an error on 32 bit fedora, but it was a oneoff sort of thing. I'll see if it shows again. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Wed May 13 21:42:42 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Thu, 14 May 2009 10:42:42 +0900 Subject: [Numpy-discussion] building inplace with numpy.distutils? In-Reply-To: <4A0AC1A5.20404@ntc.zcu.cz> References: <4A0AC1A5.20404@ntc.zcu.cz> Message-ID: <4A0B7712.4040600@ar.media.kyoto-u.ac.jp> Robert Cimrman wrote: > Hi (David)! > > I am evaluating numpy.distutils as a build/install system for my project > - is it possible to build the extension modules in-place so that the > project can be used without installing it? A pointer to documentation > concerning this would be handy... Currently I use a regular Makefile for > the build, which works quite well, but is not very portable and does not > solve the package installation. > > Otherwise let me say that numpy.distutils work very well, much better > than the plain old distutils. > In-place builds can be setup with the -i option: python setup.py build_ext -i I think it is a plain distutils option. cheers, David From glenn at tarbox.org Thu May 14 00:50:21 2009 From: glenn at tarbox.org (Glenn Tarbox, PhD) Date: Wed, 13 May 2009 21:50:21 -0700 Subject: [Numpy-discussion] numpy slices limited to 32 bit values? Message-ID: I'm using the latest version of Sage (3.4.2) which is python 2.5 and numpy something or other (I will do more digging presently) I'm able to map large files and access all the elements unless I'm using slices so, for example: fp = np.memmap("/mnt/hdd/data/mmap/numpy1e10.mmap", dtype='float64', mode='r+', shape=(10000000000,)) which is 1e10 doubles if you don't wanna count the zeros gives full access to a 75 GB memory image But when I do: fp[:] = 1.0 np.sum(fp) I get 1410065408.0 as the result Interestingly, I can do: fp[9999999999] = 3.0 and get the proper result stored and can read it back. So, it appears to me that slicing is limited to 32 bit values Trying to push it a bit, I tried making my own slice myslice = slice(1410065408, 9999999999) and using it like fp[myslice]=1.0 but it returns immediately having changed nothing. The slice creation "appears" to work in that I can get the values back out and all... but inside numpy it seems to get thrown out. My guess is that internally the python slice in 2.5 is 32 bit even on my 64 bit version of python / numpy. The good news is that it looks like the hard stuff (i.e. very large mmaped files) work... but slicing is, for some reason, limited to 32 bits. Am I missing something? -glenn -- Glenn H. Tarbox, PhD || 206-274-6919 http://www.tarbox.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From cimrman3 at ntc.zcu.cz Thu May 14 01:57:20 2009 From: cimrman3 at ntc.zcu.cz (Robert Cimrman) Date: Thu, 14 May 2009 07:57:20 +0200 Subject: [Numpy-discussion] building inplace with numpy.distutils? In-Reply-To: <4A0B7712.4040600@ar.media.kyoto-u.ac.jp> References: <4A0AC1A5.20404@ntc.zcu.cz> <4A0B7712.4040600@ar.media.kyoto-u.ac.jp> Message-ID: <4A0BB2C0.4040001@ntc.zcu.cz> David Cournapeau wrote: > Robert Cimrman wrote: >> Hi (David)! >> >> I am evaluating numpy.distutils as a build/install system for my project >> - is it possible to build the extension modules in-place so that the >> project can be used without installing it? A pointer to documentation >> concerning this would be handy... Currently I use a regular Makefile for >> the build, which works quite well, but is not very portable and does not >> solve the package installation. >> >> Otherwise let me say that numpy.distutils work very well, much better >> than the plain old distutils. >> > > In-place builds can be setup with the -i option: > > python setup.py build_ext -i > > I think it is a plain distutils option. I have tried python setup.py build --inplace which did not work, and --help helped neither, that is why I asked here. But I was close :) thank you! r. From charlesr.harris at gmail.com Thu May 14 02:04:31 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 14 May 2009 00:04:31 -0600 Subject: [Numpy-discussion] numpy slices limited to 32 bit values? In-Reply-To: References: Message-ID: On Wed, May 13, 2009 at 10:50 PM, Glenn Tarbox, PhD wrote: > I'm using the latest version of Sage (3.4.2) which is python 2.5 and numpy > something or other (I will do more digging presently) > > I'm able to map large files and access all the elements unless I'm using > slices > > so, for example: > > fp = np.memmap("/mnt/hdd/data/mmap/numpy1e10.mmap", dtype='float64', > mode='r+', shape=(10000000000,)) > > which is 1e10 doubles if you don't wanna count the zeros > > gives full access to a 75 GB memory image > > But when I do: > > fp[:] = 1.0 > np.sum(fp) > > I get 1410065408.0 as the result > As doubles, that is more than 2**33 bytes, so I expect there is something else going on. How much physical memory/swap memory do you have? This could also be a python problem since python does the memmap. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From glenn at tarbox.org Thu May 14 02:22:41 2009 From: glenn at tarbox.org (Glenn Tarbox, PhD) Date: Wed, 13 May 2009 23:22:41 -0700 Subject: [Numpy-discussion] numpy slices limited to 32 bit values? In-Reply-To: References: Message-ID: On Wed, May 13, 2009 at 11:04 PM, Charles R Harris < charlesr.harris at gmail.com> wrote: > > > On Wed, May 13, 2009 at 10:50 PM, Glenn Tarbox, PhD wrote: > >> I'm using the latest version of Sage (3.4.2) which is python 2.5 and numpy >> something or other (I will do more digging presently) >> >> I'm able to map large files and access all the elements unless I'm using >> slices >> >> so, for example: >> >> fp = np.memmap("/mnt/hdd/data/mmap/numpy1e10.mmap", dtype='float64', >> mode='r+', shape=(10000000000,)) >> >> which is 1e10 doubles if you don't wanna count the zeros >> >> gives full access to a 75 GB memory image >> >> But when I do: >> >> fp[:] = 1.0 >> np.sum(fp) >> >> I get 1410065408.0 as the result >> > > As doubles, that is more than 2**33 bytes, so I expect there is something > else going on. How much physical memory/swap memory do you have? This could > also be a python problem since python does the memmap. > I've been working on some other things lately and that number seemed related to 2^32... now that I look more closely, I don't know where that number comes from. To your question, I have 32GB of RAM and virtually nothing else running... Top tells me I'm getting between 96% and 98% for this process which seems about right. Here's the thing. When I create the mmap file, I get the right number of bytes. I can, from what I can tell, update individual values within the array (I'm gonna bang on it a bit more with some other scripts) Its only when using slicing that things get strange (he says having not really done a more thorough test) Of course, I was assuming this is a 32 bit thing... but you're right... where did that result come from??? The other clue here is that when I create my own slice (as described above) it returns instantly... numpy doesn't throw an error but it doesn't do anything with the slice either. Since I'm IO bound anyways, maybe i'll just write a loop and see if I can't set all the values. The machine could use a little exercise anyways. -glenn > > Chuck > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Glenn H. Tarbox, PhD || 206-274-6919 http://www.tarbox.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From glenn at tarbox.org Thu May 14 04:31:45 2009 From: glenn at tarbox.org (Glenn Tarbox, PhD) Date: Thu, 14 May 2009 01:31:45 -0700 Subject: [Numpy-discussion] numpy slices limited to 32 bit values? In-Reply-To: References: Message-ID: On Wed, May 13, 2009 at 11:22 PM, Glenn Tarbox, PhD wrote: > > > On Wed, May 13, 2009 at 11:04 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Wed, May 13, 2009 at 10:50 PM, Glenn Tarbox, PhD wrote: >> >>> I'm using the latest version of Sage (3.4.2) which is python 2.5 and >>> numpy something or other (I will do more digging presently) >>> >>> I'm able to map large files and access all the elements unless I'm using >>> slices >>> >>> so, for example: >>> >>> fp = np.memmap("/mnt/hdd/data/mmap/numpy1e10.mmap", dtype='float64', >>> mode='r+', shape=(10000000000,)) >>> >>> which is 1e10 doubles if you don't wanna count the zeros >>> >>> gives full access to a 75 GB memory image >>> >>> But when I do: >>> >>> fp[:] = 1.0 >>> np.sum(fp) >>> >>> I get 1410065408.0 as the result >>> >> >> As doubles, that is more than 2**33 bytes, so I expect there is something >> else going on. How much physical memory/swap memory do you have? This could >> also be a python problem since python does the memmap. >> > > I've been working on some other things lately and that number seemed > related to 2^32... now that I look more closely, I don't know where that > number comes from. > > To your question, I have 32GB of RAM and virtually nothing else running... > Top tells me I'm getting between 96% and 98% for this process which seems > about right. > > Here's the thing. When I create the mmap file, I get the right number of > bytes. I can, from what I can tell, update individual values within the > array (I'm gonna bang on it a bit more with some other scripts) > > Its only when using slicing that things get strange (he says having not > really done a more thorough test) > > Of course, I was assuming this is a 32 bit thing... but you're right... > where did that result come from??? > > The other clue here is that when I create my own slice (as described above) > it returns instantly... numpy doesn't throw an error but it doesn't do > anything with the slice either. > > Since I'm IO bound anyways, maybe i'll just write a loop and see if I can't > set all the values. The machine could use a little exercise anyways. > I ran the following test: import numpy as np size=10000000000 fp = np.memmap("/mnt/hdd/data/mmap/numpy1e10.mmap", dtype='float64', mode='r+', shape=(size,)) for i in xrange(size): fp[i]=1.0 time np.sum(fp) 10000000000.0 Time: CPU 188.36 s, Wall: 884.33 s So, everything seems to be working and it kinda makes sense. The sum should be IO bound which it is. I didn't time the loop but it took a while (maybe 30 minutes) and it was compute bound. To make sure, I exited the program and ran everything but the initialization loop. import numpy as np size=10000000000 fp = np.memmap("/mnt/hdd/data/mmap/numpy1e10.mmap", dtype='float64', mode='r+', shape=(size,) time np.sum(fp) 10000000000.0 Time: CPU 180.02 s, Wall: 854.72 s I was a little surprised that it didn't take longer given almost half of the mmap'ed data should have been resident in the sum performed immediately after initialization, but since it needed to start at the beginning and only had the second half in memory, it makes sense So, it "appears" as though the mmap works but there's something strange with slices going on. -glenn > > >> >> Chuck >> >> >> >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > Glenn H. Tarbox, PhD || 206-274-6919 > http://www.tarbox.org > -- Glenn H. Tarbox, PhD || 206-274-6919 http://www.tarbox.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Thu May 14 04:43:35 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 14 May 2009 10:43:35 +0200 Subject: [Numpy-discussion] numpy slices limited to 32 bit values? In-Reply-To: References: Message-ID: <20090514084335.GD32437@phare.normalesup.org> On Thu, May 14, 2009 at 01:31:45AM -0700, Glenn Tarbox, PhD wrote: > I've been working on some other things lately and that number seemed > related to 2^32... now that I look more closely, I don't know where that > number comes from. Is your OS 64bit? Ga?l From glenn at tarbox.org Thu May 14 05:13:23 2009 From: glenn at tarbox.org (Glenn Tarbox, PhD) Date: Thu, 14 May 2009 02:13:23 -0700 Subject: [Numpy-discussion] numpy slices limited to 32 bit values? In-Reply-To: <20090514084335.GD32437@phare.normalesup.org> References: <20090514084335.GD32437@phare.normalesup.org> Message-ID: On Thu, May 14, 2009 at 1:43 AM, Gael Varoquaux < gael.varoquaux at normalesup.org> wrote: > On Thu, May 14, 2009 at 01:31:45AM -0700, Glenn Tarbox, PhD wrote: > > I've been working on some other things lately and that number seemed > > related to 2^32... now that I look more closely, I don't know where > that > > number comes from. > > Is your OS 64bit? Yes, Ubuntu 9.04 x86_64 Linux hq2 2.6.28-11-server #42-Ubuntu SMP Fri Apr 17 02:45:36 UTC 2009 x86_64 GNU/Linux -glenn > > Ga?l > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Glenn H. Tarbox, PhD || 206-274-6919 http://www.tarbox.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Thu May 14 05:16:17 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 14 May 2009 11:16:17 +0200 Subject: [Numpy-discussion] numpy slices limited to 32 bit values? In-Reply-To: References: <20090514084335.GD32437@phare.normalesup.org> Message-ID: <20090514091617.GF32437@phare.normalesup.org> On Thu, May 14, 2009 at 02:13:23AM -0700, Glenn Tarbox, PhD wrote: > On Thu, May 14, 2009 at 1:43 AM, Gael Varoquaux > <[1]gael.varoquaux at normalesup.org> wrote: > On Thu, May 14, 2009 at 01:31:45AM -0700, Glenn Tarbox, PhD wrote: > > ? ? ?I've been working on some other things lately and that number > seemed > > ? ? ?related to 2^32... now that I look more closely, I don't know > where that > > ? ? ?number comes from. > Is your OS 64bit? > Yes, Ubuntu 9.04 x86_64 Hum, I am wondering: could it be that Sage has not been compiled in 64bits? That number '32' seems to me to point toward a 32bit pointer issue (I may be wrong). Ga?l From sebastian.walter at gmail.com Thu May 14 05:19:03 2009 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Thu, 14 May 2009 11:19:03 +0200 Subject: [Numpy-discussion] (no subject) In-Reply-To: References: Message-ID: On Wed, May 13, 2009 at 10:18 PM, David J Strozzi wrote: > Hi, > > [You may want to edit the numpy homepage numpy.scipy.org to tell > people they must subscribe to post, and adding a link to > http://www.scipy.org/Mailing_Lists] > > > Many of you probably know of the interpreter yorick by Dave Munro. As > a Livermoron, I use it all the time. Never heard of it... what does it do? By the sound of it, yorick is an interpreted language like Python. > There are some built-in > functions there, analogous to but above and beyond numpy's sum() and > diff(), which are quite useful for common operations on gridded data. > Of course one can write their own, but maybe they should be cleanly > canonized? > > For instance: > > x = linspace(0,10,10) > y = sin(x) > > It is common, say when integrating y(x), to take "point-centered" > data and want to zone-center it: > > I = sum(zcen(y)*diff(x)) > > def zcen(x): return 0.5*(x[0:-1]+x[1:]) > > Besides zcen, yorick has builtins for "point centering", "un-zone > centering," etc. Also, due to its slick syntax you can give these > things as array "indexes": > > x(zcen), y(dif), z(:,sum,:) > > > Just some thoughts, > David Strozzi > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From emerald.jasper at yahoo.co.uk Thu May 14 06:26:25 2009 From: emerald.jasper at yahoo.co.uk (Emerald Jasper) Date: Thu, 14 May 2009 10:26:25 +0000 (GMT) Subject: [Numpy-discussion] Non-linear optimization in python Message-ID: <342303.58705.qm@web23902.mail.ird.yahoo.com> Dear python user! Please, instruct me how to make non-linear optimization using numpy/simpy in python? Thank you very much in the advance, Emerald from Japan -------------- next part -------------- An HTML attachment was scrubbed... URL: From matthieu.brucher at gmail.com Thu May 14 06:51:29 2009 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Thu, 14 May 2009 12:51:29 +0200 Subject: [Numpy-discussion] Non-linear optimization in python In-Reply-To: <342303.58705.qm@web23902.mail.ird.yahoo.com> References: <342303.58705.qm@web23902.mail.ird.yahoo.com> Message-ID: Hi, You have several choices: - using scipy.optimize - openopt - the old openopt scikit that contains a generic optimization framework. Did you try one of these? 2009/5/14 Emerald Jasper : > Dear python user! > Please, instruct me how to make non-linear optimization using numpy/simpy in > python? > Thank you very much in the advance, > Emerald > from Japan > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher From pav at iki.fi Thu May 14 06:54:50 2009 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 14 May 2009 10:54:50 +0000 (UTC) Subject: [Numpy-discussion] Indexing with callables (was: Yorick-like functionality) References: Message-ID: Wed, 13 May 2009 13:18:45 -0700, David J Strozzi kirjoitti: [clip] > Many of you probably know of the interpreter yorick by Dave Munro. As a > Livermoron, I use it all the time. There are some built-in functions > there, analogous to but above and beyond numpy's sum() and diff(), which > are quite useful for common operations on gridded data. Of course one > can write their own, but maybe they should be cleanly canonized? +0 from me for zcen and other, having small functions probably won't hurt much [clip] > Besides zcen, yorick has builtins for "point centering", "un-zone > centering," etc. Also, due to its slick syntax you can give these > things as array "indexes": > > x(zcen), y(dif), z(:,sum,:) I think you can easily subclass numpy.ndarray to offer the same feature, see below. I don't know if we want to add this feature (indexing with callables) to the Numpy's fancy indexing itself. Thoughts? ----- import numpy as np import inspect class YNdarray(np.ndarray): """ A subclass of ndarray that implements Yorick-like indexing with functions. Beware: not adequately tested... """ def __getitem__(self, key_): if not isinstance(key_, tuple): key = (key_,) scalar_key = True else: key = key_ scalar_key = False key = list(key) # expand ellipsis manually while Ellipsis in key: j = key.index(Ellipsis) key[j:j+1] = [slice(None)] * (self.ndim - len(key)) # handle reducing or mutating callables arr = self new_key = [] real_axis = 0 for j, v in enumerate(key): if callable(v): arr2 = self._reduce_axis(arr, v, real_axis) new_key.extend([slice(None)] * (arr2.ndim - arr.ndim + 1)) arr = arr2 elif v is not None: real_axis += 1 new_key.append(v) else: new_key.append(v) # final get if scalar_key: return np.ndarray.__getitem__(arr, new_key[0]) else: return np.ndarray.__getitem__(arr, tuple(new_key)) def _reduce_axis(self, arr, func, axis): return func(arr, axis=axis) x = np.arange(2*3*4).reshape(2,3,4).view(YNdarray) # Now, assert np.allclose(x[np.sum,...], np.sum(x, axis=0)) assert np.allclose(x[:,np.sum,:], np.sum(x, axis=1)) assert np.allclose(x[:,:,np.sum], np.sum(x, axis=2)) assert np.allclose(x[:,np.sum,None,np.sum], x.sum(axis=1).sum(axis=1)[:,None]) def get(v, s, axis=0): """Index `v` with slice `s` along given axis""" ix = [slice(None)] * v.ndim ix[axis] = s return v[ix] def drop_last(v, axis=0): """Remove one element from given array in given dimension""" return get(v, slice(None, -1), axis) assert np.allclose(x[:,drop_last,:], x[:,:-1,:]) def zcen(v, axis=0): return .5*(get(v, slice(None,-1), axis) + get(v, slice(1,None), axis)) assert np.allclose(x[0,1,zcen], .5*(x[0,1,1:] + x[0,1,:-1])) def append_one(v, axis=0): """Append one element to the given array in given dimension, fill with ones""" new_shape = list(v.shape) new_shape[axis] += 1 v2 = np.empty(new_shape, dtype=v.dtype) get(v2, slice(None, -1), axis)[:] = v get(v2, -1, axis)[:] = 1 return v2 assert np.allclose(x[:,np.diff,0], np.diff(x.view(np.ndarray)[:,:,0], axis=1)) assert np.allclose(x[0,append_one,:], [[0,1,2,3], [4,5,6,7], [8,9,10,11], [1,1,1,1]]) assert np.allclose(x[:,append_one,0], [[0,4,8,1], [12,16,20,1]]) From emerald.jasper at yahoo.co.uk Thu May 14 07:06:37 2009 From: emerald.jasper at yahoo.co.uk (Emerald Jasper) Date: Thu, 14 May 2009 11:06:37 +0000 (GMT) Subject: [Numpy-discussion] Numpy-discussion Digest, Vol 32, Issue 39 Message-ID: <19251.97438.qm@web23908.mail.ird.yahoo.com> Hi, Actually, I am quite new in programming, so could you please send me the syntax so that I can use in my research. thank you so much. --- On Thu, 14/5/09, numpy-discussion-request at scipy.org wrote: From: numpy-discussion-request at scipy.org Subject: Numpy-discussion Digest, Vol 32, Issue 39 To: numpy-discussion at scipy.org Date: Thursday, 14 May, 2009, 10:51 AM Send Numpy-discussion mailing list submissions to ??? numpy-discussion at scipy.org To subscribe or unsubscribe via the World Wide Web, visit ??? http://mail.scipy.org/mailman/listinfo/numpy-discussion or, via email, send a message with subject or body 'help' to ??? numpy-discussion-request at scipy.org You can reach the person managing the list at ??? numpy-discussion-owner at scipy.org When replying, please edit your Subject line so it is more specific than "Re: Contents of Numpy-discussion digest..." Today's Topics: ???1. Re: numpy slices limited to 32 bit values? (Glenn Tarbox, PhD) ???2. Re: numpy slices limited to 32 bit values? (Gael Varoquaux) ???3. Re: numpy slices limited to 32 bit values? (Glenn Tarbox, PhD) ???4. Re: numpy slices limited to 32 bit values? (Gael Varoquaux) ???5. Re: (no subject) (Sebastian Walter) ???6. Non-linear optimization in python (Emerald Jasper) ???7. Re: Non-linear optimization in python (Matthieu Brucher) ---------------------------------------------------------------------- Message: 1 Date: Thu, 14 May 2009 01:31:45 -0700 From: "Glenn Tarbox, PhD" Subject: Re: [Numpy-discussion] numpy slices limited to 32 bit values? To: Discussion of Numerical Python Message-ID: ??? Content-Type: text/plain; charset="iso-8859-1" On Wed, May 13, 2009 at 11:22 PM, Glenn Tarbox, PhD wrote: > > > On Wed, May 13, 2009 at 11:04 PM, Charles R Harris < > charlesr.harris at gmail.com> wrote: > >> >> >> On Wed, May 13, 2009 at 10:50 PM, Glenn Tarbox, PhD wrote: >> >>> I'm using the latest version of Sage (3.4.2) which is python 2.5 and >>> numpy something or other (I will do more digging presently) >>> >>> I'm able to map large files and access all the elements unless I'm using >>> slices >>> >>> so, for example: >>> >>> fp = np.memmap("/mnt/hdd/data/mmap/numpy1e10.mmap", dtype='float64', >>> mode='r+', shape=(10000000000,)) >>> >>> which is 1e10 doubles if you don't wanna count the zeros >>> >>> gives full access to a 75 GB memory image >>> >>> But when I do: >>> >>> fp[:] = 1.0 >>> np.sum(fp) >>> >>> I get 1410065408.0? as the result >>> >> >> As doubles, that is more than 2**33 bytes, so I expect there is something >> else going on. How much physical memory/swap memory do you have? This could >> also be a python problem since python does the memmap. >> > > I've been working on some other things lately and that number seemed > related to 2^32... now that I look more closely, I don't know where that > number comes from. > > To your question, I have 32GB of RAM and virtually nothing else running... > Top tells me I'm getting between 96% and 98% for this process which seems > about right. > > Here's the thing.? When I create the mmap file, I get the right number of > bytes.? I can, from what I can tell, update individual values within the > array (I'm gonna bang on it a bit more with some other scripts) > > Its only when using slicing that things get strange (he says having not > really done a more thorough test) > > Of course, I was assuming this is a 32 bit thing... but you're right... > where did that result come from??? > > The other clue here is that when I create my own slice (as described above) > it returns instantly... numpy doesn't throw an error but it doesn't do > anything with the slice either. > > Since I'm IO bound anyways, maybe i'll just write a loop and see if I can't > set all the values.? The machine could use a little exercise anyways. > I ran the following test: import numpy as np size=10000000000 fp = np.memmap("/mnt/hdd/data/mmap/numpy1e10.mmap", dtype='float64', mode='r+', shape=(size,)) for i in xrange(size): ? ? fp[i]=1.0 time np.sum(fp) 10000000000.0 Time: CPU 188.36 s, Wall: 884.33 s So, everything seems to be working and it kinda makes sense.? The sum should be IO bound which it is.? I didn't time the loop but it took a while (maybe 30 minutes) and it was compute bound. To make sure, I exited the program and ran everything but the initialization loop. import numpy as np size=10000000000 fp = np.memmap("/mnt/hdd/data/mmap/numpy1e10.mmap", dtype='float64', mode='r+', shape=(size,) time np.sum(fp) 10000000000.0 Time: CPU 180.02 s, Wall: 854.72 s I was a little surprised that it didn't take longer given almost half of the mmap'ed data should have been resident in the sum performed immediately after initialization, but since it needed to start at the beginning and only had the second half in memory, it makes sense So, it "appears" as though the mmap works but there's something strange with slices going on. -glenn > > >> >> Chuck >> >> >> >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > > > -- > Glenn H. Tarbox, PhD ||? 206-274-6919 > http://www.tarbox.org > -- Glenn H. Tarbox, PhD ||? 206-274-6919 http://www.tarbox.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20090514/c58e77fd/attachment-0001.html ------------------------------ Message: 2 Date: Thu, 14 May 2009 10:43:35 +0200 From: Gael Varoquaux Subject: Re: [Numpy-discussion] numpy slices limited to 32 bit values? To: Discussion of Numerical Python Message-ID: <20090514084335.GD32437 at phare.normalesup.org> Content-Type: text/plain; charset=utf-8 On Thu, May 14, 2009 at 01:31:45AM -0700, Glenn Tarbox, PhD wrote: >? ? ? I've been working on some other things lately and that number seemed >? ? ? related to 2^32... now that I look more closely, I don't know where that >? ? ? number comes from. Is your OS 64bit? Ga? ------------------------------ Message: 3 Date: Thu, 14 May 2009 02:13:23 -0700 From: "Glenn Tarbox, PhD" Subject: Re: [Numpy-discussion] numpy slices limited to 32 bit values? To: Discussion of Numerical Python Message-ID: ??? Content-Type: text/plain; charset="iso-8859-1" On Thu, May 14, 2009 at 1:43 AM, Gael Varoquaux < gael.varoquaux at normalesup.org> wrote: > On Thu, May 14, 2009 at 01:31:45AM -0700, Glenn Tarbox, PhD wrote: > >? ? ? I've been working on some other things lately and that number seemed > >? ? ? related to 2^32... now that I look more closely, I don't know where > that > >? ? ? number comes from. > > Is your OS 64bit? Yes, Ubuntu 9.04 x86_64 Linux hq2 2.6.28-11-server #42-Ubuntu SMP Fri Apr 17 02:45:36 UTC 2009 x86_64 GNU/Linux -glenn > > Ga?l > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Glenn H. Tarbox, PhD ||? 206-274-6919 http://www.tarbox.org -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20090514/5ae4b5a2/attachment-0001.html ------------------------------ Message: 4 Date: Thu, 14 May 2009 11:16:17 +0200 From: Gael Varoquaux Subject: Re: [Numpy-discussion] numpy slices limited to 32 bit values? To: Discussion of Numerical Python Message-ID: <20090514091617.GF32437 at phare.normalesup.org> Content-Type: text/plain; charset=utf-8 On Thu, May 14, 2009 at 02:13:23AM -0700, Glenn Tarbox, PhD wrote: >? ? On Thu, May 14, 2009 at 1:43 AM, Gael Varoquaux >? ? <[1]gael.varoquaux at normalesup.org> wrote: >? ? ? On Thu, May 14, 2009 at 01:31:45AM -0700, Glenn Tarbox, PhD wrote: >? ? ? > ? ? ?I've been working on some other things lately and that number >? ? ? seemed >? ? ? > ? ? ?related to 2^32... now that I look more closely, I don't know >? ? ? where that >? ? ? > ? ? ?number comes from. >? ? ? Is your OS 64bit? >? ? Yes, Ubuntu 9.04 x86_64 Hum, I am wondering: could it be that Sage has not been compiled in 64bits? That number '32' seems to me to point toward a 32bit pointer issue (I may be wrong). Ga? ------------------------------ Message: 5 Date: Thu, 14 May 2009 11:19:03 +0200 From: Sebastian Walter Subject: Re: [Numpy-discussion] (no subject) To: Discussion of Numerical Python Message-ID: ??? Content-Type: text/plain; charset=ISO-8859-1 On Wed, May 13, 2009 at 10:18 PM, David J Strozzi wrote: > Hi, > > [You may want to edit the numpy homepage numpy.scipy.org to tell > people they must subscribe to post, and adding a link to > http://www.scipy.org/Mailing_Lists] > > > Many of you probably know of the interpreter yorick by Dave Munro. As > a Livermoron, I use it all the time. Never heard of it... what does it do? By the sound of it, yorick is an interpreted language like Python. > There are some built-in > functions there, analogous to but above and beyond numpy's sum() and > diff(), which are quite useful for common operations on gridded data. > Of course one can write their own, but maybe they should be cleanly > canonized? > > For instance: > > x = linspace(0,10,10) > y = sin(x) > > It is common, say when integrating y(x), to take "point-centered" > data and want to zone-center it: > > I = sum(zcen(y)*diff(x)) > > def zcen(x): return 0.5*(x[0:-1]+x[1:]) > > Besides zcen, yorick has builtins for "point centering", "un-zone > centering," etc.? Also, due to its slick syntax you can give these > things as array "indexes": > > x(zcen), y(dif), z(:,sum,:) > > > Just some thoughts, > David Strozzi > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > ------------------------------ Message: 6 Date: Thu, 14 May 2009 10:26:25 +0000 (GMT) From: Emerald Jasper Subject: [Numpy-discussion] Non-linear optimization in python To: numpy-discussion at scipy.org Message-ID: <342303.58705.qm at web23902.mail.ird.yahoo.com> Content-Type: text/plain; charset="utf-8" Dear python user! Please, instruct me how to make non-linear optimization using numpy/simpy in python? Thank you very much in the advance, Emerald from Japan ? ? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: http://mail.scipy.org/pipermail/numpy-discussion/attachments/20090514/5ef19c2b/attachment-0001.html ------------------------------ Message: 7 Date: Thu, 14 May 2009 12:51:29 +0200 From: Matthieu Brucher Subject: Re: [Numpy-discussion] Non-linear optimization in python To: Discussion of Numerical Python Message-ID: ??? Content-Type: text/plain; charset=ISO-8859-1 Hi, You have several choices: - using scipy.optimize - openopt - the old openopt scikit that contains a generic optimization framework. Did you try one of these? 2009/5/14 Emerald Jasper : > Dear python user! > Please, instruct me how to make non-linear optimization using numpy/simpy in > python? > Thank you very much in the advance, > Emerald > from Japan > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher ------------------------------ _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion End of Numpy-discussion Digest, Vol 32, Issue 39 ************************************************ -------------- next part -------------- An HTML attachment was scrubbed... URL: From glenn at tarbox.org Thu May 14 10:40:58 2009 From: glenn at tarbox.org (Glenn Tarbox, PhD) Date: Thu, 14 May 2009 07:40:58 -0700 Subject: [Numpy-discussion] numpy slices limited to 32 bit values? In-Reply-To: <20090514091617.GF32437@phare.normalesup.org> References: <20090514084335.GD32437@phare.normalesup.org> <20090514091617.GF32437@phare.normalesup.org> Message-ID: On Thu, May 14, 2009 at 2:16 AM, Gael Varoquaux < gael.varoquaux at normalesup.org> wrote: > On Thu, May 14, 2009 at 02:13:23AM -0700, Glenn Tarbox, PhD wrote: > > On Thu, May 14, 2009 at 1:43 AM, Gael Varoquaux > > <[1]gael.varoquaux at normalesup.org> wrote: > > > On Thu, May 14, 2009 at 01:31:45AM -0700, Glenn Tarbox, PhD wrote: > > > I've been working on some other things lately and that number > > seemed > > > related to 2^32... now that I look more closely, I don't know > > where that > > > number comes from. > > > Is your OS 64bit? > > > Yes, Ubuntu 9.04 x86_64 > > Hum, I am wondering: could it be that Sage has not been compiled in > 64bits? That number '32' seems to me to point toward a 32bit pointer > issue (I may be wrong). The other tests I posted indicate everything else is working... For example, np.sum(fp) runs over the full set of 1e10 doubes and seems to work fine. Also, while my first thought was about 2^32, Chuck Harris's reply kinda put that to bed. Where 1410065408.0 comes from may involve e or PI (at least thats how we reverse engineered answers when I was in college :-) -glenn > > Ga?l > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Glenn H. Tarbox, PhD || 206-274-6919 http://www.tarbox.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From gael.varoquaux at normalesup.org Thu May 14 10:54:06 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Thu, 14 May 2009 16:54:06 +0200 Subject: [Numpy-discussion] numpy slices limited to 32 bit values? In-Reply-To: References: <20090514084335.GD32437@phare.normalesup.org> <20090514091617.GF32437@phare.normalesup.org> Message-ID: <20090514145406.GB3630@phare.normalesup.org> On Thu, May 14, 2009 at 07:40:58AM -0700, Glenn Tarbox, PhD wrote: > Hum, I am wondering: could it be that Sage has not been compiled in > 64bits? That number '32' seems to me to point toward a 32bit pointer > issue (I may be wrong). > The other tests I posted indicate everything else is working... For > example, np.sum(fp) runs over the full set of 1e10 doubes and seems to > work fine.? Correct. I had missed that. Ga?l From aisaac at american.edu Thu May 14 11:10:34 2009 From: aisaac at american.edu (Alan G Isaac) Date: Thu, 14 May 2009 11:10:34 -0400 Subject: [Numpy-discussion] Non-linear optimization in python In-Reply-To: <342303.58705.qm@web23902.mail.ird.yahoo.com> References: <342303.58705.qm@web23902.mail.ird.yahoo.com> Message-ID: <4A0C346A.6010803@american.edu> On 5/14/2009 6:26 AM Emerald Jasper apparently wrote: > Please, instruct me how to make non-linear optimization using > numpy/simpy in python? http://www.scipy.org/SciPyPackages/Optimize http://www.scipy.org/Cookbook/OptimizationDemo1 hth, Alan Isaac From josef.pktd at gmail.com Thu May 14 11:45:57 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Thu, 14 May 2009 11:45:57 -0400 Subject: [Numpy-discussion] Non-linear optimization in python In-Reply-To: <4A0C346A.6010803@american.edu> References: <342303.58705.qm@web23902.mail.ird.yahoo.com> <4A0C346A.6010803@american.edu> Message-ID: <1cd32cbb0905140845j7ef6927byafb1c8b3a224707a@mail.gmail.com> On Thu, May 14, 2009 at 11:10 AM, Alan G Isaac wrote: > On 5/14/2009 6:26 AM Emerald Jasper apparently wrote: >> Please, instruct me how to make non-linear optimization using >> numpy/simpy in python? > > > http://www.scipy.org/SciPyPackages/Optimize note: the link there links to the old documentation, the current documentation for scipy.optimize is at http://docs.scipy.org/doc/scipy/reference/tutorial/optimize.html Josef > > http://www.scipy.org/Cookbook/OptimizationDemo1 > > hth, > Alan Isaac > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From strozzi2 at llnl.gov Thu May 14 14:52:17 2009 From: strozzi2 at llnl.gov (David J Strozzi) Date: Thu, 14 May 2009 11:52:17 -0700 Subject: [Numpy-discussion] yorick zcen et al In-Reply-To: References: Message-ID: Hi, Sorry to not include a subject last time, or provide more info on yorick: http://yorick.sourceforge.net/index.php official http://www.maumae.net/yorick/doc/index.php unofficial It's basically an interpreter geared toward doing numerics, with C-like syntax (but Fortran array indexing) and very elegant multi-D array syntax. It's also quite fast, has a parallel MPI package, and is used at LLNL to steer some big numerical codes where the 'guts' are in C. Also has a graphics package called gist. It's a free, open-source, bare-bones matlab, written by David Munro of LLNL starting in I think the late 1980s. At the risk of being glib, I find the current science tools in python (numpy, scipy, matplotlib) to be a good beta version of yorick :) Anyway, my point was there are a lot of standard grid gymnastics, of which numpy's diff() and sum() are examples, which don't seem to be in numpy, like yorick's zcen (zone centering) and pcen (point centering). Rather than everyone write their own, perhaps they could be included? Unless they're in numpy and I can't find where. Cheers Dave [Strozzi, not Munro] At 11:19 AM +0200 5/14/09, Sebastian Walter wrote: >On Wed, May 13, 2009 at 10:18 PM, David J Strozzi wrote: >> Hi, >> >> [You may want to edit the numpy homepage numpy.scipy.org to tell >> people they must subscribe to post, and adding a link to >> http:// www. scipy.org/Mailing_Lists] >> >> >> Many of you probably know of the interpreter yorick by Dave Munro. As >> a Livermoron, I use it all the time. > >Never heard of it... what does it do? By the sound of it, yorick is an >interpreted language like Python. > >> There are some built-in >> functions there, analogous to but above and beyond numpy's sum() and >> diff(), which are quite useful for common operations on gridded data. >> Of course one can write their own, but maybe they should be cleanly >> canonized? >> >> For instance: >> >> x = linspace(0,10,10) >> y = sin(x) >> >> It is common, say when integrating y(x), to take "point-centered" >> data and want to zone-center it: >> >> I = sum(zcen(y)*diff(x)) >> >> def zcen(x): return 0.5*(x[0:-1]+x[1:]) >> >> Besides zcen, yorick has builtins for "point centering", "un-zone >> centering," etc. Also, due to its slick syntax you can give these >> things as array "indexes": >> >> x(zcen), y(dif), z(:,sum,:) >> >> >> Just some thoughts, >> David Strozzi >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http:// mail.scipy.org/mailman/listinfo/numpy-discussion >> >_______________________________________________ >Numpy-discussion mailing list >Numpy-discussion at scipy.org >http:// mail.scipy.org/mailman/listinfo/numpy-discussion From aisaac at american.edu Thu May 14 18:29:48 2009 From: aisaac at american.edu (Alan G Isaac) Date: Thu, 14 May 2009 18:29:48 -0400 Subject: [Numpy-discussion] yorick zcen et al In-Reply-To: References: Message-ID: <4A0C9B5C.3050604@american.edu> On 5/14/2009 2:52 PM David J Strozzi apparently wrote: > At the risk of being glib, I find the current science tools in python > (numpy, scipy, matplotlib) to be a good beta version of yorick :) I suspect that is too glib for quite a number of reasons, but just to mention one aside from the very truncated list of science tools in Python, if you really prefer gist to Matplotlib for some reason (?), you can use Pygist. Which allows you to use it cross platform as well. Cheers, Alan Isaac From efiring at hawaii.edu Fri May 15 14:05:18 2009 From: efiring at hawaii.edu (Eric Firing) Date: Fri, 15 May 2009 08:05:18 -1000 Subject: [Numpy-discussion] Are masked arrays slower for processing than ndarrays? In-Reply-To: References: <4A061A86.5020207@hawaii.edu> <4A061D33.7080704@hawaii.edu> <44ACCB4C-0BD6-4D82-B052-B779CEB6ED12@gmail.com> Message-ID: <4A0DAEDE.5070205@hawaii.edu> Pierre GM wrote: > On May 13, 2009, at 7:36 PM, Matt Knox wrote: >>> Here's the catch: it's basically cheating. I got rid of the pre- >>> processing (where a mask was calculated depending on the domain and >>> the input set to a filling value depending on this mask, before the >>> actual computation). Instead, I force >>> np.seterr(divide='ignore',invalid='ignore') before calling the ufunc >> This isn't a thread safe approach and could cause wierd side effects >> in a >> multi-threaded application. I think modifying global options/ >> variables inside >> any function where it generally wouldn't be expected by the user is >> a bad idea. > > Whine. I was afraid of something like that... > 2 options, then: > * We revert to computing a mask beforehand. That looks like the part > that takes the most time w/ domained operations (according to Robert > K's profiler. Robert, you deserve a statue for this tool). And that > doesn't solve the pb of power, anyway: how do you compute the domain > of power ? > * We reimplement masked versions of the ufuncs in C. Won't happen from > me anytime soon (this fall or winter, maybe...) Pierre, I have implemented masked versions of all binary ufuncs in C, using slight modifications of the numpy code generation machinery. I suspect that the way I have done it will not be the final method, and as of this moment I have just gotten it compiled and minimally checked (numpy imports, multiply_m(x, y, mask, out) puts x*y in out only where mask is False), but it is enough to make me think that we should be able to make it work in numpy.ma. In the present implementation, the masked versions of the ufuncs take a single mask, and they live in the same namespace as the unmasked versions. Masked versions of the unary ufuncs need to be added. Binary versions taking two masks and returning the resulting mask can also be added, but with considerably more effort, so I view that as something to be done only after all the wrinkles are worked out with the single-mask implementation. I view these masked versions of ufuncs as perfectly good standalone entities, which will enable a huge speedup in numpy.ma, but which may also be useful independently of masked arrays. I have made no attempt at this point to address domain checking, but certainly this needs to be moved into the C stage also, with separate ufuncs while we have only the single-mask binary ufuncs, but directly into the double-mask binary ufuncs whenever those are implemented. Example: In [1]:import numpy as np In [2]:x = np.arange(3) In [3]:y = np.arange(3) + 2 In [4]:x Out[4]:array([0, 1, 2]) In [5]:y Out[5]:array([2, 3, 4]) In [6]:mask = np.array([False, True, False]) In [7]:np.multiply_m(x, y, mask, x) Out[7]:array([0, 1, 8]) In [8]:x = np.arange(1000000, dtype=float) In [9]:y = np.sin(x) In [10]:mask = y > 0 In [11]:z = np.zeros_like(x) In [12]:timeit np.multiply(x,y,z) 100 loops, best of 3: 10.5 ms per loop In [13]:timeit np.multiply_m(x,y,mask,z) 100 loops, best of 3: 12 ms per loop Eric From david.huard at gmail.com Fri May 15 16:09:08 2009 From: david.huard at gmail.com (David Huard) Date: Fri, 15 May 2009 16:09:08 -0400 Subject: [Numpy-discussion] Indexing with callables (was: Yorick-like functionality) In-Reply-To: References: Message-ID: <91cf711d0905151309v5ab033f9r7c3c3e472e74bfff@mail.gmail.com> Pauli and David, Can this indexing syntax do things that are otherwise awkward with the current syntax ? Otherwise, I'm not warm to the idea of making indexing more complex than it is. getv : this is useful but it feels a bit redundant with numpy.take. Is there a reason why take could not support slices ? Drop_last: I don't think it is worth cluttering the namespace with a one liner. append_one: A generalized stack method with broadcasting capability would be more useful in my opinion, eg. ``np.stack(x, 1., axis=1)`` zcen: This is indeed useful, particulary in its nd form, that is, when it can be applied to multiples axes to find the center of a 2D or 3D cell in one call. I'm appending the version I use below. Cheers, David # This code is released in the public domain. import numpy as np def __midpoints_1d(a): """Return `a` linearly interpolated at the mid-points.""" return (a[:-1] + a[1:])/2. def midpoints(a, axis=None): """Return `a` linearly interpolated at the mid-points. Parameters ---------- a : array-like Input array. axis : int or None Axis along which the interpolation takes place. None stands for all axes. Returns ------- out : ndarray Input array interpolated at the midpoints along the given axis. Examples -------- >>> a = [1,2,3,4] >>> midpoints(a) array([1.5, 2.5, 3.5]) """ x = np.asarray(a) if axis is not None: return np.apply_along_axis(__midpoints_1d, axis, x) else: for i in range(x.ndim): x = midpoints(x, i) return x On Thu, May 14, 2009 at 6:54 AM, Pauli Virtanen wrote: > Wed, 13 May 2009 13:18:45 -0700, David J Strozzi kirjoitti: > [clip] > > Many of you probably know of the interpreter yorick by Dave Munro. As a > > Livermoron, I use it all the time. There are some built-in functions > > there, analogous to but above and beyond numpy's sum() and diff(), which > > are quite useful for common operations on gridded data. Of course one > > can write their own, but maybe they should be cleanly canonized? > > +0 from me for zcen and other, having small functions probably won't hurt > much > > [clip] > > Besides zcen, yorick has builtins for "point centering", "un-zone > > centering," etc. Also, due to its slick syntax you can give these > > things as array "indexes": > > > > x(zcen), y(dif), z(:,sum,:) > > I think you can easily subclass numpy.ndarray to offer the same feature, > see below. I don't know if we want to add this feature (indexing with > callables) to the Numpy's fancy indexing itself. Thoughts? > > ----- > > import numpy as np > import inspect > > class YNdarray(np.ndarray): > """ > A subclass of ndarray that implements Yorick-like indexing with > functions. > > Beware: not adequately tested... > """ > > def __getitem__(self, key_): > if not isinstance(key_, tuple): > key = (key_,) > scalar_key = True > else: > key = key_ > scalar_key = False > > key = list(key) > > # expand ellipsis manually > while Ellipsis in key: > j = key.index(Ellipsis) > key[j:j+1] = [slice(None)] * (self.ndim - len(key)) > > # handle reducing or mutating callables > arr = self > new_key = [] > real_axis = 0 > for j, v in enumerate(key): > if callable(v): > arr2 = self._reduce_axis(arr, v, real_axis) > new_key.extend([slice(None)] * (arr2.ndim - arr.ndim + 1)) > arr = arr2 > elif v is not None: > real_axis += 1 > new_key.append(v) > else: > new_key.append(v) > > # final get > if scalar_key: > return np.ndarray.__getitem__(arr, new_key[0]) > else: > return np.ndarray.__getitem__(arr, tuple(new_key)) > > > def _reduce_axis(self, arr, func, axis): > return func(arr, axis=axis) > > x = np.arange(2*3*4).reshape(2,3,4).view(YNdarray) > > # Now, > > assert np.allclose(x[np.sum,...], np.sum(x, axis=0)) > assert np.allclose(x[:,np.sum,:], np.sum(x, axis=1)) > assert np.allclose(x[:,:,np.sum], np.sum(x, axis=2)) > assert np.allclose(x[:,np.sum,None,np.sum], > x.sum(axis=1).sum(axis=1)[:,None]) > > def get(v, s, axis=0): > """Index `v` with slice `s` along given axis""" > ix = [slice(None)] * v.ndim > ix[axis] = s > return v[ix] > > def drop_last(v, axis=0): > """Remove one element from given array in given dimension""" > return get(v, slice(None, -1), axis) > > assert np.allclose(x[:,drop_last,:], x[:,:-1,:]) > > def zcen(v, axis=0): > return .5*(get(v, slice(None,-1), axis) + get(v, slice(1,None), axis)) > > assert np.allclose(x[0,1,zcen], .5*(x[0,1,1:] + x[0,1,:-1])) > > def append_one(v, axis=0): > """Append one element to the given array in given dimension, > fill with ones""" > new_shape = list(v.shape) > new_shape[axis] += 1 > v2 = np.empty(new_shape, dtype=v.dtype) > get(v2, slice(None, -1), axis)[:] = v > get(v2, -1, axis)[:] = 1 > return v2 > > assert np.allclose(x[:,np.diff,0], np.diff(x.view(np.ndarray)[:,:,0], > axis=1)) > assert np.allclose(x[0,append_one,:], [[0,1,2,3], > [4,5,6,7], > [8,9,10,11], > [1,1,1,1]]) > assert np.allclose(x[:,append_one,0], [[0,4,8,1], > [12,16,20,1]]) > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri May 15 16:47:46 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 15 May 2009 16:47:46 -0400 Subject: [Numpy-discussion] Indexing with callables (was: Yorick-like functionality) In-Reply-To: <91cf711d0905151309v5ab033f9r7c3c3e472e74bfff@mail.gmail.com> References: <91cf711d0905151309v5ab033f9r7c3c3e472e74bfff@mail.gmail.com> Message-ID: <1cd32cbb0905151347v4e6cb201ld0f4cd0a998e3973@mail.gmail.com> On Fri, May 15, 2009 at 4:09 PM, David Huard wrote: > Pauli and David, > > Can this indexing syntax do things that are otherwise awkward with the > current syntax ? Otherwise, I'm not warm to the idea of making indexing more > complex than it is. > > getv : this is useful but it feels a bit redundant with numpy.take. Is there > a reason why take could not support slices ? > > Drop_last: I don't think it is worth cluttering the namespace with a one > liner. > > append_one: A generalized stack method with broadcasting capability would be > more useful in my opinion, eg. ``np.stack(x, 1., axis=1)`` > > zcen: This is indeed useful, particulary in its nd form, that is, when it > can be applied to multiples axes to find the center of a 2D or 3D cell in > one call. I'm appending the version I use below. > > Cheers, > > David > > > # This code is released in the public domain. > import numpy as np > def __midpoints_1d(a): > ??? """Return `a` linearly interpolated at the mid-points.""" > ??? return (a[:-1] + a[1:])/2. > > def midpoints(a,? axis=None): > ??? """Return `a` linearly interpolated at the mid-points. > > ??? Parameters > ??? ---------- > ??? a : array-like > ????? Input array. > ??? axis : int or None > ????? Axis along which the interpolation takes place. None stands for all > axes. > > ??? Returns > ??? ------- > ??? out : ndarray > ????? Input array interpolated at the midpoints along the given axis. > > ??? Examples > ??? -------- > ??? >>> a = [1,2,3,4] > ??? >>> midpoints(a) > ??? array([1.5, 2.5, 3.5]) > ??? """ > ??? x = np.asarray(a) > ??? if axis is not None: > ??????? return np.apply_along_axis(__midpoints_1d,? axis, x) > ??? else: > ??????? for i in range(x.ndim): > ??????????? x = midpoints(x,? i) > ??????? return x > zcen is just a moving average, isn't it? For time series (1d), correlate works well, for 2d (nd?), there is >>> a= np.arange(5) >>> b = 1.0*a[:,np.newaxis]*np.arange(4) >>> ndimage.filters.correlate(b,0.5*np.ones((2,1)))[1:,1:] >>> ndimage.filters.correlate(b,0.5*np.ones((2,1)))[1:,1:] Josef From david.huard at gmail.com Fri May 15 17:39:31 2009 From: david.huard at gmail.com (David Huard) Date: Fri, 15 May 2009 17:39:31 -0400 Subject: [Numpy-discussion] Indexing with callables (was: Yorick-like functionality) In-Reply-To: <1cd32cbb0905151347v4e6cb201ld0f4cd0a998e3973@mail.gmail.com> References: <91cf711d0905151309v5ab033f9r7c3c3e472e74bfff@mail.gmail.com> <1cd32cbb0905151347v4e6cb201ld0f4cd0a998e3973@mail.gmail.com> Message-ID: <91cf711d0905151439g6e746a7cr298bfd8b65909a0c@mail.gmail.com> Josef, You're right, you can see it as a moving average. For 1D, correlate(a, [5,.5]) yields what I expect but does not take an axis keyword. For the 2D case, I'm rather looking for >>> ndimage.filters.correlate(b,0.25*np.ones((2,2)))[1:,1:] So another one-liner... maybe not worth adding to the numpy namespace. David On Fri, May 15, 2009 at 4:47 PM, wrote: > On Fri, May 15, 2009 at 4:09 PM, David Huard > wrote: > > Pauli and David, > > > > Can this indexing syntax do things that are otherwise awkward with the > > current syntax ? Otherwise, I'm not warm to the idea of making indexing > more > > complex than it is. > > > > getv : this is useful but it feels a bit redundant with numpy.take. Is > there > > a reason why take could not support slices ? > > > > Drop_last: I don't think it is worth cluttering the namespace with a one > > liner. > > > > append_one: A generalized stack method with broadcasting capability would > be > > more useful in my opinion, eg. ``np.stack(x, 1., axis=1)`` > > > > zcen: This is indeed useful, particulary in its nd form, that is, when it > > can be applied to multiples axes to find the center of a 2D or 3D cell in > > one call. I'm appending the version I use below. > > > > Cheers, > > > > David > > > > > > # This code is released in the public domain. > > import numpy as np > > def __midpoints_1d(a): > > """Return `a` linearly interpolated at the mid-points.""" > > return (a[:-1] + a[1:])/2. > > > > def midpoints(a, axis=None): > > """Return `a` linearly interpolated at the mid-points. > > > > Parameters > > ---------- > > a : array-like > > Input array. > > axis : int or None > > Axis along which the interpolation takes place. None stands for all > > axes. > > > > Returns > > ------- > > out : ndarray > > Input array interpolated at the midpoints along the given axis. > > > > Examples > > -------- > > >>> a = [1,2,3,4] > > >>> midpoints(a) > > array([1.5, 2.5, 3.5]) > > """ > > x = np.asarray(a) > > if axis is not None: > > return np.apply_along_axis(__midpoints_1d, axis, x) > > else: > > for i in range(x.ndim): > > x = midpoints(x, i) > > return x > > > > zcen is just a moving average, isn't it? For time series (1d), > correlate works well, for 2d (nd?), there is > > >>> a= np.arange(5) > >>> b = 1.0*a[:,np.newaxis]*np.arange(4) > >>> ndimage.filters.correlate(b,0.5*np.ones((2,1)))[1:,1:] > >>> ndimage.filters.correlate(b,0.5*np.ones((2,1)))[1:,1:] > > Josef > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Fri May 15 19:16:21 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 15 May 2009 19:16:21 -0400 Subject: [Numpy-discussion] Indexing with callables (was: Yorick-like functionality) In-Reply-To: <91cf711d0905151439g6e746a7cr298bfd8b65909a0c@mail.gmail.com> References: <91cf711d0905151309v5ab033f9r7c3c3e472e74bfff@mail.gmail.com> <1cd32cbb0905151347v4e6cb201ld0f4cd0a998e3973@mail.gmail.com> <91cf711d0905151439g6e746a7cr298bfd8b65909a0c@mail.gmail.com> Message-ID: <1cd32cbb0905151616i230f5b29gb3cb249e9bdb68d1@mail.gmail.com> On Fri, May 15, 2009 at 5:39 PM, David Huard wrote: > Josef, > > You're right, you can see it as a moving average. For 1D, correlate(a, > [5,.5]) yields what I expect but does not take an axis keyword. For the 2D > case, I'm rather looking for > >>>> ndimage.filters.correlate(b,0.25*np.ones((2,2)))[1:,1:] > > So another one-liner... maybe not worth adding to the numpy namespace. > I needed some practice with slice handling. This seems to work, but only minimally tested. It would be possible to extend it to axis being a tuple. ndimage is currently very fast if you give it the correct types and crashes for wrong function arguments. Josef def movmean(a, k=2, axis=None): '''moving average along axis for window length k''' a = np.asarray(a, dtype=float) # integers don't work because return type is also integer if axis is None: kernshape = [k]*a.ndim kern = 1/float(k)**a.ndim * np.ones(kernshape) #print kern cut = [slice(1,None,None)]*a.ndim return ndimage.filters.correlate(a,kern)[cut] else: kernshape = [1]*a.ndim kernshape[axis] = k kern = 1/float(k) * np.ones(kernshape) #print kern cut = [slice(None)]*a.ndim cut[axis] = slice(1,None,None) return ndimage.filters.correlate(a,kern)[cut] a = np.arange(5) b = 1.0*a[:,np.newaxis]*np.arange(1,6,2) c = b[:,:,np.newaxis]*a print movmean(a) print movmean(b) print "axis=1" print (b[:,:-1]+b[:,1:])/2 print movmean(b, axis=1) print "axis=0" print (b[:-1,:]+b[1:,:])/2 print movmean(b, axis=0) print (c[:-1,:,:]+c[1:,:,:])/2 print movmean(c, axis=0) From efiring at hawaii.edu Fri May 15 21:48:50 2009 From: efiring at hawaii.edu (Eric Firing) Date: Fri, 15 May 2009 15:48:50 -1000 Subject: [Numpy-discussion] masked ufuncs in C: on github Message-ID: <4A0E1B82.9030006@hawaii.edu> http://www.mail-archive.com/numpy-discussion at scipy.org/msg17595.html Prompted by the thread above, I decided to see what it would take to implement ufuncs with masking in C. I described the result here: http://www.mail-archive.com/numpy-discussion at scipy.org/msg17698.html Now I am starting a new thread. The present state of the work is now in github: http://github.com/efiring/numpy-work/tree/cfastma I don't want to do any more until I have gotten some feedback from core developers. (And I would be delighted if someone wants to help with this, or take it over.) 1) The strategy I have started with is to make a full set of masked ufuncs alongside the existing ones, appending "_m" to their names. Only the binary ufuncs are implemented now, but the unary ufuncs can be handled similarly. Example: multiply(x, y, out) # present ufunc: no change multiply_m(x, y, mask, out) # new Where mask is True, the operation is skipped. 2) I have in mind the possibility of supporting two input masks and one output mask for binary operations. This would look like: multiply_mm(x, y, maskx, masky, out, outmask) outmask would be the logical_or of maskx and masky, and in the case of domained operations it would also be True where the arguments are outside the domain. This form would provide the fastest support for masked arrays, but would also take quite a bit more work, and would expand the namespace even more. I'm not sure it's worth it. 3) I have not yet taken any steps to modify numpy.ma to take advantage of the new ufuncs, but I think that will be quite simple. 4) Likewise, to save time, I am now just borrowing the regular ufunc docstrings. 5) No tests yet, Stefan. They can be added as soon as there is agreement on API and general strategy. 6) The present implementation is based on conceptually small modifications of the existing numpy code generation system. It required a lot of cut and paste, and yields a lot of nearly duplicated code. There may be better ways to do it--especially if it turns out it needs to be redone in some modified form. Eric From charlesr.harris at gmail.com Fri May 15 23:06:31 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 15 May 2009 21:06:31 -0600 Subject: [Numpy-discussion] masked ufuncs in C: on github In-Reply-To: <4A0E1B82.9030006@hawaii.edu> References: <4A0E1B82.9030006@hawaii.edu> Message-ID: On Fri, May 15, 2009 at 7:48 PM, Eric Firing wrote: > > http://www.mail-archive.com/numpy-discussion at scipy.org/msg17595.html > > Prompted by the thread above, I decided to see what it would take to > implement ufuncs with masking in C. I described the result here: > > http://www.mail-archive.com/numpy-discussion at scipy.org/msg17698.html > > Now I am starting a new thread. The present state of the work is now in > github: http://github.com/efiring/numpy-work/tree/cfastma > > I don't want to do any more until I have gotten some feedback from core > developers. (And I would be delighted if someone wants to help with > this, or take it over.) > Here the if ... continue needs to follow the declaration: if (*mp1) continue; float in1 = *(float *)ip1; float in2 = *(float *)ip2; *(float *)op1 = f(in1, in2); I think this would be better as if (!(*mp1)) { float in1 = *(float *)ip1; float in2 = *(float *)ip2; *(float *)op1 = f(in1, in2); } But since this is actually a ternary function, you could define new functions, something like double npy_add_m(double a, double b, double mask) { if (!mask) { return a + b; else { return a; } } And use the currently existing loops. Well, you would have to add one for ternary functions. Question, what about reduce? I don't think it is defined defined for ternary functions. Apart from reduce, why not just add, you already have the mask to tell you which results are invalid. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Sat May 16 04:02:21 2009 From: efiring at hawaii.edu (Eric Firing) Date: Fri, 15 May 2009 22:02:21 -1000 Subject: [Numpy-discussion] masked ufuncs in C: on github In-Reply-To: References: <4A0E1B82.9030006@hawaii.edu> Message-ID: <4A0E730D.4060902@hawaii.edu> Charles R Harris wrote: > > > On Fri, May 15, 2009 at 7:48 PM, Eric Firing > wrote: > > > http://www.mail-archive.com/numpy-discussion at scipy.org/msg17595.html > > Prompted by the thread above, I decided to see what it would take to > implement ufuncs with masking in C. I described the result here: > > http://www.mail-archive.com/numpy-discussion at scipy.org/msg17698.html > > Now I am starting a new thread. The present state of the work is now in > github: http://github.com/efiring/numpy-work/tree/cfastma > > I don't want to do any more until I have gotten some feedback from core > developers. (And I would be delighted if someone wants to help with > this, or take it over.) Chuck, Thanks very much for the quick response. > > > Here the if ... continue needs to follow the declaration: > > if (*mp1) continue; > float in1 = *(float *)ip1; > float in2 = *(float *)ip2; > *(float *)op1 = f(in1, in2); > I was surprised to see the declarations inside the loop in the first place (this certainly is not ANSI-C), and I was also pleasantly surprised that letting them be after the conditional didn't seem to bother the compiler at all. Maybe that is a gcc extension. > I think this would be better as > > if (!(*mp1)) { > float in1 = *(float *)ip1; > float in2 = *(float *)ip2; > *(float *)op1 = f(in1, in2); > } > I agree, and I thought of that originally--I think I did it with continue because it was easier to type it in, and it reduced the difference relative to the non-masked form. > > But since this is actually a ternary function, you could define new > functions, something like > > double npy_add_m(double a, double b, double mask) > { > if (!mask) { > return a + b; > else { > return a; > } > } > > And use the currently existing loops. Well, you would have to add one > for ternary functions. > That would incur the overhead of an extra function call for each element; I suspect it would slow it down a lot. My motivation is to make masked array overhead negligible, at least for medium to large arrays. Also your suggestion above does not handle the case where an output argument is supplied; it would modify the output under the mask. > Question, what about reduce? I don't think it is defined defined for > ternary functions. Apart from reduce, why not just add, you already have > the mask to tell you which results are invalid. > You mean just do the operation and ignore the results under the mask? This is the way Pierre originally did it, if I remember correctly, but fairly recently people started objecting that they didn't want to disturb values in an output argument under a mask. So now ma jumps through hoops to satisfy this requirement, and it is consequently slow. ufunc methods like reduce are supported only for the binary ops with one output, so they are automatically unavailable for the masked versions. To get around this would require subclassing the ufunc to make a masked version. This is probably the best way to go, but I suspect it is much more complicated than I can handle in the amount of time I can spend. So maybe my proposed masked ufuncs are a slight abuse of the ufunc concept, or at least its present implementation. Unary functions with a mask, which I have not yet tried to implement, would actually be binary, so they would have reduce etc. methods that would not make any sense. Is there a way to disable (remove) the methods in this case? Eric > Chuck > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From robert.kern at gmail.com Sat May 16 04:04:46 2009 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 16 May 2009 03:04:46 -0500 Subject: [Numpy-discussion] masked ufuncs in C: on github In-Reply-To: <4A0E730D.4060902@hawaii.edu> References: <4A0E1B82.9030006@hawaii.edu> <4A0E730D.4060902@hawaii.edu> Message-ID: <3d375d730905160104g3eedb936v2d97be231b4218e0@mail.gmail.com> On Sat, May 16, 2009 at 03:02, Eric Firing wrote: > Charles R Harris wrote: >> Here the if ... continue needs to follow the declaration: >> >> ? ? ? ? if (*mp1) continue; >> ? ? ? ? float in1 = *(float *)ip1; >> ? ? ? ? float in2 = *(float *)ip2; >> ? ? ? ? *(float *)op1 = f(in1, in2); >> > > I was surprised to see the declarations inside the loop in the first > place (this certainly is not ANSI-C), and I was also pleasantly > surprised that letting them be after the conditional didn't seem to > bother the compiler at all. ?Maybe that is a gcc extension. I believe they are a part of C99. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From matthieu.brucher at gmail.com Sat May 16 04:06:26 2009 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Sat, 16 May 2009 10:06:26 +0200 Subject: [Numpy-discussion] masked ufuncs in C: on github In-Reply-To: <3d375d730905160104g3eedb936v2d97be231b4218e0@mail.gmail.com> References: <4A0E1B82.9030006@hawaii.edu> <4A0E730D.4060902@hawaii.edu> <3d375d730905160104g3eedb936v2d97be231b4218e0@mail.gmail.com> Message-ID: 2009/5/16 Robert Kern : > On Sat, May 16, 2009 at 03:02, Eric Firing wrote: >> Charles R Harris wrote: > >>> Here the if ... continue needs to follow the declaration: >>> >>> ? ? ? ? if (*mp1) continue; >>> ? ? ? ? float in1 = *(float *)ip1; >>> ? ? ? ? float in2 = *(float *)ip2; >>> ? ? ? ? *(float *)op1 = f(in1, in2); >>> >> >> I was surprised to see the declarations inside the loop in the first >> place (this certainly is not ANSI-C), and I was also pleasantly >> surprised that letting them be after the conditional didn't seem to >> bother the compiler at all. ?Maybe that is a gcc extension. > > I believe they are a part of C99. Exactly (so not supported by Visual Studio). Matthieu -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher From jorgesmbox-ml at yahoo.es Sat May 16 04:42:50 2009 From: jorgesmbox-ml at yahoo.es (jorgesmbox-ml at yahoo.es) Date: Sat, 16 May 2009 08:42:50 +0000 (GMT) Subject: [Numpy-discussion] Question about slicing Message-ID: <519625.91803.qm@web27901.mail.ukl.yahoo.com> Hi, I am just starting with numpy, pyhton and related others. I work on image processing basically. Now, my question is: what is the expected behaviour when slicing a view of an array? The following example might give some background on what I tried to do and the results obatined (which I don't understand): I read an image with PIL but, for whatever reason (different conventions I suppose), it comes upside down. This doesn't change when (I don't know the exact term for this) transforming the image to ndarray with 'array(img)'. I don't feel comfortable working with upside down images, so this had to be fixed. I tried to be smart and avoid copying the whole image: aimg = array(img)[::-1] and it worked!, but I am interested actually in sub-regions of this image, so the next I did was: roi = aimg[10:20,45:50,:] And to my surprise the result was like if I was slicing the original, upside down, image instead of aimg. Can someone explain me what's going on here? I searched and looked at the documentation but I couldn't find an answer. Maybe I am not looking properly. Is the only way to turn the image to perform a copy? Thanks, Jorge From emmanuelle.gouillart at normalesup.org Sat May 16 05:22:45 2009 From: emmanuelle.gouillart at normalesup.org (Emmanuelle Gouillart) Date: Sat, 16 May 2009 11:22:45 +0200 Subject: [Numpy-discussion] Question about slicing In-Reply-To: <519625.91803.qm@web27901.mail.ukl.yahoo.com> References: <519625.91803.qm@web27901.mail.ukl.yahoo.com> Message-ID: <20090516092245.GA27596@phare.normalesup.org> Hi Jorge, > roi = aimg[10:20,45:50,:] are you working with 3-D images? I didn't know PIL was able to handle 3D images. I wasn't able to reproduce the behavior you observed with a simple example: In [20]: base = np.arange(25).reshape((5,5)) In [21]: base Out[21]: array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19], [20, 21, 22, 23, 24]]) In [22]: flip = base[::-1] In [23]: flip Out[23]: array([[20, 21, 22, 23, 24], [15, 16, 17, 18, 19], [10, 11, 12, 13, 14], [ 5, 6, 7, 8, 9], [ 0, 1, 2, 3, 4]]) In [24]: flip[2:4,2:4] Out[24]: array([[12, 13], [ 7, 8]]) which is what you expect... I also tried the same manipulations as you do starting from a PIL image object, but I also got what I expected (and my image was not flipped vertically by PIL or when transformed into an array). It is quite weird BTW that your images are flipped. How do you visualize PIL image (Image.Image.show?)and arrays (pylab.imshow?) ? Hope someone can help you more than I did :D Cheers, Emmanuelle On Sat, May 16, 2009 at 08:42:50AM +0000, jorgesmbox-ml at yahoo.es wrote: > Hi, > I am just starting with numpy, pyhton and related others. I work on image processing basically. Now, my question is: what is the expected behaviour when slicing a view of an array? The following example might give some background on what I tried to do and the results obatined (which I don't understand): > I read an image with PIL but, for whatever reason (different conventions I suppose), it comes upside down. This doesn't change when (I don't know the exact term for this) transforming the image to ndarray with 'array(img)'. I don't feel comfortable working with upside down images, so this had to be fixed. I tried to be smart and avoid copying the whole image: > aimg = array(img)[::-1] > and it worked!, but I am interested actually in sub-regions of this image, so the next I did was: > roi = aimg[10:20,45:50,:] > And to my surprise the result was like if I was slicing the original, upside down, image instead of aimg. Can someone explain me what's going on here? I searched and looked at the documentation but I couldn't find an answer. Maybe I am not looking properly. Is the only way to turn the image to perform a copy? > Thanks, > Jorge > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From pav at iki.fi Sat May 16 05:29:23 2009 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 16 May 2009 09:29:23 +0000 (UTC) Subject: [Numpy-discussion] Question about slicing References: <519625.91803.qm@web27901.mail.ukl.yahoo.com> Message-ID: Sat, 16 May 2009 08:42:50 +0000, jorgesmbox-ml wrote: [clip] > I don't feel comfortable working with upside down images, so this had to > be fixed. I tried to be smart and avoid copying the whole image: > > aimg = array(img)[::-1] Note that here a copy is made. You can use `asarray` instead of `array` if you want to avoid making a copy. > and it worked!, but I am interested actually in sub-regions of this > image, so the next I did was: > > roi = aimg[10:20,45:50,:] > > And to my surprise the result was like if I was slicing the original, > upside down, image instead of aimg. Can someone explain me what's going > on here? Sounds impossible, and I don't see this: In [1]: import Image In [2]: img = Image.open('foo.png') In [3]: aimg = array(img) In [4]: imshow(aimg) Out[4]: In [5]: imshow(aimg[10:320,5:150]) Out[5]: The image is here right-side up, both in full and the slice (since imshow flips it). Also, In [6]: aimg = array(img)[::-1] In [7]: imshow(aimg[10:320,5:150]) Out[7]: Now, the image is upside down, both in full and in the slice. I think you should re-check that you are doing what you think you are doing. Preparing a self-contained code example could help here, at least this would make pinpointing where the error is more easy. -- Pauli Virtanen From pav at iki.fi Sat May 16 05:41:12 2009 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 16 May 2009 09:41:12 +0000 (UTC) Subject: [Numpy-discussion] Indexing with callables (was: Yorick-like functionality) References: <91cf711d0905151309v5ab033f9r7c3c3e472e74bfff@mail.gmail.com> Message-ID: Fri, 15 May 2009 16:09:08 -0400, David Huard wrote: > Can this indexing syntax do things that are otherwise awkward with the > current syntax ? Otherwise, I'm not warm to the idea of making indexing > more complex than it is. I think the indexing with callables is more syntax sugar for nested `func(v, axis=n)` than anything else. It may be more useful interactive use than calling the functions, though. Compare: x[:,sum,mean] mean(sum(x, axis=1), axis=1) It might be useful also for broadcasting functions operating on 1D vectors to the whole array, but here the semantics start getting muddier. [clip] > getv > drop_last > append_one > zcen These, apart from zcen, were just some demo functions I pulled out from nowhere. The actual list of Yorick functions relevant here appears to be here: http://yorick.sourceforge.net/manual/yorick_46.php#SEC46 http://yorick.sourceforge.net/manual/yorick_47.php#SEC47 I must say that I don't see many functions missing in Numpy... David (Strozzi): are these the functions you meant? Are there more? -- Pauli Virtanen From david at ar.media.kyoto-u.ac.jp Sat May 16 05:23:48 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sat, 16 May 2009 18:23:48 +0900 Subject: [Numpy-discussion] masked ufuncs in C: on github In-Reply-To: <4A0E730D.4060902@hawaii.edu> References: <4A0E1B82.9030006@hawaii.edu> <4A0E730D.4060902@hawaii.edu> Message-ID: <4A0E8624.3090001@ar.media.kyoto-u.ac.jp> Eric Firing wrote: > That would incur the overhead of an extra function call for each > element; I suspect it would slow it down a lot. My motivation is to make > masked array overhead negligible, at least for medium to large arrays. > You can use inline in that case - starting with numpy 1.3.0, inline can be used in C code in a portable way (it is a macro which points to compiler specific inline if the C99 inline is not available so it works even on broken compilers). cheers, David From kxroberto at googlemail.com Sat May 16 06:23:23 2009 From: kxroberto at googlemail.com (Robert) Date: Sat, 16 May 2009 12:23:23 +0200 Subject: [Numpy-discussion] minimal numpy ? In-Reply-To: <4A08F00E.1040605@ar.media.kyoto-u.ac.jp> References: <4A08F00E.1040605@ar.media.kyoto-u.ac.jp> Message-ID: David Cournapeau wrote: > Robert wrote: >> for use in binary distribution where I need only basics and fast >> startup/low memory footprint, I try to isolate the minimal ndarray >> type and what I need.. >> [..] > > I think you need at least umath to make this work: when doing import > numpy.core.multiarray, you pull out the whole numpy (because import > foo.bar induces import foo I believe), whereas import multiarray just > imports the multiarray C extension. > > So my suggestion would be to modify numpy such as you can do import > numpy after having removed most directories inside numpy. The big ones > are distutils and f2py, which should already save 2.5 Mb and are not > used at all in numpy itself. IIRC, the only problematic package is > numpy.lib (we import numpy.lib in numpy.core IIRC). > Did like this - keeping a /numpy/core folder structure. In attachment is a README-minimal-numpy.txt Maybe thats interesting for many users / inclusion somewhere in the docs. Result is: some 300kB compressed. And startup very fast. Strange: most imports in the package are relative - which is good for (flat) repackaging. Just one absolutue "from numpy.core.multiarray import ..." in a py file. Yet the 2 remaining DLL's obviously contain absolute imports of each other. Just because of that it is not possible to have the "minimal numpy" in a separate package folder with other name (without recompiling), but one needs to rename/remove the original numpy from the PYTHONPATH :-( Maybe the absolute imports could be removed out of the DLLs in future. By #ifdef or so in the C code the newer Pythons also can be forced to do precisely relative import. Robert -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: README-minimal-numpy.txt URL: From quilby at gmail.com Sat May 16 09:01:00 2009 From: quilby at gmail.com (Quilby) Date: Sat, 16 May 2009 16:01:00 +0300 Subject: [Numpy-discussion] linear algebra help Message-ID: Hi- This is what I need to do- I have this equation- Ax = y Where A is a rational m*n matrix (m<=n), and x and y are vectors of the right size. I know A and y, I don't know what x is equal to. I also know that there is no x where Ax equals exactly y. I want to find the vector x' such that Ax' is as close as possible to y. Meaning that (Ax' - y) is as close as possible to (0,0,0,...0). I know that I need to use either the lstsq function: http://www.scipy.org/doc/numpy_api_docs/numpy.linalg.linalg.html#lstsq or the svd function: http://www.scipy.org/doc/numpy_api_docs/numpy.linalg.linalg.html#svd I don't understand the documentation at all. Can someone please show me how to use these functions to solve my problem. Thanks a lot!!! -quilby From nwagner at iam.uni-stuttgart.de Sat May 16 09:15:46 2009 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Sat, 16 May 2009 15:15:46 +0200 Subject: [Numpy-discussion] linear algebra help In-Reply-To: References: Message-ID: On Sat, 16 May 2009 16:01:00 +0300 Quilby wrote: > Hi- > This is what I need to do- > > I have this equation- > > Ax = y > > Where A is a rational m*n matrix (m<=n), and x and y are >vectors of > the right size. I know A and y, I don't know what x is >equal to. I > also know that there is no x where Ax equals exactly y. >I want to find > the vector x' such that Ax' is as close as possible to >y. Meaning that > (Ax' - y) is as close as possible to (0,0,0,...0). > > I know that I need to use either the lstsq function: > http://www.scipy.org/doc/numpy_api_docs/numpy.linalg.linalg.html#lstsq > > or the svd function: > http://www.scipy.org/doc/numpy_api_docs/numpy.linalg.linalg.html#svd > > I don't understand the documentation at all. Can someone >please show > me how to use these functions to solve my problem. > > Thanks a lot!!! > > -quilby > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion I guess you meant a rectangular matrix http://mathworld.wolfram.com/RectangularMatrix.html from numpy.random import rand, seed from numpy import dot, shape from numpy.linalg import lstsq, norm seed(1) m = 10 n = 20 A = rand(m,n) # random matrix b = rand(m) # rhs x,residues,rank,s = lstsq(A,b) print 'Singular values',s print 'Numerical rank of A',rank print 'Solution',x r=dot(A,x)-b print 'residual',norm(r) Cheers, Nils From josef.pktd at gmail.com Sat May 16 09:34:15 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sat, 16 May 2009 09:34:15 -0400 Subject: [Numpy-discussion] linear algebra help In-Reply-To: References: Message-ID: <1cd32cbb0905160634y209bdaa4jee0652404ec750c3@mail.gmail.com> On Sat, May 16, 2009 at 9:01 AM, Quilby wrote: > Hi- > This is what I need to do- > > I have this equation- > > Ax = y > > Where A is a rational m*n matrix (m<=n), and x and y are vectors of > the right size. I know A and y, I don't know what x is equal to. I > also know that there is no x where Ax equals exactly y. I want to find > the vector x' such that Ax' is as close as possible to y. Meaning that > (Ax' - y) is as close as possible to (0,0,0,...0). > > I know that I need to use either the lstsq function: > http://www.scipy.org/doc/numpy_api_docs/numpy.linalg.linalg.html#lstsq > > or the svd function: > http://www.scipy.org/doc/numpy_api_docs/numpy.linalg.linalg.html#svd > > I don't understand the documentation at all. Can someone please show > me how to use these functions to solve my problem. > Hi, The new docs are more informative and are being improved in the online editor see http://docs.scipy.org/doc/ and http://docs.scipy.org/doc/numpy/reference/generated/numpy.linalg.lstsq.html#numpy.linalg.lstsq any comments and improvement to the docs are very welcome. Josef From pav at iki.fi Sat May 16 10:02:34 2009 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 16 May 2009 14:02:34 +0000 (UTC) Subject: [Numpy-discussion] Obsolete Endo-generated docs References: Message-ID: Hi, Sat, 16 May 2009 16:01:00 +0300, Quilby wrote: [clip] > http://www.scipy.org/doc/numpy_api_docs/numpy.linalg.linalg.html#lstsq [clip] Could we take these old Endo-generated docs down, and make the URL redirect to docs.scipy.org? I believe they are more harmful than helpful... -- Pauli Virtanen From pav at iki.fi Sat May 16 10:31:40 2009 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 16 May 2009 14:31:40 +0000 (UTC) Subject: [Numpy-discussion] Obsolete Endo-generated docs References: Message-ID: Sat, 16 May 2009 14:02:34 +0000, Pauli Virtanen wrote: > Hi, > > Sat, 16 May 2009 16:01:00 +0300, Quilby wrote: [clip] >> http://www.scipy.org/doc/numpy_api_docs/numpy.linalg.linalg.html#lstsq > [clip] > > Could we take these old Endo-generated docs down, and make the URL > redirect to docs.scipy.org? I went and removed the links to them from http://www.scipy.org/Documentation -- Pauli Virtanen From charlesr.harris at gmail.com Sat May 16 10:41:20 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 16 May 2009 08:41:20 -0600 Subject: [Numpy-discussion] masked ufuncs in C: on github In-Reply-To: <4A0E730D.4060902@hawaii.edu> References: <4A0E1B82.9030006@hawaii.edu> <4A0E730D.4060902@hawaii.edu> Message-ID: On Sat, May 16, 2009 at 2:02 AM, Eric Firing wrote: > Charles R Harris wrote: > > > > > > On Fri, May 15, 2009 at 7:48 PM, Eric Firing > > wrote: > > > > > > http://www.mail-archive.com/numpy-discussion at scipy.org/msg17595.html > > > > Prompted by the thread above, I decided to see what it would take to > > implement ufuncs with masking in C. I described the result here: > > > > http://www.mail-archive.com/numpy-discussion at scipy.org/msg17698.html > > > > Now I am starting a new thread. The present state of the work is now > in > > github: http://github.com/efiring/numpy-work/tree/cfastma > > > > I don't want to do any more until I have gotten some feedback from > core > > developers. (And I would be delighted if someone wants to help with > > this, or take it over.) > > Chuck, > > Thanks very much for the quick response. > > > > > > > Here the if ... continue needs to follow the declaration: > > > > if (*mp1) continue; > > float in1 = *(float *)ip1; > > float in2 = *(float *)ip2; > > *(float *)op1 = f(in1, in2); > > > > I was surprised to see the declarations inside the loop in the first > place (this certainly is not ANSI-C), and I was also pleasantly > surprised that letting them be after the conditional didn't seem to > bother the compiler at all. Maybe that is a gcc extension. > Declarations at the top of a block have always been valid C. > > > I think this would be better as > > > > if (!(*mp1)) { > > float in1 = *(float *)ip1; > > float in2 = *(float *)ip2; > > *(float *)op1 = f(in1, in2); > > } > > > > I agree, and I thought of that originally--I think I did it with > continue because it was easier to type it in, and it reduced the > difference relative to the non-masked form. > > > > > But since this is actually a ternary function, you could define new > > functions, something like > > > > double npy_add_m(double a, double b, double mask) > > { > > if (!mask) { > > return a + b; > > else { > > return a; > > } > > } > > > > And use the currently existing loops. Well, you would have to add one > > for ternary functions. > > > That would incur the overhead of an extra function call for each > element; I suspect it would slow it down a lot. My motivation is to make > masked array overhead negligible, at least for medium to large arrays. > It overhead would be the same as it is now, the generic loops all use passed function pointers for functions like sin. Some functions like addition, which is intrinsic and not part of a library, are done in their own special loops that you will find further down in that file. The difficulty I see is that with the current machinery the mask will be converted to the same type as the added numbers and that could add some overhead. > > Also your suggestion above does not handle the case where an output > argument is supplied; it would modify the output under the mask. > > > Question, what about reduce? I don't think it is defined defined for > > ternary functions. Apart from reduce, why not just add, you already have > > the mask to tell you which results are invalid. > > > > You mean just do the operation and ignore the results under the mask? > This is the way Pierre originally did it, if I remember correctly, but > fairly recently people started objecting that they didn't want to > disturb values in an output argument under a mask. So now ma jumps > through hoops to satisfy this requirement, and it is consequently slow. > OK. I'm not familiar with the uses of masked arrays. > > ufunc methods like reduce are supported only for the binary ops with one > output, so they are automatically unavailable for the masked versions. > To get around this would require subclassing the ufunc to make a masked > version. This is probably the best way to go, but I suspect it is much > more complicated than I can handle in the amount of time I can spend. > I think reduce could be added for ternary functions, but it is a design decision how it should operate. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jorgesmbox-ml at yahoo.es Sat May 16 18:05:16 2009 From: jorgesmbox-ml at yahoo.es (Jorge Scandaliaris) Date: Sat, 16 May 2009 22:05:16 +0000 (UTC) Subject: [Numpy-discussion] Question about slicing References: <519625.91803.qm@web27901.mail.ukl.yahoo.com> <20090516092245.GA27596@phare.normalesup.org> Message-ID: Emmanuelle Gouillart normalesup.org> writes: > > Hi Jorge, > > > roi = aimg[10:20,45:50,:] > > are you working with 3-D images? I didn't know PIL was able to handle 3D > images. > Well, if by 3D you mean color images then yes, PIL is able to handle them > I wasn't able to reproduce the behavior you observed with a simple > example: > In [20]: base = np.arange(25).reshape((5,5)) > > In [21]: base > Out[21]: > array([[ 0, 1, 2, 3, 4], > [ 5, 6, 7, 8, 9], > [10, 11, 12, 13, 14], > [15, 16, 17, 18, 19], > [20, 21, 22, 23, 24]]) > > In [22]: flip = base[::-1] > > In [23]: flip > Out[23]: > array([[20, 21, 22, 23, 24], > [15, 16, 17, 18, 19], > [10, 11, 12, 13, 14], > [ 5, 6, 7, 8, 9], > [ 0, 1, 2, 3, 4]]) > > In [24]: flip[2:4,2:4] > Out[24]: > array([[12, 13], > [ 7, 8]]) > which is what you expect... > You're right. I should have done these tests myself. I apologize for jumping to the list so quickly. > I also tried the same manipulations as you do starting from a PIL image > object, but I also got what I expected (and my image was not flipped > vertically by PIL or when transformed into an array). It is quite weird > BTW that your images are flipped. How do you visualize PIL image > (Image.Image.show?)and arrays (pylab.imshow?) ? > It is weird indeed. But no so much that they appear upside down (I do use pylab.imshow() to display images), because at the end of the day it is just a convention, and different things can use different conventions, but because of the fact that the numpy array obtained from the PIL image is not. I downloaded the scipy logo: http://docs.scipy.org/doc/_static/scipyshiny_small.png and did the following: img = Image.open('./scipyshiny_small.png') mpl.pylab.imshow(img) # Comes upside down aimg = asarray(img) mpl.pylab.imshow(aimg) # Comes OK I guess my problem lies with PIL rather than with numpy. I am glad to find that slicing works as I would have expected it to work! > Hope someone can help you more than I did :D You did help, thanks! Jorge From pav at iki.fi Sat May 16 18:22:11 2009 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 16 May 2009 22:22:11 +0000 (UTC) Subject: [Numpy-discussion] Question about slicing References: <519625.91803.qm@web27901.mail.ukl.yahoo.com> <20090516092245.GA27596@phare.normalesup.org> Message-ID: Sat, 16 May 2009 22:05:16 +0000, Jorge Scandaliaris wrote: [clip] > I downloaded the scipy logo: > http://docs.scipy.org/doc/_static/scipyshiny_small.png and did the > following: > > img = Image.open('./scipyshiny_small.png') > > mpl.pylab.imshow(img) # Comes upside down > aimg = asarray(img) > mpl.pylab.imshow(aimg) # Comes OK Ah, that was it. Apparently, matplotlib imshow uses a different conversion mechanism if the input is a Image.Image from PIL, than if it is an array. There's a dedicated function matplotlib.image.pil_to_array which seems to work differently from asarray. This may actually be a bug in matplotlib; perhaps you should ask the people on the matplotlib lists if this is really the intended behavior. -- Pauli Virtanen From jorgesmbox-ml at yahoo.es Sat May 16 18:42:16 2009 From: jorgesmbox-ml at yahoo.es (Jorge Scandaliaris) Date: Sat, 16 May 2009 22:42:16 +0000 (UTC) Subject: [Numpy-discussion] Question about slicing References: <519625.91803.qm@web27901.mail.ukl.yahoo.com> Message-ID: Pauli Virtanen iki.fi> writes: > > img = array(img)[::-1] > > Note that here a copy is made. You can use `asarray` instead of `array` > if you want to avoid making a copy. > Thanks, that's good info! > > and it worked!, but I am interested actually in sub-regions of this > > image, so the next I did was: > > > > roi = aimg[10:20,45:50,:] > > > > And to my surprise the result was like if I was slicing the original, > > upside down, image instead of aimg. Can someone explain me what's going > > on here? > > Sounds impossible, and I don't see this: > > In [1]: import Image > In [2]: img = Image.open('foo.png') > In [3]: aimg = array(img) > In [4]: imshow(aimg) > Out[4]: > In [5]: imshow(aimg[10:320,5:150]) > Out[5]: > > The image is here right-side up, both in full and the slice (since imshow > flips it). Also, > > In [6]: aimg = array(img)[::-1] > In [7]: imshow(aimg[10:320,5:150]) > Out[7]: > > Now, the image is upside down, both in full and in the slice. > > I think you should re-check that you are doing what you think you are > doing. Preparing a self-contained code example could help here, at least > this would make pinpointing where the error is more easy. > You're right. I was using imshow to see img (the IPL iamge, not the numpy array), and that comes upside down, at least here. That made me think the numpy array was upside down too when in fact it wasn't, so my 'fix' actually was flipping it. I'll further investigate as why the IPL image appears upside down, but my questions about slicing are answered now. Sorry for mixing things up, and thanks for helping out. Jorge From aisaac at american.edu Sat May 16 18:51:26 2009 From: aisaac at american.edu (Alan G Isaac) Date: Sat, 16 May 2009 18:51:26 -0400 Subject: [Numpy-discussion] linear algebra help In-Reply-To: References: Message-ID: <4A0F436E.4020900@american.edu> On 5/16/2009 9:01 AM Quilby apparently wrote: > Ax = y > Where A is a rational m*n matrix (m<=n), and x and y are vectors of > the right size. I know A and y, I don't know what x is equal to. I > also know that there is no x where Ax equals exactly y. If m<=n, that can only be true if there are not m linearly independent columns of A. Are you sure you have the dimensions right? Alan Isaac From ljhardy at gmail.com Sat May 16 20:26:05 2009 From: ljhardy at gmail.com (ljhardy) Date: Sat, 16 May 2009 17:26:05 -0700 (PDT) Subject: [Numpy-discussion] Leopard install In-Reply-To: <0EAFD24D-078E-4EAD-AEF0-BC5B55789CBD@cinci.rr.com> References: <0EAFD24D-078E-4EAD-AEF0-BC5B55789CBD@cinci.rr.com> Message-ID: <23579011.post@talk.nabble.com> I'm continuing to have this problem. I have installed Python 2.6.2 from the source that is found on www.python.org. I'm running Leopard 10.5.7. Entering "python" from the shell shows: Python 2.6.2 (r262:71600, May 16 2009, 19:04:59) [GCC 4.0.1 (Apple Inc. build 5465)] on darwin Type "help", "copyright", "credits" or "license" for more information. >>> Stuart Edwards wrote: > > Hi > > I am trying to install numpy 1.3.0 on Leopard 10.5.6 and at the point > in the install process where I select a destination, my boot disc is > excluded with the message: > > " You cannot install numpy 1.3.0 on this volume. numpy requires > System Python 2.5 to install." > > I'm not sure what 'System Python 2.5' is as compared to 'Python 2.5' > but in the terminal when I type 'python' I get: > > "Python 2.5.1 (r251:54863, Jan 13 2009, 10:26:13) [GCC 4.0.1 (Apple > Inc. build 5465)] on darwin" > > so the Python 2.5 requirement seems to be met. Any ideas on what is > happening here and why the installer can't see my python installation? > > (I notice that this issue was raised 3/28 also, but no resolution yet) > > Thanks for any assistance > > Stu > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- View this message in context: http://www.nabble.com/Leopard-install-tp23012456p23579011.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From robert.kern at gmail.com Sat May 16 20:28:52 2009 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 16 May 2009 19:28:52 -0500 Subject: [Numpy-discussion] Leopard install In-Reply-To: <23579011.post@talk.nabble.com> References: <0EAFD24D-078E-4EAD-AEF0-BC5B55789CBD@cinci.rr.com> <23579011.post@talk.nabble.com> Message-ID: <3d375d730905161728l3f09cfa7q9e80c6b899cb3dcc@mail.gmail.com> On Sat, May 16, 2009 at 19:26, ljhardy wrote: > > I'm continuing to have this problem. ?I have installed Python 2.6.2 from the > source that is found on www.python.org. ?I'm running Leopard 10.5.7. You cannot use a binary of numpy built for Python 2.5 with your Python 2.6. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From sccolbert at gmail.com Sat May 16 23:12:42 2009 From: sccolbert at gmail.com (Chris Colbert) Date: Sat, 16 May 2009 23:12:42 -0400 Subject: [Numpy-discussion] Question about slicing In-Reply-To: References: <519625.91803.qm@web27901.mail.ukl.yahoo.com> Message-ID: <7f014ea60905162012s46755b2ehdc58d763f626f62d@mail.gmail.com> the reason for all this is that the bitmap image format specifies the image origin as the lower left corner. This is the convention used by PIL. The origin of a numpy array is the upper right corner. Matplot lib does not handle this discrepancy in the function pil_to_array, which is called internally when you invoke imshow(img) on a PIL image. Recently, PIL has implemented the array interface for PIL images. So if you call asarray(img) on a PIL image, you will get a (height, width, 3) array (for RGB) with the origin in the upper left corner. This is why the image appears right side up in matplotlib doing things this way. The matplot lib code should probably be updated to make use of the array interface. It just reshapes the raw string data currently. Chris On Sat, May 16, 2009 at 6:42 PM, Jorge Scandaliaris wrote: > Pauli Virtanen iki.fi> writes: > > > > > img = array(img)[::-1] > > > > Note that here a copy is made. You can use `asarray` instead of `array` > > if you want to avoid making a copy. > > > > Thanks, that's good info! > > > > and it worked!, but I am interested actually in sub-regions of this > > > image, so the next I did was: > > > > > > roi = aimg[10:20,45:50,:] > > > > > > And to my surprise the result was like if I was slicing the original, > > > upside down, image instead of aimg. Can someone explain me what's going > > > on here? > > > > Sounds impossible, and I don't see this: > > > > In [1]: import Image > > In [2]: img = Image.open('foo.png') > > In [3]: aimg = array(img) > > In [4]: imshow(aimg) > > Out[4]: > > In [5]: imshow(aimg[10:320,5:150]) > > Out[5]: > > > > The image is here right-side up, both in full and the slice (since imshow > > flips it). Also, > > > > In [6]: aimg = array(img)[::-1] > > In [7]: imshow(aimg[10:320,5:150]) > > Out[7]: > > > > Now, the image is upside down, both in full and in the slice. > > > > I think you should re-check that you are doing what you think you are > > doing. Preparing a self-contained code example could help here, at least > > this would make pinpointing where the error is more easy. > > > > You're right. I was using imshow to see img (the IPL iamge, not the numpy > array), and that comes upside down, at least here. That made me think the > numpy > array was upside down too when in fact it wasn't, so my 'fix' actually was > flipping it. > I'll further investigate as why the IPL image appears upside down, but my > questions about slicing are answered now. Sorry for mixing things up, and > thanks > for helping out. > > Jorge > > > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -------------- next part -------------- An HTML attachment was scrubbed... URL: From glenn at tarbox.org Sun May 17 01:24:34 2009 From: glenn at tarbox.org (Glenn Tarbox, PhD) Date: Sat, 16 May 2009 22:24:34 -0700 Subject: [Numpy-discussion] numpy slices limited to 32 bit values? In-Reply-To: <20090514145406.GB3630@phare.normalesup.org> References: <20090514084335.GD32437@phare.normalesup.org> <20090514091617.GF32437@phare.normalesup.org> <20090514145406.GB3630@phare.normalesup.org> Message-ID: Today at Sage Days we tried slices on a few large arrays (no mmap) and found that slicing breaks on arrays somewhere between 2.0e9 and 2.5e9 elements. The failure mode is the same, no error thrown, basically nothing happens This was on one of the big sage machines. I don't know the specific OS / CPU but it was definitely 64 bit and lots of available memory etc. -glenn On Thu, May 14, 2009 at 7:54 AM, Gael Varoquaux < gael.varoquaux at normalesup.org> wrote: > On Thu, May 14, 2009 at 07:40:58AM -0700, Glenn Tarbox, PhD wrote: > > Hum, I am wondering: could it be that Sage has not been compiled in > > 64bits? That number '32' seems to me to point toward a 32bit pointer > > issue (I may be wrong). > > > The other tests I posted indicate everything else is working... For > > example, np.sum(fp) runs over the full set of 1e10 doubes and seems to > > work fine. > > Correct. I had missed that. > > Ga?l > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -- Glenn H. Tarbox, PhD || 206-274-6919 http://www.tarbox.org -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sun May 17 06:32:39 2009 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 17 May 2009 10:32:39 +0000 (UTC) Subject: [Numpy-discussion] numpy slices limited to 32 bit values? References: <20090514084335.GD32437@phare.normalesup.org> <20090514091617.GF32437@phare.normalesup.org> <20090514145406.GB3630@phare.normalesup.org> Message-ID: Hi, Sat, 16 May 2009 22:24:34 -0700, Glenn Tarbox, PhD wrote: > Today at Sage Days we tried slices on a few large arrays (no mmap) and > found that slicing breaks on arrays somewhere between 2.0e9 and 2.5e9 > elements. The failure mode is the same, no error thrown, basically > nothing happens > > This was on one of the big sage machines. I don't know the specific OS / > CPU but it was definitely 64 bit and lots of available memory etc. Could you file a bug ticket in the Numpy Trac, http://projects.scipy.org/numpy so that there's a better chance that this doesn't get forgotten. Thanks, -- Pauli Virtanen From charlesr.harris at gmail.com Sun May 17 10:51:27 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 17 May 2009 08:51:27 -0600 Subject: [Numpy-discussion] numpy slices limited to 32 bit values? In-Reply-To: References: <20090514084335.GD32437@phare.normalesup.org> <20090514091617.GF32437@phare.normalesup.org> <20090514145406.GB3630@phare.normalesup.org> Message-ID: Hi Glen, On Sat, May 16, 2009 at 11:24 PM, Glenn Tarbox, PhD wrote: > Today at Sage Days we tried slices on a few large arrays (no mmap) and > found that slicing breaks on arrays somewhere between 2.0e9 and 2.5e9 > elements. The failure mode is the same, no error thrown, basically nothing > happens > > This was on one of the big sage machines. I don't know the specific OS / > CPU but it was definitely 64 bit and lots of available memory etc. > Can you try slicing with an explicit upper bound? Something like a[:n] = 1, where n is the size of the array. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sun May 17 11:14:40 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 17 May 2009 09:14:40 -0600 Subject: [Numpy-discussion] numpy slices limited to 32 bit values? In-Reply-To: References: <20090514084335.GD32437@phare.normalesup.org> <20090514091617.GF32437@phare.normalesup.org> <20090514145406.GB3630@phare.normalesup.org> Message-ID: On Sun, May 17, 2009 at 8:51 AM, Charles R Harris wrote: > Hi Glen, > > On Sat, May 16, 2009 at 11:24 PM, Glenn Tarbox, PhD wrote: > >> Today at Sage Days we tried slices on a few large arrays (no mmap) and >> found that slicing breaks on arrays somewhere between 2.0e9 and 2.5e9 >> elements. The failure mode is the same, no error thrown, basically nothing >> happens >> >> This was on one of the big sage machines. I don't know the specific OS / >> CPU but it was definitely 64 bit and lots of available memory etc. >> > > Can you try slicing with an explicit upper bound? Something like a[:n] = 1, > where n is the size of the array. > And maybe some things like a[n:n+1] = 1, which should only set a single element and might save some time ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From quilby at gmail.com Sun May 17 12:14:05 2009 From: quilby at gmail.com (Quilby) Date: Sun, 17 May 2009 19:14:05 +0300 Subject: [Numpy-discussion] linear algebra help In-Reply-To: <4A0F436E.4020900@american.edu> References: <4A0F436E.4020900@american.edu> Message-ID: Right the dimensions I gave were wrong. What do I need to do for m>=n (more rows than columns)? Can I use the same function? When I run the script written by Nils (thanks!) I get: from numpy.random import rand, seed ImportError: No module named random But importing numpy works ok. What do I need to install? Thanks again! On Sun, May 17, 2009 at 1:51 AM, Alan G Isaac wrote: > On 5/16/2009 9:01 AM Quilby apparently wrote: >> Ax = y >> Where A is a rational m*n matrix (m<=n), and x and y are vectors of >> the right size. I know A and y, I don't know what x is equal to. I >> also know that there is no x where Ax equals exactly y. > > If m<=n, that can only be true if there are not > m linearly independent columns of A. Are you > sure you have the dimensions right? > > Alan Isaac > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Sun May 17 13:21:16 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 17 May 2009 13:21:16 -0400 Subject: [Numpy-discussion] linear algebra help In-Reply-To: References: <4A0F436E.4020900@american.edu> Message-ID: <1cd32cbb0905171021x6bbb31a2n818bc25e091a4c9e@mail.gmail.com> On Sun, May 17, 2009 at 12:14 PM, Quilby wrote: > Right the dimensions I gave were wrong. > What do I need to do for m>=n (more rows than columns)? ?Can I use the > same function? > > When I run the script written by Nils (thanks!) I get: > ? ?from numpy.random import rand, seed > ImportError: No module named random > > But importing numpy works ok. What do I need to install? This should be working without extra install. You could run the test suite, numpy.test(), to see whether your install is ok. Otherwise, you would need to provide more information, numpy version, .... np.lstsq works for m>n, mn (more observations than parameters) is the standard least squares estimation problem. Josef > > Thanks again! > > On Sun, May 17, 2009 at 1:51 AM, Alan G Isaac wrote: >> On 5/16/2009 9:01 AM Quilby apparently wrote: >>> Ax = y >>> Where A is a rational m*n matrix (m<=n), and x and y are vectors of >>> the right size. I know A and y, I don't know what x is equal to. I >>> also know that there is no x where Ax equals exactly y. >> >> If m<=n, that can only be true if there are not >> m linearly independent columns of A. ?Are you >> sure you have the dimensions right? >> >> Alan Isaac >> From gokhansever at gmail.com Sun May 17 19:54:33 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_SEVER?=) Date: Sun, 17 May 2009 18:54:33 -0500 Subject: [Numpy-discussion] Savetxt usage question Message-ID: <49d6b3500905171654g6abd4b93x386c9b046ed87bc3@mail.gmail.com> Hello, Is there a way to write a header information to a text file using savetxt command besides dumping arrays in the same file? In little detailed fashion: I have to write a few long column of arrays into a text file. While doing that I need to put some information regarding to the context of the file. Like variable names, project date, missing value equivalent etc... So far, I couldn't see that this could be achieved with one savetxt command. However there might be an easy point that I am missing. Thank you. G?khan -------------- next part -------------- An HTML attachment was scrubbed... URL: From michael.s.gilbert at gmail.com Sun May 17 19:57:37 2009 From: michael.s.gilbert at gmail.com (Michael S. Gilbert) Date: Sun, 17 May 2009 19:57:37 -0400 Subject: [Numpy-discussion] Savetxt usage question In-Reply-To: <49d6b3500905171654g6abd4b93x386c9b046ed87bc3@mail.gmail.com> References: <49d6b3500905171654g6abd4b93x386c9b046ed87bc3@mail.gmail.com> Message-ID: <20090517195737.fc42172d.michael.s.gilbert@gmail.com> fid = open( 'file' , 'w' ) fid.write( 'header\n' ) savetxt( fid , data ) fid.close() On Sun, 17 May 2009 18:54:33 -0500 G?khan SEVER wrote: > Hello, > > Is there a way to write a header information to a text file using savetxt > command besides dumping arrays in the same file? > > In little detailed fashion: I have to write a few long column of arrays into > a text file. While doing that I need to put some information regarding to > the context of the file. Like variable names, project date, missing value > equivalent etc... > > So far, I couldn't see that this could be achieved with one savetxt command. > However there might be an easy point that I am missing. > > Thank you. > > G?khan > From gokhansever at gmail.com Sun May 17 20:06:29 2009 From: gokhansever at gmail.com (=?UTF-8?Q?G=C3=B6khan_SEVER?=) Date: Sun, 17 May 2009 19:06:29 -0500 Subject: [Numpy-discussion] Savetxt usage question In-Reply-To: <20090517195737.fc42172d.michael.s.gilbert@gmail.com> References: <49d6b3500905171654g6abd4b93x386c9b046ed87bc3@mail.gmail.com> <20090517195737.fc42172d.michael.s.gilbert@gmail.com> Message-ID: <49d6b3500905171706p796c9f71j2438d22a333c6d3f@mail.gmail.com> Thanks for the quick reply. Exact solution ! G?khan On Sun, May 17, 2009 at 6:57 PM, Michael S. Gilbert < michael.s.gilbert at gmail.com> wrote: > fid = open( 'file' , 'w' ) > fid.write( 'header\n' ) > savetxt( fid , data ) > fid.close() > > On Sun, 17 May 2009 18:54:33 -0500 G?khan SEVER wrote: > > > Hello, > > > > Is there a way to write a header information to a text file using savetxt > > command besides dumping arrays in the same file? > > > > In little detailed fashion: I have to write a few long column of arrays > into > > a text file. While doing that I need to put some information regarding to > > the context of the file. Like variable names, project date, missing value > > equivalent etc... > > > > So far, I couldn't see that this could be achieved with one savetxt > command. > > However there might be an easy point that I am missing. > > > > Thank you. > > > > G?khan > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From sebastian.walter at gmail.com Mon May 18 04:05:21 2009 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Mon, 18 May 2009 10:05:21 +0200 Subject: [Numpy-discussion] linear algebra help In-Reply-To: <1cd32cbb0905171021x6bbb31a2n818bc25e091a4c9e@mail.gmail.com> References: <4A0F436E.4020900@american.edu> <1cd32cbb0905171021x6bbb31a2n818bc25e091a4c9e@mail.gmail.com> Message-ID: Alternatively, to solve A x = b you could do import numpy import numpy.linalg B = numpy.dot(A.T, A) c = numpy.dot(A.T, b) x = numpy.linalg(B,c) This is not the most efficient way to do it but at least you know exactly what's going on in your code. On Sun, May 17, 2009 at 7:21 PM, wrote: > On Sun, May 17, 2009 at 12:14 PM, Quilby wrote: >> Right the dimensions I gave were wrong. >> What do I need to do for m>=n (more rows than columns)? Can I use the >> same function? >> >> When I run the script written by Nils (thanks!) I get: >> from numpy.random import rand, seed >> ImportError: No module named random >> >> But importing numpy works ok. What do I need to install? > > This should be working without extra install. You could run the test > suite, numpy.test(), to see whether your install is ok. > > Otherwise, you would need to provide more information, numpy version, .... > > np.lstsq works for m>n, m solution is different in the 3 cases. > m>n (more observations than parameters) is the standard least squares > estimation problem. > > Josef > > >> >> Thanks again! >> >> On Sun, May 17, 2009 at 1:51 AM, Alan G Isaac wrote: >>> On 5/16/2009 9:01 AM Quilby apparently wrote: >>>> Ax = y >>>> Where A is a rational m*n matrix (m<=n), and x and y are vectors of >>>> the right size. I know A and y, I don't know what x is equal to. I >>>> also know that there is no x where Ax equals exactly y. >>> >>> If m<=n, that can only be true if there are not >>> m linearly independent columns of A. Are you >>> sure you have the dimensions right? >>> >>> Alan Isaac >>> > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From stefan at sun.ac.za Mon May 18 04:21:44 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 18 May 2009 10:21:44 +0200 Subject: [Numpy-discussion] linear algebra help In-Reply-To: References: <4A0F436E.4020900@american.edu> <1cd32cbb0905171021x6bbb31a2n818bc25e091a4c9e@mail.gmail.com> Message-ID: <9457e7c80905180121w68216bb4xdfe0ac2da1ff7bff@mail.gmail.com> 2009/5/18 Sebastian Walter : > B = numpy.dot(A.T, A) This multiplication should be avoided whenever possible -- you are effectively squaring your condition number. In the case where you have more rows than columns, use least squares. For square matrices use solve. For large sparse matrices, use GMRES or any of the others available in scipy.sparse.linalg. Regards St?fan From sebastian.walter at gmail.com Mon May 18 04:35:07 2009 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Mon, 18 May 2009 10:35:07 +0200 Subject: [Numpy-discussion] linear algebra help In-Reply-To: <9457e7c80905180121w68216bb4xdfe0ac2da1ff7bff@mail.gmail.com> References: <4A0F436E.4020900@american.edu> <1cd32cbb0905171021x6bbb31a2n818bc25e091a4c9e@mail.gmail.com> <9457e7c80905180121w68216bb4xdfe0ac2da1ff7bff@mail.gmail.com> Message-ID: 2009/5/18 St?fan van der Walt : > 2009/5/18 Sebastian Walter : >> B = numpy.dot(A.T, A) > > This multiplication should be avoided whenever possible -- you are > effectively squaring your condition number. Indeed. > > In the case where you have more rows than columns, use least squares. > For square matrices use solve. For large sparse matrices, use GMRES > or any of the others available in scipy.sparse.linalg. It is my impression that this is a linear algebra and not a numerics question. > > Regards > St?fan > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From rjsteed at talk21.com Mon May 18 06:48:04 2009 From: rjsteed at talk21.com (rob steed) Date: Mon, 18 May 2009 10:48:04 +0000 (GMT) Subject: [Numpy-discussion] Problem with correlate Message-ID: <294586.51207.qm@web86006.mail.ird.yahoo.com> Hi all, I have been using numpy.correlate and was finding something weird. I now think that there might be a bug. Correlations should be order dependent eg. correlate(x,y) != correlate(y,x) in general (whereas convolutions are symmetric) >>> import numpy as N >>> x = N.array([1,0,0]) >>> y = N.array([0,0,1]) >>> N.correlate(x,y,'full') array([1, 0, 0, 0, 0]) >>> N.correlate(y,x,'full') array([0, 0, 0, 0, 1]) This works fine. However, if the arrays have different lengths, we get a problem. >>> y2=N.array([0,0,0,1]) >>> N.correlate(x,y2,'full') array([0, 0, 0, 0, 0, 1]) >>> N.correlate(y2,x,'full') array([0, 0, 0, 0, 0, 1]) I believe that somewhere in the code, the arrays are re-ordered by their length. Initially I thought that this was because correlate was deriving from convolution but looking at numpy.core, I can see that in fact convolution derives from correlate. After that, it becomes C code which I haven't managed to look at yet. Am I correct, is this a bug? regards Rob Steed From stefan at sun.ac.za Mon May 18 08:38:37 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Mon, 18 May 2009 14:38:37 +0200 Subject: [Numpy-discussion] Problem with correlate In-Reply-To: <294586.51207.qm@web86006.mail.ird.yahoo.com> References: <294586.51207.qm@web86006.mail.ird.yahoo.com> Message-ID: <9457e7c80905180538j287740eak2f30f6381992b4c8@mail.gmail.com> 2009/5/18 rob steed : > This works fine. However, if the arrays have different lengths, we get a problem. > >>>> y2=N.array([0,0,0,1]) >>>> N.correlate(x,y2,'full') This looks like a bug to me. In [54]: N.correlate([1, 0, 0, 0], [0, 0, 0, 1],'full') Out[54]: array([1, 0, 0, 0, 0, 0, 0]) In [55]: N.correlate([1, 0, 0, 0, 0], [0, 0, 0, 1],'full') Out[55]: array([1, 0, 0, 0, 0, 0, 0, 0]) In [56]: N.correlate([1, 0, 0, 0, 0], [0, 0, 0, 0, 1],'full') Out[56]: array([1, 0, 0, 0, 0, 0, 0, 0, 0]) In [57]: N.correlate([1, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1],'full') Out[57]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1]) Regards St?fan From darkgl0w at yahoo.com Mon May 18 08:37:09 2009 From: darkgl0w at yahoo.com (Cristi Constantin) Date: Mon, 18 May 2009 05:37:09 -0700 (PDT) Subject: [Numpy-discussion] Overlap arrays with "transparency" Message-ID: <739378.72909.qm@web52104.mail.re2.yahoo.com> Good day. I am working on this algorithm for a few weeks now, so i tried almost everything... I want to overlap / overwrite 2 matrices, but completely ignore some values (in this case ignore 0) Let me explain: a = [ [1, 2, 3, 4, 5], [9,7], [0,0,0,0,0], [5,5,5] ] b = [ [0,0,9,9], [1,1,1,1], [2,2,2,2] ] Then, we have: a over b = [ [1,2,3,4,5], [9,7,1,1], [1,1,1,1,0], [5,5,5,2] ] b over a = [ [0,0,9,9,5], 1,1,1,1], 2,2,2,2,0], 5,5,5] ] That means, completely overwrite one list of arrays over the other, not matter what values one has, not matter the size, just ignore 0 values on overwriting. I checked the documentation, i just need some tips. TempA = [[]] # One For Cicle in here to get the Element data... ??? Data = vElem.data???????????????? # This is a list of numpy ndarrays. ??? # ??? for nr_row in range( len(Data) ): # For each numpy ndarray (row) in Data. ??????? # ??????? NData = Data[nr_row]?????????????????? # New data, to be written over old data. ??????? OData = TempA[nr_row:nr_row+1] or [[]] # This is old data. Can be numpy ndarray, or empty list. ??????? OData = OData[0] ??????? # ??????? # NData must completely eliminate transparent pixels... here comes the algorithm... No algorithm yet. ??????? # ??????? if len(NData) >= len(OData): ??????????? # If new data is longer than old data, old data will be completely overwritten. ??????????? TempA[nr_row:nr_row+1] = [NData] ??????? else: # Old data is longer than new data ; old data cannot be null. ??????????? TempB = np.copy(OData) ??????????? TempB.put( range(len(NData)), NData ) ??????????? #TempB[0:len(NData)-1] = NData # This returns "ValueError: shape mismatch: objects cannot be broadcast to a single shape" ??????????? TempA[nr_row:nr_row+1] = [TempB] ??????????? del TempB ??????? # ??? # # The result is stored inside TempA as list of numpy arrays. I would use 2D arrays, but they are slower than Python Lists containing Numpy arrays. I need to do this overwrite in a very big loop and every delay is very important. I tried to create a masked array where all "zero" values are ignored on overlap, but it doesn't work. Masked or not, the "transparent" values are still overwritten. Please, any suggestion is useful. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon May 18 10:21:20 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 18 May 2009 10:21:20 -0400 Subject: [Numpy-discussion] Problem with correlate In-Reply-To: <9457e7c80905180538j287740eak2f30f6381992b4c8@mail.gmail.com> References: <294586.51207.qm@web86006.mail.ird.yahoo.com> <9457e7c80905180538j287740eak2f30f6381992b4c8@mail.gmail.com> Message-ID: <1cd32cbb0905180721u5426cea2ibc79cc4655f679fc@mail.gmail.com> 2009/5/18 St?fan van der Walt : > 2009/5/18 rob steed : >> This works fine. However, if the arrays have different lengths, we get a problem. >> >>>>> y2=N.array([0,0,0,1]) >>>>> N.correlate(x,y2,'full') > > This looks like a bug to me. > > In [54]: N.correlate([1, 0, 0, 0], [0, 0, 0, 1],'full') > Out[54]: array([1, 0, 0, 0, 0, 0, 0]) > > In [55]: N.correlate([1, 0, 0, 0, 0], [0, 0, 0, 1],'full') > Out[55]: array([1, 0, 0, 0, 0, 0, 0, 0]) > > In [56]: N.correlate([1, 0, 0, 0, 0], [0, 0, 0, 0, 1],'full') > Out[56]: array([1, 0, 0, 0, 0, 0, 0, 0, 0]) > > In [57]: N.correlate([1, 0, 0, 0, 0], [0, 0, 0, 0, 0, 1],'full') > Out[57]: array([0, 0, 0, 0, 0, 0, 0, 0, 0, 1]) > comparing with scipy: signal.correlate behaves the same "flipping" way as np.correlate, ndimage.correlate keeps the orientation. >>> np.correlate([1, 2, 0, 0, 0], [0, 0, 1, 0, 0,0],'same') array([0, 0, 0, 2, 1, 0]) >>> np.correlate([1, 2, 0, 0, 0], [0, 0, 1, 0, 0],'same') array([1, 2, 0, 0, 0]) >>> np.correlate([1, 2, 0, 0, 0], [0, 0, 1, 0, 0, 0],'full') array([0, 0, 0, 0, 0, 2, 1, 0, 0, 0]) >>> np.correlate([1, 2, 0, 0, 0], [0, 0, 1, 0, 0],'full') array([0, 0, 1, 2, 0, 0, 0, 0, 0]) >>> >>> signal.correlate([1, 2, 0, 0, 0], [0, 0, 1, 0, 0, 0]) array([0, 0, 0, 0, 0, 2, 1, 0, 0, 0]) >>> signal.correlate([1, 2, 0, 0, 0], [0, 0, 1, 0, 0]) array([0, 0, 1, 2, 0, 0, 0, 0, 0]) >>> ndimage.filters.correlate([1, 2, 0, 0, 0], [0, 0, 1, 0, 0, 0],mode='constant') array([0, 1, 2, 0, 0]) >>> ndimage.filters.correlate([1, 2, 0, 0, 0], [0, 0, 1, 0, 0],mode='constant') array([1, 2, 0, 0, 0]) Josef From millman at berkeley.edu Mon May 18 10:47:45 2009 From: millman at berkeley.edu (Jarrod Millman) Date: Mon, 18 May 2009 07:47:45 -0700 Subject: [Numpy-discussion] SciPy 2009 Call for Papers Message-ID: ========================== SciPy 2009 Call for Papers ========================== SciPy 2009, the 8th Python in Science conference, will be held from August 18-23, 2009 at Caltech in Pasadena, CA, USA. Each year SciPy attracts leading figures in research and scientific software development with Python from a wide range of scientific and engineering disciplines. The focus of the conference is both on scientific libraries and tools developed with Python and on scientific or engineering achievements using Python. We welcome contributions from the industry as well as the academic world. Indeed, industrial research and development as well academic research face the challenge of mastering IT tools for exploration, modeling and analysis. We look forward to hearing your recent breakthroughs using Python! Submission of Papers ==================== The program features tutorials, contributed papers, lightning talks, and bird-of-a-feather sessions. We are soliciting talks and accompanying papers (either formal academic or magazine-style articles) that discuss topics which center around scientific computing using Python. These include applications, teaching, future development directions, and research. A collection of peer-reviewed articles will be published as part of the proceedings. Proposals for talks are submitted as extended abstracts. There are two categories of talks: Paper presentations These talks are 35 minutes in duration (including questions). A one page abstract of no less than 500 words (excluding figures and references) should give an outline of the final paper. Proceeding papers are due two weeks after the conference, and may be in a formal academic style, or in a more relaxed magazine-style format. Rapid presentations These talks are 10 minutes in duration. An abstract of between 300 and 700 words should describe the topic and motivate its relevance to scientific computing. In addition, there will be an open session for lightning talks during which any attendee willing to do so is invited to do a couple-of-minutes-long presentation. If you wish to present a talk at the conference, please create an account on the website (http://conference.scipy.org). You may then submit an abstract by logging in, clicking on your profile and following the "Submit an abstract" link. Submission Guidelines --------------------- * Submissions should be uploaded via the online form. * Submissions whose main purpose is to promote a commercial product or service will be refused. * All accepted proposals must be presented at the SciPy conference by at least one author. * Authors of an accepted proposal can provide a final paper for publication in the conference proceedings. Final papers are limited to 7 pages, including diagrams, figures, references, and appendices. The papers will be reviewed to help ensure the high-quality of the proceedings. For further information, please visit the conference homepage: http://conference.scipy.org. Important Dates =============== * Friday, June 26: Abstracts Due * Saturday, July 4: Announce accepted talks, post schedule * Friday, July 10: Early Registration ends * Tuesday-Wednesday, August 18-19: Tutorials * Thursday-Friday, August 20-21: Conference * Saturday-Sunday, August 22-23: Sprints * Friday, September 4: Papers for proceedings due Tutorials ========= Two days of tutorials to the scientific Python tools will precede the conference. There will be two tracks: one for introduction of the basic tools to beginners and one for more advanced tools. Tutorials will be announced later. Birds of a Feather Sessions =========================== If you wish to organize a birds-of-a-feather session to discuss some specific area of scientific development with Python, please contact the organizing committee. Executive Committee =================== * Jarrod Millman, UC Berkeley, USA (Conference Chair) * Ga?l Varoquaux, INRIA Saclay, France (Program Co-Chair) * St?fan van der Walt, University of Stellenbosch, South Africa (Program Co-Chair) * Fernando P?rez, UC Berkeley, USA (Tutorial Chair) From charlesr.harris at gmail.com Mon May 18 10:55:30 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 18 May 2009 08:55:30 -0600 Subject: [Numpy-discussion] linear algebra help In-Reply-To: <9457e7c80905180121w68216bb4xdfe0ac2da1ff7bff@mail.gmail.com> References: <4A0F436E.4020900@american.edu> <1cd32cbb0905171021x6bbb31a2n818bc25e091a4c9e@mail.gmail.com> <9457e7c80905180121w68216bb4xdfe0ac2da1ff7bff@mail.gmail.com> Message-ID: 2009/5/18 St?fan van der Walt > 2009/5/18 Sebastian Walter : > > B = numpy.dot(A.T, A) > > This multiplication should be avoided whenever possible -- you are > effectively squaring your condition number. > Although the condition number doesn't mean much unless the columns are normalized. Having badly scaled columns can lead to problems with lstsq because of its default cutoff based on the condition number. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon May 18 11:35:41 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 18 May 2009 11:35:41 -0400 Subject: [Numpy-discussion] linear algebra help In-Reply-To: References: <4A0F436E.4020900@american.edu> <1cd32cbb0905171021x6bbb31a2n818bc25e091a4c9e@mail.gmail.com> <9457e7c80905180121w68216bb4xdfe0ac2da1ff7bff@mail.gmail.com> Message-ID: <1cd32cbb0905180835x7002255ev3fe53ae9ff1b2f0b@mail.gmail.com> On Mon, May 18, 2009 at 10:55 AM, Charles R Harris wrote: > > > 2009/5/18 St?fan van der Walt >> >> 2009/5/18 Sebastian Walter : >> > B = numpy.dot(A.T, A) >> >> This multiplication should be avoided whenever possible -- you are >> effectively squaring your condition number. > > Although the condition number doesn't mean much unless the columns are > normalized. Having badly scaled columns can lead to problems with lstsq > because of its default cutoff based on the condition number. > > Chuck Do you know if any of the linalg methods, np.linalg.lstsq or scipy.linalg.lstsq, do any normalization internally to improve numerical accuracy? I saw automatic internal normalization (e.g. rescaling) for some econometrics methods, and was wondering whether we should do this also in stats.models or whether scipy.linalg is already taking care of this. I have only vague knowledge of the numerical precision of different linear algebra methods. Thanks, Josef From michael.s.gilbert at gmail.com Mon May 18 11:38:25 2009 From: michael.s.gilbert at gmail.com (Michael S. Gilbert) Date: Mon, 18 May 2009 11:38:25 -0400 Subject: [Numpy-discussion] Overlap arrays with "transparency" In-Reply-To: <739378.72909.qm@web52104.mail.re2.yahoo.com> References: <739378.72909.qm@web52104.mail.re2.yahoo.com> Message-ID: <20090518113825.2e360839.michael.s.gilbert@gmail.com> On Mon, 18 May 2009 05:37:09 -0700 (PDT), Cristi Constantin wrote: > Good day. > I am working on this algorithm for a few weeks now, so i tried almost everything... > I want to overlap / overwrite 2 matrices, but completely ignore some values (in this case ignore 0) > Let me explain: > > a = [ > [1, 2, 3, 4, 5], > [9,7], > [0,0,0,0,0], > [5,5,5] ] > > b = [ > [0,0,9,9], > [1,1,1,1], > [2,2,2,2] ] > > Then, we have: > > a over b = [ > [1,2,3,4,5], > [9,7,1,1], > [1,1,1,1,0], > [5,5,5,2] ] > > b over a = [ > [0,0,9,9,5], > 1,1,1,1], > 2,2,2,2,0], > 5,5,5] ] > > That means, completely overwrite one list of arrays over the other, not matter what values one has, not matter the size, just ignore 0 values on overwriting. > I checked the documentation, i just need some tips. > > TempA = [[]] > # > One For Cicle in here to get the Element data... > ??? Data = vElem.data???????????????? # This is a list of numpy ndarrays. > ??? # > ??? for nr_row in range( len(Data) ): # For each numpy ndarray (row) in Data. > ??????? # > ??????? NData = Data[nr_row]?????????????????? # New data, to be written over old data. > ??????? OData = TempA[nr_row:nr_row+1] or [[]] # This is old data. Can be numpy ndarray, or empty list. > ??????? OData = OData[0] > ??????? # > ??????? # NData must completely eliminate transparent pixels... here comes the algorithm... No algorithm yet. > ??????? # > ??????? if len(NData) >= len(OData): > ??????????? # If new data is longer than old data, old data will be completely overwritten. > ??????????? TempA[nr_row:nr_row+1] = [NData] > ??????? else: # Old data is longer than new data ; old data cannot be null. > ??????????? TempB = np.copy(OData) > ??????????? TempB.put( range(len(NData)), NData ) > ??????????? #TempB[0:len(NData)-1] = NData # This returns "ValueError: shape mismatch: objects cannot be broadcast to a single shape" > ??????????? TempA[nr_row:nr_row+1] = [TempB] > ??????????? del TempB > ??????? # > ??? # > # > The result is stored inside TempA as list of numpy arrays. > > I would use 2D arrays, but they are slower than Python Lists containing Numpy arrays. I need to do this overwrite in a very big loop and every delay is very important. > I tried to create a masked array where all "zero" values are ignored on overlap, but it doesn't work. Masked or not, the "transparent" values are still overwritten. > Please, any suggestion is useful. your code will certainly be slow if you do no preallocate memory for your arrays. and i would suggest using numpy's array class instead of lists. a = numpy.array( a ) b = numpy.array( b ) c = numpy.zeros( ( max( ( len(a[:,0]) , len(b[:,0]) ) ) , max( ( len(a[0,:]) , len(b[0,:]) ) ) , int ) From charlesr.harris at gmail.com Mon May 18 11:49:17 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 18 May 2009 09:49:17 -0600 Subject: [Numpy-discussion] linear algebra help In-Reply-To: <1cd32cbb0905180835x7002255ev3fe53ae9ff1b2f0b@mail.gmail.com> References: <4A0F436E.4020900@american.edu> <1cd32cbb0905171021x6bbb31a2n818bc25e091a4c9e@mail.gmail.com> <9457e7c80905180121w68216bb4xdfe0ac2da1ff7bff@mail.gmail.com> <1cd32cbb0905180835x7002255ev3fe53ae9ff1b2f0b@mail.gmail.com> Message-ID: On Mon, May 18, 2009 at 9:35 AM, wrote: > On Mon, May 18, 2009 at 10:55 AM, Charles R Harris > wrote: > > > > > > 2009/5/18 St?fan van der Walt > >> > >> 2009/5/18 Sebastian Walter : > >> > B = numpy.dot(A.T, A) > >> > >> This multiplication should be avoided whenever possible -- you are > >> effectively squaring your condition number. > > > > Although the condition number doesn't mean much unless the columns are > > normalized. Having badly scaled columns can lead to problems with lstsq > > because of its default cutoff based on the condition number. > > > > Chuck > > Do you know if any of the linalg methods, np.linalg.lstsq or > scipy.linalg.lstsq, do any normalization internally to improve > numerical accuracy? > - They don't. Although, IIRC, lapack provides routines for doing so. Maybe there is another least squares routine that does the scaling. > > I saw automatic internal normalization (e.g. rescaling) for some > econometrics methods, and was wondering whether we should do this also > in stats.models or whether scipy.linalg is already taking care of > this. I have only vague knowledge of the numerical precision of > different linear algebra methods. > It's a good idea. Otherwise the condition number depends on choice of units and other such extraneous things. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Mon May 18 11:50:50 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 18 May 2009 11:50:50 -0400 Subject: [Numpy-discussion] Overlap arrays with "transparency" In-Reply-To: <20090518113825.2e360839.michael.s.gilbert@gmail.com> References: <739378.72909.qm@web52104.mail.re2.yahoo.com> <20090518113825.2e360839.michael.s.gilbert@gmail.com> Message-ID: <1cd32cbb0905180850v2b0b2dedr71051f855463ddac@mail.gmail.com> On Mon, May 18, 2009 at 11:38 AM, Michael S. Gilbert wrote: > On Mon, 18 May 2009 05:37:09 -0700 (PDT), Cristi Constantin wrote: >> Good day. >> I am working on this algorithm for a few weeks now, so i tried almost everything... >> I want to overlap / overwrite 2 matrices, but completely ignore some values (in this case ignore 0) >> Let me explain: >> >> a = [ >> [1, 2, 3, 4, 5], >> [9,7], >> [0,0,0,0,0], >> [5,5,5] ] >> >> b = [ >> [0,0,9,9], >> [1,1,1,1], >> [2,2,2,2] ] >> >> Then, we have: >> >> a over b = [ >> [1,2,3,4,5], >> [9,7,1,1], >> [1,1,1,1,0], >> [5,5,5,2] ] >> >> b over a = [ >> [0,0,9,9,5], >> 1,1,1,1], >> 2,2,2,2,0], >> 5,5,5] ] >> If you can convert the list of lists to a common rectangular shape (masking missing values or assigning nans), then conditional overwriting is very easy, something like mask = a>0 a[mask] =b[mask] but for lists of lists with unequal shape, there might not be anything faster than looping. Josef From strozzi2 at llnl.gov Mon May 18 12:21:39 2009 From: strozzi2 at llnl.gov (David J Strozzi) Date: Mon, 18 May 2009 09:21:39 -0700 Subject: [Numpy-discussion] Indexing with callables (was: Yorick-like functionality) In-Reply-To: References: <91cf711d0905151309v5ab033f9r7c3c3e472e74bfff@mail.gmail.com> Message-ID: > >The actual list of Yorick functions relevant here appears to be here: > > http:// yorick.sourceforge.net/manual/yorick_46.php#SEC46 > http:// yorick.sourceforge.net/manual/yorick_47.php#SEC47 > >I must say that I don't see many functions missing in Numpy... > >David (Strozzi): are these the functions you meant? Are there more? > >-- >Pauli Virtanen Paul et al, I see the numpy list is quite active, and I appreciate the discussion I started - and then went silent about! foo(zcen,dif) is indeed syntax sugar, but it can be quite sweet! Haven't you seen how lab rats react to feedings of aspartame or other sweeteners?? Anyway, the list above is indeed what I had in mind. It seems a few are missing, like zcen, and perhaps it's worth it to add to numpy proper rather than have everyone write their own one-liners (and then have to deal w/ it when sharing code). I leave it to the community's wisdom. It seems enough smart people have thought about the issue. I also like pointing out that Yorick was a fast, free environment developed by ~1990, when matlab/IDL were probably the only comparable games in town, but very few people ever used it. I think this is a case study in the triumph of marketing over substance. It looks like num/sci py are gaining enough momentum and visibility. Hopefully the numerical science community won't be re-inventing this same wheel in 5 years.... Dave From pav at iki.fi Mon May 18 14:23:02 2009 From: pav at iki.fi (Pauli Virtanen) Date: Mon, 18 May 2009 18:23:02 +0000 (UTC) Subject: [Numpy-discussion] Indexing with callables (was: Yorick-like functionality) References: <91cf711d0905151309v5ab033f9r7c3c3e472e74bfff@mail.gmail.com> Message-ID: Mon, 18 May 2009 09:21:39 -0700, David J Strozzi wrote: [clip] > I also like pointing out that Yorick was a fast, free environment > developed by ~1990, when matlab/IDL were probably the only comparable > games in town, but very few people ever used it. I think this is a case > study in the triumph of marketing over substance. It looks like num/sci > py are gaining enough momentum and visibility. Hopefully the numerical > science community won't be re-inventing this same wheel in 5 years.... Well, GNU Octave has been around about the same time, and the same for Scilab. Curiously enough, first public version >= 1.0 of all the three seem to have appeared around 1994. [1,2,3] (Maybe something was in the air that year...) So I'd claim this particular wheel has already been reinvented pretty thoroughly :) .. [1] http://ftp.lanet.lv/ftp/mirror/x2ftp/msdos/programming/news/yorick.10 .. [2] http://www.scilab.org/platform/index_platform.php?page=history .. [3] http://en.wikipedia.org/wiki/GNU_Octave#History -- Pauli Virtanen From robert.kern at gmail.com Mon May 18 18:22:35 2009 From: robert.kern at gmail.com (Robert Kern) Date: Mon, 18 May 2009 17:22:35 -0500 Subject: [Numpy-discussion] Indexing with callables (was: Yorick-like functionality) In-Reply-To: References: <91cf711d0905151309v5ab033f9r7c3c3e472e74bfff@mail.gmail.com> Message-ID: <3d375d730905181522oc1a868dq5e8f8bef1676990a@mail.gmail.com> On Mon, May 18, 2009 at 13:23, Pauli Virtanen wrote: > Mon, 18 May 2009 09:21:39 -0700, David J Strozzi wrote: > [clip] >> I also like pointing out that Yorick was a fast, free environment >> developed by ~1990, when matlab/IDL were probably the only comparable >> games in town, but very few people ever used it. ?I think this is a case >> study in the triumph of marketing over substance. ?It looks like num/sci >> py are gaining enough momentum and visibility. ?Hopefully the numerical >> science community won't be re-inventing this same wheel in 5 years.... > > Well, GNU Octave has been around about the same time, and the same for > Scilab. Curiously enough, first public version >= 1.0 of all the three > seem to have appeared around 1994. [1,2,3] (Maybe something was in > the air that year...) > > So I'd claim this particular wheel has already been reinvented pretty > thoroughly :) It's worth noting that most of numpy's indexing functionality was stol^H^H^H^Hborrowed from Yorick in ages past: http://mail.python.org/pipermail/matrix-sig/1995-November/000143.html -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From josef.pktd at gmail.com Mon May 18 21:12:04 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 18 May 2009 21:12:04 -0400 Subject: [Numpy-discussion] Indexing with callables (was: Yorick-like functionality) In-Reply-To: <3d375d730905181522oc1a868dq5e8f8bef1676990a@mail.gmail.com> References: <91cf711d0905151309v5ab033f9r7c3c3e472e74bfff@mail.gmail.com> <3d375d730905181522oc1a868dq5e8f8bef1676990a@mail.gmail.com> Message-ID: <1cd32cbb0905181812l276d5db7lf1254139188cf22@mail.gmail.com> On Mon, May 18, 2009 at 6:22 PM, Robert Kern wrote: > On Mon, May 18, 2009 at 13:23, Pauli Virtanen wrote: >> Mon, 18 May 2009 09:21:39 -0700, David J Strozzi wrote: >> [clip] >>> I also like pointing out that Yorick was a fast, free environment >>> developed by ~1990, when matlab/IDL were probably the only comparable >>> games in town, but very few people ever used it. ?I think this is a case >>> study in the triumph of marketing over substance. ?It looks like num/sci >>> py are gaining enough momentum and visibility. ?Hopefully the numerical >>> science community won't be re-inventing this same wheel in 5 years.... >> >> Well, GNU Octave has been around about the same time, and the same for >> Scilab. Curiously enough, first public version >= 1.0 of all the three >> seem to have appeared around 1994. [1,2,3] (Maybe something was in >> the air that year...) >> >> So I'd claim this particular wheel has already been reinvented pretty >> thoroughly :) > > It's worth noting that most of numpy's indexing functionality was > stol^H^H^H^Hborrowed from Yorick in ages past: > > ?http://mail.python.org/pipermail/matrix-sig/1995-November/000143.html > Thanks for the link, an interesting discussion on the origin of array/matrices in python. also the end of matrix-sig is interesting http://mail.python.org/pipermail/matrix-sig/2000-February/003292.html I needed to check some history: Gauss and Matlab are more than 10 years older, and S is ancient, way ahead of Python. Josef From Klaus.Noekel at gmx.de Tue May 19 13:07:45 2009 From: Klaus.Noekel at gmx.de (Klaus Noekel) Date: Tue, 19 May 2009 19:07:45 +0200 Subject: [Numpy-discussion] numpy failure under Windows Vista 64 bit Message-ID: <4A12E761.5020801@gmx.de> Today I wanted to experiment with the AMD64 version of numpy. I am using Windows Vista 64 bit. I downloaded and installed today's Python 2.6.2 AMD64 and then numpy 1.3.0 AMD64. Executing "import numpy" (nothing else) yielded the following error message: IDLE 2.6.2 >>> import numpy Warning (from warnings module): File "C:\Python26\lib\site-packages\numpy\core\__init__.py", line 5 import multiarray Warning: Windows 64 bits support is experimental, and only available for testing. You are advised not to use it for production. CRASHES ARE TO BE EXPECTED - PLEASE REPORT THEM TO NUMPY DEVELOPERS Traceback (most recent call last): File "", line 1, in import numpy File "C:\Python26\lib\site-packages\numpy\__init__.py", line 130, in import add_newdocs File "C:\Python26\lib\site-packages\numpy\add_newdocs.py", line 9, in from lib import add_newdoc File "C:\Python26\lib\site-packages\numpy\lib\__init__.py", line 13, in from polynomial import * File "C:\Python26\lib\site-packages\numpy\lib\polynomial.py", line 18, in from numpy.linalg import eigvals, lstsq File "C:\Python26\lib\site-packages\numpy\linalg\__init__.py", line 47, in from linalg import * File "C:\Python26\lib\site-packages\numpy\linalg\linalg.py", line 22, in from numpy.linalg import lapack_lite ImportError: DLL load failed: Das angegebene Modul wurde nicht gefunden. I doubt that the DLL was not physically present and rather suspect a dependency on some other DLL that was missing. The INSTALL.TXT unfortunately was not helpful. Can anybody please explain what other dependencies exist? Anything else I need to install? Thanks a lot! Klaus Noekel Karlsruhe, Germany From david at ar.media.kyoto-u.ac.jp Tue May 19 13:02:42 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 20 May 2009 02:02:42 +0900 Subject: [Numpy-discussion] numpy failure under Windows Vista 64 bit In-Reply-To: <4A12E761.5020801@gmx.de> References: <4A12E761.5020801@gmx.de> Message-ID: <4A12E632.5010109@ar.media.kyoto-u.ac.jp> Klaus Noekel wrote: > I doubt that the DLL was not physically present and rather suspect a > dependency on some other DLL that was missing. The INSTALL.TXT > unfortunately was not helpful. Can anybody please explain what other > dependencies exist? Anything else I need to install? > This exact problem is specific to IDLE - I don't know what triggers it. Today, the best solution for a 64 bits numpy on windows is to built it yourself with MS compilers - the distributed one is built with mingw compilers, and there still seems to be some stability problems with those. Unfortunately, as the mingw debugger does not work either on 64 bits archs, finding the problem is quite hard. cheers, David From dwf at cs.toronto.edu Tue May 19 20:26:02 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Tue, 19 May 2009 20:26:02 -0400 Subject: [Numpy-discussion] build failure with macosx-10.5-fat64-2.6 Message-ID: I've just tried a "fat 64" build (with a Python 2.6.2 that had been built similarly), and I'm getting this weird behaviour. The command I used was: CFLAGS="-O3 -Wall -DNDEBUG -g -fwrapv -Wstrict-prototypes -arch x86_64 -arch ppc64" python setup.py build It looks as though for some reason, numpy distutils is executing gcc without any -arch flags, and falling back to the guessed architecture ppc (which in my cases is not even partially correct, as I was building x86_64 and ppc64 only). Building with CC="gcc -arch x86_64 -arch ppc64" fixes things, but I guess this is a bug in numpy distutils if it's not respecting CFLAGS during these config tests. Output is below. Cheers, David ------------ C compiler: gcc -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -O3 - Wall -DNDEBUG -g -fwrapv -Wstrict-prototypes -arch x86_64 -arch ppc64 compile options: '-Inumpy/core/src -Inumpy/core/src/multiarray -Inumpy/ core/src/umath -Inumpy/core/include -I/Library/Frameworks/ Python64.framework/Versions/2.6/include/python2.6 -c' gcc: _configtest.c _configtest.c:1: warning: conflicting types for built-in function ?exp? _configtest.c:1: warning: conflicting types for built-in function ?exp? gcc _configtest.o -o _configtest ld warning: in _configtest.o, missing required architecture ppc in file Undefined symbols: "_main", referenced from: start in crt1.10.5.o ld: symbol(s) not found collect2: ld returned 1 exit status ld warning: in _configtest.o, missing required architecture ppc in file Undefined symbols: "_main", referenced from: start in crt1.10.5.o ld: symbol(s) not found collect2: ld returned 1 exit status failure. removing: _configtest.c _configtest.o C compiler: gcc -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -O3 - Wall -DNDEBUG -g -fwrapv -Wstrict-prototypes -arch x86_64 -arch ppc64 compile options: '-Inumpy/core/src -Inumpy/core/src/multiarray -Inumpy/ core/src/umath -Inumpy/core/include -I/Library/Frameworks/ Python64.framework/Versions/2.6/include/python2.6 -c' gcc: _configtest.c _configtest.c:1: warning: conflicting types for built-in function ?exp? _configtest.c:1: warning: conflicting types for built-in function ?exp? gcc _configtest.o -lm -o _configtest ld warning: in _configtest.o, missing required architecture ppc in file Undefined symbols: "_main", referenced from: start in crt1.10.5.o ld: symbol(s) not found collect2: ld returned 1 exit status ld warning: in _configtest.o, missing required architecture ppc in file Undefined symbols: "_main", referenced from: start in crt1.10.5.o ld: symbol(s) not found collect2: ld returned 1 exit status failure. removing: _configtest.c _configtest.o C compiler: gcc -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -O3 - Wall -DNDEBUG -g -fwrapv -Wstrict-prototypes -arch x86_64 -arch ppc64 compile options: '-Inumpy/core/src -Inumpy/core/src/multiarray -Inumpy/ core/src/umath -Inumpy/core/include -I/Library/Frameworks/ Python64.framework/Versions/2.6/include/python2.6 -c' gcc: _configtest.c _configtest.c:1: warning: conflicting types for built-in function ?exp? _configtest.c:1: warning: conflicting types for built-in function ?exp? gcc _configtest.o -lcpml -o _configtest ld: library not found for -lcpml collect2: ld returned 1 exit status ld: library not found for -lcpml collect2: ld returned 1 exit status failure. removing: _configtest.c _configtest.o Traceback (most recent call last): File "setup.py", line 172, in setup_package() File "setup.py", line 165, in setup_package configuration=configuration ) File "/Users/dwf/src/numpy-svn/numpy/distutils/core.py", line 184, in setup return old_setup(**new_attr) File "/Library/Frameworks/Python64.framework/Versions/2.6/lib/ python2.6/distutils/core.py", line 152, in setup dist.run_commands() File "/Library/Frameworks/Python64.framework/Versions/2.6/lib/ python2.6/distutils/dist.py", line 975, in run_commands self.run_command(cmd) File "/Library/Frameworks/Python64.framework/Versions/2.6/lib/ python2.6/distutils/dist.py", line 995, in run_command cmd_obj.run() File "/Users/dwf/src/numpy-svn/numpy/distutils/command/build.py", line 37, in run old_build.run(self) File "/Library/Frameworks/Python64.framework/Versions/2.6/lib/ python2.6/distutils/command/build.py", line 134, in run self.run_command(cmd_name) File "/Library/Frameworks/Python64.framework/Versions/2.6/lib/ python2.6/distutils/cmd.py", line 333, in run_command self.distribution.run_command(command) File "/Library/Frameworks/Python64.framework/Versions/2.6/lib/ python2.6/distutils/dist.py", line 995, in run_command cmd_obj.run() File "/Users/dwf/src/numpy-svn/numpy/distutils/command/ build_src.py", line 130, in run self.build_sources() File "/Users/dwf/src/numpy-svn/numpy/distutils/command/ build_src.py", line 147, in build_sources self.build_extension_sources(ext) File "/Users/dwf/src/numpy-svn/numpy/distutils/command/ build_src.py", line 250, in build_extension_sources sources = self.generate_sources(sources, ext) File "/Users/dwf/src/numpy-svn/numpy/distutils/command/ build_src.py", line 307, in generate_sources source = func(extension, build_dir) File "numpy/core/setup.py", line 337, in generate_config_h mathlibs = check_mathlib(config_cmd) File "numpy/core/setup.py", line 281, in check_mathlib raise EnvironmentError("math library missing; rerun " EnvironmentError: math library missing; rerun setup.py after setting the MATHLIB env variable From darkgl0w at yahoo.com Wed May 20 07:24:43 2009 From: darkgl0w at yahoo.com (Cristi Constantin) Date: Wed, 20 May 2009 04:24:43 -0700 (PDT) Subject: [Numpy-discussion] Overlap arrays with "transparency" Message-ID: <307808.38061.qm@web52101.mail.re2.yahoo.com> Thank you for your help. :) I used this : try: NData[ (NData==transparent)[:len(OData)] ] = OData[ (NData==transparent)[:len(OData)] ] except: pass That means overwrite all "transparent" data from NData with valid data from OData. I am sure it's not the best method yet, but it's the only one that works. --- On Mon, 5/18/09, Cristi Constantin wrote: From: Cristi Constantin Subject: [Numpy-discussion] Overlap arrays with "transparency" To: "Numpy Discussion" Date: Monday, May 18, 2009, 5:37 AM Good day. I am working on this algorithm for a few weeks now, so i tried almost everything... I want to overlap / overwrite 2 matrices, but completely ignore some values (in this case ignore 0) Let me explain: a = [ [1, 2, 3, 4, 5], [9,7], [0,0,0,0,0], [5,5,5] ] b = [ [0,0,9,9], [1,1,1,1], [2,2,2,2] ] Then, we have: a over b = [ [1,2,3,4,5], [9,7,1,1], [1,1,1,1,0], [5,5,5,2] ] b over a = [ [0,0,9,9,5], 1,1,1,1], 2,2,2,2,0], 5,5,5] ] That means, completely overwrite one list of arrays over the other, not matter what values one has, not matter the size, just ignore 0 values on overwriting. I checked the documentation, i just need some tips. TempA = [[]] # One For Cicle in here to get the Element data... ??? Data = vElem.data???????????????? # This is a list of numpy ndarrays. ??? # ??? for nr_row in range( len(Data) ): # For each numpy ndarray (row) in Data. ??????? # ??????? NData = Data[nr_row]?????????????????? # New data, to be written over old data. ??????? OData = TempA[nr_row:nr_row+1] or [[]] # This is old data. Can be numpy ndarray, or empty list. ??????? OData = OData[0] ??????? # ??????? # NData must completely eliminate transparent pixels... here comes the algorithm... No algorithm yet. ??????? # ??????? if len(NData) >= len(OData): ??????????? # If new data is longer than old data, old data will be completely overwritten. ??????????? TempA[nr_row:nr_row+1] = [NData] ??????? else: # Old data is longer than new data ; old data cannot be null. ??????????? TempB = np.copy(OData) ??????????? TempB.put( range(len(NData)), NData ) ??????????? #TempB[0:len(NData)-1] = NData # This returns "ValueError: shape mismatch: objects cannot be broadcast to a single shape" ??????????? TempA[nr_row:nr_row+1] = [TempB] ??????????? del TempB ??????? # ??? # # The result is stored inside TempA as list of numpy arrays. I would use 2D arrays, but they are slower than Python Lists containing Numpy arrays. I need to do this overwrite in a very big loop and every delay is very important. I tried to create a masked array where all "zero" values are ignored on overlap, but it doesn't work. Masked or not, the "transparent" values are still overwritten. Please, any suggestion is useful. Thank you. -------------- next part -------------- An HTML attachment was scrubbed... URL: From darkgl0w at yahoo.com Wed May 20 07:53:25 2009 From: darkgl0w at yahoo.com (Cristi Constantin) Date: Wed, 20 May 2009 04:53:25 -0700 (PDT) Subject: [Numpy-discussion] Rotate Left and Rotate Right Message-ID: <381631.79275.qm@web52107.mail.re2.yahoo.com> Good day, me again. I have this string of data : String = 'i want\nto go\nto the\nbeach'. I want to rotate this data to left, or to right, after is split it after '\n'. Note that it's important to use 'U' array, because i might have unicode characters in this string. So normal, the text is: i want to go to the beach Rotated right would be: btti eoo a? w ctga hhon ?e t Rotated left would be: t e nohh agtc w? a ?ooe ittb There are a few methods i guess could be used. Split like: [np.array([j for j in i],'U') for i in String.split('\n')] => ??? [ array([u'i', u' ', u'w', u'a', u'n', u't'], dtype=' array([u'i want', u'to go', u'to the', u'beach'], dtype=' From josef.pktd at gmail.com Wed May 20 09:11:53 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 20 May 2009 09:11:53 -0400 Subject: [Numpy-discussion] Rotate Left and Rotate Right In-Reply-To: <381631.79275.qm@web52107.mail.re2.yahoo.com> References: <381631.79275.qm@web52107.mail.re2.yahoo.com> Message-ID: <1cd32cbb0905200611k7cce660ao2869eaaf74b598a9@mail.gmail.com> On Wed, May 20, 2009 at 7:53 AM, Cristi Constantin wrote: > Good day, me again. > I have this string of data : > String = 'i want\nto go\nto the\nbeach'. > > I want to rotate this data to left, or to right, after is split it after > '\n'. > Note that it's important to use 'U' array, because i might have unicode > characters in this string. > > So normal, the text is: > i want > to go > to the > beach > > Rotated right would be: > btti > eoo > a? w > ctga > hhon > ?e t > > Rotated left would be: > t e > nohh > agtc > w? a > ?ooe > ittb > > There are a few methods i guess could be used. > > Split like: [np.array([j for j in i],'U') for i in String.split('\n')] => > ??? [ array([u'i', u' ', u'w', u'a', u'n', u't'], dtype=' ??? array([u't', u'o', u' ', u'g', u'o'], dtype=' ??? array([u't', u'o', u' ', u't', u'h', u'e'], dtype=' ??? array([u'b', u'e', u'a', u'c', u'h'], dtype=' > Or split like: np.array( String.split('\n'), 'U' ), => array([u'i want', > u'to go', u'to the', u'beach'], dtype=' Both methods are impossible to use with my knowledge. > > Without Numpy, if i want to rotate to right: i should reverse a list of > splited by '\n' strings, save max length for all lines, align all lines to > max len, rotate the square with map( lambda *row: [elem for elem in row], > *Content ), then join the resulted sub-lists, unite with '\n' and return. > > I need to know if there is a faster Numpy approach... maybe explode the > string by '\n' and rotate fast, or something? > > Thank you very much, in advance. > If you have many lines of different length, then using sparse matrices might be useful. scipy\maxentropy\examples might be useful to look at. I don't know much about language processing but the following seems to work, using 1 character arrays: Josef >>> ss = u'i want\nto go\nto the\nbeach' >>> lines = ss.split('\n') >>> max(len(line) for line in lines) 6 >>> maxl = max(len(line) for line in lines) >>> uarr = np.zeros((len(lines),maxl),dtype='>> uarr array([[u'', u'', u'', u'', u'', u''], [u'', u'', u'', u'', u'', u''], [u'', u'', u'', u'', u'', u''], [u'', u'', u'', u'', u'', u'']], dtype='>> for i,line in enumerate(lines): uarr[i,:len(line)] = [j for j in line] >>> uarr array([[u'i', u' ', u'w', u'a', u'n', u't'], [u't', u'o', u' ', u'g', u'o', u''], [u't', u'o', u' ', u't', u'h', u'e'], [u'b', u'e', u'a', u'c', u'h', u'']], dtype='>> uarr[::-1,::-1] array([[u'', u'h', u'c', u'a', u'e', u'b'], [u'e', u'h', u't', u' ', u'o', u't'], [u'', u'o', u'g', u' ', u'o', u't'], [u't', u'n', u'a', u'w', u' ', u'i']], dtype='>> uarr[:,::-1].T # rotate left ? array([[u't', u'', u'e', u''], [u'n', u'o', u'h', u'h'], [u'a', u'g', u't', u'c'], [u'w', u' ', u' ', u'a'], [u' ', u'o', u'o', u'e'], [u'i', u't', u't', u'b']], dtype='>> uarr[::-1,:].T # rotate right ? array([[u'b', u't', u't', u'i'], [u'e', u'o', u'o', u' '], [u'a', u' ', u' ', u'w'], [u'c', u't', u'g', u'a'], [u'h', u'h', u'o', u'n'], [u'', u'e', u'', u't']], dtype=' References: <307808.38061.qm@web52101.mail.re2.yahoo.com> Message-ID: <1cd32cbb0905200624o458dfa4ao7c1f77dd390bffcb@mail.gmail.com> On Wed, May 20, 2009 at 7:24 AM, Cristi Constantin wrote: > Thank you for your help. :) > > I used this : > try: NData[ (NData==transparent)[:len(OData)] ] = OData[ (NData==transparent)[:len(OData)] ] > except: pass > > That means overwrite all "transparent" data from NData with valid data from > OData. > I am sure it's not the best method yet, but it's the only one that works. > just some quick comments: If you assign (NData==transparent)[:len(OData)] to a temporary variable, then you don't need to calculate this twice. It would save a bit of time. Catching all exceptions with ``except: pass`` is definitely discouraged. Josef From klaus.noekel at gmx.de Wed May 20 09:36:27 2009 From: klaus.noekel at gmx.de (=?iso-8859-1?Q?=22Klaus_N=F6kel=22?=) Date: Wed, 20 May 2009 15:36:27 +0200 Subject: [Numpy-discussion] numpy failure under Windows Vista 64 bit In-Reply-To: References: Message-ID: <20090520133627.148820@gmx.net> David, > > Klaus Noekel wrote: > > I doubt that the DLL was not physically present and rather suspect a > > dependency on some other DLL that was missing. The INSTALL.TXT > > unfortunately was not helpful. Can anybody please explain what other > > dependencies exist? Anything else I need to install? > > > > This exact problem is specific to IDLE - I don't know what triggers it. > Today, the best solution for a 64 bits numpy on windows is to built it > yourself with MS compilers - the distributed one is built with mingw > compilers, and there still seems to be some stability problems with > those. Unfortunately, as the mingw debugger does not work either on 64 > bits archs, finding the problem is quite hard. > I don't believe that the problem is specific to IDLE. Python also crashes when I put nothing but "import numpy" in a file and execute it with python.exe. Regarding the note on building numpy myself: the discussion in this forum scared me a little, because of the challenge to build LAPACK with a compatible Fortran compiler etc. That and the fact that I do not have MSVC 2008 (only 2005) keeps me from trying it. Any chance that a MS-based installer will materialize soon? Or are there any mingw-specific runtime libraries that I need to install so that the mingw-based numpy works? Thanks for your help! Klaus Noekel -- Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + Telefonanschluss f?r nur 17,95 Euro/mtl.!* http://dslspecial.gmx.de/freedsl-aktionspreis/?ac=OM.AD.PD003K11308T4569a From bsouthey at gmail.com Wed May 20 09:42:21 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 20 May 2009 08:42:21 -0500 Subject: [Numpy-discussion] numpy failure under Windows Vista 64 bit In-Reply-To: <20090520133627.148820@gmx.net> References: <20090520133627.148820@gmx.net> Message-ID: <4A1408BD.8080105@gmail.com> Klaus N?kel wrote: > David, > > >> Klaus Noekel wrote: >> >>> I doubt that the DLL was not physically present and rather suspect a >>> dependency on some other DLL that was missing. The INSTALL.TXT >>> unfortunately was not helpful. Can anybody please explain what other >>> dependencies exist? Anything else I need to install? >>> >>> >> This exact problem is specific to IDLE - I don't know what triggers it. >> Today, the best solution for a 64 bits numpy on windows is to built it >> yourself with MS compilers - the distributed one is built with mingw >> compilers, and there still seems to be some stability problems with >> those. Unfortunately, as the mingw debugger does not work either on 64 >> bits archs, finding the problem is quite hard. >> >> > > I don't believe that the problem is specific to IDLE. Python also crashes when I put nothing but "import numpy" in a file and execute it with python.exe. > > Regarding the note on building numpy myself: the discussion in this forum scared me a little, because of the challenge to build LAPACK with a compatible Fortran compiler etc. That and the fact that I do not have MSVC 2008 (only 2005) keeps me from trying it. Any chance that a MS-based installer will materialize soon? Or are there any mingw-specific runtime libraries that I need to install so that the mingw-based numpy works? > > Thanks for your help! > Klaus Noekel > > > > Hi, I also see this. What version of Python specific of Python are you using? I got the same with Python 2.6.1 so perhaps I need to downgrade to Python 2.6.0? Bruce From oliphant at enthought.com Wed May 20 10:45:12 2009 From: oliphant at enthought.com (Travis Oliphant) Date: Wed, 20 May 2009 09:45:12 -0500 Subject: [Numpy-discussion] Join us for "Scientific Computing with Python Webinar" References: <1437076956.5204661242825355676.JavaMail.root@g2mp1br2.las.expertcity.com> Message-ID: <2355F1D0-DD01-4BD1-8482-FDDC6FEE6C91@enthought.com> Hello all Python users: I am pleased to announce the beginning of a free Webinar series that discusses using Python for scientific computing. Enthought will host this free series which will take place once a month for 30-45 minutes. The schedule and length may change based on participation feedback, but for now it is scheduled for the fourth Friday of every month. This free webinar should not be confused with the EPD webinar on the first Friday of each month which is open only to subscribers to the Enthought Python Distribution. I (Travis Oliphant) will be the first speaker at this continuing series. I plan to present a brief (10-15) minute talk on reading binary files with NumPy using memory mapped arrays and structured data- types. This talk will be followed by a demonstration of Chaco for interactive 2-d visualization and Mayavi for interactive 3-d visualization. Both Chaco and Mayavi are open-source tools and part of the Enthought Tool Suite. They can be conveniently installed using the Enthought Python Distribution. Topics for future webinars will be chosen later based on participant feedback. This event will take place on Friday at 3:00pm CDT and will last 30 to 45 minutes depending on questions asked. Space is limited at this event. If you would like to participate, please register by going to https://www1.gotomeeting.com/register/422340144 or by clicking on the appropriate link in the attached announcement. There will be a 10 minute technical help session prior to the on-line meeting which you should plan to use if you have never participated in a GoToWebinar previously. During this time you can test your connection and audio equipment as well as familiarize yourself with the GoTo Meeting software. I am looking forward to interacting with many of you this Friday. Best regards, Travis Oliphant Enthought, Inc. Enthought is the company that sponsored the creation of SciPy and the Enthought Tool Suite. It continues to sponsor the SciPy community by hosting the SciPy mailing list and website and participating in the development of SciPy and NumPy. Enthought creates custom scientific and technical software applications and provides training on using Python for technical computing. Enthought also provides the Enthought Python Distribution. Learn more at http://www.enthought.com Travis Oliphant's bio can be read at http://www.enthought.com/company/executive-team.php > > > > > > Scientific Computing with Python Webinar > > > > > > Each webinar in this continuing series will demonstrate the use of > some aspect of Python to assist with scientific, engineering, and > technical computing. Enthought will host each meeting and select a > specific topic based on feedback from participants > Register for a session now by clicking a date below: > Fri, May 22, 2009 3:00 PM - 3:30 PM CDT > Fri, Jun 19, 2009 1:00 PM - 1:30 PM CDT > Fri, Jul 17, 2009 1:00 PM - 1:30 PM CDT > Once registered you will receive an email confirming your registration > with information you need to join the Webinar. > System Requirements > PC-based attendees > Required: Windows? 2000, XP Home, XP Pro, 2003 Server, Vista > Macintosh?-based attendees > Required: Mac OS? X 10.4 (Tiger?) or newer > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nwagner at iam.uni-stuttgart.de Wed May 20 11:04:20 2009 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Wed, 20 May 2009 17:04:20 +0200 Subject: [Numpy-discussion] skiprows option in loadtxt Message-ID: Hi all, Is the value of skiprows in loadtxt restricted to values in [0-10] ? It doesn't work for skiprows=11. Nils From klaus.noekel at gmx.de Wed May 20 11:09:01 2009 From: klaus.noekel at gmx.de (=?iso-8859-1?Q?=22Klaus_N=F6kel=22?=) Date: Wed, 20 May 2009 17:09:01 +0200 Subject: [Numpy-discussion] numpy failure under Windows Vista 64 bit In-Reply-To: References: Message-ID: <20090520150901.123590@gmx.net> > > > > I don't believe that the problem is specific to IDLE. Python also > crashes when I put nothing but "import numpy" in a file and execute it with > python.exe. > > > > Regarding the note on building numpy myself: the discussion in this > forum scared me a little, because of the challenge to build LAPACK with a > compatible Fortran compiler etc. That and the fact that I do not have MSVC 2008 > (only 2005) keeps me from trying it. Any chance that a MS-based installer > will materialize soon? Or are there any mingw-specific runtime libraries > that I need to install so that the mingw-based numpy works? > > > > Thanks for your help! > > Klaus Noekel > > > > > > > > > Hi, > I also see this. > What version of Python specific of Python are you using? > I got the same with Python 2.6.1 so perhaps I need to downgrade to > Python 2.6.0? > > Bruce I got it with a fresh 2.6.2 install on Tuesday. Klaus -- Neu: GMX FreeDSL Komplettanschluss mit DSL 6.000 Flatrate + Telefonanschluss f?r nur 17,95 Euro/mtl.!* http://dslspecial.gmx.de/freedsl-aktionspreis/?ac=OM.AD.PD003K11308T4569a From pgmdevlist at gmail.com Wed May 20 11:14:37 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 20 May 2009 11:14:37 -0400 Subject: [Numpy-discussion] skiprows option in loadtxt In-Reply-To: References: Message-ID: <955153E7-F7E7-4385-90D2-3ECFECECFDD6@gmail.com> On May 20, 2009, at 11:04 AM, Nils Wagner wrote: > Hi all, > > Is the value of skiprows in loadtxt restricted to values > in [0-10] ? > > It doesn't work for skiprows=11. Please post an example From stefan at sun.ac.za Wed May 20 11:15:49 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 20 May 2009 17:15:49 +0200 Subject: [Numpy-discussion] skiprows option in loadtxt In-Reply-To: References: Message-ID: <9457e7c80905200815m158f2861kc6d14bc633b0349f@mail.gmail.com> Hi Nils 2009/5/20 Nils Wagner : > Is the value of skiprows in loadtxt restricted to values > in [0-10] ? > > It doesn't work for skiprows=11. I don't see this behaviour. Could you provide a code snippet? Thanks St?fan From rmay31 at gmail.com Wed May 20 11:16:08 2009 From: rmay31 at gmail.com (Ryan May) Date: Wed, 20 May 2009 10:16:08 -0500 Subject: [Numpy-discussion] skiprows option in loadtxt In-Reply-To: References: Message-ID: On Wed, May 20, 2009 at 10:04 AM, Nils Wagner wrote: > Hi all, > > Is the value of skiprows in loadtxt restricted to values > in [0-10] ? > > It doesn't work for skiprows=11. Works for me: s = '\n'.join(map(str,range(20))) from StringIO import StringIO np.loadtxt(StringIO(s), skiprows=11) The last line yields, as expected: array([ 11., 12., 13., 14., 15., 16., 17., 18., 19.]) This is with 1.4.0.dev6983. Can we see code and data file? Ryan -- Ryan May Graduate Research Assistant School of Meteorology University of Oklahoma Sent from Norman, Oklahoma, United States -------------- next part -------------- An HTML attachment was scrubbed... URL: From nwagner at iam.uni-stuttgart.de Wed May 20 11:31:38 2009 From: nwagner at iam.uni-stuttgart.de (Nils Wagner) Date: Wed, 20 May 2009 17:31:38 +0200 Subject: [Numpy-discussion] skiprows option in loadtxt In-Reply-To: References: Message-ID: On Wed, 20 May 2009 10:16:08 -0500 Ryan May wrote: > On Wed, May 20, 2009 at 10:04 AM, Nils Wagner > wrote: > >> Hi all, >> >> Is the value of skiprows in loadtxt restricted to values >> in [0-10] ? >> >> It doesn't work for skiprows=11. > > > Works for me: > > s = '\n'.join(map(str,range(20))) > from StringIO import StringIO > np.loadtxt(StringIO(s), skiprows=11) > > The last line yields, as expected: > array([ 11., 12., 13., 14., 15., 16., 17., 18., > 19.]) > > This is with 1.4.0.dev6983. Can we see code and data >file? > > Ryan > > -- > Ryan May > Graduate Research Assistant > School of Meteorology > University of Oklahoma > Sent from Norman, Oklahoma, United States Hi all, My fault. Sorry for the noise. Nils From dagss at student.matnat.uio.no Wed May 20 11:47:05 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Wed, 20 May 2009 17:47:05 +0200 Subject: [Numpy-discussion] ANN: Cython 0.11.2 released Message-ID: <4A1425F9.9030100@student.matnat.uio.no> I'm happy to announce the release of Cython 0.11.2. http://sage.math.washington.edu/home/dagss/Cython-0.11.2.tar.gz http://sage.math.washington.edu/home/dagss/Cython-0.11.2.zip (Will be present on the Cython front page in a few days.) New features: * There's now native complex floating point support! C99 complex will be used if complex.h is included, otherwise explicit complex arithmetic working on all C compilers is used. [Robert Bradshaw] cdef double complex a = 1 + 0.3j cdef np.ndarray[np.complex128_t, ndim=2] arr = \ np.zeros(10, np.complex128) * Cython can now generate a main()-method for embedding of the Python interpreter into an executable (see #289) [Robert Bradshaw] * @wraparound directive (another way to disable arr[idx] for negative idx) [Dag Sverre Seljebotn] * Correct support for NumPy record dtypes with different alignments, and "cdef packed struct" support [Dag Sverre Seljebotn] * @callspec directive, allowing custom calling convention macros [Lisandro Dalcin] * Bug fixes and smaller improvements. For the full list, see [1]. Contributors to this release: - Stefan Behnel - Robert Bradshaw - Lisandro Dalcin - Dag Sverre Seljebotn Thanks also to everybody who's helping us out in our discussions on the mailing list. [1] -- Dag Sverre From jdh2358 at gmail.com Wed May 20 15:07:43 2009 From: jdh2358 at gmail.com (John Hunter) Date: Wed, 20 May 2009 14:07:43 -0500 Subject: [Numpy-discussion] binary builds against older numpys Message-ID: <88e473830905201207hb37d94ehef2ed12cedca50b@mail.gmail.com> We are trying to build and test mpl installers for python2.4, 2.5 and 2.6. What we are finding is that if we build mpl against a more recent numpy than the installed numpy on a test machine, the import of mpl extension modules which depend on numpy trigger a segfault. Eg, on python2.5 and python2.6, we build the mpl installers against the numpy-1.3.0-win32.superpack installation, and if I test the installer on a python2.5 machine with numpy-1.2.1-win32.superpack installed, I get the segfault. If I install numpy-1.3.0-win32.superpack on the test machine, then the mpl binaries work fine. Is there an known binary incompatibly between 1.2.1 and 1.3.0? One solution we may consider is building our 2.5 binaries against 1.2.1 and seeing if they work with both 1.2.1 and 1.3.0 installations, but wanted to check in here to see if there were known issues or solutions we should be considering. Our test installers are at http://drop.io/rlck8ph if you are interested. Thanks, JDH From gkelly at gmail.com Wed May 20 15:08:46 2009 From: gkelly at gmail.com (Grant Kelly) Date: Wed, 20 May 2009 12:08:46 -0700 Subject: [Numpy-discussion] wiki page correction Message-ID: <646de7490905201208r273d1acch1868f70f9929b482@mail.gmail.com> I believe there is an error on this wiki page: http://www.scipy.org/NumPy_for_Matlab_Users MATLAB y=x(2,:) PYTHON y = x[2,:].copy() shouldn't the Python version be: y = x[1,:].copy() If not, please advise. Thanks, Grant From josef.pktd at gmail.com Wed May 20 15:17:50 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Wed, 20 May 2009 15:17:50 -0400 Subject: [Numpy-discussion] binary builds against older numpys In-Reply-To: <88e473830905201207hb37d94ehef2ed12cedca50b@mail.gmail.com> References: <88e473830905201207hb37d94ehef2ed12cedca50b@mail.gmail.com> Message-ID: <1cd32cbb0905201217g69fa750dqaa8eb7a10ab5450a@mail.gmail.com> On Wed, May 20, 2009 at 3:07 PM, John Hunter wrote: > We are trying to build and test mpl installers for python2.4, 2.5 and > 2.6. ?What we are finding is that if we build mpl against a more > recent numpy than the installed numpy on a test machine, the import of > mpl extension modules which depend on numpy trigger a segfault. > > Eg, on python2.5 and python2.6, we build the mpl installers against > the numpy-1.3.0-win32.superpack installation, and if I test the > installer on a python2.5 machine with numpy-1.2.1-win32.superpack > installed, I get the segfault. ?If I install > numpy-1.3.0-win32.superpack on the test machine, then the mpl binaries > work fine. > > Is there an known binary incompatibly between 1.2.1 and 1.3.0? ?One > solution we may consider is building our 2.5 binaries against 1.2.1 > and seeing if they work with both 1.2.1 and 1.3.0 installations, but > wanted to check in here to see if there were known issues or solutions > we should be considering. > > Our test installers are at http://drop.io/rlck8ph if you are interested. > > Thanks, > JDH A few days ago, I asked the same question for scipy in scipy-dev, since I got segfaults using scipy against an older numpy version than the one it was build against. Davids reply is that there is no forward compatibility. Josef From dmitrey.kroshko at scipy.org Wed May 20 15:24:57 2009 From: dmitrey.kroshko at scipy.org (dmitrey) Date: Wed, 20 May 2009 12:24:57 -0700 (PDT) Subject: [Numpy-discussion] binary shift for ndarray Message-ID: <65509afd-d77c-49b6-be29-49bdb3573415@n4g2000vba.googlegroups.com> hi all, suppose I have A that is numpy ndarray of floats, with shape n x n. I want to obtain dot(A, b), b is vector of length n and norm(b)=1, but instead of exact multiplication I want to approximate b as a vector [+/- 2^m0, ? 2^m1, ? 2^m2 ,,, ? 2^m_n], m_i are integers, and then invoke left_shift(vector_m) for rows of A. So, what is the simplest way to do it, without cycles of course? Or it cannot be implemented w/o cycles with current numpy version? Thank you in advance, D. From robert.kern at gmail.com Wed May 20 15:34:55 2009 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 20 May 2009 14:34:55 -0500 Subject: [Numpy-discussion] binary shift for ndarray In-Reply-To: <65509afd-d77c-49b6-be29-49bdb3573415@n4g2000vba.googlegroups.com> References: <65509afd-d77c-49b6-be29-49bdb3573415@n4g2000vba.googlegroups.com> Message-ID: <3d375d730905201234q5632a9efhd3c4d45ed8a68025@mail.gmail.com> On Wed, May 20, 2009 at 14:24, dmitrey wrote: > hi all, > > suppose I have A that is numpy ndarray of floats, with shape n x n. > > I want to obtain dot(A, b), b is vector of length n and norm(b)=1, but > instead of exact multiplication I want to approximate b as a vector > [+/- 2^m0, ? 2^m1, ? 2^m2 ,,, ? 2^m_n], m_i are integers, and then > invoke left_shift(vector_m) for rows of A. You don't shift floats. You only shift integers. For floats, multiplying by an integer power of 2 should be fast because of the floating point representation (the exponent just gets incremented or decremented), so just do the multiplication. > So, what is the simplest way to do it, without cycles of course? Or it > cannot be implemented w/o cycles with current numpy version? It might help if you showed us an example of an actual b vector decomposed the way you describe. Your description is ambiguous. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From dmitrey.kroshko at scipy.org Wed May 20 15:46:22 2009 From: dmitrey.kroshko at scipy.org (dmitrey) Date: Wed, 20 May 2009 12:46:22 -0700 (PDT) Subject: [Numpy-discussion] binary shift for ndarray In-Reply-To: <3d375d730905201234q5632a9efhd3c4d45ed8a68025@mail.gmail.com> References: <65509afd-d77c-49b6-be29-49bdb3573415@n4g2000vba.googlegroups.com> <3d375d730905201234q5632a9efhd3c4d45ed8a68025@mail.gmail.com> Message-ID: <2c23d401-0f2c-4a98-9d9b-1b485482bd8b@q2g2000vbr.googlegroups.com> On May 20, 10:34?pm, Robert Kern wrote: > On Wed, May 20, 2009 at 14:24, dmitrey wrote: > > hi all, > > > suppose I have A that is numpy ndarray of floats, with shape n x n. > > > I want to obtain dot(A, b), b is vector of length n and norm(b)=1, but > > instead of exact multiplication I want to approximate b as a vector > > [+/- 2^m0, ? 2^m1, ? 2^m2 ,,, ? 2^m_n], m_i are integers, and then > > invoke left_shift(vector_m) for rows of A. > > You don't shift floats. You only shift integers. For floats, > multiplying by an integer power of 2 should be fast because of the > floating point representation (the exponent just gets incremented or > decremented), so just do the multiplication. > > > So, what is the simplest way to do it, without cycles of course? Or it > > cannot be implemented w/o cycles with current numpy version? > > It might help if you showed us an example of an actual b vector > decomposed the way you describe. Your description is ambiguous. > > -- > Robert Kern For the task involved (I intend to try using it for speed up ralg solver) it doesn't matter essentially (using ceil, floor or round), but for example let m_i is floor(log2(b_i)) for b_i > 1e-15, ceil(log2(-b_i)) for b_i < - 1e-15, for - 1e-15 <= b_i <= 1e-15 - don't modify the elements of A related to the b_i at all. D. From stefan at sun.ac.za Wed May 20 16:10:24 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 20 May 2009 22:10:24 +0200 Subject: [Numpy-discussion] binary builds against older numpys In-Reply-To: <1cd32cbb0905201217g69fa750dqaa8eb7a10ab5450a@mail.gmail.com> References: <88e473830905201207hb37d94ehef2ed12cedca50b@mail.gmail.com> <1cd32cbb0905201217g69fa750dqaa8eb7a10ab5450a@mail.gmail.com> Message-ID: <9457e7c80905201310u5db00eaaja65efab6496dd2e@mail.gmail.com> Hi John 2009/5/20 : > On Wed, May 20, 2009 at 3:07 PM, John Hunter wrote: >> We are trying to build and test mpl installers for python2.4, 2.5 and >> 2.6. ?What we are finding is that if we build mpl against a more >> recent numpy than the installed numpy on a test machine, the import of >> mpl extension modules which depend on numpy trigger a segfault. I think we accidentally forgot to increase the API version at some stage (bad), but we now have checks in place to catch these mismatches. Specifically, import_array makes sure that the ABI versions agree, and that the API is the same or newer: if (NPY_VERSION != PyArray_GetNDArrayCVersion()) { PyErr_Format(PyExc_RuntimeError, "module compiled against "\ "ABI version %%x but this version of numpy is %%x", \ (int) NPY_VERSION, (int) PyArray_GetNDArrayCVersion()); return -1; } if (NPY_FEATURE_VERSION > PyArray_GetNDArrayCFeatureVersion()) { PyErr_Format(PyExc_RuntimeError, "module compiled against "\ "API version %%x but this version of numpy is %%x", \ (int) NPY_FEATURE_VERSION, (int) PyArray_GetNDArrayCFeatureVersion()); return -1; } David Cournapeau also put a check in place so that the NumPy build will break if we forget to update the API version again. So, while we can't change the releases of NumPy out there already, we can at least ensure that this won't happen again. Regards St?fan From jdh2358 at gmail.com Wed May 20 16:17:42 2009 From: jdh2358 at gmail.com (John Hunter) Date: Wed, 20 May 2009 15:17:42 -0500 Subject: [Numpy-discussion] binary builds against older numpys In-Reply-To: <9457e7c80905201310u5db00eaaja65efab6496dd2e@mail.gmail.com> References: <88e473830905201207hb37d94ehef2ed12cedca50b@mail.gmail.com> <1cd32cbb0905201217g69fa750dqaa8eb7a10ab5450a@mail.gmail.com> <9457e7c80905201310u5db00eaaja65efab6496dd2e@mail.gmail.com> Message-ID: <88e473830905201317x2acb7f5ew293e59a7bae075bd@mail.gmail.com> 2009/5/20 St?fan van der Walt : > David Cournapeau also put a check in place so that the NumPy build > will break if we forget to update the API version again. > > So, while we can't change the releases of NumPy out there already, we > can at least ensure that this won't happen again. OK, great -- thanks for th info. From reading David's comments in the earlier thread: David > - Backward compatibility means that you can build something against David > numpy version M, later update numpy to version N >M, and it still works. David > numpy 1.3.0 is backward compatible with 1.2.1 it looks like our best bet will be to build our python2.4 and python2.5 binaries against 1.2.1 and our python2.6 binaries against 1.3.0 (since there are no older python2.6 numpy builds on the sf site anyhow). I'll post on the mpl list and site that anyone using the new mpl installers needs to be on numpy 1.2.1 or later. JDH From robert.kern at gmail.com Wed May 20 16:18:49 2009 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 20 May 2009 15:18:49 -0500 Subject: [Numpy-discussion] binary shift for ndarray In-Reply-To: <2c23d401-0f2c-4a98-9d9b-1b485482bd8b@q2g2000vbr.googlegroups.com> References: <65509afd-d77c-49b6-be29-49bdb3573415@n4g2000vba.googlegroups.com> <3d375d730905201234q5632a9efhd3c4d45ed8a68025@mail.gmail.com> <2c23d401-0f2c-4a98-9d9b-1b485482bd8b@q2g2000vbr.googlegroups.com> Message-ID: <3d375d730905201318l78a64fddybe47ce4464b410cd@mail.gmail.com> On Wed, May 20, 2009 at 14:46, dmitrey wrote: > On May 20, 10:34?pm, Robert Kern wrote: >> On Wed, May 20, 2009 at 14:24, dmitrey wrote: >> > hi all, >> >> > suppose I have A that is numpy ndarray of floats, with shape n x n. >> >> > I want to obtain dot(A, b), b is vector of length n and norm(b)=1, but >> > instead of exact multiplication I want to approximate b as a vector >> > [+/- 2^m0, ? 2^m1, ? 2^m2 ,,, ? 2^m_n], m_i are integers, and then >> > invoke left_shift(vector_m) for rows of A. >> >> You don't shift floats. You only shift integers. For floats, >> multiplying by an integer power of 2 should be fast because of the >> floating point representation (the exponent just gets incremented or >> decremented), so just do the multiplication. >> >> > So, what is the simplest way to do it, without cycles of course? Or it >> > cannot be implemented w/o cycles with current numpy version? >> >> It might help if you showed us an example of an actual b vector >> decomposed the way you describe. Your description is ambiguous. >> >> -- >> Robert Kern > > For the task involved (I intend to try using it for speed up ralg > solver) it doesn't matter essentially (using ceil, floor or round), > but for example let m_i is > floor(log2(b_i)) for b_i > 1e-15, > ceil(log2(-b_i)) for b_i < - 1e-15, > for - 1e-15 <= b_i <= 1e-15 - don't modify the elements of A related > to the b_i at all. I strongly suspect that a plain dot(A, b) will be faster than doing all of that. With a little bit of work with frexp() and ldexp(), you could probably do those floor(log2())'s cheaply, but ultimately, you will still need to do a dot(A, b_prime) at the end. There is no bit-shift operation available for floats (anywhere, not just numpy); you have to form the float corresponding to +-2**m and multiply. If you had many A-matrices and a static b-vector, you might see a tiny improvement because all of the b_prime elements were exactly of the form +-2**m, but I doubt it. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From pav at iki.fi Wed May 20 17:36:43 2009 From: pav at iki.fi (Pauli Virtanen) Date: Wed, 20 May 2009 21:36:43 +0000 (UTC) Subject: [Numpy-discussion] wiki page correction References: <646de7490905201208r273d1acch1868f70f9929b482@mail.gmail.com> Message-ID: Wed, 20 May 2009 12:08:46 -0700, Grant Kelly wrote: > I believe there is an error on this wiki page: > > http://www.scipy.org/NumPy_for_Matlab_Users > > > MATLAB > y=x(2,:) > PYTHON > y = x[2,:].copy() > > shouldn't the Python version be: > y = x[1,:].copy() > > If not, please advise. Yes, it should be x[1,:].copy(). Please feel free to correct it. -- Pauli Virtanen From cycomanic at gmail.com Wed May 20 17:51:25 2009 From: cycomanic at gmail.com (Jochen Schroeder) Date: Thu, 21 May 2009 09:51:25 +1200 Subject: [Numpy-discussion] view takes no keyword arguments exception Message-ID: <20090520215124.GA6548@jochen-laptop> Hi all, I'm trying to help someone out with some problems with pyfftw (pyfftw.berlios.de). He is seeing an exception, TypeError: view() takes no keyword arguments This doesn't only happen when he uses pyfftw but also when he does the following: >>> import numpy as np >>> a=np.arange(10) >>> print a.view(dtype='float') Traceback (most recent call last): File "", line 1, in TypeError: view() takes no keyword arguments I he's on Windows and sees this error both with numpy 1.1.1 and 1.3. I'm a bit lost anybody have an idea what could be the problem? Cheers Jochen From stefan at sun.ac.za Wed May 20 18:20:42 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 21 May 2009 00:20:42 +0200 Subject: [Numpy-discussion] view takes no keyword arguments exception In-Reply-To: <20090520215124.GA6548@jochen-laptop> References: <20090520215124.GA6548@jochen-laptop> Message-ID: <9457e7c80905201520o171f05dfr49da938d281861d8@mail.gmail.com> Hi Jochen 2009/5/20 Jochen Schroeder : > I'm trying to help someone out with some problems with pyfftw > (pyfftw.berlios.de). He is seeing an exception, > > TypeError: view() takes no keyword arguments > > This doesn't only happen when he uses pyfftw but also when he does the > following: > >>>> import numpy as np >>>> a=np.arange(10) >>>> print a.view(dtype='float') > Traceback (most recent call last): > ?File "", line 1, in > TypeError: view() takes no keyword arguments > > I he's on Windows and sees this error both with numpy 1.1.1 and 1.3. > I'm a bit lost anybody have an idea what could be the problem? In the older versions of numpy, a.view(float) should work (float is preferable above 'float' as well), but I would guess that you are really looking for a.astype(float) Regards St?fan From stefan at sun.ac.za Wed May 20 20:02:02 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 21 May 2009 02:02:02 +0200 Subject: [Numpy-discussion] Implementing ufuncs in Cython Message-ID: <9457e7c80905201702pd5c7697u5b4c8d888bf399a8@mail.gmail.com> Hi all, Mark Lodato outlines how to write ufuncs in Cython at http://wiki.cython.org/MarkLodato/CreatingUfuncs This is also a great way of adding generalised ufuncs: http://projects.scipy.org/numpy/wiki/GeneralLoopingFunctions Super useful! Regards St?fan From charlesr.harris at gmail.com Wed May 20 20:48:41 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 20 May 2009 18:48:41 -0600 Subject: [Numpy-discussion] Implementing ufuncs in Cython In-Reply-To: <9457e7c80905201702pd5c7697u5b4c8d888bf399a8@mail.gmail.com> References: <9457e7c80905201702pd5c7697u5b4c8d888bf399a8@mail.gmail.com> Message-ID: 2009/5/20 St?fan van der Walt > Hi all, > > Mark Lodato outlines how to write ufuncs in Cython at > > http://wiki.cython.org/MarkLodato/CreatingUfuncs > > This is also a great way of adding generalised ufuncs: > > http://projects.scipy.org/numpy/wiki/GeneralLoopingFunctions > > Super useful! > That's cool! Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ndbecker2 at gmail.com Wed May 20 21:20:06 2009 From: ndbecker2 at gmail.com (Neal Becker) Date: Wed, 20 May 2009 21:20:06 -0400 Subject: [Numpy-discussion] ANN: Cython 0.11.2 released References: <4A1425F9.9030100@student.matnat.uio.no> Message-ID: Where can I find release notes? (It would be helpful if I can point to a URL as part of the fedora release) From cycomanic at gmail.com Wed May 20 21:57:55 2009 From: cycomanic at gmail.com (Jochen Schroeder) Date: Thu, 21 May 2009 13:57:55 +1200 Subject: [Numpy-discussion] view takes no keyword arguments exception In-Reply-To: <9457e7c80905201520o171f05dfr49da938d281861d8@mail.gmail.com> References: <20090520215124.GA6548@jochen-laptop> <9457e7c80905201520o171f05dfr49da938d281861d8@mail.gmail.com> Message-ID: <20090521015754.GA28026@jochen.schroeder.phy.auckland.ac.nz> On 21/05/09 00:20, St?fan van der Walt wrote: > Hi Jochen > > 2009/5/20 Jochen Schroeder : > > I'm trying to help someone out with some problems with pyfftw > > (pyfftw.berlios.de). He is seeing an exception, > > > > TypeError: view() takes no keyword arguments > > > > This doesn't only happen when he uses pyfftw but also when he does the > > following: > > > >>>> import numpy as np > >>>> a=np.arange(10) > >>>> print a.view(dtype='float') > > Traceback (most recent call last): > > ?File "", line 1, in > > TypeError: view() takes no keyword arguments > > > > I he's on Windows and sees this error both with numpy 1.1.1 and 1.3. > > I'm a bit lost anybody have an idea what could be the problem? > > In the older versions of numpy, a.view(float) should work (float is > preferable above 'float' as well), but I would guess that you are > really looking for > > a.astype(float) Sorry maybe I phrased my question wrongly. I don't want to change the code (This was just a short example). I just want to know why it is failing on his system and what he can do so that a.view(dtype='...') is working. I suspected it was an old numpy installation but the person is saying that he installed a new version and is still seeing the same problem (or does he just have an old version of numpy floating around). Cheers Jochen From pgmdevlist at gmail.com Wed May 20 22:09:08 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 20 May 2009 22:09:08 -0400 Subject: [Numpy-discussion] view takes no keyword arguments exception In-Reply-To: <20090521015754.GA28026@jochen.schroeder.phy.auckland.ac.nz> References: <20090520215124.GA6548@jochen-laptop> <9457e7c80905201520o171f05dfr49da938d281861d8@mail.gmail.com> <20090521015754.GA28026@jochen.schroeder.phy.auckland.ac.nz> Message-ID: On May 20, 2009, at 9:57 PM, Jochen Schroeder wrote: > Sorry maybe I phrased my question wrongly. I don't want to change > the code (This was just a short example). > I just want to know why it is failing on his system and what he > can do so that a.view(dtype='...') is working. I suspected it was an > old > numpy installation but the person is saying that he installed a new > version and is still seeing the same problem (or does he just have an > old version of numpy floating around). Likely to be the second possibiity, the ghost of a previous installation. AFAIR, the keywords in .view were introduced in 1.2 or just after. A safe way to check would be to install numpy 1.3 in a virtualenv and check that it works. If it does (expected), then you may want to ask your user to start afresh (remove 1.1.1 and 1.3 and then reinstall 1.3 from a clean slate). My 2c. P. From charlesr.harris at gmail.com Wed May 20 22:10:32 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 20 May 2009 20:10:32 -0600 Subject: [Numpy-discussion] view takes no keyword arguments exception In-Reply-To: <20090520215124.GA6548@jochen-laptop> References: <20090520215124.GA6548@jochen-laptop> Message-ID: On Wed, May 20, 2009 at 3:51 PM, Jochen Schroeder wrote: > Hi all, > > I'm trying to help someone out with some problems with pyfftw > (pyfftw.berlios.de). He is seeing an exception, > > TypeError: view() takes no keyword arguments > > This doesn't only happen when he uses pyfftw but also when he does the > following: > > >>> import numpy as np > >>> a=np.arange(10) > >>> print a.view(dtype='float') > Traceback (most recent call last): > File "", line 1, in > TypeError: view() takes no keyword arguments > > I he's on Windows and sees this error both with numpy 1.1.1 and 1.3. > I'm a bit lost anybody have an idea what could be the problem? > I don't see this error on linux: In [3]: a.view(dtype=double) Out[3]: array([ 1., 1., 1., 1., 1., 1., 1., 1., 1., 1.]) What version of python do you have installed? Did try deleting the previous version of numpy from site-packages before install? Windows 32 or 64 bit? Etc. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Thu May 21 01:11:49 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Thu, 21 May 2009 14:11:49 +0900 Subject: [Numpy-discussion] numpy failure under Windows Vista 64 bit In-Reply-To: <20090520133627.148820@gmx.net> References: <20090520133627.148820@gmx.net> Message-ID: <4A14E295.2060509@ar.media.kyoto-u.ac.jp> Klaus N?kel wrote: > David, > > >> Klaus Noekel wrote: >> >>> I doubt that the DLL was not physically present and rather suspect a >>> dependency on some other DLL that was missing. The INSTALL.TXT >>> unfortunately was not helpful. Can anybody please explain what other >>> dependencies exist? Anything else I need to install? >>> >>> >> This exact problem is specific to IDLE - I don't know what triggers it. >> Today, the best solution for a 64 bits numpy on windows is to built it >> yourself with MS compilers - the distributed one is built with mingw >> compilers, and there still seems to be some stability problems with >> those. Unfortunately, as the mingw debugger does not work either on 64 >> bits archs, finding the problem is quite hard. >> >> > > I don't believe that the problem is specific to IDLE. Python also crashes when I put nothing but "import numpy" in a file and execute it with python.exe. > That's not the same problem - in one case, you have a dll not found, and in another case, a crash. I am sorry I can't tell more, but I have no idea about what's going on: sometimes, it works, sometimes, it does not. When it works, it runs the full test, and when it does not, it crashes at import - but before even initializing the first numpy extension ! The crash always happen in some conditions, and seldom in others (executing in a cmd shell vs being executed by nosetests, for example). The problem is difficult to track without a debugger, I am afraid (mingw compilers do not seem to generate debugging symbols usable by MS debugger). > Regarding the note on building numpy myself: the discussion in this forum scared me a little, because of the challenge to build LAPACK with a compatible Fortran compiler etc. That and the fact that I do not have MSVC 2008 (only 2005) keeps me from trying it. Any chance that a MS-based installer will materialize soon? I don't intend on doing one myself, no. Note that you don't need blas/lapack to build numpy - it is required for scipy. That's why I am interested in making numpy work with the mingw toolchain: once it works reliably, it will give scipy as well. Actually, I managed to build scipy for windows 64, but as for numpy, it sometimes crash. > Or are there any mingw-specific runtime libraries that I need to install so that the mingw-based numpy works? > No, there should not be anything else to install. There is a bug somewhere, which needs to be found. David From david at ar.media.kyoto-u.ac.jp Thu May 21 01:15:12 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Thu, 21 May 2009 14:15:12 +0900 Subject: [Numpy-discussion] binary builds against older numpys In-Reply-To: <9457e7c80905201310u5db00eaaja65efab6496dd2e@mail.gmail.com> References: <88e473830905201207hb37d94ehef2ed12cedca50b@mail.gmail.com> <1cd32cbb0905201217g69fa750dqaa8eb7a10ab5450a@mail.gmail.com> <9457e7c80905201310u5db00eaaja65efab6496dd2e@mail.gmail.com> Message-ID: <4A14E360.2000806@ar.media.kyoto-u.ac.jp> St?fan van der Walt wrote: > Hi John > > 2009/5/20 : > >> On Wed, May 20, 2009 at 3:07 PM, John Hunter wrote: >> >>> We are trying to build and test mpl installers for python2.4, 2.5 and >>> 2.6. What we are finding is that if we build mpl against a more >>> recent numpy than the installed numpy on a test machine, the import of >>> mpl extension modules which depend on numpy trigger a segfault. >>> > > I think we accidentally forgot to increase the API version at some > stage (bad), but we now have checks in place to catch these > mismatches. > Yes, we did forget this, but it would not have solve the problem. Building against a version N of numpy and running under a version M < N is not supported at all. We explicitly check for it starting for numpy 1.4.0, but it does not affect the solution, that is building against the oldest version possible. This rule is the same for almost any library, and not specific to numpy, or even python extensions. David From dmitrey.kroshko at scipy.org Thu May 21 03:50:45 2009 From: dmitrey.kroshko at scipy.org (dmitrey) Date: Thu, 21 May 2009 00:50:45 -0700 (PDT) Subject: [Numpy-discussion] how to use ldexp? Message-ID: <3a43192a-3907-4f62-8434-bf27cd243b76@r34g2000vba.googlegroups.com> hi all, I have tried the example from numpy/add_newdocs.py np.ldexp(5., 2) but instead of the 20 declared there it yields TypeError: function not supported for these types, and can't coerce safely to supported types I have tried arrays but it yields same error >>> np.ldexp(np.array([5., 2.]), np.array([2, 1])) Traceback (innermost last): File "", line 1, in TypeError: function not supported for these types, and can't coerce safely to supported types So, how can I use ldexp? np.__version__ = '1.4.0.dev6972' Thank you in advance, D. From dmitrey.kroshko at scipy.org Thu May 21 03:57:15 2009 From: dmitrey.kroshko at scipy.org (dmitrey) Date: Thu, 21 May 2009 00:57:15 -0700 (PDT) Subject: [Numpy-discussion] numpy ufuncs and COREPY - any info? Message-ID: hi all, has anyone already tried to compare using an ordinary numpy ufunc vs that one from corepy, first of all I mean the project http://socghop.appspot.com/student_project/show/google/gsoc2009/python/t124024628235 It would be interesting to know what is speedup for (eg) vec ** 0.5 or (if it's possible - it isn't pure ufunc) numpy.dot(Matrix, vec). Or any another example. From stefan at sun.ac.za Thu May 21 04:35:54 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 21 May 2009 10:35:54 +0200 Subject: [Numpy-discussion] how to use ldexp? In-Reply-To: <3a43192a-3907-4f62-8434-bf27cd243b76@r34g2000vba.googlegroups.com> References: <3a43192a-3907-4f62-8434-bf27cd243b76@r34g2000vba.googlegroups.com> Message-ID: <9457e7c80905210135g20c0826av1768f8193b44bfc3@mail.gmail.com> Hi Dmitrey 2009/5/21 dmitrey : > hi all, > I have tried the example from numpy/add_newdocs.py > > np.ldexp(5., 2) > but instead of the 20 declared there it yields > TypeError: function not supported for these types, and can't coerce > safely to supported types I could not reproduce the problem on current SVN: In [6]: np.ldexp(5., 2) Out[6]: 20.0 In [7]: np.ldexp(np.array([5., 2.]), np.array([2, 1])) Out[7]: array([ 20., 4.]) Regards St?fan From david at ar.media.kyoto-u.ac.jp Thu May 21 04:21:06 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Thu, 21 May 2009 17:21:06 +0900 Subject: [Numpy-discussion] how to use ldexp? In-Reply-To: <3a43192a-3907-4f62-8434-bf27cd243b76@r34g2000vba.googlegroups.com> References: <3a43192a-3907-4f62-8434-bf27cd243b76@r34g2000vba.googlegroups.com> Message-ID: <4A150EF2.4030804@ar.media.kyoto-u.ac.jp> dmitrey wrote: > hi all, > I have tried the example from numpy/add_newdocs.py > > np.ldexp(5., 2) > but instead of the 20 declared there it yields > TypeError: function not supported for these types, and can't coerce > safely to supported types > > Which OS/Compiler are you using ? David From dmitrey.kroshko at scipy.org Thu May 21 04:45:54 2009 From: dmitrey.kroshko at scipy.org (dmitrey) Date: Thu, 21 May 2009 01:45:54 -0700 (PDT) Subject: [Numpy-discussion] how to use ldexp? In-Reply-To: <4A150EF2.4030804@ar.media.kyoto-u.ac.jp> References: <3a43192a-3907-4f62-8434-bf27cd243b76@r34g2000vba.googlegroups.com> <4A150EF2.4030804@ar.media.kyoto-u.ac.jp> Message-ID: I have updated numpy to latest '1.4.0.dev7008', but the bug still remains. I use KUBUNTU 9.04, compilers - gcc (using build-essential), gfortran. D. On May 21, 11:21?am, David Cournapeau wrote: > dmitrey wrote: > > hi all, > > I have tried the example from numpy/add_newdocs.py > > > np.ldexp(5., 2) > > but instead of the 20 declared there it yields > > TypeError: function not supported for these types, and can't coerce > > safely to supported types > > Which OS/Compiler are you using ? > > David > _______________________________________________ > Numpy-discussion mailing list > Numpy-discuss... at scipy.orghttp://mail.scipy.org/mailman/listinfo/numpy-discussion From david at ar.media.kyoto-u.ac.jp Thu May 21 04:29:34 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Thu, 21 May 2009 17:29:34 +0900 Subject: [Numpy-discussion] how to use ldexp? In-Reply-To: References: <3a43192a-3907-4f62-8434-bf27cd243b76@r34g2000vba.googlegroups.com> <4A150EF2.4030804@ar.media.kyoto-u.ac.jp> Message-ID: <4A1510EE.1020601@ar.media.kyoto-u.ac.jp> dmitrey wrote: > I have updated numpy to latest '1.4.0.dev7008', but the bug still > remains. > I use KUBUNTU 9.04, compilers - gcc (using build-essential), gfortran. > D. > Can you post the build output (after having removed the build directory : rm -rf build && python setup.py build &> build.log) ? David From dmitrey.kroshko at scipy.org Thu May 21 04:55:59 2009 From: dmitrey.kroshko at scipy.org (dmitrey) Date: Thu, 21 May 2009 01:55:59 -0700 (PDT) Subject: [Numpy-discussion] how to use ldexp? In-Reply-To: <4A1510EE.1020601@ar.media.kyoto-u.ac.jp> References: <3a43192a-3907-4f62-8434-bf27cd243b76@r34g2000vba.googlegroups.com> <4A150EF2.4030804@ar.media.kyoto-u.ac.jp> <4A1510EE.1020601@ar.media.kyoto-u.ac.jp> Message-ID: <5351f1fc-5e25-46aa-915a-e674dd914f11@g19g2000vbi.googlegroups.com> On May 21, 11:29?am, David Cournapeau wrote: > dmitrey wrote: > > I have updated numpy to latest '1.4.0.dev7008', but the bug still > > remains. > > I use KUBUNTU 9.04, compilers - gcc (using build-essential), gfortran. > > D. > > Can you post the build output (after having removed the build directory > : rm -rf build && python setup.py build &> build.log) ? > > David ok, it's here http://pastebin.com/mb021e11 D. From pav at iki.fi Thu May 21 05:26:18 2009 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 21 May 2009 09:26:18 +0000 (UTC) Subject: [Numpy-discussion] how to use ldexp? References: <3a43192a-3907-4f62-8434-bf27cd243b76@r34g2000vba.googlegroups.com> <4A150EF2.4030804@ar.media.kyoto-u.ac.jp> Message-ID: Thu, 21 May 2009 01:45:54 -0700, dmitrey wrote: > I have updated numpy to latest '1.4.0.dev7008', but the bug still > remains. > I use KUBUNTU 9.04, compilers - gcc (using build-essential), gfortran. Worksforme on Ubuntu 9.04, on python2.6 and python2.5. Should be the same platform. -- Pauli Virtanen From pav at iki.fi Thu May 21 05:41:06 2009 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 21 May 2009 09:41:06 +0000 (UTC) Subject: [Numpy-discussion] how to use ldexp? References: <3a43192a-3907-4f62-8434-bf27cd243b76@r34g2000vba.googlegroups.com> <4A150EF2.4030804@ar.media.kyoto-u.ac.jp> Message-ID: Thu, 21 May 2009 09:26:18 +0000, Pauli Virtanen wrote: > Thu, 21 May 2009 01:45:54 -0700, dmitrey wrote: > >> I have updated numpy to latest '1.4.0.dev7008', but the bug still >> remains. >> I use KUBUNTU 9.04, compilers - gcc (using build-essential), gfortran. > > Worksforme on Ubuntu 9.04, on python2.6 and python2.5. Should be the > same platform. This was on 32-bit machine. I can reproduce this on a 64-bit platform, current SVN head: >>> np.ldexp(5, 2) Traceback (most recent call last): File "", line 1, in TypeError: function not supported for these types, and can't coerce safely to supported types >>> np.ldexp(5, np.int32(2)) 20.0 >>> np.ldexp.types ['fi->f', 'di->d', 'gi->g'] So for some reason the second argument tries to cast Python int to int64, and there's no loop to handle this. -- Pauli Virtanen From jrennie at gmail.com Thu May 21 09:10:14 2009 From: jrennie at gmail.com (Jason Rennie) Date: Thu, 21 May 2009 09:10:14 -0400 Subject: [Numpy-discussion] matrix default to column vector? Message-ID: <75c31b2a0905210610x1a321264r5b6f93d327ef2b36@mail.gmail.com> By default, it looks like a 1-dim ndarray gets converted to a row vector by the matrix constructor. This seems to lead to some odd behavior such as a[1] yielding the 2nd element as an ndarray and throwing an IndexError as a matrix. Is it possible to set a flag to make the default be a column vector? Thanks, Jason -- Jason Rennie Research Scientist, ITA Software http://www.itasoftware.com/ -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmitrey.kroshko at scipy.org Thu May 21 11:26:45 2009 From: dmitrey.kroshko at scipy.org (dmitrey) Date: Thu, 21 May 2009 08:26:45 -0700 (PDT) Subject: [Numpy-discussion] where are the benefits of ldexp and/or "array times 2"? Message-ID: Hi all, I expected to have some speedup via using ldexp or multiplying an array by a power of 2 (doesn't it have to perform a simple shift of mantissa?), but I don't see the one. Have I done something wrong? See the code below. from scipy import rand from numpy import dot, ones, zeros, array, ldexp from time import time N = 1500 A = rand(N, N) b = rand(N) b2 = 2*ones(A.shape, 'int32') I = 100 t = time() for i in xrange(I): dot(A, b) # N^2 multiplications + some sum operations #A * 2.1 # N^2 multiplications, so it should consume no greater than 1st line time #ldexp(A, b2) # it should consume no greater than prev line time, isn't it? print 'time elapsed:', time() - t # 1st case: 0.62811088562 # 2nd case: 2.00850605965 # 3rd case: 6.79027700424 # Let me also note - # 1) using b = 2 * ones(N) or b = zeros(N) doesn't yield any speedup vs b = rand() # 2) using A * 2.0 (or mere 2) instead of 2.1 doesn't yield any speedup, despite it is exact integer power of 2. From robert.kern at gmail.com Thu May 21 11:46:12 2009 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 21 May 2009 10:46:12 -0500 Subject: [Numpy-discussion] where are the benefits of ldexp and/or "array times 2"? In-Reply-To: References: Message-ID: <3d375d730905210846n5393a22emc2596501231d6e0f@mail.gmail.com> On Thu, May 21, 2009 at 10:26, dmitrey wrote: > Hi all, > I expected to have some speedup via using ldexp or multiplying an > array by a power of 2 (doesn't it have to perform a simple shift of > mantissa?), Addition of the exponent, not shift of the mantissa. > but I don't see the one. I said there *might* be a speedup, but it was probably going to be insignificant. The overhead of using frexp and ldexp probably outweighs any benefits. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From dagss at student.matnat.uio.no Thu May 21 11:59:32 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 21 May 2009 17:59:32 +0200 (CEST) Subject: [Numpy-discussion] ANN: Cython 0.11.2 released In-Reply-To: References: <4A1425F9.9030100@student.matnat.uio.no> Message-ID: <2418c278951e6461a58d1baea226bf95.squirrel@webmail.uio.no> > Where can I find release notes? (It would be helpful if I can point to a > URL as part of the fedora release) > OK, I put my email announcement up here: http://wiki.cython.org/ReleaseNotes-0.11.2 Tell me if you need something else (different format or level of detail -- the list of tickets on trac is always the most accurate thing). Dag Sverre From mhearne at usgs.gov Thu May 21 12:31:28 2009 From: mhearne at usgs.gov (Michael Hearne) Date: Thu, 21 May 2009 10:31:28 -0600 Subject: [Numpy-discussion] memoryerror with numpy.fromfile Message-ID: <6876BB83-8DF6-4588-89F9-9BB2E1CCCDB3@usgs.gov> I am getting a MemoryError from a numpy.fromfile() call in an application I am trying to deploy. Normally I would assume that this would mean that I don't have enough memory available on the system. However, if I run vmstat (Linux) at the same time as my process, I see that I have 3+ Gigabytes of memory free, and no swap space being used. I can't think of a way to track down this problem, so I'm punting to the list. The only thing I can imagine is that someone Python has been allocated X amount of space (very small relative to the memory actually available), and is asking for more than X. I don't know if this is true, or if there is even a way to check it. Sigh. Version info: Linux RedHat 5 kernel 2.6.18-128.1.10.el5PAE Python 2.5.4 -- EPD_Py25 4.3.0 numpy 1.3.0 Can anyone suggest more tests that I can do? Thanks, Mike From pav at iki.fi Thu May 21 13:23:17 2009 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 21 May 2009 17:23:17 +0000 (UTC) Subject: [Numpy-discussion] memoryerror with numpy.fromfile References: <6876BB83-8DF6-4588-89F9-9BB2E1CCCDB3@usgs.gov> Message-ID: Thu, 21 May 2009 10:31:28 -0600, Michael Hearne wrote: > I am getting a MemoryError from a numpy.fromfile() call in an > application I am trying to deploy. Normally I would assume that this > would mean that I don't have enough memory available on the system. > However, if I run vmstat (Linux) at the same time as my process, I see > that I have 3+ Gigabytes of memory free, and no swap space being used. If you are on a 32-bit platform, the maximum addressable memory for a single process is limited to 3 GB, and what can be allocated can be less than this because of memory fragmentation. Also, you should check that you don't have an ulimit set for virtual/RSS memory. -- Pauli Virtanen From charlesr.harris at gmail.com Thu May 21 13:46:11 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Thu, 21 May 2009 11:46:11 -0600 Subject: [Numpy-discussion] memoryerror with numpy.fromfile In-Reply-To: <6876BB83-8DF6-4588-89F9-9BB2E1CCCDB3@usgs.gov> References: <6876BB83-8DF6-4588-89F9-9BB2E1CCCDB3@usgs.gov> Message-ID: On Thu, May 21, 2009 at 10:31 AM, Michael Hearne wrote: > I am getting a MemoryError from a numpy.fromfile() call in an > application I am trying to deploy. Normally I would assume that this > would mean that I don't have enough memory available on the system. > However, if I run vmstat (Linux) at the same time as my process, I see > that I have 3+ Gigabytes of memory free, and no swap space being > used. I can't think of a way to track down this problem, so I'm > punting to the list. The only thing I can imagine is that someone > Python has been allocated X amount of space (very small relative to > the memory actually available), and is asking for more than X. I > don't know if this is true, or if there is even a way to check it. > Sigh. > > Version info: > Linux RedHat 5 kernel 2.6.18-128.1.10.el5PAE > Python 2.5.4 -- EPD_Py25 4.3.0 > numpy 1.3.0 > > Can anyone suggest more tests that I can do? > How big is the file and what type are you importing to? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From gkelly at gmail.com Thu May 21 13:54:21 2009 From: gkelly at gmail.com (Grant Kelly) Date: Thu, 21 May 2009 10:54:21 -0700 Subject: [Numpy-discussion] wiki page correction In-Reply-To: References: <646de7490905201208r273d1acch1868f70f9929b482@mail.gmail.com> Message-ID: <646de7490905211054q59f8e216t1ae925be56043034@mail.gmail.com> It's an immutable page. Can someone who already has access make the edit? On Wed, May 20, 2009 at 2:36 PM, Pauli Virtanen wrote: > Wed, 20 May 2009 12:08:46 -0700, Grant Kelly wrote: > >> I believe there is an error on this wiki page: >> >> http://www.scipy.org/NumPy_for_Matlab_Users >> >> >> MATLAB >> ? y=x(2,:) >> PYTHON >> ? y = x[2,:].copy() >> >> shouldn't the Python version be: >> ? y = x[1,:].copy() >> >> If not, please advise. > > Yes, it should be x[1,:].copy(). Please feel free to correct it. > > -- > Pauli Virtanen > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From mhearne at usgs.gov Thu May 21 14:14:18 2009 From: mhearne at usgs.gov (Michael Hearne) Date: Thu, 21 May 2009 12:14:18 -0600 Subject: [Numpy-discussion] memoryerror with numpy.fromfile In-Reply-To: References: <6876BB83-8DF6-4588-89F9-9BB2E1CCCDB3@usgs.gov> Message-ID: <072ED70C-6EA9-4AB0-88AE-946E96C107AF@usgs.gov> All: Never mind! The file I was attempting to read was part of an (apparently) silently incomplete download, and as such, the file size didn't match the metadata in the header file describing the data file, and I was reading beyond the end of the file. I would submit that MemoryError is perhaps a little misleading for this particular case, but oh well. Thanks for the comments! --Mike On May 21, 2009, at 11:46 AM, Charles R Harris wrote: > > > On Thu, May 21, 2009 at 10:31 AM, Michael Hearne > wrote: > I am getting a MemoryError from a numpy.fromfile() call in an > application I am trying to deploy. Normally I would assume that this > would mean that I don't have enough memory available on the system. > However, if I run vmstat (Linux) at the same time as my process, I see > that I have 3+ Gigabytes of memory free, and no swap space being > used. I can't think of a way to track down this problem, so I'm > punting to the list. The only thing I can imagine is that someone > Python has been allocated X amount of space (very small relative to > the memory actually available), and is asking for more than X. I > don't know if this is true, or if there is even a way to check it. > Sigh. > > Version info: > Linux RedHat 5 kernel 2.6.18-128.1.10.el5PAE > Python 2.5.4 -- EPD_Py25 4.3.0 > numpy 1.3.0 > > Can anyone suggest more tests that I can do? > > How big is the file and what type are you importing to? > > Chuck > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Thu May 21 14:36:24 2009 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 21 May 2009 18:36:24 +0000 (UTC) Subject: [Numpy-discussion] memoryerror with numpy.fromfile References: <6876BB83-8DF6-4588-89F9-9BB2E1CCCDB3@usgs.gov> <072ED70C-6EA9-4AB0-88AE-946E96C107AF@usgs.gov> Message-ID: Thu, 21 May 2009 12:14:18 -0600, Michael Hearne wrote: > All: Never mind! The file I was attempting to read was part of an > (apparently) silently incomplete download, and as such, the file size > didn't match the metadata in the header file describing the data file, > and I was reading beyond the end of the file. > > I would submit that MemoryError is perhaps a little misleading for this > particular case, but oh well. Well, that it raises MemoryError in this case is a bug, so maybe a ticket should be filed. There's another bug associated with reading empty files: http://projects.scipy.org/numpy/ticket/1115 which is maybe related. -- Pauli Virtanen From pav at iki.fi Thu May 21 14:39:12 2009 From: pav at iki.fi (Pauli Virtanen) Date: Thu, 21 May 2009 18:39:12 +0000 (UTC) Subject: [Numpy-discussion] wiki page correction References: <646de7490905201208r273d1acch1868f70f9929b482@mail.gmail.com> <646de7490905211054q59f8e216t1ae925be56043034@mail.gmail.com> Message-ID: Thu, 21 May 2009 10:54:21 -0700, Grant Kelly wrote: > It's an immutable page. Can someone who already has access make the > edit? Umm, are you sure? I don't see any ACL's on the page. (Though you need to register an account on the wiki before editing.) -- Pauli Virtanen From albert.thuswaldner at gmail.com Thu May 21 16:04:00 2009 From: albert.thuswaldner at gmail.com (albert.thuswaldner at gmail.com) Date: Thu, 21 May 2009 20:04:00 +0000 Subject: [Numpy-discussion] Home for pyhdf5io? Message-ID: <000e0cd2a07010d73d046a71a597@google.com> Dear list, I'm writing this because i have developed a small python module that might be of interest to you the readers of this list: http://code.google.com/p/pyhdf5io/ It basically implements load/save functions that mimic the behaviour of those found in Matlab, ie with them you can store your variables from within the interactive shell (IPython, python) or from within a function, and then load them back in again. One important difference is that the hdf5 format is used to store the variables, which comes with aa number of benefits: - a open standard file format which is supported by many applications. - completely portable file format across different platforms. Read more here: http://www.hdfgroup.org/HDF5/whatishdf5.html And now to the question: I think that this module is to small to be developed and maintained on its on, I think It would be better if it could be part of some larger project. So where would pyhdf5io fit in? Any tips and ideas are highly appreciated. Thanks. /Albert -------------- next part -------------- An HTML attachment was scrubbed... URL: From dagss at student.matnat.uio.no Thu May 21 16:09:27 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 21 May 2009 22:09:27 +0200 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: <000e0cd2a07010d73d046a71a597@google.com> References: <000e0cd2a07010d73d046a71a597@google.com> Message-ID: <4A15B4F7.4040906@student.matnat.uio.no> albert.thuswaldner at gmail.com wrote: > Dear list, > I'm writing this because i have developed a small python module that > might be of interest to you the readers of this list: > > http://code.google.com/p/pyhdf5io/ > > It basically implements load/save functions that mimic the behaviour of > those found in Matlab, i.e. with them you can store your variables from > within the interactive shell (IPython, python) or from within a > function, and then load them back in again. One important difference is > that the hdf5 format is used to store the variables, which comes with a > a number of benefits: > - a open standard file format which is supported by many applications. > - completely portable file format across different platforms. > > Read more here: http://www.hdfgroup.org/HDF5/whatishdf5.html > > And now to the question: > > I think that this module is to small to be developed and maintained on > its on, I think It would be better if it could be part of some larger > project. So where would pyhdf5io fit in? > Any tips and ideas are highly appreciate I'd expect to find it in http://h5py.alfven.org/ I think... -- Dag Sverre From robert.kern at gmail.com Thu May 21 16:07:34 2009 From: robert.kern at gmail.com (Robert Kern) Date: Thu, 21 May 2009 15:07:34 -0500 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: <000e0cd2a07010d73d046a71a597@google.com> References: <000e0cd2a07010d73d046a71a597@google.com> Message-ID: <3d375d730905211307q45450ffn7ebbf8c78b6f3713@mail.gmail.com> On Thu, May 21, 2009 at 15:04, wrote: > Dear list, > I'm writing this because i have developed a small python module that might > be of interest to you the readers of this list: > > http://code.google.com/p/pyhdf5io/ > > It basically implements load/save functions that mimic the behaviour of > those found in Matlab, i.e. with them you can store your variables from > within the interactive shell (IPython, python) or from within a function, > and then load them back in again. One important difference is that the hdf5 > format is used to store the variables, which comes with a a number of > benefits: > - a open standard file format which is supported by many applications. > - completely portable file format across different platforms. > > Read more here: http://www.hdfgroup.org/HDF5/whatishdf5.html > > And now to the question: > > I think that this module is to small to be developed and maintained on its > on, I think It would be better if it could be part of some larger project. > So where would pyhdf5io fit in? PyTables, probably. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From dagss at student.matnat.uio.no Thu May 21 16:18:26 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Thu, 21 May 2009 22:18:26 +0200 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: <4A15B4F7.4040906@student.matnat.uio.no> References: <000e0cd2a07010d73d046a71a597@google.com> <4A15B4F7.4040906@student.matnat.uio.no> Message-ID: <4A15B712.7010901@student.matnat.uio.no> Dag Sverre Seljebotn wrote: > albert.thuswaldner at gmail.com wrote: >> Dear list, >> I'm writing this because i have developed a small python module that >> might be of interest to you the readers of this list: >> >> http://code.google.com/p/pyhdf5io/ >> >> It basically implements load/save functions that mimic the behaviour of >> those found in Matlab, i.e. with them you can store your variables from >> within the interactive shell (IPython, python) or from within a >> function, and then load them back in again. One important difference is >> that the hdf5 format is used to store the variables, which comes with a >> a number of benefits: >> - a open standard file format which is supported by many applications. >> - completely portable file format across different platforms. >> >> Read more here: http://www.hdfgroup.org/HDF5/whatishdf5.html >> >> And now to the question: >> >> I think that this module is to small to be developed and maintained on >> its on, I think It would be better if it could be part of some larger >> project. So where would pyhdf5io fit in? >> Any tips and ideas are highly appreciate > > I'd expect to find it in > > http://h5py.alfven.org/ > > I think... > Please disregard this, I didn't notice you had a PyTable dependency. -- Dag Sverre From dwf at cs.toronto.edu Thu May 21 16:38:23 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Thu, 21 May 2009 16:38:23 -0400 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: <000e0cd2a07010d73d046a71a597@google.com> References: <000e0cd2a07010d73d046a71a597@google.com> Message-ID: <8A557145-6C11-4843-A75C-B46ED8F6C496@cs.toronto.edu> Hi Albert, So this is a wrapper on top of PyTables to implement load() and save()? Neat. Obviously if you're installing PyTables, you can do a lot better and organize your data hierarchically without the messiness of Matlab structures, walk the node tree, all kinds of fun stuff, but if you're an expatriate matlab user and just want to save some matrices... this is great. Notably, that was one of my gripes about ipython+numpy+scipy +matplotlib when I first came from Matlab. I think you should send a message to the PyTables list, ask Francesc if he thinks it has a place in PyTables for it as a 'lite' wrapper or something, for people who need to save data but don't need/are intimidated by all the features that PyTables provides. David On 21-May-09, at 4:04 PM, albert.thuswaldner at gmail.com wrote: > Dear list, > I'm writing this because i have developed a small python module that > might be of interest to you the readers of this list: > > http://code.google.com/p/pyhdf5io/ > > It basically implements load/save functions that mimic the behaviour > of those found in Matlab, ie with them you can store your variables > from within the interactive shell (IPython, python) or from within a > function, and then load them back in again. One important difference > is that the hdf5 format is used to store the variables, which comes > with aa number of benefits: > - a open standard file format which is supported by many applications. > - completely portable file format across different platforms. > > Read more here: http://www.hdfgroup.org/HDF5/whatishdf5.html > > And now to the question: > > I think that this module is to small to be developed and maintained > on its on, I think It would be better if it could be part of some > larger project. So where would pyhdf5io fit in? > Any tips and ideas are highly appreciated. > > Thanks. > > /Albert > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From albert.thuswaldner at gmail.com Thu May 21 16:54:08 2009 From: albert.thuswaldner at gmail.com (Albert Thuswaldner) Date: Thu, 21 May 2009 22:54:08 +0200 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: <8A557145-6C11-4843-A75C-B46ED8F6C496@cs.toronto.edu> References: <000e0cd2a07010d73d046a71a597@google.com> <8A557145-6C11-4843-A75C-B46ED8F6C496@cs.toronto.edu> Message-ID: On Thu, May 21, 2009 at 22:38, David Warde-Farley wrote: > Hi Albert, > > So this is a wrapper on top of PyTables to implement load() and > save()? Neat. Yes, you got the idea. in its most simplest form you can type: hdf5save() And all your local variables are saved to a file with the default file name "hdf5io.h5". Of course it also allows you to specify a file name and what variables you would like to save. As it is based on hdf5 you can also store the variables to a certain group within the file (If you know how hdf5 works, you probably know what I'm talking about). Appending data to existing hdf5-files is also possible. > Obviously if you're installing PyTables, you can do a lot better and > organize your data hierarchically without the messiness of Matlab > structures, walk the node tree, all kinds of fun stuff, but if ?you're > an expatriate matlab user and just want to save some matrices... this > is great. Notably, that was one of my gripes about ipython+numpy+scipy > +matplotlib when I first came from Matlab. Exactly! > I think you should send a message to the PyTables list, ask Francesc > if he thinks it has a place in PyTables for it as a 'lite' wrapper or > something, for people who need to save data but don't need/are > intimidated by all the features that PyTables provides. Actually, I just e-mail Francesc, see what he thinks. Thanks for your reply. Also thanks to the others who also have replied /Albert > David > > On 21-May-09, at 4:04 PM, albert.thuswaldner at gmail.com wrote: > >> Dear list, >> I'm writing this because i have developed a small python module that >> might be of interest to you the readers of this list: >> >> http://code.google.com/p/pyhdf5io/ >> >> It basically implements load/save functions that mimic the behaviour >> of those found in Matlab, ie with them you can store your variables >> from within the interactive shell (IPython, python) or from within a >> function, and then load them back in again. One important difference >> is that the hdf5 format is used to store the variables, which comes >> with aa number of benefits: >> - a open standard file format which is supported by many applications. >> - completely portable file format across different platforms. >> >> Read more here: http://www.hdfgroup.org/HDF5/whatishdf5.html >> >> And now to the question: >> >> I think that this module is to small to be developed and maintained >> on its on, I think It would be better if it could be part of some >> larger project. So where would pyhdf5io fit in? >> Any tips and ideas are highly appreciated. >> >> Thanks. >> >> /Albert >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From faltet at pytables.org Fri May 22 04:00:56 2009 From: faltet at pytables.org (Francesc Alted) Date: Fri, 22 May 2009 10:00:56 +0200 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: References: <000e0cd2a07010d73d046a71a597@google.com> Message-ID: <200905221000.56593.faltet@pytables.org> Hello Albert, A Thursday 21 May 2009 22:32:10 escrigu?reu: > Hi, > First of all thanks for your work on PyTables! I think it is excellent > and it has been really nice working with it. > > I'm writing this because i have developed a small python module that > uses pyTables: > > http://code.google.com/p/pyhdf5io/ > > It basically implements load/save functions that mimic the behaviour > of those found in Matlab, i.e. with them you can store your variables > from within the interactive shell (IPython, python) or from within a > function, and then load them back in again. I've been having a look at your module and seems pretty cute. Incidentally, there is another module module that does similar things: http://www.elisanet.fi/ptvirtan/software/hdf5pickle/index.html However, I do like your package better in the sense that it adds more 'magic' to the load/save routines. But maybe you want to have a look at the above: it can give you more ideas, like for example, using CArrays and compression for very large arrays, or Tables for structured arrays. > And now to the question: > > I think that this module is to small to be developed and maintained on > its own. I think It would be better if it could be part of some larger > project, maybe pyTables, I don't know. Sure. I think it could fit perfectly as a module inside PyTables, in the same wave than 'filenode' and 'netcdf3'. Most of your module can be dropped as-is into the PyTables module hierarchy. However, it would be nice if you can write some documentation following the format of User's Guide chapters like the ones about 'filenode' or 'netcdf3' modules. Please, let's continue the discussion in the PyTables list in case we need to. Thanks for contributing! -- Francesc Alted From gregor.thalhammer at gmail.com Fri May 22 05:42:56 2009 From: gregor.thalhammer at gmail.com (Gregor Thalhammer) Date: Fri, 22 May 2009 11:42:56 +0200 Subject: [Numpy-discussion] numpy ufuncs and COREPY - any info? In-Reply-To: References: Message-ID: <4A1673A0.2080503@googlemail.com> dmitrey schrieb: > hi all, > has anyone already tried to compare using an ordinary numpy ufunc vs > that one from corepy, first of all I mean the project > http://socghop.appspot.com/student_project/show/google/gsoc2009/python/t124024628235 > > It would be interesting to know what is speedup for (eg) vec ** 0.5 or > (if it's possible - it isn't pure ufunc) numpy.dot(Matrix, vec). Or > any another example. > I have no experience with the mentioned CorePy, but recently I was playing around with accelerated ufuncs using Intels Math Kernel Library (MKL). These improvements are now part of the numexpr package http://code.google.com/p/numexpr/ Some remarks on possible speed improvements on recent Intel x86 processors. 1) basic arithmetic ufuncs (add, sub, mul, ...) in standard numpy are fast (SSE is used) and speed is limited by memory bandwidth. 2) the speed of many transcendental functions (exp, sin, cos, pow, ...) can be improved by _roughly_ a factor of five (single core) by using the MKL. Most of the improvements stem from using faster algorithms with a vectorized implementation. Note: the speed improvement depends on a _lot_ of other circumstances. 3) Improving performance by using multi cores is much more difficult. Only for sufficiently large (>1e5) arrays a significant speedup is possible. Where a speed gain is possible, the MKL uses several cores. Some experimentation showed that adding a few OpenMP constructs you could get a similar speedup with numpy. 4) numpy.dot uses optimized implementations. Gregor From gregor.thalhammer at gmail.com Fri May 22 05:55:31 2009 From: gregor.thalhammer at gmail.com (Gregor Thalhammer) Date: Fri, 22 May 2009 11:55:31 +0200 Subject: [Numpy-discussion] where are the benefits of ldexp and/or "array times 2"? In-Reply-To: References: Message-ID: <4A167693.1090901@googlemail.com> dmitrey schrieb: > Hi all, > I expected to have some speedup via using ldexp or multiplying an > array by a power of 2 (doesn't it have to perform a simple shift of > mantissa?), but I don't see the one. > > # Let me also note - > # 1) using b = 2 * ones(N) or b = zeros(N) doesn't yield any speedup > vs b = rand() > # 2) using A * 2.0 (or mere 2) instead of 2.1 doesn't yield any > speedup, despite it is exact integer power of 2. > On recent processors multiplication is very fast and takes 1.5 clock cycles (float, double precision), independent of the values. There is very little gain by using bit shift operators. Gregor From faltet at pytables.org Fri May 22 06:08:21 2009 From: faltet at pytables.org (Francesc Alted) Date: Fri, 22 May 2009 12:08:21 +0200 Subject: [Numpy-discussion] numpy ufuncs and COREPY - any info? In-Reply-To: <4A1673A0.2080503@googlemail.com> References: <4A1673A0.2080503@googlemail.com> Message-ID: <200905221208.21567.faltet@pytables.org> A Friday 22 May 2009 11:42:56 Gregor Thalhammer escrigu?: > dmitrey schrieb: > > hi all, > > has anyone already tried to compare using an ordinary numpy ufunc vs > > that one from corepy, first of all I mean the project > > http://socghop.appspot.com/student_project/show/google/gsoc2009/python/t1 > >24024628235 > > > > It would be interesting to know what is speedup for (eg) vec ** 0.5 or > > (if it's possible - it isn't pure ufunc) numpy.dot(Matrix, vec). Or > > any another example. > > I have no experience with the mentioned CorePy, but recently I was > playing around with accelerated ufuncs using Intels Math Kernel Library > (MKL). These improvements are now part of the numexpr package > http://code.google.com/p/numexpr/ > Some remarks on possible speed improvements on recent Intel x86 processors. > 1) basic arithmetic ufuncs (add, sub, mul, ...) in standard numpy are > fast (SSE is used) and speed is limited by memory bandwidth. > 2) the speed of many transcendental functions (exp, sin, cos, pow, ...) > can be improved by _roughly_ a factor of five (single core) by using the > MKL. Most of the improvements stem from using faster algorithms with a > vectorized implementation. Note: the speed improvement depends on a > _lot_ of other circumstances. > 3) Improving performance by using multi cores is much more difficult. > Only for sufficiently large (>1e5) arrays a significant speedup is > possible. Where a speed gain is possible, the MKL uses several cores. > Some experimentation showed that adding a few OpenMP constructs you > could get a similar speedup with numpy. > 4) numpy.dot uses optimized implementations. Good points Gregor. However, I wouldn't say that improving performance by using multi cores is *that* difficult, but rather that multi cores can only be used efficiently *whenever* the memory bandwith is not a limitation. An example of this is the computation of transcendental functions, where, even using vectorized implementations, the computation speed is still CPU-bounded in many cases. And you have experimented yourself very good speed-ups for these cases with your implementation of numexpr/MKL :) Cheers, -- Francesc Alted From faltet at pytables.org Fri May 22 06:16:55 2009 From: faltet at pytables.org (Francesc Alted) Date: Fri, 22 May 2009 12:16:55 +0200 Subject: [Numpy-discussion] where are the benefits of ldexp and/or "array times 2"? In-Reply-To: <4A167693.1090901@googlemail.com> References: <4A167693.1090901@googlemail.com> Message-ID: <200905221216.55371.faltet@pytables.org> A Friday 22 May 2009 11:55:31 Gregor Thalhammer escrigu?: > dmitrey schrieb: > > Hi all, > > I expected to have some speedup via using ldexp or multiplying an > > array by a power of 2 (doesn't it have to perform a simple shift of > > mantissa?), but I don't see the one. > > > > # Let me also note - > > # 1) using b = 2 * ones(N) or b = zeros(N) doesn't yield any speedup > > vs b = rand() > > # 2) using A * 2.0 (or mere 2) instead of 2.1 doesn't yield any > > speedup, despite it is exact integer power of 2. > > On recent processors multiplication is very fast and takes 1.5 clock > cycles (float, double precision), independent of the values. There is > very little gain by using bit shift operators. ...unless you use the vectorization capabilities of modern Intel-compatible processors and shift data in bunches of up to 4 elements (i.e. the number of floats that fits on a 128-bit SSE2 register), in which case you can perform operations up to a speed of 0.25 cycles/element. Indeed, that requires dealing with SSE2 instructions in your code, but using latest GCC, ICC or MSVC implementations, this is not that difficult. Cheers, -- Francesc Alted From pav at iki.fi Fri May 22 06:24:50 2009 From: pav at iki.fi (Pauli Virtanen) Date: Fri, 22 May 2009 10:24:50 +0000 (UTC) Subject: [Numpy-discussion] Home for pyhdf5io? References: <000e0cd2a07010d73d046a71a597@google.com> <200905221000.56593.faltet@pytables.org> Message-ID: Fri, 22 May 2009 10:00:56 +0200, Francesc Alted kirjoitti: [clip: pyhdf5io] > I've been having a look at your module and seems pretty cute. > Incidentally, there is another module module that does similar things: > > http://www.elisanet.fi/ptvirtan/software/hdf5pickle/index.html > > However, I do like your package better in the sense that it adds more > 'magic' to the load/save routines. But maybe you want to have a look at > the above: it can give you more ideas, like for example, using CArrays > and compression for very large arrays, or Tables for structured arrays. I don't think these two are really comparable. The significant difference appears to be that pyhdf5io is a thin wrapper for File.createArray, so when it encounters non-array objects, it will pickle them to strings, and save the strings to the HDF5 file. Hdf5pickle, OTOH, implements the pickle protocol, and will unwrap non- array objects so that all their attributes etc. are exposed in the hdf5 file and can be read by non-Python applications. -- Pauli Virtanen From afriedle at indiana.edu Fri May 22 07:52:46 2009 From: afriedle at indiana.edu (Andrew Friedley) Date: Fri, 22 May 2009 07:52:46 -0400 Subject: [Numpy-discussion] numpy ufuncs and COREPY - any info? In-Reply-To: References: Message-ID: <4A16920E.8060805@indiana.edu> (sending again) Hi, I'm the student doing the project. I have a blog here, which contains some initial performance numbers for a couple test ufuncs I did: http://numcorepy.blogspot.com It's really too early yet to give definitive results though; GSoC officially starts in two days :) What I'm finding is that the existing ufuncs are already pretty fast; it appears right now that the main limitation is memory bandwidth. If that's really the case, the performance gains I'll get will be through cache tricks (non-temporal loads/stores), reducing memory accesses and using multiple cores to get more bandwidth. Another alternative we've talked about, and I (more and more likely) may look into is composing multiple operations together into a single ufunc. Again the main idea being that memory accesses can be reduced/eliminated. Andrew dmitrey wrote: > hi all, > has anyone already tried to compare using an ordinary numpy ufunc vs > that one from corepy, first of all I mean the project > http://socghop.appspot.com/student_project/show/google/gsoc2009/python/t124024628235 > > It would be interesting to know what is speedup for (eg) vec ** 0.5 or > (if it's possible - it isn't pure ufunc) numpy.dot(Matrix, vec). Or > any another example. > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From afriedle at indiana.edu Fri May 22 07:59:17 2009 From: afriedle at indiana.edu (Andrew Friedley) Date: Fri, 22 May 2009 07:59:17 -0400 Subject: [Numpy-discussion] numpy ufuncs and COREPY - any info? In-Reply-To: <200905221208.21567.faltet@pytables.org> References: <4A1673A0.2080503@googlemail.com> <200905221208.21567.faltet@pytables.org> Message-ID: <4A169395.6020604@indiana.edu> Francesc Alted wrote: > A Friday 22 May 2009 11:42:56 Gregor Thalhammer escrigu?: >> dmitrey schrieb: >> 3) Improving performance by using multi cores is much more difficult. >> Only for sufficiently large (>1e5) arrays a significant speedup is >> possible. Where a speed gain is possible, the MKL uses several cores. >> Some experimentation showed that adding a few OpenMP constructs you >> could get a similar speedup with numpy. >> 4) numpy.dot uses optimized implementations. > > Good points Gregor. However, I wouldn't say that improving performance by > using multi cores is *that* difficult, but rather that multi cores can only be > used efficiently *whenever* the memory bandwith is not a limitation. An > example of this is the computation of transcendental functions, where, even > using vectorized implementations, the computation speed is still CPU-bounded > in many cases. And you have experimented yourself very good speed-ups for > these cases with your implementation of numexpr/MKL :) Using multiple cores is pretty easy for element-wise ufuncs; no communication needs to occur and the work partitioning is trivial. And actually I've found with some initial testing that multiple cores does still help when you are memory bound. I don't fully understand why yet, though I have some ideas. One reason is multiple memory controllers due to multiple sockets (ie opteron). Another is that each thread is pulling memory from a different bank, utilizing more bandwidth than a single sequential thread could. However if that's the case, we could possibly come up with code for a single thread that achieves (nearly) the same additional throughput.. Andrew From faltet at pytables.org Fri May 22 08:17:50 2009 From: faltet at pytables.org (Francesc Alted) Date: Fri, 22 May 2009 14:17:50 +0200 Subject: [Numpy-discussion] numpy ufuncs and COREPY - any info? In-Reply-To: <4A169395.6020604@indiana.edu> References: <200905221208.21567.faltet@pytables.org> <4A169395.6020604@indiana.edu> Message-ID: <200905221417.50613.faltet@pytables.org> A Friday 22 May 2009 13:59:17 Andrew Friedley escrigu?: > Using multiple cores is pretty easy for element-wise ufuncs; no > communication needs to occur and the work partitioning is trivial. And > actually I've found with some initial testing that multiple cores does > still help when you are memory bound. I don't fully understand why yet, > though I have some ideas. One reason is multiple memory controllers due > to multiple sockets (ie opteron). Yeah. I think this must likely be the reason. If, as in your case, you have several independent paths from different processors to your data, then you can achieve speed-ups even if you are having a memory bound in a one-processor scenario. > Another is that each thread is > pulling memory from a different bank, utilizing more bandwidth than a > single sequential thread could. However if that's the case, we could > possibly come up with code for a single thread that achieves (nearly) > the same additional throughput.. Well, I don't think you can achieve important speed-ups in this case, but experimenting never hurts :) Good luck! -- Francesc Alted From faltet at pytables.org Fri May 22 08:33:18 2009 From: faltet at pytables.org (Francesc Alted) Date: Fri, 22 May 2009 14:33:18 +0200 Subject: [Numpy-discussion] numpy ufuncs and COREPY - any info? In-Reply-To: <4A16920E.8060805@indiana.edu> References: <4A16920E.8060805@indiana.edu> Message-ID: <200905221433.18551.faltet@pytables.org> A Friday 22 May 2009 13:52:46 Andrew Friedley escrigu?: > (sending again) > > Hi, > > I'm the student doing the project. I have a blog here, which contains > some initial performance numbers for a couple test ufuncs I did: > > http://numcorepy.blogspot.com > > It's really too early yet to give definitive results though; GSoC > officially starts in two days :) What I'm finding is that the existing > ufuncs are already pretty fast; it appears right now that the main > limitation is memory bandwidth. If that's really the case, the > performance gains I'll get will be through cache tricks (non-temporal > loads/stores), reducing memory accesses and using multiple cores to get > more bandwidth. > > Another alternative we've talked about, and I (more and more likely) may > look into is composing multiple operations together into a single ufunc. > Again the main idea being that memory accesses can be reduced/eliminated. IMHO, composing multiple operations together is the most promising venue for leveraging current multicore systems. Another interesting approach is to implement costly operations (from the point of view of CPU resources), namely, transcendental functions like sin, cos or tan, but also others like sqrt or pow) in a parallel way. If besides, you can combine this with vectorized versions of them (by using the well spread SSE2 instruction set, see [1] for an example), then you would be able to achieve really good results for sure (at least Intel did with its VML library ;) [1] http://gruntthepeon.free.fr/ssemath/ Cheers, -- Francesc Alted From andrea.gavana at gmail.com Fri May 22 12:31:19 2009 From: andrea.gavana at gmail.com (Andrea Gavana) Date: Fri, 22 May 2009 17:31:19 +0100 Subject: [Numpy-discussion] List/location of consecutive integers Message-ID: Hi All, this should be a very easy question but I am trying to make a script run as fast as possible, so please bear with me if the solution is easy and I just overlooked it. I have a list of integers, like this one: indices = [1,2,3,4,5,6,7,8,9,255,256,257,258,10001,10002,10003,10004] >From this list, I would like to find out which values are consecutive and store them in another list of tuples (begin_consecutive, end_consecutive) or a simple list: as an example, the previous list will become: new_list = [(1, 9), (255, 258), (10001, 10004)] I can do it with for loops, but I am trying to speed up a fotran-based routine which I wrap with f2py (ideally I would like to do this step in Fortran too, so if you have a suggestion on how to do it also in Fortran it would be more than welcome). Do you have any suggestions? Thank you for your time. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ http://thedoomedcity.blogspot.com/ From josef.pktd at gmail.com Fri May 22 12:53:19 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 22 May 2009 12:53:19 -0400 Subject: [Numpy-discussion] List/location of consecutive integers In-Reply-To: References: Message-ID: <1cd32cbb0905220953u279f0303m59da1535921f8544@mail.gmail.com> On Fri, May 22, 2009 at 12:31 PM, Andrea Gavana wrote: > Hi All, > > ? ?this should be a very easy question but I am trying to make a > script run as fast as possible, so please bear with me if the solution > is easy and I just overlooked it. > > I have a list of integers, like this one: > > indices = [1,2,3,4,5,6,7,8,9,255,256,257,258,10001,10002,10003,10004] > > >From this list, I would like to find out which values are consecutive > and store them in another list of tuples (begin_consecutive, > end_consecutive) or a simple list: as an example, the previous list > will become: > > new_list = [(1, 9), (255, 258), (10001, 10004)] > > I can do it with for loops, but I am trying to speed up a fotran-based > routine which I wrap with f2py (ideally I would like to do this step > in Fortran too, so if you have a suggestion on how to do it also in > Fortran it would be more than welcome). Do you have any suggestions? > > Thank you for your time. > something along the line of: >>> indices = np.array([1,2,3,4,5,6,7,8,9,255,256,257,258,10001,10002,10003,10004]) >>> idx = (np.diff(indices) != 1).nonzero()[0] >>> idx array([ 8, 12]) >>> idxf = np.hstack((-1,idx,len(indices)-1)) >>> vmin = indices[idxf[:-1]+1] >>> vmax = indices[idxf[1:]] >>> zip(vmin,vmax) [(1, 9), (255, 258), (10001, 10004)] Josef From Chris.Barker at noaa.gov Fri May 22 13:03:05 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri, 22 May 2009 10:03:05 -0700 Subject: [Numpy-discussion] List/location of consecutive integers In-Reply-To: References: Message-ID: <4A16DAC9.2090004@noaa.gov> Andrea Gavana wrote: > I have a list of integers, like this one: > > indices = [1,2,3,4,5,6,7,8,9,255,256,257,258,10001,10002,10003,10004] > >>From this list, I would like to find out which values are consecutive > and store them in another list of tuples (begin_consecutive, > end_consecutive) or a simple list: as an example, the previous list > will become: > > new_list = [(1, 9), (255, 258), (10001, 10004)] Is this faster? In [102]: indices = np.array([1,2,3,4,5,6,7,8,9,255,256,257,258,10001,10002,10003,10004,sys.maxint]) In [103]: breaks = np.diff(indices) != 1 In [104]: zip(indices[np.r_[True, breaks[:-1]]], indices[breaks]) Out[104]: [(1, 9), (255, 258), (10001, 10004)] -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From dwf at cs.toronto.edu Fri May 22 15:59:48 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Fri, 22 May 2009 15:59:48 -0400 Subject: [Numpy-discussion] List/location of consecutive integers In-Reply-To: <4A16DAC9.2090004@noaa.gov> References: <4A16DAC9.2090004@noaa.gov> Message-ID: <237A4B0E-E8EC-401F-A8C9-D7A24F9DC47A@cs.toronto.edu> On 22-May-09, at 1:03 PM, Christopher Barker wrote: > In [104]: zip(indices[np.r_[True, breaks[:-1]]], indices[breaks]) I don't think this is very general: In [53]: indices Out[53]: array([ -3, 1, 2, 3, 4, 5, 6, 7, 8, 9, 255, 256, 257, 258, 10001, 10002, 10003, 10004]) In [54]: breaks = diff(indices) != 1 In [55]: zip(indices[np.r_[True, breaks[:-1]]], indices[breaks]) Out[55]: [(-3, -3), (1, 9), (255, 258)] From faltet at pytables.org Fri May 22 16:04:38 2009 From: faltet at pytables.org (Francesc Alted) Date: Fri, 22 May 2009 22:04:38 +0200 Subject: [Numpy-discussion] Join us for "Scientific Computing with Python Webinar" In-Reply-To: <2355F1D0-DD01-4BD1-8482-FDDC6FEE6C91@enthought.com> References: <1437076956.5204661242825355676.JavaMail.root@g2mp1br2.las.expertcity.com> <2355F1D0-DD01-4BD1-8482-FDDC6FEE6C91@enthought.com> Message-ID: <200905222204.38273.faltet@pytables.org> A Wednesday 20 May 2009 16:45:12 Travis Oliphant escrigu?: > Hello all Python users: > > I am pleased to announce the beginning of a free Webinar series that > discusses using Python for scientific computing. Enthought will host > this free series which will take place once a month for 30-45 > minutes. The schedule and length may change based on participation > feedback, but for now it is scheduled for the fourth Friday of every > month. This free webinar should not be confused with the EPD > webinar on the first Friday of each month which is open only to > subscribers to the Enthought Python Distribution. Mmh, I'm trying to connect, but it seems that Linux is not supported for this sort of webinars: To join the Webinar, please use one of the following supported operating systems: ? Windows? 2000, XP Pro, XP Home, 2003 Server, Vista ? Mac OS? X, Panther? 10.3.9, Tiger? 10.4.5 or higher Too bad :-/ -- Francesc Alted From efiring at hawaii.edu Fri May 22 16:09:52 2009 From: efiring at hawaii.edu (Eric Firing) Date: Fri, 22 May 2009 10:09:52 -1000 Subject: [Numpy-discussion] Join us for "Scientific Computing with Python Webinar" In-Reply-To: <200905222204.38273.faltet@pytables.org> References: <1437076956.5204661242825355676.JavaMail.root@g2mp1br2.las.expertcity.com> <2355F1D0-DD01-4BD1-8482-FDDC6FEE6C91@enthought.com> <200905222204.38273.faltet@pytables.org> Message-ID: <4A170690.1000500@hawaii.edu> Francesc Alted wrote: > A Wednesday 20 May 2009 16:45:12 Travis Oliphant escrigu?: >> Hello all Python users: >> >> I am pleased to announce the beginning of a free Webinar series that >> discusses using Python for scientific computing. Enthought will host >> this free series which will take place once a month for 30-45 >> minutes. The schedule and length may change based on participation >> feedback, but for now it is scheduled for the fourth Friday of every >> month. This free webinar should not be confused with the EPD >> webinar on the first Friday of each month which is open only to >> subscribers to the Enthought Python Distribution. > > Mmh, I'm trying to connect, but it seems that Linux is not supported for this > sort of webinars: > > To join the Webinar, please use one of the following supported operating > systems: > ? Windows? 2000, XP Pro, XP Home, 2003 Server, Vista > ? Mac OS? X, Panther? 10.3.9, Tiger? 10.4.5 or higher > > Too bad :-/ > See comments 3, 4, and 5 in the blog: http://blog.enthought.com/?p=116 Eric From pgmdevlist at gmail.com Fri May 22 16:15:19 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 22 May 2009 16:15:19 -0400 Subject: [Numpy-discussion] List/location of consecutive integers In-Reply-To: References: Message-ID: <841DC85A-3B49-4D8A-8D73-BF2AAD18D645@gmail.com> On May 22, 2009, at 12:31 PM, Andrea Gavana wrote: > Hi All, > > this should be a very easy question but I am trying to make a > script run as fast as possible, so please bear with me if the solution > is easy and I just overlooked it. > > I have a list of integers, like this one: > > indices = [1,2,3,4,5,6,7,8,9,255,256,257,258,10001,10002,10003,10004] > >> From this list, I would like to find out which values are consecutive > and store them in another list of tuples (begin_consecutive, > end_consecutive) or a simple list: as an example, the previous list > will become: > > new_list = [(1, 9), (255, 258), (10001, 10004)] Josef's and Chris's solutions are pretty neat in this case. I've been recently working on a more generic case where integers are grouped depending on some condition (equals, differing by 1 or 2...). A version in pure Python/numpy, the `Cluster` class is available in scikits.hydroclimpy.core.tools (hydroclimpy.sourceforge.net). Otherwise, here's a Cython version of the same class. Let me know if it works. And I'm not ultra happy with the name, so if you have any suggestions... cdef class Brackets: """ Groups consecutive data from an array according to a clustering condition. A cluster is defined as a group of consecutive values differing by at most the increment value. Missing values are **not** handled: the input sequence must therefore be free of missing values. Parameters ---------- darray : ndarray Input data array to clusterize. increment : {float}, optional Increment between two consecutive values to group. By default, use a value of 1. operator : {function}, optional Comparison operator for the definition of clusters. By default, use :func:`numpy.less_equal`. Attributes ---------- inishape Shape of the argument array (stored for resizing). inisize Size of the argument array. uniques : sequence List of unique cluster values, as they appear in chronological order. slices : sequence List of the slices corresponding to each cluster of data. starts : ndarray Lists of the indices at which the clusters start. ends : ndarray Lists of the indices at which the clusters end. clustered : list List of clustered data. Examples -------- >>> A = [0, 0, 1, 2, 2, 2, 3, 4, 3, 4, 4, 4] >>> klust = cluster(A,0) >>> [list(_) for _ in klust.clustered] [[0, 0], [1], [2, 2, 2], [3], [4], [3], [4, 4, 4]] >>> klust.uniques array([0, 1, 2, 3, 4, 3, 4]) >>> x = [ 1.8, 1.3, 2.4, 1.2, 2.5, 3.9, 1. , 3.8, 4.2, 3.3, ... 1.2, 0.2, 0.9, 2.7, 2.4, 2.8, 2.7, 4.7, 4.2, 0.4] >>> Brackets(x,1).starts array([ 0, 2, 3, 4, 5, 6, 7, 10, 11, 13, 17, 19]) >>> Brackets(x,1.5).starts array([ 0, 6, 7, 10, 13, 17, 19]) >>> Brackets(x,2.5).starts array([ 0, 6, 7, 19]) >>> Brackets(x,2.5,greater).starts array([ 0, 1, 2, 3, 4, 5, 8, 9, 10, ... 11, 12, 13, 14, 15, 16, 17, 18]) >>> y = [ 0, -1, 0, 0, 0, 1, 1, -1, -1, -1, 1, 1, 0, 0, 0, 0, 1, 1, 0, 0] >>> Brackets(y,1).starts array([ 0, 1, 2, 5, 7, 10, 12, 16, 18]) """ cdef readonly double increment cdef readonly np.ndarray data cdef readonly list _starts cdef readonly list _ends def __init__(Brackets self, object data, double increment=1, object operator=np.less_equal): """ """ cdef int i, n, ifirst, ilast, test cdef double last cdef list starts, ends # self.increment = increment self.data = np.asanyarray(data) data = np.asarray(data) # n = len(data) starts = [] ends = [] # last = data[0] ifirst = 0 ilast = 0 for 1 <= i < n: test = operator(abs(data[i] - last), increment) ilast = i if not test: starts.append(ifirst) ends.append(ilast-1) ifirst = i last = data[i] starts.append(ifirst) ends.append(n-1) self._starts = starts self._ends = ends def __len__(self): return len(self.starts) property starts: # def __get__(Brackets self): return np.asarray(self._starts) property ends: # def __get__(Brackets self): return np.asarray(self._ends) property sizes: # def __get__(Brackets self): return np.asarray(self._ends) - np.asarray(self._firsts) property slices: # def __get__(Brackets self): cdef int i cdef list starts = self._starts, ends = self._ends cdef list slices = [] for 0 <= i < len(starts): slices.append(slice(starts[i], ends[i]+1)) return slices property clustered: # def __get__(self): cdef int i cdef list starts = self._starts, ends = self._ends cdef list groups = [] cdef np.ndarray data = self.data for 0 <= i < len(starts): groups.append(data[starts[i]:ends[i]+1]) return groups property uniques: def __get__(self): return self.data[self.starts] def grouped_slices(self): """ Returns a dictionary with the unique values of ``self`` as keys, and a list of slices for the corresponding values. See Also -------- Brackets.grouped_limits that does the same thing """ # Define shortcuts cdef int i, ifirst, n = len(self) cdef list starts = self._starts, ends = self._ends cdef np.ndarray data = self.data # Define new variables cdef list seen = [] cdef double value cdef dict grouped = {} for 0 <= i < n: ifirst = starts[i] value = data[ifirst] if not (value in seen): grouped[value] = [] seen.append(value) grouped[value].append(slice(ifirst, ends[i]+1)) return grouped def grouped_limits(self): """ Returns a dictionary with the unique values of ``self`` as keys, and a list of tuples (starting index, ending index) for the corresponding values. See Also -------- Cluster.grouped_slices """ # Define shortcuts cdef int i, ifirst, n = len(self) cdef list starts = self.starts, ends = self.ends cdef np.ndarray data = self.data # Define new variables cdef list seen = [] cdef double value cdef dict grouped = {} for 0 <= i < n: ifirst = starts[i] value = data[ifirst] if not (value in seen): grouped[value] = [] seen.append(value) grouped[value].append((ifirst, ends[i]+1)) for k in grouped: grouped[k] = np.asarray(grouped[k]) return grouped From josef.pktd at gmail.com Fri May 22 16:27:38 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Fri, 22 May 2009 16:27:38 -0400 Subject: [Numpy-discussion] List/location of consecutive integers In-Reply-To: <237A4B0E-E8EC-401F-A8C9-D7A24F9DC47A@cs.toronto.edu> References: <4A16DAC9.2090004@noaa.gov> <237A4B0E-E8EC-401F-A8C9-D7A24F9DC47A@cs.toronto.edu> Message-ID: <1cd32cbb0905221327m13295edk199eaa4aa3978870@mail.gmail.com> On Fri, May 22, 2009 at 3:59 PM, David Warde-Farley wrote: > On 22-May-09, at 1:03 PM, Christopher Barker wrote: > >> In [104]: zip(indices[np.r_[True, breaks[:-1]]], indices[breaks]) > > > > I don't think this is very general: > > In [53]: indices > Out[53]: > array([ ? -3, ? ? 1, ? ? 2, ? ? 3, ? ? 4, ? ? 5, ? ? 6, ? ? 7, ? ? 8, > ? ? ? ? ? ?9, ? 255, ? 256, ? 257, ? 258, 10001, 10002, 10003, 10004]) > > In [54]: breaks = diff(indices) != 1 > > In [55]: zip(indices[np.r_[True, breaks[:-1]]], indices[breaks]) > Out[55]: [(-3, -3), (1, 9), (255, 258)] > this still works: >>> indices = np.array([-5,-4,-3,1,1,2,3,4,5,6,7,8,9,255,256,257,258,10001,10002,10003,10004]) >>> idx = (np.diff(indices) != 1).nonzero()[0] >>> idxf = np.hstack((-1,idx,len(indices)-1)) >>> vmin = indices[idxf[:-1]+1] >>> vmax = indices[idxf[1:]] >>> zip(vmin,vmax) [(-5, -3), (1, 1), (1, 9), (255, 258), (10001, 10004)] Josef From Chris.Barker at noaa.gov Fri May 22 18:13:00 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri, 22 May 2009 15:13:00 -0700 Subject: [Numpy-discussion] List/location of consecutive integers In-Reply-To: <237A4B0E-E8EC-401F-A8C9-D7A24F9DC47A@cs.toronto.edu> References: <4A16DAC9.2090004@noaa.gov> <237A4B0E-E8EC-401F-A8C9-D7A24F9DC47A@cs.toronto.edu> Message-ID: <4A17236C.2020000@noaa.gov> David Warde-Farley wrote: > I don't think this is very general: > > In [53]: indices > Out[53]: > array([ -3, 1, 2, 3, 4, 5, 6, 7, 8, > 9, 255, 256, 257, 258, 10001, 10002, 10003, 10004]) > > In [54]: breaks = diff(indices) != 1 > > In [55]: zip(indices[np.r_[True, breaks[:-1]]], indices[breaks]) > Out[55]: [(-3, -3), (1, 9), (255, 258)] that's why I put a sys.maxint at the end of the series... In [13]: indices = np.array([ -3, 1, 2, 3, 4, 5, 6, 7, 8, 9, 255, 256, 257, 258, 10001, 10002, 10003, 10004, sys.maxint]) In [15]: breaks = np.diff(indices) != 1 In [16]: zip(indices[np.r_[True, breaks[:-1]]], indices[breaks]) Out[16]: [(-3, -3), (1, 9), (255, 258), (10001, 10004)] Though that's probably not very robust! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Fri May 22 18:15:15 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Fri, 22 May 2009 15:15:15 -0700 Subject: [Numpy-discussion] List/location of consecutive integers In-Reply-To: <841DC85A-3B49-4D8A-8D73-BF2AAD18D645@gmail.com> References: <841DC85A-3B49-4D8A-8D73-BF2AAD18D645@gmail.com> Message-ID: <4A1723F3.7010905@noaa.gov> Pierre GM wrote: > scikits.hydroclimpy.core.tools (hydroclimpy.sourceforge.net). whoa! Why didn't I ever see that before. Here I am , witting a whole bunch of my own code to deal with time series of meteorological data.... argg! Now I need to go dig into that more. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pgmdevlist at gmail.com Fri May 22 18:17:36 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Fri, 22 May 2009 18:17:36 -0400 Subject: [Numpy-discussion] List/location of consecutive integers In-Reply-To: <4A1723F3.7010905@noaa.gov> References: <841DC85A-3B49-4D8A-8D73-BF2AAD18D645@gmail.com> <4A1723F3.7010905@noaa.gov> Message-ID: <517D5658-B7D6-46D8-A202-C0B2B2DB6513@gmail.com> On May 22, 2009, at 6:15 PM, Christopher Barker wrote: > Pierre GM wrote: >> scikits.hydroclimpy.core.tools (hydroclimpy.sourceforge.net). > > whoa! Why didn't I ever see that before. Here I am , witting a whole > bunch of my own code to deal with time series of meteorological > data.... > argg! > > Now I need to go dig into that more. You def'ny need to send me some feedback as well. Let's chat about it offline and whip up something... I'm sure that the Powers-That-Pay-Me would be happy to know that the code I've been writing is used by the Powers-That-Pay-Them... From dwf at cs.toronto.edu Fri May 22 18:28:05 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Fri, 22 May 2009 18:28:05 -0400 Subject: [Numpy-discussion] List/location of consecutive integers In-Reply-To: <4A17236C.2020000@noaa.gov> References: <4A16DAC9.2090004@noaa.gov> <237A4B0E-E8EC-401F-A8C9-D7A24F9DC47A@cs.toronto.edu> <4A17236C.2020000@noaa.gov> Message-ID: <37C782A0-2BF8-442F-9741-003536F1FB7F@cs.toronto.edu> On 22-May-09, at 6:13 PM, Christopher Barker wrote: > that's why I put a sys.maxint at the end of the series... Oops! I foolishly assumed the sequence was unaltered. That makes a lot more sense. David From charlesr.harris at gmail.com Fri May 22 23:19:26 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Fri, 22 May 2009 21:19:26 -0600 Subject: [Numpy-discussion] Inconsistent error messages. Message-ID: Hi All, Currently fromfile prints a message raises a MemoryError when more items are requested than read, but fromstring raises a value error: In [8]: fromstring("", count=10) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) /home/charris/ in () ValueError: string is smaller than requested size In [9]: fromfile("empty.dat", count=10) 10 items requested but only 0 read --------------------------------------------------------------------------- MemoryError Traceback (most recent call last) /home/charris/ in () MemoryError: I think fromfile should also raise a ValueError in this case. Thoughts? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From albert.thuswaldner at gmail.com Sat May 23 05:36:23 2009 From: albert.thuswaldner at gmail.com (Albert Thuswaldner) Date: Sat, 23 May 2009 11:36:23 +0200 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: References: <000e0cd2a07010d73d046a71a597@google.com> <200905221000.56593.faltet@pytables.org> Message-ID: Thank you pauli for your input. I agree with you that our projects have different goals, even if they touch on the same subject. The goal of pyhdf5io is to provide a very simple interface, so that the user can save his/her data. The reasone for picking hdf5 was of course also to enable the use of the data in other programs. Here i mostly tought on simple nummerical data, arrays etc. Not so much on complex datatypes that python provides. So i guess in the long term i have to also add pickling support. In the short term i will add warnings for the data types that are not supported. /Albert 2009/5/22, Pauli Virtanen : > Fri, 22 May 2009 10:00:56 +0200, Francesc Alted kirjoitti: > [clip: pyhdf5io] >> I've been having a look at your module and seems pretty cute. >> Incidentally, there is another module module that does similar things: >> >> http://www.elisanet.fi/ptvirtan/software/hdf5pickle/index.html >> >> However, I do like your package better in the sense that it adds more >> 'magic' to the load/save routines. But maybe you want to have a look at >> the above: it can give you more ideas, like for example, using CArrays >> and compression for very large arrays, or Tables for structured arrays. > > I don't think these two are really comparable. The significant difference > appears to be that pyhdf5io is a thin wrapper for File.createArray, so > when it encounters non-array objects, it will pickle them to strings, and > save the strings to the HDF5 file. > > Hdf5pickle, OTOH, implements the pickle protocol, and will unwrap non- > array objects so that all their attributes etc. are exposed in the hdf5 > file and can be read by non-Python applications. > > -- > Pauli Virtanen > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > -- Skickat fr?n min mobila enhet From stefan at sun.ac.za Sat May 23 05:49:00 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sat, 23 May 2009 11:49:00 +0200 Subject: [Numpy-discussion] Inconsistent error messages. In-Reply-To: References: Message-ID: <9457e7c80905230249k39cd9965q784b80b5afa0f7dc@mail.gmail.com> 2009/5/23 Charles R Harris : > In [9]: fromfile("empty.dat", count=10) > 10 items requested but only 0 read > --------------------------------------------------------------------------- > MemoryError?????????????????????????????? Traceback (most recent call last) > > /home/charris/ in () > > MemoryError: > > I think fromfile should also raise a ValueError in this case. Thoughts? I'm also wondering why the MemoryError has an empty message string -- that should be fixed. Instead of throwing errors in these scenarios, we could just return the elements read and raise a warning? This is consistent with most other file APIs I know and allows you to read blocks of data until the data runs out. Regards St?fan From dwf at cs.toronto.edu Sat May 23 07:47:46 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Sat, 23 May 2009 07:47:46 -0400 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: References: <000e0cd2a07010d73d046a71a597@google.com> <200905221000.56593.faltet@pytables.org> Message-ID: On 23-May-09, at 5:36 AM, Albert Thuswaldner wrote: > So i guess in the long term i have to also add pickling support. In > the short term i will add warnings for the data types that are not > supported. In order to ensure optimal division of labour, I'd suggest simply basing your pickling support on hdf5pickle, and including it as an optional dependency, that you detect at runtime (just put the import in a try block and catch the ImportError). If you have hdf5pickle installed, pyhdf5io will pickle any objects you try to use save() with, etc. Otherwise it will just work the way it does now. I think that satisfies the goals of your project as being a thin wrapper that provides a simple interface, rather than reinventing the wheel by re-implementing hdf5 pickling. It also means that there aren't two, maybe-incompatible ways to pickle an object in HDF5 -- just one (even if you write your implementation to be compatible with Pauli's, there's opportunity for the codebases to diverge over time). David From dwf at cs.toronto.edu Sat May 23 08:00:27 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Sat, 23 May 2009 08:00:27 -0400 Subject: [Numpy-discussion] Ticket #1113, change title? Message-ID: <98BDD1CA-65DE-4300-98AE-54F2D9A6EDFA@cs.toronto.edu> Can someone with the requisite permissions change the title of ticket #1113 to reflect the fact that it affects both ppc and ppc64? Alternately, if you know why the bug is happening, you could file a patch ;) David From david at ar.media.kyoto-u.ac.jp Sat May 23 08:54:41 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Sat, 23 May 2009 21:54:41 +0900 Subject: [Numpy-discussion] Ticket #1113, change title? In-Reply-To: <98BDD1CA-65DE-4300-98AE-54F2D9A6EDFA@cs.toronto.edu> References: <98BDD1CA-65DE-4300-98AE-54F2D9A6EDFA@cs.toronto.edu> Message-ID: <4A17F211.9030209@ar.media.kyoto-u.ac.jp> David Warde-Farley wrote: > Can someone with the requisite permissions change the title of ticket > #1113 to reflect the fact that it affects both ppc and ppc64? > Done. > Alternately, if you know why the bug is happening, you could file a > patch ;) > I have not looked at the code, but if the precision is indeed single precision, a tolerance of 1e-15 may not make much sense (single precision has 7 significant digits in normal representation) cheers, David From dwf at cs.toronto.edu Sat May 23 09:28:08 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Sat, 23 May 2009 09:28:08 -0400 Subject: [Numpy-discussion] Ticket #1113, change title? In-Reply-To: <4A17F211.9030209@ar.media.kyoto-u.ac.jp> References: <98BDD1CA-65DE-4300-98AE-54F2D9A6EDFA@cs.toronto.edu> <4A17F211.9030209@ar.media.kyoto-u.ac.jp> Message-ID: <39664C87-125A-40C3-8627-24812DE2F92B@cs.toronto.edu> On 23-May-09, at 8:54 AM, David Cournapeau wrote: > I have not looked at the code, but if the precision is indeed single > precision, a tolerance of 1e-15 may not make much sense (single > precision has 7 significant digits in normal representation) Yes, I was wondering about that too, though notably the tests pass on x86, and in fact the result on ppc was nowhere near 0 when I checked it. David From mark.wendell at gmail.com Sat May 23 11:14:04 2009 From: mark.wendell at gmail.com (Mark Wendell) Date: Sat, 23 May 2009 09:14:04 -0600 Subject: [Numpy-discussion] numpy.choose question Message-ID: I have a question about the numpy.choose method. I'm working with rgb image arrays (converted using PIL's 'asarray'), and would like to combine data from multiple images. The choose method seemed just the thing for what i want to do: creating new images from multiple source images and an index mask. However, it looks like the choose method chokes on anything but single-value data arrays. As soon as I give it source arrays made up of RGB int triplets (e.g., [0,128,255]), it complains with a "ValueError: too many dimensions" error. My hope was that the choose method would return an array made up of the same kind of elements that the source arrays are made up of (in this case, RGB triplets), but it seems to only like scalars. I guess it sees my image arrays at 3D arrays, and doesn't know what to do with them. Is that right? I suppose I could pre-split my images into separate R, G, and B arrays, run them through np.choose, and then recombine the results, but that seems clunky. Any advice welcome. Thanks Mark -- -- Mark Wendell From pav at iki.fi Sat May 23 14:33:37 2009 From: pav at iki.fi (Pauli Virtanen) Date: Sat, 23 May 2009 18:33:37 +0000 (UTC) Subject: [Numpy-discussion] Ticket #1113, change title? References: <98BDD1CA-65DE-4300-98AE-54F2D9A6EDFA@cs.toronto.edu> <4A17F211.9030209@ar.media.kyoto-u.ac.jp> <39664C87-125A-40C3-8627-24812DE2F92B@cs.toronto.edu> Message-ID: Sat, 23 May 2009 09:28:08 -0400, David Warde-Farley wrote: > On 23-May-09, at 8:54 AM, David Cournapeau wrote: > >> I have not looked at the code, but if the precision is indeed single >> precision, a tolerance of 1e-15 may not make much sense (single >> precision has 7 significant digits in normal representation) Yes, it should be `max(5*eps, 1e-15)`, and not 1e-15. It just happens that on x86 the code computes the correct value down to machine precision for complex64. Actually, I'm a bit perplexed about why this doesn't upcast: >>> p = np.complex128(9.999999999333333333e-6 + 1.000000000066666666e-5j) >>> np.arctanh(np.array([1e-5 + 1e-5j], dtype=np.complex64))/p array([ 1.+0.j], dtype=complex64) > Yes, I was wondering about that too, though notably the tests pass on > x86, and in fact the result on ppc was nowhere near 0 when I checked it. What do you mean by "nowhere near"? What does the following output for you: >>> np.arctanh(np.array([1e-5 + 1e-5j], np.complex64)) array([ 9.99999975e-06 +9.99999975e-06j], dtype=complex64) -- Pauli Virtanen From dwf at cs.toronto.edu Sat May 23 15:37:19 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Sat, 23 May 2009 15:37:19 -0400 Subject: [Numpy-discussion] Ticket #1113, change title? In-Reply-To: References: <98BDD1CA-65DE-4300-98AE-54F2D9A6EDFA@cs.toronto.edu> <4A17F211.9030209@ar.media.kyoto-u.ac.jp> <39664C87-125A-40C3-8627-24812DE2F92B@cs.toronto.edu> Message-ID: <249F193A-0369-4BA5-A899-B6007F5BF6C6@cs.toronto.edu> On 23-May-09, at 2:33 PM, Pauli Virtanen wrote: >> Yes, I was wondering about that too, though notably the tests pass on >> x86, and in fact the result on ppc was nowhere near 0 when I >> checked it. > > What do you mean by "nowhere near"? What does the following output for > you: > >>>> np.arctanh(np.array([1e-5 + 1e-5j], np.complex64)) > array([ 9.99999975e-06 +9.99999975e-06j], dtype=complex64) I must've been hallucinating: this is on ppc64. I remembered it being closer to 1e-1, >>> dtype = np.complex64 >>> z = np.array([1e-5*(1+1j)], dtype=dtype) >>> p = 9.999999999333333333e-6 + 1.000000000066666666e-5j >>> d = np.absolute(1-np.arctanh(z)/p) >>> d array([ 2.75662915e-09], dtype=float32) >>> np.finfo('complex64').eps 1.1920929e-07 >>> _ * 5 5.9604644775390625e-07 So I guess it is pretty small, and small enough to pass the revised test condition you gave above. David From Chris.Barker at noaa.gov Sat May 23 15:51:24 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Sat, 23 May 2009 12:51:24 -0700 Subject: [Numpy-discussion] numpy.choose question In-Reply-To: References: Message-ID: <4A1853BC.1000105@noaa.gov> Mark Wendell wrote: > I'm working with rgb image arrays (converted using PIL's 'asarray'), > and would like to combine data from multiple images. The choose method > seemed just the thing for what i want to do: creating new images from > multiple source images and an index mask. However, it looks like the > choose method chokes on anything but single-value data arrays. As soon > as I give it source arrays made up of RGB int triplets (e.g., > [0,128,255]), it complains with a "ValueError: too many dimensions" > error. you might try making a 2d array of a a datatype that fits the rgb triples: >>> rgb = np.dtype([('r', np.uint8),('g', np.uint8),('b',np.uint8)]) >>> rgb dtype([('r', '|u1'), ('g', '|u1'), ('b', '|u1')]) >>> image = np.ones((5,6,3),dtype=np.uint8) (that's a 5x3 rgb image, as PIL should have created it...) >>> image2 = image.view(dtype=rgb).reshape((5,6)) >>> image2.shape (5, 6) now it's a 2-d array, and you may be able to use choose, like you expected. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception From Chris.Barker at noaa.gov Sat May 23 16:02:04 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Sat, 23 May 2009 13:02:04 -0700 Subject: [Numpy-discussion] Inconsistent error messages. In-Reply-To: <9457e7c80905230249k39cd9965q784b80b5afa0f7dc@mail.gmail.com> References: <9457e7c80905230249k39cd9965q784b80b5afa0f7dc@mail.gmail.com> Message-ID: <4A18563C.3010705@noaa.gov> St?fan van der Walt wrote: > 2009/5/23 Charles R Harris : >> In [9]: fromfile("empty.dat", count=10) > Instead of throwing errors in these scenarios, we could just return > the elements read and raise a warning? This is consistent with most > other file APIs I know and allows you to read blocks of data until the > data runs out. This was just discussed a week or two ago (look for messaged by me and Robert Kern) fromfile needs some attention in general, but I think Robert an I (the only two that I know were paying attention to that discussion) agreed that there should be an API that says: read up to n items from the file Robert thought that should be the default, but I think that means everyone would be forced to check how many items they got every time they read, which is too much code and likely to be forgotten and lead to errors. So I think that an exception should be raised if you ask for n and get less than n, but that there should be a flag that says something like "max_items=n", that indicates that you'll take what you get. I don't like a warning -- it's unconventional to catch those like exceptions -- if you want n items, and you haven't written code to handle less than n, you should get an exception. If you have written code to handle that, that you can use the flag. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception From albert.thuswaldner at gmail.com Sat May 23 16:25:37 2009 From: albert.thuswaldner at gmail.com (Albert Thuswaldner) Date: Sat, 23 May 2009 22:25:37 +0200 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: References: <000e0cd2a07010d73d046a71a597@google.com> <200905221000.56593.faltet@pytables.org> Message-ID: Actually my vision with pyhdf5io is to have hdf5 to replace numpy's own binary file format (.npy, npz). Pyhdf5io (or an incarnation of it) should be the standard (binary) way to store data in scipy/numpy. A bold statement, I know, but I think that it would be an improvement, especially for those users how are replacing Matlab with sicpy/numpy. I don't know if this vision of mine is possible to realize, or even something that is shared by anyone else in the community. So list what are your thoughts? Are there some people out there, who like the idea and want to cooperate in realizing it? /Albert On Sat, May 23, 2009 at 13:47, David Warde-Farley wrote: > On 23-May-09, at 5:36 AM, Albert Thuswaldner wrote: > >> So i guess in the long term i have to also add pickling support. In >> the short term i will add warnings for the data types that are not >> supported. > > In order to ensure optimal division of labour, I'd suggest simply > basing your pickling support on hdf5pickle, and including it as an > optional dependency, that you detect at runtime ?(just put the import > in a try block and catch the ImportError). If you have hdf5pickle > installed, pyhdf5io will pickle any objects you try to use save() > with, etc. Otherwise it will just work the way it does now. > > I think that satisfies the goals of your project as being a thin > wrapper that provides a simple interface, rather than reinventing the > wheel by re-implementing hdf5 pickling. It also means that there > aren't two, maybe-incompatible ways to pickle an object in HDF5 -- > just one (even if you write your implementation to be compatible with > Pauli's, there's opportunity for the codebases to diverge over time). > > David > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From charlesr.harris at gmail.com Sat May 23 16:27:12 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 23 May 2009 14:27:12 -0600 Subject: [Numpy-discussion] Inconsistent error messages. In-Reply-To: <4A18563C.3010705@noaa.gov> References: <9457e7c80905230249k39cd9965q784b80b5afa0f7dc@mail.gmail.com> <4A18563C.3010705@noaa.gov> Message-ID: On Sat, May 23, 2009 at 2:02 PM, Christopher Barker wrote: > St?fan van der Walt wrote: > > 2009/5/23 Charles R Harris : > >> In [9]: fromfile("empty.dat", count=10) > > > Instead of throwing errors in these scenarios, we could just return > > the elements read and raise a warning? This is consistent with most > > other file APIs I know and allows you to read blocks of data until the > > data runs out. > > This was just discussed a week or two ago (look for messaged by me and > Robert Kern) > > fromfile needs some attention in general, but I think Robert an I (the > only two that I know were paying attention to that discussion) agreed > that there should be an API that says: > > read up to n items from the file > > Robert thought that should be the default, but I think that means > everyone would be forced to check how many items they got every time > they read, which is too much code and likely to be forgotten and lead to > errors. So I think that an exception should be raised if you ask for n > and get less than n, but that there should be a flag that says something > like "max_items=n", that indicates that you'll take what you get. > > I don't like a warning -- it's unconventional to catch those like > exceptions -- if you want n items, and you haven't written code to > handle less than n, you should get an exception. If you have written > code to handle that, that you can use the flag. > I don't like the idea of a warning here either. How about adding a keyword 'strict' so that strict=1 means an error is raised if the count isn't reached, and strict=0 means any count is acceptable? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From charlesr.harris at gmail.com Sat May 23 16:29:24 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 23 May 2009 14:29:24 -0600 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: References: <000e0cd2a07010d73d046a71a597@google.com> <200905221000.56593.faltet@pytables.org> Message-ID: On Sat, May 23, 2009 at 2:25 PM, Albert Thuswaldner < albert.thuswaldner at gmail.com> wrote: > Actually my vision with pyhdf5io is to have hdf5 to replace numpy's > own binary file format (.npy, npz). Pyhdf5io (or an incarnation of it) > should be the standard (binary) way to store data in scipy/numpy. A > bold statement, I know, but I think that it would be an improvement, > especially for those users how are replacing Matlab with sicpy/numpy. > > I don't know if this vision of mine is possible to realize, or even > something that is shared by anyone else in the community. So list what > are your thoughts? Are there some people out there, who like the idea > and want to cooperate in realizing it? > I rather like the idea, but having a dependency on hdf5/pytables might be a bit much. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From dwf at cs.toronto.edu Sat May 23 16:41:40 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Sat, 23 May 2009 16:41:40 -0400 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: References: <000e0cd2a07010d73d046a71a597@google.com> <200905221000.56593.faltet@pytables.org> Message-ID: On 23-May-09, at 4:25 PM, Albert Thuswaldner wrote: > Actually my vision with pyhdf5io is to have hdf5 to replace numpy's > own binary file format (.npy, npz). Pyhdf5io (or an incarnation of it) > should be the standard (binary) way to store data in scipy/numpy. A > bold statement, I know, but I think that it would be an improvement, > especially for those users how are replacing Matlab with sicpy/numpy. In that it introduces a dependency on pytables (and the hdf5 C library) I doubt it would be something the numpy core developers would be eager to adopt. The npy and npz formats (as best I can gather) exist so that there is _some_ way of persisting data to disk that ships with numpy. It's not meant necessarily as the best way, or as an interchange format, just as something that works "out of the box", the code for which is completely contained within numpy. It might be worth mentioning the limitations of numpy's built-in save(), savez() and load() in the docstrings and recommending more portable alternatives, though. David From robert.kern at gmail.com Sat May 23 16:59:00 2009 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 23 May 2009 15:59:00 -0500 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: References: <000e0cd2a07010d73d046a71a597@google.com> <200905221000.56593.faltet@pytables.org> Message-ID: <3d375d730905231359g4595bf65mb54d551d9c8a1c86@mail.gmail.com> On Sat, May 23, 2009 at 06:47, David Warde-Farley wrote: > On 23-May-09, at 5:36 AM, Albert Thuswaldner wrote: > >> So i guess in the long term i have to also add pickling support. In >> the short term i will add warnings for the data types that are not >> supported. > > In order to ensure optimal division of labour, I'd suggest simply > basing your pickling support on hdf5pickle, and including it as an > optional dependency, that you detect at runtime ?(just put the import > in a try block and catch the ImportError). If you have hdf5pickle > installed, pyhdf5io will pickle any objects you try to use save() > with, etc. Otherwise it will just work the way it does now. That would cause difficulties. Now the format of your data depends on whether or not you have a package installed. That's not a very good level of control. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Sat May 23 17:02:48 2009 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 23 May 2009 16:02:48 -0500 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: References: <000e0cd2a07010d73d046a71a597@google.com> <200905221000.56593.faltet@pytables.org> Message-ID: <3d375d730905231402l6799c9ffjf5131995f11b85aa@mail.gmail.com> On Sat, May 23, 2009 at 15:41, David Warde-Farley wrote: > On 23-May-09, at 4:25 PM, Albert Thuswaldner wrote: > >> Actually my vision with pyhdf5io is to have hdf5 to replace numpy's >> own binary file format (.npy, npz). Pyhdf5io (or an incarnation of it) >> should be the standard (binary) way to store data in scipy/numpy. A >> bold statement, I know, but I think that it would be an improvement, >> especially for those users how are replacing Matlab with sicpy/numpy. > > In that it introduces a dependency on pytables (and the hdf5 C > library) I doubt it would be something the numpy core developers would > be eager to adopt. > > The npy and npz formats (as best I can gather) exist so that there is > _some_ way of persisting data to disk that ships with numpy. It's not > meant necessarily as the best way, or as an interchange format, just > as something that works "out of the box", the code for which is > completely contained within numpy. Yes. The full set of use cases and design constraints are considered here: http://svn.scipy.org/svn/numpy/trunk/doc/neps/npy-format.txt -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Sat May 23 17:09:25 2009 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 23 May 2009 16:09:25 -0500 Subject: [Numpy-discussion] numpy.choose question In-Reply-To: References: Message-ID: <3d375d730905231409h634e4ed2ue6f498ae77f62ea4@mail.gmail.com> On Sat, May 23, 2009 at 10:14, Mark Wendell wrote: > I have a question about the numpy.choose method. > > I'm working with rgb image arrays (converted using PIL's 'asarray'), > and would like to combine data from multiple images. The choose method > seemed just the thing for what i want to do: creating new images from > multiple source images and an index mask. However, it looks like the > choose method chokes on anything but single-value data arrays. As soon > as I give it source arrays made up of RGB int triplets (e.g., > [0,128,255]), it complains with a "ValueError: too many dimensions" > error. > > My hope was that the choose method would return an array made up of > the same kind of elements that the source arrays are made up of (in > this case, RGB triplets), but it seems to only like scalars. I guess > it sees my image arrays at 3D arrays, and doesn't know what to do with > them. Is that right? Yes. They are 3D arrays. Try not to think of them as 2D arrays of RGB triplets. You can check these things by looking at the .shape attribute of the array. > I suppose I could pre-split my images into > separate R, G, and B arrays, run them through np.choose, and then > recombine the results, but that seems clunky. Make your index array the same shape as the image array. dstack([indices] * 3) should do the trick. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From dwf at cs.toronto.edu Sat May 23 17:47:16 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Sat, 23 May 2009 17:47:16 -0400 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: <3d375d730905231359g4595bf65mb54d551d9c8a1c86@mail.gmail.com> References: <000e0cd2a07010d73d046a71a597@google.com> <200905221000.56593.faltet@pytables.org> <3d375d730905231359g4595bf65mb54d551d9c8a1c86@mail.gmail.com> Message-ID: On 23-May-09, at 4:59 PM, Robert Kern wrote: >> Otherwise it will just work the way it does now. > > That would cause difficulties. Now the format of your data depends on > whether or not you have a package installed. That's not a very good > level of control. Sorry, I wasn't clear. What I meant was, if hdf5pickle isn't detected you could just refuse to save anything that's not a numpy array. David From robert.kern at gmail.com Sat May 23 17:56:10 2009 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 23 May 2009 16:56:10 -0500 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: References: <000e0cd2a07010d73d046a71a597@google.com> <200905221000.56593.faltet@pytables.org> <3d375d730905231359g4595bf65mb54d551d9c8a1c86@mail.gmail.com> Message-ID: <3d375d730905231456p25eb1a15jc4022ad0ba7e54ed@mail.gmail.com> On Sat, May 23, 2009 at 16:47, David Warde-Farley wrote: > > On 23-May-09, at 4:59 PM, Robert Kern wrote: > >>> ?Otherwise it will just work the way it does now. >> >> That would cause difficulties. Now the format of your data depends on >> whether or not you have a package installed. That's not a very good >> level of control. > > Sorry, I wasn't clear. What I meant was, if hdf5pickle isn't detected > you could just refuse to save anything that's not a numpy array. Ah, good. That makes much more sense. :-) -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From stefan at sun.ac.za Sat May 23 18:56:36 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sun, 24 May 2009 00:56:36 +0200 Subject: [Numpy-discussion] Inconsistent error messages. In-Reply-To: References: <9457e7c80905230249k39cd9965q784b80b5afa0f7dc@mail.gmail.com> <4A18563C.3010705@noaa.gov> Message-ID: <9457e7c80905231556o45eba72eu39b555f687ed2fb5@mail.gmail.com> Hi Chris and Charles 2009/5/23 Charles R Harris : >> Robert thought that should be the default, but I think that means >> everyone would be forced to check how many items they got every time >> they read, which is too much code and likely to be forgotten and lead to >> errors. So I think that an exception should be raised if you ask for n >> and get less than n, but that there should be a flag that says something >> like "max_items=n", that indicates that you'll take what you get. >> >> I don't like a warning -- it's unconventional to catch those like >> exceptions -- if you want n items, and you haven't written code to >> handle less than n, you should get an exception. If you have written >> code to handle that, that you can use the flag. > > I don't like the idea of a warning here either. How about adding a keyword > 'strict' so that strict=1 means an error is raised if the count isn't > reached, and strict=0 means any count is acceptable? The reason I much prefer a warning is that you always get data back, whether things went wrong or not. If you throw an error, then you can't get hold of the last read blocks at all. I guess a strict flag is OK, but why, if you've got a warning in place? Warnings are easy to catch (and this can be documented in fromfile's docstring): warnings.simplefilter('error', np.lib.IOWarning) In Python 2.6 you can use "catch_warnings": with warnings.catch_warnings(True) as w: np.fromfile(...) if w: print "Error trying to load file" Regards St?fan From charlesr.harris at gmail.com Sat May 23 19:43:13 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 23 May 2009 17:43:13 -0600 Subject: [Numpy-discussion] Inconsistent error messages. In-Reply-To: <9457e7c80905231556o45eba72eu39b555f687ed2fb5@mail.gmail.com> References: <9457e7c80905230249k39cd9965q784b80b5afa0f7dc@mail.gmail.com> <4A18563C.3010705@noaa.gov> <9457e7c80905231556o45eba72eu39b555f687ed2fb5@mail.gmail.com> Message-ID: 2009/5/23 St?fan van der Walt > Hi Chris and Charles > > 2009/5/23 Charles R Harris : > >> Robert thought that should be the default, but I think that means > >> everyone would be forced to check how many items they got every time > >> they read, which is too much code and likely to be forgotten and lead to > >> errors. So I think that an exception should be raised if you ask for n > >> and get less than n, but that there should be a flag that says something > >> like "max_items=n", that indicates that you'll take what you get. > >> > >> I don't like a warning -- it's unconventional to catch those like > >> exceptions -- if you want n items, and you haven't written code to > >> handle less than n, you should get an exception. If you have written > >> code to handle that, that you can use the flag. > > > > I don't like the idea of a warning here either. How about adding a > keyword > > 'strict' so that strict=1 means an error is raised if the count isn't > > reached, and strict=0 means any count is acceptable? > > The reason I much prefer a warning is that you always get data back, > whether things went wrong or not. If you throw an error, then you > can't get hold of the last read blocks at all. > > I guess a strict flag is OK, but why, if you've got a warning in > place? Warnings are easy to catch (and this can be documented in > fromfile's docstring): > > warnings.simplefilter('error', np.lib.IOWarning) > > In Python 2.6 you can use "catch_warnings": > > with warnings.catch_warnings(True) as w: > np.fromfile(...) > if w: print "Error trying to load file" > Warnings used to warn once and then never again. I once hacked the module to make it work right but I don't know if it has been officially fixed. Do you know if it's been fixed? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sat May 23 19:51:48 2009 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 23 May 2009 18:51:48 -0500 Subject: [Numpy-discussion] Inconsistent error messages. In-Reply-To: References: <9457e7c80905230249k39cd9965q784b80b5afa0f7dc@mail.gmail.com> <4A18563C.3010705@noaa.gov> <9457e7c80905231556o45eba72eu39b555f687ed2fb5@mail.gmail.com> Message-ID: <3d375d730905231651j2804850bj1849cb0f2b583212@mail.gmail.com> On Sat, May 23, 2009 at 18:43, Charles R Harris wrote: > Warnings used to warn once and then never again. I once hacked the module to > make it work right but I don't know if it has been officially fixed. Do you > know if it's been fixed? Warning once per location then never again is, and always has been, the documented default behavior. Are you referring to a more particular bug, like not being able to reset this, or that the location checking failed? What did you consider to be "working" behavior? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From charlesr.harris at gmail.com Sat May 23 19:57:11 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 23 May 2009 17:57:11 -0600 Subject: [Numpy-discussion] Inconsistent error messages. In-Reply-To: <3d375d730905231651j2804850bj1849cb0f2b583212@mail.gmail.com> References: <9457e7c80905230249k39cd9965q784b80b5afa0f7dc@mail.gmail.com> <4A18563C.3010705@noaa.gov> <9457e7c80905231556o45eba72eu39b555f687ed2fb5@mail.gmail.com> <3d375d730905231651j2804850bj1849cb0f2b583212@mail.gmail.com> Message-ID: On Sat, May 23, 2009 at 5:51 PM, Robert Kern wrote: > On Sat, May 23, 2009 at 18:43, Charles R Harris > wrote: > > Warnings used to warn once and then never again. I once hacked the module > to > > make it work right but I don't know if it has been officially fixed. Do > you > > know if it's been fixed? > > Warning once per location then never again is, and always has been, > the documented default behavior. Are you referring to a more > particular bug, like not being able to reset this, or that the > location checking failed? What did you consider to be "working" > behavior? > You were supposed to be able to change the default behaviour, but it didn't used to work. I think if you are going to use a warning as a flag then it has to always be raised when a failure occurs, not just the first time. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sat May 23 19:59:12 2009 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 23 May 2009 18:59:12 -0500 Subject: [Numpy-discussion] Inconsistent error messages. In-Reply-To: References: <9457e7c80905230249k39cd9965q784b80b5afa0f7dc@mail.gmail.com> <4A18563C.3010705@noaa.gov> <9457e7c80905231556o45eba72eu39b555f687ed2fb5@mail.gmail.com> <3d375d730905231651j2804850bj1849cb0f2b583212@mail.gmail.com> Message-ID: <3d375d730905231659r41798868wa82fb8fe6980478b@mail.gmail.com> On Sat, May 23, 2009 at 18:57, Charles R Harris wrote: > > On Sat, May 23, 2009 at 5:51 PM, Robert Kern wrote: >> >> On Sat, May 23, 2009 at 18:43, Charles R Harris >> wrote: >> > Warnings used to warn once and then never again. I once hacked the >> > module to >> > make it work right but I don't know if it has been officially fixed. Do >> > you >> > know if it's been fixed? >> >> Warning once per location then never again is, and always has been, >> the documented default behavior. Are you referring to a more >> particular bug, like not being able to reset this, or that the >> location checking failed? What did you consider to be "working" >> behavior? > > You were supposed to be able to change the default behaviour, but it didn't > used to work. I think if you are going to use a warning as a flag then it > has to always be raised when a failure occurs, not just the first time. Did you ever report this bug? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Sat May 23 20:03:14 2009 From: robert.kern at gmail.com (Robert Kern) Date: Sat, 23 May 2009 19:03:14 -0500 Subject: [Numpy-discussion] Inconsistent error messages. In-Reply-To: References: <9457e7c80905230249k39cd9965q784b80b5afa0f7dc@mail.gmail.com> <4A18563C.3010705@noaa.gov> <9457e7c80905231556o45eba72eu39b555f687ed2fb5@mail.gmail.com> <3d375d730905231651j2804850bj1849cb0f2b583212@mail.gmail.com> Message-ID: <3d375d730905231703r3eb6c0baxbad49df2e9dd7caa@mail.gmail.com> On Sat, May 23, 2009 at 18:57, Charles R Harris wrote: > You were supposed to be able to change the default behaviour, but it didn't > used to work. I think if you are going to use a warning as a flag then it > has to always be raised when a failure occurs, not just the first time. A brief test suggest that in Python 2.5.4, at least, as long as you set the action to be 'always' before the warning is first issued, it works. We can do this just after the IOWarning (or whatever) gets defined. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From charlesr.harris at gmail.com Sat May 23 20:25:09 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 23 May 2009 18:25:09 -0600 Subject: [Numpy-discussion] Inconsistent error messages. In-Reply-To: <3d375d730905231703r3eb6c0baxbad49df2e9dd7caa@mail.gmail.com> References: <9457e7c80905230249k39cd9965q784b80b5afa0f7dc@mail.gmail.com> <4A18563C.3010705@noaa.gov> <9457e7c80905231556o45eba72eu39b555f687ed2fb5@mail.gmail.com> <3d375d730905231651j2804850bj1849cb0f2b583212@mail.gmail.com> <3d375d730905231703r3eb6c0baxbad49df2e9dd7caa@mail.gmail.com> Message-ID: On Sat, May 23, 2009 at 6:03 PM, Robert Kern wrote: > On Sat, May 23, 2009 at 18:57, Charles R Harris > wrote: > > You were supposed to be able to change the default behaviour, but it > didn't > > used to work. I think if you are going to use a warning as a flag then it > > has to always be raised when a failure occurs, not just the first time. > > A brief test suggest that in Python 2.5.4, at least, as long as you > set the action to be 'always' before the warning is first issued, it > works. We can do this just after the IOWarning (or whatever) gets > defined. > OK, that would work. Although I think a named argument might be a more transparent way to specify behaviour than setting the warnings. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From efiring at hawaii.edu Sat May 23 20:37:25 2009 From: efiring at hawaii.edu (Eric Firing) Date: Sat, 23 May 2009 14:37:25 -1000 Subject: [Numpy-discussion] Inconsistent error messages. In-Reply-To: References: <9457e7c80905230249k39cd9965q784b80b5afa0f7dc@mail.gmail.com> <4A18563C.3010705@noaa.gov> <9457e7c80905231556o45eba72eu39b555f687ed2fb5@mail.gmail.com> <3d375d730905231651j2804850bj1849cb0f2b583212@mail.gmail.com> <3d375d730905231703r3eb6c0baxbad49df2e9dd7caa@mail.gmail.com> Message-ID: <4A1896C5.2080408@hawaii.edu> Charles R Harris wrote: > > > On Sat, May 23, 2009 at 6:03 PM, Robert Kern > wrote: > > On Sat, May 23, 2009 at 18:57, Charles R Harris > > wrote: > > You were supposed to be able to change the default behaviour, but > it didn't > > used to work. I think if you are going to use a warning as a flag > then it > > has to always be raised when a failure occurs, not just the first > time. > > A brief test suggest that in Python 2.5.4, at least, as long as you > set the action to be 'always' before the warning is first issued, it > works. We can do this just after the IOWarning (or whatever) gets > defined. > > > OK, that would work. Although I think a named argument might be a more > transparent way to specify behaviour than setting the warnings. I agree; using a warning strikes me as an abuse of the warnings mechanism. Instead of a "strict" flag, which I find not particularly expressive--what is it being "strict" about?--how about a "min_count" kwarg to go with the existing "count" kwarg? min_count=None # default; raise ValueError instead of the present warning if fewer than count are found. min_count=0 # Accept whatever you get; no warning, no error. min_count=N # raise ValueError if fewer than N are found. This is more flexible than using a "strict" flag, and the kwarg is more descriptive. Eric > > Chuck > > > > ------------------------------------------------------------------------ > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From stefan at sun.ac.za Sat May 23 21:06:28 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sun, 24 May 2009 03:06:28 +0200 Subject: [Numpy-discussion] Inconsistent error messages. In-Reply-To: <4A1896C5.2080408@hawaii.edu> References: <4A18563C.3010705@noaa.gov> <9457e7c80905231556o45eba72eu39b555f687ed2fb5@mail.gmail.com> <3d375d730905231651j2804850bj1849cb0f2b583212@mail.gmail.com> <3d375d730905231703r3eb6c0baxbad49df2e9dd7caa@mail.gmail.com> <4A1896C5.2080408@hawaii.edu> Message-ID: <9457e7c80905231806vb6b887aia582a3a7db8dc23a@mail.gmail.com> 2009/5/24 Eric Firing : >> OK, that would work. Although I think a named argument might be a more >> transparent way to specify behaviour than setting the warnings. > > I agree; using a warning strikes me as an abuse of the warnings > mechanism. ?Instead of a "strict" flag, which I find not particularly > expressive--what is it being "strict" about?--how about a "min_count" > kwarg to go with the existing "count" kwarg? Warnings are a great way of telling the user that a non-fatal problem cropped up. It isn't easy to send information along with an exception, so I don't think raising an error here is ever particularly useful. Maybe we should provide tools in NumPy to handle warnings more easily? Something like with no_warnings: np.fromfile('x') or with raise_warnings: np.fromfile('x') ? Regards St?fan From charlesr.harris at gmail.com Sat May 23 21:06:59 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sat, 23 May 2009 19:06:59 -0600 Subject: [Numpy-discussion] Inconsistent error messages. In-Reply-To: <4A1896C5.2080408@hawaii.edu> References: <4A18563C.3010705@noaa.gov> <9457e7c80905231556o45eba72eu39b555f687ed2fb5@mail.gmail.com> <3d375d730905231651j2804850bj1849cb0f2b583212@mail.gmail.com> <3d375d730905231703r3eb6c0baxbad49df2e9dd7caa@mail.gmail.com> <4A1896C5.2080408@hawaii.edu> Message-ID: On Sat, May 23, 2009 at 6:37 PM, Eric Firing wrote: > Charles R Harris wrote: > > > > > > On Sat, May 23, 2009 at 6:03 PM, Robert Kern > > wrote: > > > > On Sat, May 23, 2009 at 18:57, Charles R Harris > > > > wrote: > > > You were supposed to be able to change the default behaviour, but > > it didn't > > > used to work. I think if you are going to use a warning as a flag > > then it > > > has to always be raised when a failure occurs, not just the first > > time. > > > > A brief test suggest that in Python 2.5.4, at least, as long as you > > set the action to be 'always' before the warning is first issued, it > > works. We can do this just after the IOWarning (or whatever) gets > > defined. > > > > > > OK, that would work. Although I think a named argument might be a more > > transparent way to specify behaviour than setting the warnings. > > I agree; using a warning strikes me as an abuse of the warnings > mechanism. Instead of a "strict" flag, which I find not particularly > expressive--what is it being "strict" about?--how about a "min_count" > kwarg to go with the existing "count" kwarg? > I didn't like the fact that it overlaps with count. Although I suppose it could be the minimum and count the maximum if we enforce min_count <= count. But that still seems a bit clumsy. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From albert.thuswaldner at gmail.com Sun May 24 04:53:50 2009 From: albert.thuswaldner at gmail.com (Albert Thuswaldner) Date: Sun, 24 May 2009 10:53:50 +0200 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: <3d375d730905231456p25eb1a15jc4022ad0ba7e54ed@mail.gmail.com> References: <000e0cd2a07010d73d046a71a597@google.com> <200905221000.56593.faltet@pytables.org> <3d375d730905231359g4595bf65mb54d551d9c8a1c86@mail.gmail.com> <3d375d730905231456p25eb1a15jc4022ad0ba7e54ed@mail.gmail.com> Message-ID: Ok, I understand that my thought on making hdf5 the standard save/load format for numpy was a bit naive. If it would have been easy it would already have been done. Thanks for the insights Robert. Well anyhow, I will continue with my little module and see where it goes. I will start a new thread in the pyTables list to discuss the steps needed to be taken to add pyhdf5io to the pyTables project. Thanks to everyone who took part in this discussion. /Albert On Sat, May 23, 2009 at 23:56, Robert Kern wrote: > On Sat, May 23, 2009 at 16:47, David Warde-Farley wrote: >> >> On 23-May-09, at 4:59 PM, Robert Kern wrote: >> >>>> ?Otherwise it will just work the way it does now. >>> >>> That would cause difficulties. Now the format of your data depends on >>> whether or not you have a package installed. That's not a very good >>> level of control. >> >> Sorry, I wasn't clear. What I meant was, if hdf5pickle isn't detected >> you could just refuse to save anything that's not a numpy array. > > Ah, good. That makes much more sense. :-) > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > ?-- Umberto Eco > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From mail at stevesimmons.com Sun May 24 08:23:22 2009 From: mail at stevesimmons.com (Stephen Simmons) Date: Sun, 24 May 2009 14:23:22 +0200 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: References: <000e0cd2a07010d73d046a71a597@google.com> <200905221000.56593.faltet@pytables.org> Message-ID: <4A193C3A.1080309@stevesimmons.com> David Warde-Farley wrote: > On 23-May-09, at 4:25 PM, Albert Thuswaldner wrote: >> Actually my vision with pyhdf5io is to have hdf5 to replace numpy's >> own binary file format (.npy, npz). Pyhdf5io (or an incarnation of it) >> should be the standard (binary) way to store data in scipy/numpy. A >> bold statement, I know, but I think that it would be an improvement, >> especially for those users how are replacing Matlab with sicpy/numpy. >> > In that it introduces a dependency on pytables (and the hdf5 C > library) I doubt it would be something the numpy core developers would > be eager to adopt. > > The npy and npz formats (as best I can gather) exist so that there is > _some_ way of persisting data to disk that ships with numpy. It's not > meant necessarily as the best way, or as an interchange format, just > as something that works "out of the box", the code for which is > completely contained within numpy. > > It might be worth mentioning the limitations of numpy's built-in > save(), savez() and load() in the docstrings and recommending more > portable alternatives, though. > > David > I tend to agree with David that PyTables is too big a dependency for inclusion in core Numpy. It does a lot more than simply loading and saving arrays. While I haven't tried Andrew Collette's h5py (http://code.google.com/p/h5py), it looks like a very 'thin' wrapper around the HDF5 C libraries. Maybe numpy's save(), savez(), load(), memmap() could be enhanced so that saving/loading files with HDF5-like file extensions used the HDF5 format, with code based on h5py and pyhdf5io. This could, I imagine, be a relatively small/simple addition to numpy, with the only external dependency being the HDF5 libraries themselves. Stephen From tpk at kraussfamily.org Sun May 24 08:32:22 2009 From: tpk at kraussfamily.org (Tom K.) Date: Sun, 24 May 2009 05:32:22 -0700 (PDT) Subject: [Numpy-discussion] matrix default to column vector? In-Reply-To: <75c31b2a0905210610x1a321264r5b6f93d327ef2b36@mail.gmail.com> References: <75c31b2a0905210610x1a321264r5b6f93d327ef2b36@mail.gmail.com> Message-ID: <23693116.post@talk.nabble.com> Jason Rennie-2 wrote: > > By default, it looks like a 1-dim ndarray gets converted to a row vector > by > the matrix constructor. This seems to lead to some odd behavior such as > a[1] yielding the 2nd element as an ndarray and throwing an IndexError as > a > matrix. Is it possible to set a flag to make the default be a column > vector? > I'm not aware of any option that changes the construction behavior of "matrix". I honestly don't work with matrices very often - I just stick with arrays. ".T" will do what you want of course: x = matrix(range(10)).T As for odd behavior, can you say what in particular is odd about it? For the example that you mention, let's consider it in greater depth: x=np.array(range(5)) x[1] --> 1 x=np.matrix(range(5)) x[1] --> raises IndexError "index out of bounds" So it appears that with *matrices* as opposed to arrays, indexing with just 1 index has an implicit ":" as a 2nd index. This is distinctly different than the behavior for *arrays* where indexing with just 1 index returns a new nd-array with one fewer dimensions. I think it boils down to this: a matrix always has 2 dimensions, so indexing into it (whether with one or both indices specified) returns another 2D matrix; with arrays which are ND, indexing with just a single (integer) index returns an N-1 D array. When I look at it like this, np's indexing error is just exactly what I would expect (given that it creates a row vector). If you are going to use matrix in numpy, you must realize that x[n] for matrix x is equivalent to x[n,:]. Maybe my reluctance to work with matrices stems from this kind of inconsistency. It seems like your code has to be all matrix, or all array - and if you mix them, you need to be very careful about which is which. It is just easier for me to work only with array, and when needing to do matrix stuff just call "dot". Come to think of it, why doesn't 1/x for matrix x invert the matrix like in MATLAB? How about x\1? (Yeah, I know I'm pushing it now :-) As for adding an option that changes to behavior of matrix creation to be "transposed" I don't like it since the code will now do different things depending on whether the option is set. So, to write "safe" code you would have to check the option each time. -- View this message in context: http://www.nabble.com/matrix-default-to-column-vector--tp23652920p23693116.html Sent from the Numpy-discussion mailing list archive at Nabble.com. From ralf.gommers at googlemail.com Sun May 24 14:29:30 2009 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 24 May 2009 14:29:30 -0400 Subject: [Numpy-discussion] documenting index_tricks Message-ID: Hi all, I'm documenting the index_tricks module and am unsure how to handle r_ and related objects. r_ is an instance of RClass, therefore ipython shows the RClass docstring for "r_?". RClass in the doc editor is also marked as "needs editing", and the page for r_ is completely empty (see http://docs.scipy.org/numpy/docs/numpy.r_/#numpy-r). It seems like only one of the two should be edited and then the docstring should be copied over. In Travis Oliphant's Guide to Numpy r_ is extensively documented, should I integrate part of that with one of the docstrings? About the code itself, is there a use case for having two RClass instances at the same time? If not maybe RClass should be a private class. Also a name like RowConcatenator (it is a subclass of AxisConcatenator) would make more sense. I'm not sure this change is worth the trouble though. Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From pav at iki.fi Sun May 24 15:21:59 2009 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 24 May 2009 19:21:59 +0000 (UTC) Subject: [Numpy-discussion] documenting index_tricks References: Message-ID: Sun, 24 May 2009 14:29:30 -0400, Ralf Gommers wrote: > Hi all, > > I'm documenting the index_tricks module and am unsure how to handle r_ > and related objects. r_ is an instance of RClass, therefore ipython > shows the RClass docstring for "r_?". RClass in the doc editor is also > marked as "needs editing", and the page for r_ is completely empty (see > http://docs.scipy.org/numpy/docs/numpy.r_/#numpy-r). It seems like only > one of the two should be edited and then the docstring should be copied > over. In Travis Oliphant's Guide to Numpy r_ is extensively documented, > should I integrate part of that with one of the docstrings? Yes, please reuse the r_ documentation, and write it on the numpy.lib.index_tricks.r_ page. We already do special tricks to get the mgrid and ogrid docs to work as they should, and it's simple to do the same for r_ and c_. There are many of the r_ object pages in the doc wiki: this should be fixed, but I don't have time to do much this in the following couple of weeks. > About the code itself, is there a use case for having two RClass > instances at the same time? If not maybe RClass should be a private > class. Also a name like RowConcatenator (it is a subclass of > AxisConcatenator) would make more sense. I'm not sure this change is > worth the trouble though. I think the RClass is a private implementation detail, and shouldn't be documented in the reference guide. -- Pauli Virtanen From josef.pktd at gmail.com Sun May 24 15:22:37 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 24 May 2009 15:22:37 -0400 Subject: [Numpy-discussion] documenting index_tricks In-Reply-To: References: Message-ID: <1cd32cbb0905241222n518ea288s7bdddce1eea1e553@mail.gmail.com> On Sun, May 24, 2009 at 2:29 PM, Ralf Gommers wrote: > Hi all, > > I'm documenting the index_tricks module and am unsure how to handle r_ and > related objects. r_ is an instance of RClass, therefore ipython shows the > RClass docstring for "r_?". RClass in the doc editor is also marked as > "needs editing", and the page for r_ is completely empty (see > http://docs.scipy.org/numpy/docs/numpy.r_/#numpy-r). It seems like only one > of the two should be edited and then the docstring should be copied over. it looks like this is already the case, the htmlhelp and the sphinx generated docs http://docs.scipy.org/doc/numpy/reference/generated/numpy.r_.html#numpy.r_ show the docstring that I get with help(np.r_) for RClass on the python prompt, even though in the doc editor it is empty. So editing RClass seems to be the right thing to do. Josef >In > Travis Oliphant's Guide to Numpy r_ is extensively documented, should I > integrate part of that with one of the docstrings? > > About the code itself, is there a use case for having two RClass instances > at the same time? If not maybe RClass should be a private class. Also a name > like RowConcatenator (it is a subclass of AxisConcatenator) would make more > sense. I'm not sure this change is worth the trouble though. > > Cheers, > Ralf > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > From pav at iki.fi Sun May 24 15:48:48 2009 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 24 May 2009 19:48:48 +0000 (UTC) Subject: [Numpy-discussion] documenting index_tricks References: Message-ID: Sun, 24 May 2009 19:21:59 +0000, Pauli Virtanen wrote: > Sun, 24 May 2009 14:29:30 -0400, Ralf Gommers wrote: > >> Hi all, >> >> I'm documenting the index_tricks module and am unsure how to handle r_ >> and related objects. r_ is an instance of RClass, therefore ipython >> shows the RClass docstring for "r_?". RClass in the doc editor is also >> marked as "needs editing", and the page for r_ is completely empty (see >> http://docs.scipy.org/numpy/docs/numpy.r_/#numpy-r). It seems like only >> one of the two should be edited and then the docstring should be copied >> over. In Travis Oliphant's Guide to Numpy r_ is extensively documented, >> should I integrate part of that with one of the docstrings? > > Yes, please reuse the r_ documentation, and write it on the > numpy.lib.index_tricks.r_ page. We already do special tricks to get the > mgrid and ogrid docs to work as they should, and it's simple to do the > same for r_ and c_. I'm wrong: the correct place is RClass, since its only instance is r_. -- Pauli Virtanen From bryan at cole.uklinux.net Sun May 24 15:55:50 2009 From: bryan at cole.uklinux.net (Bryan Cole) Date: Sun, 24 May 2009 20:55:50 +0100 Subject: [Numpy-discussion] Generalised Ufunc list Message-ID: <1243194950.6207.5.camel@pc2.cole.uklinux.net> Which (if any) existing ufuncs support the new generalised looping system? I'm particularly interested in a "vectorised" matrix multiply. BC From pav at iki.fi Sun May 24 16:02:48 2009 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 24 May 2009 20:02:48 +0000 (UTC) Subject: [Numpy-discussion] Generalised Ufunc list References: <1243194950.6207.5.camel@pc2.cole.uklinux.net> Message-ID: Sun, 24 May 2009 20:55:50 +0100, Bryan Cole wrote: > Which (if any) existing ufuncs support the new generalised looping > system? I'm particularly interested in a "vectorised" matrix multiply. None so far, to my knowledge. Also, a Python API has not been decided on, although the C-side stuff is in place. -- Pauli Virtanen From charlesr.harris at gmail.com Sun May 24 16:29:42 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 24 May 2009 14:29:42 -0600 Subject: [Numpy-discussion] parsing text strings/files in fromfile, fromstring Message-ID: Hi All, I am trying to put together some rule for parsing text strings/files in fromfile, fromstring so that the two are consistent. Tickets relevant to this are #1116 and #883. The question here is the interpretation of the separators, not the parsing of the numbers themselves. Below is the current behavior of fromstring, fromfile, and python split for content of "", "1", "1 1", " " respectively. fromstring : In [5]: fromstring("", sep=" ") Out[5]: array([ 0.]) In [6]: fromstring("1", sep=" ") Out[6]: array([ 1.]) In [7]: fromstring("1 1", sep=" ") Out[7]: array([ 1., 1.]) In [8]: fromstring(" ", sep=" ") Out[8]: array([ 0.]) fromfile: In [1]: fromfile("tmp", sep=" ") Out[1]: array([], dtype=float64) In [2]: fromfile("tmp", sep=" ") Out[2]: array([ 1.]) In [3]: fromfile("tmp", sep=" ") Out[3]: array([ 1., 1.]) In [4]: fromfile("tmp", sep=" ") Out[4]: array([ 0.]) split: In [9]: "".split(" ") Out[9]: [''] In [10]: "1".split(" ") Out[10]: ['1'] In [11]: "1 1".split(" ") Out[11]: ['1', '1'] In [12]: " ".split(" ") Out[12]: ['', ''] Differences: 1) When the string/file is empty fromfile returns and empty array, split returns an empty string, and fromstring converts the empty string to a default value. Which should we use? 2) When the string/file contains only a single seperator fromfile/fromstring both return a single value, while split returns two empty strings. Which should we use? My preferences would be to return empty arrays whenever the string/file is empty, but I don't feel strongly about that. I think the single separator should definitely produce two values. Also, wouldn't a missing value be better interpreted as nan than zero in the float case? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From jh at physics.ucf.edu Sun May 24 16:33:07 2009 From: jh at physics.ucf.edu (Joe Harrington) Date: Sun, 24 May 2009 16:33:07 -0400 Subject: [Numpy-discussion] add xirr to numpy financial functions? Message-ID: I hate to ask for another function in numpy, but there's an obvious one missing in the financial group: xirr. It could be done as a new function or as an extension to the existing np.irr. The internal rate of return (np.irr) is defined as the growth rate that would give you a zero balance at the end of a period of investment given a series of cash flows into or out of the investment at regular intervals (the first and last cash flows are usually an initial deposit and a withdrawal of the current balance). This is useful in academics, but if you're tracking a real investment, you don't just withdraw or add money on a perfectly annual basis, nor do you want a calc with thousands of days of zero entries just so you can handle the uneven intervals by evening them out. Both excel and openoffice define a "xirr" function that pairs each cash flow with a date. Would there be an objection to either a xirr or adding an optional second arg (or a keyword arg) to np.irr in numpy? Who writes the code is a different question, but that part isn't hard. --jh-- From stefan at sun.ac.za Sun May 24 16:43:46 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Sun, 24 May 2009 22:43:46 +0200 Subject: [Numpy-discussion] parsing text strings/files in fromfile, fromstring In-Reply-To: References: Message-ID: <9457e7c80905241343k14a3dd4bia438f7b6bfb2cbd2@mail.gmail.com> 2009/5/24 Charles R Harris : > fromstring : > > In [5]: fromstring("", sep=" ") > Out[5]: array([ 0.]) I would expect an empty array here. St?fan From ralf.gommers at googlemail.com Sun May 24 16:47:41 2009 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 24 May 2009 16:47:41 -0400 Subject: [Numpy-discussion] documenting index_tricks In-Reply-To: References: Message-ID: On Sun, May 24, 2009 at 3:48 PM, Pauli Virtanen wrote: > Sun, 24 May 2009 19:21:59 +0000, Pauli Virtanen wrote: > > > Sun, 24 May 2009 14:29:30 -0400, Ralf Gommers wrote: > > > >> Hi all, > >> > >> I'm documenting the index_tricks module and am unsure how to handle r_ > >> and related objects. r_ is an instance of RClass, therefore ipython > >> shows the RClass docstring for "r_?". RClass in the doc editor is also > >> marked as "needs editing", and the page for r_ is completely empty (see > >> http://docs.scipy.org/numpy/docs/numpy.r_/#numpy-r). It seems like only > >> one of the two should be edited and then the docstring should be copied > >> over. In Travis Oliphant's Guide to Numpy r_ is extensively documented, > >> should I integrate part of that with one of the docstrings? > > > > Yes, please reuse the r_ documentation, and write it on the > > numpy.lib.index_tricks.r_ page. We already do special tricks to get the > > mgrid and ogrid docs to work as they should, and it's simple to do the > > same for r_ and c_. > > I'm wrong: the correct place is RClass, since its only instance is r_. > Thanks Pauli and Josef. I'll edit RClass and put a comment on the r_ page that it should not be edited and has to be removed from the doc editor at some point. Josef is right about the r_ page already showing up correctly in the sphinx generated docs. Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From robert.kern at gmail.com Sun May 24 17:22:23 2009 From: robert.kern at gmail.com (Robert Kern) Date: Sun, 24 May 2009 16:22:23 -0500 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: <4A193C3A.1080309@stevesimmons.com> References: <000e0cd2a07010d73d046a71a597@google.com> <200905221000.56593.faltet@pytables.org> <4A193C3A.1080309@stevesimmons.com> Message-ID: <3d375d730905241422j29b11d6dn46212a78c7ddfd56@mail.gmail.com> On Sun, May 24, 2009 at 07:23, Stephen Simmons wrote: > While I haven't tried Andrew Collette's h5py > (http://code.google.com/p/h5py), it looks like a very 'thin' wrapper > around the HDF5 C libraries. Maybe numpy's save(), savez(), load(), > memmap() could be enhanced so that saving/loading files with HDF5-like > file extensions used the HDF5 format, with code based on h5py and > pyhdf5io. This could, I imagine, be a relatively small/simple addition > to numpy, with the only external dependency being the HDF5 libraries > themselves. *libhdf5* is too big, not PyTables. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From josef.pktd at gmail.com Sun May 24 18:14:42 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Sun, 24 May 2009 18:14:42 -0400 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: References: Message-ID: <1cd32cbb0905241514i20d8191fv1f9453f97ab58ec3@mail.gmail.com> On Sun, May 24, 2009 at 4:33 PM, Joe Harrington wrote: > I hate to ask for another function in numpy, but there's an obvious > one missing in the financial group: xirr. ?It could be done as a new > function or as an extension to the existing np.irr. > > The internal rate of return (np.irr) is defined as the growth rate > that would give you a zero balance at the end of a period of > investment given a series of cash flows into or out of the investment > at regular intervals (the first and last cash flows are usually an > initial deposit and a withdrawal of the current balance). > > This is useful in academics, but if you're tracking a real investment, > you don't just withdraw or add money on a perfectly annual basis, nor > do you want a calc with thousands of days of zero entries just so you > can handle the uneven intervals by evening them out. ?Both excel and > openoffice define a "xirr" function that pairs each cash flow with a > date. ?Would there be an objection to either a xirr or adding an > optional second arg (or a keyword arg) to np.irr in numpy? ?Who writes > the code is a different question, but that part isn't hard. > 3 comments: * open office has also the other function in an x??? version, so it might be good to add it consistently to all functions * date type: scikits.timeseries and the gsoc for implementing a date type would be useful to have a clear date type, or would you want to base it only on python standard library * real life accuracy: given that there are large differences in the definition of a year for financial calculations, any simple implementation would be only approximately accurate. for example in the open office help, oddlyield list the following option Basis is chosen from a list of options and indicates how the year is to be calculated. Basis Calculation 0 or missing US method (NASD), 12 months of 30 days each 1 Exact number of days in months, exact number of days in year 2 Exact number of days in month, year has 360 days 3 Exact number of days in month, year has 365 days 4 European method, 12 months of 30 days each So, my question: what's the purpose of the financial function in numpy? Currently it provides convenient functions for (approximate) interest calculations. If they get expanded to a "serious" implementation of, for example, the main financial functions listed in the open office help (just for reference) then maybe numpy is not the right location for it. I started to do something similar in matlab, and once I tried to use real dates instead of just counting months, the accounting rules get quickly very messy. Using dates as you propose would be very convenient, but the users shouldn't be surprised that their actual payments at the end of the year don't fully match up with what numpy told them. my 3cents Josef From dwf at cs.toronto.edu Sun May 24 18:31:43 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Sun, 24 May 2009 18:31:43 -0400 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: <3d375d730905241422j29b11d6dn46212a78c7ddfd56@mail.gmail.com> References: <000e0cd2a07010d73d046a71a597@google.com> <200905221000.56593.faltet@pytables.org> <4A193C3A.1080309@stevesimmons.com> <3d375d730905241422j29b11d6dn46212a78c7ddfd56@mail.gmail.com> Message-ID: <4488200F-9E58-4844-88D5-08C03CDAB713@cs.toronto.edu> On 24-May-09, at 5:22 PM, Robert Kern wrote: >> While I haven't tried Andrew Collette's h5py >> (http://code.google.com/p/h5py), it looks like a very 'thin' wrapper >> around the HDF5 C libraries. Maybe numpy's save(), savez(), load(), >> memmap() could be enhanced so that saving/loading files with HDF5- >> like >> file extensions used the HDF5 format, with code based on h5py and >> pyhdf5io. This could, I imagine, be a relatively small/simple >> addition >> to numpy, with the only external dependency being the HDF5 libraries >> themselves. > > *libhdf5* is too big, not PyTables. Yup. According to sloccount, numpy is roughly ~210,000 lines of code. The hdf5 library is ~385,000 lines. Including even a small part of libhdf5 would grow the code base significantly, and requiring it as a dependency isn't a good idea since libhdf5 can be tricky to build right. As Robert's design document for the NPY format says, one option would be to implement a minimal subset of the HDF5 protocol *from scratch* (that would be required for saving NumPy arrays as top-level leaf nodes, for example). This would also sidestep any tricky licensing issues (I don't know what the HDF5 license is in particular, I know it's fairly permissive but still might not be suitable for including any of it in NumPy). David From dwf at cs.toronto.edu Sun May 24 18:45:46 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Sun, 24 May 2009 18:45:46 -0400 Subject: [Numpy-discussion] matrix default to column vector? In-Reply-To: <23693116.post@talk.nabble.com> References: <75c31b2a0905210610x1a321264r5b6f93d327ef2b36@mail.gmail.com> <23693116.post@talk.nabble.com> Message-ID: On 24-May-09, at 8:32 AM, Tom K. wrote: > Maybe my reluctance to work with matrices stems from this kind of > inconsistency. It seems like your code has to be all matrix, or all > array - > and if you mix them, you need to be very careful about which is which. Also, functions called on things of type matrix may not return a matrix as expected, but rather an array. Anecdotally, it seems to me that lots of people (myself included) seem to go through a phase early in their use of NumPy where they try to use matrix(), but most seem to end up switching to using 2D arrays for all the aforementioned reasons. David From pav at iki.fi Sun May 24 19:28:05 2009 From: pav at iki.fi (Pauli Virtanen) Date: Sun, 24 May 2009 23:28:05 +0000 (UTC) Subject: [Numpy-discussion] parsing text strings/files in fromfile, fromstring References: Message-ID: Sun, 24 May 2009 14:29:42 -0600, Charles R Harris wrote: > I am trying to put together some rule for parsing text strings/files in > fromfile, fromstring so that the two are consistent. Tickets relevant to > this are #1116 and > #883. The question here is > the interpretation of the separators, not the parsing of the numbers > themselves. Below is the current behavior of fromstring, fromfile, and > python split for content of "", "1", "1 1", " " respectively. It should return only the data that's in the file, no extra elements. The current behavior is a bug, IMHO, especially so since the default value is uninitialized IIRC. So, fromstring("", sep=" ") -> [] fromstring(" ", sep=" ") -> [] fromstring("1 ", sep=" ") -> [1] fromfile should behave identically. Another question is perhaps what to do with malformed input: whether to try best-efforts parsing, or bail out. I'd suggest bailing out when encountering bad data rather than guessing: fromstring("1,2,,3", sep=",") -> [1,2] or ValueError Currently, something horrible happens: >>> np.fromstring('1,2,,3,,,6', sep=',') array([ 1., 2., -1., 3., -1., -1., 6.]) Also, on second thoughts, the idea about raising a warning on malformed input seems more repulsive the more I think about it. Warnings are a bit nasty to catch, spam stderr if uncaught, and IMHO should not be a part of "business as usual" code paths. Having malformed input is business as usual :) In some sense, it would be simpler if `fromfile` and `fromstring` would be defined so that they read *at most* `count` entries, and return what they got by parsing the leftmost valid part. This could be implemented by fixing the current bugs and removing the fprintf that currently prints to stderr there. As an addition, a flag could be added that forces them to raise a ValueError on malformed input (eg. EOF when `count` was given, or bad separator encountered). Ideally, the exceptions flag would be True by default both for fromfile and fromstring, but I guess some legacy applications might rely on the current behavior... Also, one could envision a "default" value that would denote a batch of malformed input... *** So, I see a couple of alternatives (some already suggested): a) fromstring("1,2,x,4", sep=",") -> [1,2] fromstring("1,2,x,4", sep=",", strict=True) -> ValueError fromstring("1,2,x,4", sep=",", count=5) -> [1,2] fromstring("1,2,x,4", sep=",", count=5, strict=True) -> ValueError b) fromstring("1,2,x,4", sep=",") -> [1,2] fromstring("1,2,x,4", sep=",", strict=True) -> ValueError fromstring("1,2,x,4", sep=",", default=3) -> [1,2,3,4] fromstring("1,2,x,4", sep=",", count=5) -> [1,2] fromstring("1,2,x,4", sep=",", count=5, strict=True) -> ValueError c) fromstring("1,2,x,4", sep=",") -> [1,2] + SomeWarning fromstring("1,2,x,4", sep=",", count=5) -> [1,2] + SomeWarning d) fromstring("1,2,x,4", sep=",") -> [1,2] + SomeWarning fromstring("1,2,x,4", sep=",", default=3) -> [1,2,3,4] fromstring("1,2,x,4", sep=",", default=3, count=5) -> [1,2,3,4] + SomeWarning e) fromstring("1,2,x,4", sep=",") -> ValueError fromstring("1,2,x,4", sep=",", strict=False) -> [1,2] fromstring("1,2,x,4", sep=",", count=5) -> ValueError fromstring("1,2,x,4", sep=",", count=5, strict=False) -> [1,2] Fromfile would always behave the same way as `fromstring(file.read())`. In the above, " " in sep would equal the regexp \w+, and binary data implied by sep='' would be interpreted in the same way it would if first converted to comma-separated text. Can you think of any other alternatives? (Let's forget the names of the new keyword arguments for the present, and assume they have perfectly fitting names.) I'd vote for (e) if the slate was clean, but since it's not: +1 for (a) or (b) -- Pauli Virtanen From charlesr.harris at gmail.com Sun May 24 23:07:11 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 24 May 2009 21:07:11 -0600 Subject: [Numpy-discussion] parsing text strings/files in fromfile, fromstring In-Reply-To: References: Message-ID: On Sun, May 24, 2009 at 5:28 PM, Pauli Virtanen wrote: > Sun, 24 May 2009 14:29:42 -0600, Charles R Harris wrote: > > I am trying to put together some rule for parsing text strings/files in > > fromfile, fromstring so that the two are consistent. Tickets relevant to > > this are #1116 and > > #883. The question here is > > the interpretation of the separators, not the parsing of the numbers > > themselves. Below is the current behavior of fromstring, fromfile, and > > python split for content of "", "1", "1 1", " " respectively. > > It should return only the data that's in the file, no extra elements. The > current behavior is a bug, IMHO, especially so since the default value is > uninitialized IIRC. > > So, > > fromstring("", sep=" ") -> [] > fromstring(" ", sep=" ") -> [] > fromstring("1 ", sep=" ") -> [1] > > fromfile should behave identically. > > Another question is perhaps what to do with malformed input: whether > to try best-efforts parsing, or bail out. I'd suggest bailing out > when encountering bad data rather than guessing: > > fromstring("1,2,,3", sep=",") -> [1,2] or ValueError > > Currently, something horrible happens: > > >>> np.fromstring('1,2,,3,,,6', sep=',') > array([ 1., 2., -1., 3., -1., -1., 6.]) > > > Also, on second thoughts, the idea about raising a warning on malformed > input seems more repulsive the more I think about it. Warnings are a bit > nasty to catch, spam stderr if uncaught, and IMHO should not be a part of > "business as usual" code paths. Having malformed input is business as > usual :) > > In some sense, it would be simpler if `fromfile` and `fromstring` would > be defined so that they read *at most* `count` entries, and return what > they got by parsing the leftmost valid part. This could be implemented by > fixing the current bugs and removing the fprintf that currently prints to > stderr there. > > As an addition, a flag could be added that forces them to raise a > ValueError on malformed input (eg. EOF when `count` was given, or bad > separator encountered). Ideally, the exceptions flag would be True by > default both for fromfile and fromstring, but I guess some legacy > applications might rely on the current behavior... > > Also, one could envision a "default" value that would denote a batch of > malformed input... > > *** > > So, I see a couple of alternatives (some already suggested): > > a) fromstring("1,2,x,4", sep=",") -> [1,2] > fromstring("1,2,x,4", sep=",", strict=True) -> ValueError > fromstring("1,2,x,4", sep=",", count=5) -> [1,2] > fromstring("1,2,x,4", sep=",", count=5, strict=True) -> ValueError > > b) fromstring("1,2,x,4", sep=",") -> [1,2] > fromstring("1,2,x,4", sep=",", strict=True) -> ValueError > fromstring("1,2,x,4", sep=",", default=3) -> [1,2,3,4] > fromstring("1,2,x,4", sep=",", count=5) -> [1,2] > fromstring("1,2,x,4", sep=",", count=5, strict=True) -> ValueError > > c) fromstring("1,2,x,4", sep=",") -> [1,2] + SomeWarning > fromstring("1,2,x,4", sep=",", count=5) -> [1,2] + SomeWarning > > d) fromstring("1,2,x,4", sep=",") -> [1,2] + SomeWarning > fromstring("1,2,x,4", sep=",", default=3) -> [1,2,3,4] > fromstring("1,2,x,4", sep=",", default=3, count=5) -> [1,2,3,4] + > SomeWarning > > e) fromstring("1,2,x,4", sep=",") -> ValueError > fromstring("1,2,x,4", sep=",", strict=False) -> [1,2] > fromstring("1,2,x,4", sep=",", count=5) -> ValueError > fromstring("1,2,x,4", sep=",", count=5, strict=False) -> [1,2] > > Fromfile would always behave the same way as `fromstring(file.read())`. I think a common behavior is basic to whatever we end up with. > > In the above, " " in sep would equal the regexp \w+, and binary data > implied by sep='' would be interpreted in the same way it would if first > converted to comma-separated text. > > Can you think of any other alternatives? (Let's forget the names of > the new keyword arguments for the present, and assume they have > perfectly fitting names.) > > > I'd vote for (e) if the slate was clean, but since it's not: > > +1 for (a) or (b) > (a) and (e) are the simplest and just differ in the default, so that would be the shortest path. OTOH, (b) is the most general and the default is a nice idea. Hmm... Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at pytables.org Mon May 25 02:57:47 2009 From: faltet at pytables.org (Francesc Alted) Date: Mon, 25 May 2009 08:57:47 +0200 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: <4488200F-9E58-4844-88D5-08C03CDAB713@cs.toronto.edu> References: <000e0cd2a07010d73d046a71a597@google.com> <3d375d730905241422j29b11d6dn46212a78c7ddfd56@mail.gmail.com> <4488200F-9E58-4844-88D5-08C03CDAB713@cs.toronto.edu> Message-ID: <200905250857.47279.faltet@pytables.org> A Monday 25 May 2009 00:31:43 David Warde-Farley escrigu?: > As Robert's design document for the NPY format says, one option would > be to implement a minimal subset of the HDF5 protocol *from scratch* > (that would be required for saving NumPy arrays as top-level leaf > nodes, for example). This would also sidestep any tricky licensing > issues (I don't know what the HDF5 license is in particular, I know > it's fairly permissive but still might not be suitable for including > any of it in NumPy). The license for HDF5 is BSD-based and apparently permissive enough, as can be seen in: http://www.hdfgroup.org/HDF5/doc/Copyright.html The problem is to select such a desired minimal protocol subset. In addition, this implementation may require quite a bit of work (but I've never had an in- deep look at the guts of the HDF5 library, so I may be wrong). Cheers, -- Francesc Alted From andrea.gavana at gmail.com Mon May 25 04:34:11 2009 From: andrea.gavana at gmail.com (Andrea Gavana) Date: Mon, 25 May 2009 09:34:11 +0100 Subject: [Numpy-discussion] List/location of consecutive integers In-Reply-To: <37C782A0-2BF8-442F-9741-003536F1FB7F@cs.toronto.edu> References: <4A16DAC9.2090004@noaa.gov> <237A4B0E-E8EC-401F-A8C9-D7A24F9DC47A@cs.toronto.edu> <4A17236C.2020000@noaa.gov> <37C782A0-2BF8-442F-9741-003536F1FB7F@cs.toronto.edu> Message-ID: Hi All, On Fri, May 22, 2009 at 11:28 PM, David Warde-Farley wrote: > On 22-May-09, at 6:13 PM, Christopher Barker wrote: > >> that's why I put a sys.maxint at the end of the series... > > Oops! I foolishly assumed the sequence was unaltered. That makes a lot > more sense. Thank you guys for your help, I'll implement your solutions and will report back about the timings :-D Thank you again. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ http://thedoomedcity.blogspot.com/ From afriedle at indiana.edu Mon May 25 06:59:31 2009 From: afriedle at indiana.edu (Andrew Friedley) Date: Mon, 25 May 2009 06:59:31 -0400 Subject: [Numpy-discussion] numpy ufuncs and COREPY - any info? In-Reply-To: <200905221433.18551.faltet@pytables.org> References: <4A16920E.8060805@indiana.edu> <200905221433.18551.faltet@pytables.org> Message-ID: <4A1A7A13.6080405@indiana.edu> For some reason the list seems to occasionally drop my messages... Francesc Alted wrote: > A Friday 22 May 2009 13:52:46 Andrew Friedley escrigu?: >> I'm the student doing the project. I have a blog here, which contains >> some initial performance numbers for a couple test ufuncs I did: >> >> http://numcorepy.blogspot.com >> Another alternative we've talked about, and I (more and more likely) may >> look into is composing multiple operations together into a single ufunc. >> Again the main idea being that memory accesses can be reduced/eliminated. > > IMHO, composing multiple operations together is the most promising venue for > leveraging current multicore systems. Agreed -- our concern when considering for the project was to keep the scope reasonable so I can complete it in the GSoC timeframe. If I have time I'll definitely be looking into this over the summer; if not later. > Another interesting approach is to implement costly operations (from the point > of view of CPU resources), namely, transcendental functions like sin, cos or > tan, but also others like sqrt or pow) in a parallel way. If besides, you can > combine this with vectorized versions of them (by using the well spread SSE2 > instruction set, see [1] for an example), then you would be able to achieve > really good results for sure (at least Intel did with its VML library ;) > > [1] http://gruntthepeon.free.fr/ssemath/ I've seen that page before. Using another source [1] I came up with a quick/dirty cos ufunc. Performance is crazy good compared to NumPy (100x); see the latest post on my blog for a little more info. I'll look at the source myself when I get time again, but is NumPy using a Python-based cos function, a C implementation, or something else? As I wrote in my blog, the performance gain is almost too good to believe. [1] http://www.devmaster.net/forums/showthread.php?t=5784 Andrew From charlesr.harris at gmail.com Mon May 25 10:51:26 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 25 May 2009 08:51:26 -0600 Subject: [Numpy-discussion] numpy ufuncs and COREPY - any info? In-Reply-To: <4A1A7A13.6080405@indiana.edu> References: <4A16920E.8060805@indiana.edu> <200905221433.18551.faltet@pytables.org> <4A1A7A13.6080405@indiana.edu> Message-ID: On Mon, May 25, 2009 at 4:59 AM, Andrew Friedley wrote: > For some reason the list seems to occasionally drop my messages... > > Francesc Alted wrote: > > A Friday 22 May 2009 13:52:46 Andrew Friedley escrigu?: > >> I'm the student doing the project. I have a blog here, which contains > >> some initial performance numbers for a couple test ufuncs I did: > >> > >> http://numcorepy.blogspot.com > > >> Another alternative we've talked about, and I (more and more likely) may > >> look into is composing multiple operations together into a single ufunc. > >> Again the main idea being that memory accesses can be > reduced/eliminated. > > > > IMHO, composing multiple operations together is the most promising venue > for > > leveraging current multicore systems. > > Agreed -- our concern when considering for the project was to keep the > scope reasonable so I can complete it in the GSoC timeframe. If I have > time I'll definitely be looking into this over the summer; if not later. > > > Another interesting approach is to implement costly operations (from the > point > > of view of CPU resources), namely, transcendental functions like sin, cos > or > > tan, but also others like sqrt or pow) in a parallel way. If besides, > you can > > combine this with vectorized versions of them (by using the well spread > SSE2 > > instruction set, see [1] for an example), then you would be able to > achieve > > really good results for sure (at least Intel did with its VML library ;) > > > > [1] http://gruntthepeon.free.fr/ssemath/ > > I've seen that page before. Using another source [1] I came up with a > quick/dirty cos ufunc. Performance is crazy good compared to NumPy > (100x); see the latest post on my blog for a little more info. I'll > look at the source myself when I get time again, but is NumPy using a > Python-based cos function, a C implementation, or something else? As I > wrote in my blog, the performance gain is almost too good to believe. > Numpy uses the C library version. If long double and float aren't available the double version is used with number conversions, but that shouldn't give a factor of 100x. Something else is going on. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at pytables.org Mon May 25 11:10:05 2009 From: faltet at pytables.org (Francesc Alted) Date: Mon, 25 May 2009 17:10:05 +0200 Subject: [Numpy-discussion] numpy ufuncs and COREPY - any info? In-Reply-To: <4A1A7A13.6080405@indiana.edu> References: <200905221433.18551.faltet@pytables.org> <4A1A7A13.6080405@indiana.edu> Message-ID: <200905251710.05135.faltet@pytables.org> A Monday 25 May 2009 12:59:31 Andrew Friedley escrigu?: > For some reason the list seems to occasionally drop my messages... > > Francesc Alted wrote: > > A Friday 22 May 2009 13:52:46 Andrew Friedley escrigu?: > >> I'm the student doing the project. I have a blog here, which contains > >> some initial performance numbers for a couple test ufuncs I did: > >> > >> http://numcorepy.blogspot.com > >> > >> Another alternative we've talked about, and I (more and more likely) may > >> look into is composing multiple operations together into a single ufunc. > >> Again the main idea being that memory accesses can be > >> reduced/eliminated. > > > > IMHO, composing multiple operations together is the most promising venue > > for leveraging current multicore systems. > > Agreed -- our concern when considering for the project was to keep the > scope reasonable so I can complete it in the GSoC timeframe. If I have > time I'll definitely be looking into this over the summer; if not later. You should know that Numexpr has already started this path for some time now. The fact that it already can evaluate complex array expressions like 'a+b*cos(c)' without using temporaries (like NumPy does) should allow it to use multiple cores without stressing the memory bus too much. I'm planning to implement such parallelism in Numexpr for some time now, but not there yet. > I've seen that page before. Using another source [1] I came up with a > quick/dirty cos ufunc. Performance is crazy good compared to NumPy > (100x); see the latest post on my blog for a little more info. I'll > look at the source myself when I get time again, but is NumPy using a > Python-based cos function, a C implementation, or something else? As I > wrote in my blog, the performance gain is almost too good to believe. > > [1] http://www.devmaster.net/forums/showthread.php?t=5784 100x? Uh, sounds really impressing... -- Francesc Alted From jh at physics.ucf.edu Mon May 25 11:50:16 2009 From: jh at physics.ucf.edu (Joe Harrington) Date: Mon, 25 May 2009 11:50:16 -0400 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: (numpy-discussion-request@scipy.org) References: Message-ID: On Sun, 24 May 2009 18:14:42 -0400 josef.pktd at gmail.com wrote: > On Sun, May 24, 2009 at 4:33 PM, Joe Harrington wrote: > > I hate to ask for another function in numpy, but there's an obvious > > one missing in the financial group: xirr. ?It could be done as a new > > function or as an extension to the existing np.irr. > > > > The internal rate of return (np.irr) is defined as the growth rate > > that would give you a zero balance at the end of a period of > > investment given a series of cash flows into or out of the investment > > at regular intervals (the first and last cash flows are usually an > > initial deposit and a withdrawal of the current balance). > > > > This is useful in academics, but if you're tracking a real investment, > > you don't just withdraw or add money on a perfectly annual basis, nor > > do you want a calc with thousands of days of zero entries just so you > > can handle the uneven intervals by evening them out. ?Both excel and > > openoffice define a "xirr" function that pairs each cash flow with a > > date. ?Would there be an objection to either a xirr or adding an > > optional second arg (or a keyword arg) to np.irr in numpy? ?Who writes > > the code is a different question, but that part isn't hard. > > > > > > 3 comments: > > * open office has also the other function in an x??? version, so it > might be good to add it consistently to all functions > > * date type: scikits.timeseries and the gsoc for implementing a date > type would be useful to have a clear date type, or would you want to > base it only on python standard library > > * real life accuracy: given that there are large differences in the > definition of a year for financial calculations, any simple > implementation would be only approximately accurate. for example in > the open office help, oddlyield list the following option > > Basis is chosen from a list of options and indicates how the year is > to be calculated. > Basis Calculation > 0 or missing US method (NASD), 12 months of 30 days each > 1 Exact number of days in months, exact number of days in year > 2 Exact number of days in month, year has 360 days > 3 Exact number of days in month, year has 365 days > 4 European method, 12 months of 30 days each > > So, my question: what's the purpose of the financial function in numpy? > Currently it provides convenient functions for (approximate) interest > calculations. > If they get expanded to a "serious" implementation of, for example, > the main financial functions listed in the open office help (just for > reference) then maybe numpy is not the right location for it. > > I started to do something similar in matlab, and once I tried to use > real dates instead of just counting months, the accounting rules get > quickly very messy. > > Using dates as you propose would be very convenient, but the users > shouldn't be surprised that their actual payments at the end of the > year don't fully match up with what numpy told them. > > my 3cents > > Josef First point: agreed. I wish this community had a design review process for numpy and scipy, so that these things could get properly hashed out, and not just one person (even Travis) suggesting something and everyone else saying yeah-sure-whatever. Does anyone on the list have the financial background to suggest what functions "should" be included in a basic set of financial routines? xirr is the only one I've ever used in a spreadsheet, myself. Other points: Yuk. You're right. When these first came up for discussion, I had a Han Solo moment ("I've got a baaad feeling about this...") but I couldn't put my finger on why. They seemed like simple and limited functions with high utility. Certainly anything as open-ended as financial-industry rules should go elsewhere (scikits, scipy, monpy, whatever). But, that doesn't prevent a user-supplied, floating-point time array from going into a function in numpy. The rate of return would be in units of that array. Functions that convert date/time in some format (or many) and following some rule (or one of many) to such a floating array can still go elsewhere, maintained by people who know the definitions, if they have interest (pun intended). That would make the functions in numpy much more useful without bloating them or making them a maintenance nightmare. --jh-- From jsseabold at gmail.com Mon May 25 12:29:55 2009 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 25 May 2009 12:29:55 -0400 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: References: Message-ID: On Mon, May 25, 2009 at 11:50 AM, Joe Harrington wrote: > On Sun, 24 May 2009 18:14:42 -0400 josef.pktd at gmail.com wrote: >> On Sun, May 24, 2009 at 4:33 PM, Joe Harrington wrote: >> > I hate to ask for another function in numpy, but there's an obvious >> > one missing in the financial group: xirr. ?It could be done as a new >> > function or as an extension to the existing np.irr. >> > >> > The internal rate of return (np.irr) is defined as the growth rate >> > that would give you a zero balance at the end of a period of >> > investment given a series of cash flows into or out of the investment >> > at regular intervals (the first and last cash flows are usually an >> > initial deposit and a withdrawal of the current balance). >> > >> > This is useful in academics, but if you're tracking a real investment, >> > you don't just withdraw or add money on a perfectly annual basis, nor >> > do you want a calc with thousands of days of zero entries just so you >> > can handle the uneven intervals by evening them out. ?Both excel and >> > openoffice define a "xirr" function that pairs each cash flow with a >> > date. ?Would there be an objection to either a xirr or adding an >> > optional second arg (or a keyword arg) to np.irr in numpy? ?Who writes >> > the code is a different question, but that part isn't hard. >> > >> >> >> >> 3 comments: >> >> * open office has also the other function in an x??? version, so it >> might be good to add it consistently to all functions >> >> * date type: scikits.timeseries and the gsoc for implementing a date >> type would be useful to have a clear date type, or would you want to >> base it only on python standard library >> >> * real life accuracy: given that there are large differences in the >> definition of a year for financial calculations, any simple >> implementation would be only approximately accurate. for example in >> the open office help, oddlyield list the following option >> >> Basis is chosen from a list of options and indicates how the year is >> to be calculated. >> Basis Calculation >> 0 or missing US method (NASD), 12 months of 30 days each >> 1 Exact number of days in months, exact number of days in year >> 2 Exact number of days in month, year has 360 days >> 3 Exact number of days in month, year has 365 days >> 4 European method, 12 months of 30 days each >> >> So, my question: what's the purpose of the financial function in numpy? >> Currently it provides convenient functions for (approximate) interest >> calculations. >> If they get expanded to a "serious" implementation of, for example, >> the main financial functions listed in the open office help (just for >> reference) then maybe numpy is not the right location for it. >> >> I started to do something similar in matlab, and once I tried to use >> real dates instead of just counting months, the accounting rules get >> quickly very messy. >> >> Using dates as you propose would be very convenient, but the users >> shouldn't be surprised that their actual payments at the end of the >> year don't fully match up with what numpy told them. >> >> my 3cents >> >> Josef > > First point: agreed. ?I wish this community had a design review > process for numpy and scipy, so that these things could get properly > hashed out, and not just one person (even Travis) suggesting something > and everyone else saying yeah-sure-whatever. > > Does anyone on the list have the financial background to suggest what > functions "should" be included in a basic set of financial routines? > xirr is the only one I've ever used in a spreadsheet, myself. > Again, I think it depends on what exactly you want to do. While I've certainly never worked in a quant shop, I am familiar with some of the academic/CFA-type usages. On my todo list for the summer is to provide some Cookbook examples for some options pricing and yield curve models (some of which will be based on past work), so I might be in a somewhat better position to answer this later. There is always quantlib over here , which is certainly a good place to look for what *could* be included... but this is of course much too field-specific to go into Numpy or Scipy. > Other points: Yuk. ?You're right. > > When these first came up for discussion, I had a Han Solo moment > ("I've got a baaad feeling about this...") but I couldn't put my > finger on why. ?They seemed like simple and limited functions with > high utility. ?Certainly anything as open-ended as financial-industry > rules should go elsewhere (scikits, scipy, monpy, whatever). > But remember. Han shot first ;) > But, that doesn't prevent a user-supplied, floating-point time array > from going into a function in numpy. ?The rate of return would be in > units of that array. ?Functions that convert date/time in some format > (or many) and following some rule (or one of many) to such a floating > array can still go elsewhere, maintained by people who know the > definitions, if they have interest (pun intended). ?That would make > the functions in numpy much more useful without bloating them or > making them a maintenance nightmare. > This seems like a good direction, if I understand you correctly. That way the user could just supply a list of trading days or whatever for the instrument they're interested in. Anything else could be maintained elsewhere, and I think this would be an interesting project personally. Cheers, Skipper From efiring at hawaii.edu Mon May 25 13:19:50 2009 From: efiring at hawaii.edu (Eric Firing) Date: Mon, 25 May 2009 07:19:50 -1000 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: <200905250857.47279.faltet@pytables.org> References: <000e0cd2a07010d73d046a71a597@google.com> <3d375d730905241422j29b11d6dn46212a78c7ddfd56@mail.gmail.com> <4488200F-9E58-4844-88D5-08C03CDAB713@cs.toronto.edu> <200905250857.47279.faltet@pytables.org> Message-ID: <4A1AD336.6040705@hawaii.edu> Francesc Alted wrote: > A Monday 25 May 2009 00:31:43 David Warde-Farley escrigu?: >> As Robert's design document for the NPY format says, one option would >> be to implement a minimal subset of the HDF5 protocol *from scratch* >> (that would be required for saving NumPy arrays as top-level leaf >> nodes, for example). This would also sidestep any tricky licensing >> issues (I don't know what the HDF5 license is in particular, I know >> it's fairly permissive but still might not be suitable for including >> any of it in NumPy). > > The license for HDF5 is BSD-based and apparently permissive enough, as can be > seen in: > > http://www.hdfgroup.org/HDF5/doc/Copyright.html > > The problem is to select such a desired minimal protocol subset. In addition, > this implementation may require quite a bit of work (but I've never had an in- > deep look at the guts of the HDF5 library, so I may be wrong). > > Cheers, > If the aim is to come up with a method of saving numpy arrays that uses a standard protocol and does not introduce large dependencies, then could this be accomplished using netcdf instead of hdf5, specifically Roberto De Almeida's pupynere, which is already in scipy.io as netcdf.py? Or does hdf5 have essential characteristics for this purpose that netcdf lacks? Eric From albert.thuswaldner at gmail.com Mon May 25 13:39:52 2009 From: albert.thuswaldner at gmail.com (Albert Thuswaldner) Date: Mon, 25 May 2009 19:39:52 +0200 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: <4A1AD336.6040705@hawaii.edu> References: <000e0cd2a07010d73d046a71a597@google.com> <3d375d730905241422j29b11d6dn46212a78c7ddfd56@mail.gmail.com> <4488200F-9E58-4844-88D5-08C03CDAB713@cs.toronto.edu> <200905250857.47279.faltet@pytables.org> <4A1AD336.6040705@hawaii.edu> Message-ID: >From what I understand, netCFD is based on on HDF5, at least as of the version 4 release. On Mon, May 25, 2009 at 19:19, Eric Firing wrote: > Francesc Alted wrote: >> A Monday 25 May 2009 00:31:43 David Warde-Farley escrigu?: >>> As Robert's design document for the NPY format says, one option would >>> be to implement a minimal subset of the HDF5 protocol *from scratch* >>> (that would be required for saving NumPy arrays as top-level leaf >>> nodes, for example). This would also sidestep any tricky licensing >>> issues (I don't know what the HDF5 license is in particular, I know >>> it's fairly permissive but still might not be suitable for including >>> any of it in NumPy). >> >> The license for HDF5 is BSD-based and apparently permissive enough, as can be >> seen in: >> >> http://www.hdfgroup.org/HDF5/doc/Copyright.html >> >> The problem is to select such a desired minimal protocol subset. In addition, >> this implementation may require quite a bit of work (but I've never had an in- >> deep look at the guts of the HDF5 library, so I may be wrong). >> >> Cheers, >> > > If the aim is to come up with a method of saving numpy arrays that uses > a standard protocol and does not introduce large dependencies, then > could this be accomplished using netcdf instead of hdf5, specifically > Roberto De Almeida's pupynere, which is already in scipy.io as > netcdf.py? ?Or does hdf5 have essential characteristics for this purpose > that netcdf lacks? > > Eric > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From josef.pktd at gmail.com Mon May 25 13:51:38 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 25 May 2009 13:51:38 -0400 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: References: Message-ID: <1cd32cbb0905251051r7c2669ceqafaaa04373d242f5@mail.gmail.com> On Mon, May 25, 2009 at 11:50 AM, Joe Harrington wrote: > On Sun, 24 May 2009 18:14:42 -0400 josef.pktd at gmail.com wrote: >> On Sun, May 24, 2009 at 4:33 PM, Joe Harrington wrote: >> > I hate to ask for another function in numpy, but there's an obvious >> > one missing in the financial group: xirr. ?It could be done as a new >> > function or as an extension to the existing np.irr. >> > >> > The internal rate of return (np.irr) is defined as the growth rate >> > that would give you a zero balance at the end of a period of >> > investment given a series of cash flows into or out of the investment >> > at regular intervals (the first and last cash flows are usually an >> > initial deposit and a withdrawal of the current balance). >> > >> > This is useful in academics, but if you're tracking a real investment, >> > you don't just withdraw or add money on a perfectly annual basis, nor >> > do you want a calc with thousands of days of zero entries just so you >> > can handle the uneven intervals by evening them out. ?Both excel and >> > openoffice define a "xirr" function that pairs each cash flow with a >> > date. ?Would there be an objection to either a xirr or adding an >> > optional second arg (or a keyword arg) to np.irr in numpy? ?Who writes >> > the code is a different question, but that part isn't hard. >> > >> >> >> >> 3 comments: >> >> * open office has also the other function in an x??? version, so it >> might be good to add it consistently to all functions >> >> * date type: scikits.timeseries and the gsoc for implementing a date >> type would be useful to have a clear date type, or would you want to >> base it only on python standard library >> >> * real life accuracy: given that there are large differences in the >> definition of a year for financial calculations, any simple >> implementation would be only approximately accurate. for example in >> the open office help, oddlyield list the following option >> >> Basis is chosen from a list of options and indicates how the year is >> to be calculated. >> Basis Calculation >> 0 or missing US method (NASD), 12 months of 30 days each >> 1 Exact number of days in months, exact number of days in year >> 2 Exact number of days in month, year has 360 days >> 3 Exact number of days in month, year has 365 days >> 4 European method, 12 months of 30 days each >> >> So, my question: what's the purpose of the financial function in numpy? >> Currently it provides convenient functions for (approximate) interest >> calculations. >> If they get expanded to a "serious" implementation of, for example, >> the main financial functions listed in the open office help (just for >> reference) then maybe numpy is not the right location for it. >> >> I started to do something similar in matlab, and once I tried to use >> real dates instead of just counting months, the accounting rules get >> quickly very messy. >> >> Using dates as you propose would be very convenient, but the users >> shouldn't be surprised that their actual payments at the end of the >> year don't fully match up with what numpy told them. >> >> my 3cents >> >> Josef > > First point: agreed. ?I wish this community had a design review > process for numpy and scipy, so that these things could get properly > hashed out, and not just one person (even Travis) suggesting something > and everyone else saying yeah-sure-whatever. > > Does anyone on the list have the financial background to suggest what > functions "should" be included in a basic set of financial routines? > xirr is the only one I've ever used in a spreadsheet, myself. > > Other points: Yuk. ?You're right. > > When these first came up for discussion, I had a Han Solo moment > ("I've got a baaad feeling about this...") but I couldn't put my > finger on why. ?They seemed like simple and limited functions with > high utility. ?Certainly anything as open-ended as financial-industry > rules should go elsewhere (scikits, scipy, monpy, whatever). > > But, that doesn't prevent a user-supplied, floating-point time array > from going into a function in numpy. ?The rate of return would be in > units of that array. ?Functions that convert date/time in some format > (or many) and following some rule (or one of many) to such a floating > array can still go elsewhere, maintained by people who know the > definitions, if they have interest (pun intended). ?That would make > the functions in numpy much more useful without bloating them or > making them a maintenance nightmare. > If you think of time just as a regularly spaced, e.g. days, but with sparse points on it, or as a continuous variable, then extending the current functions should be relatively easy. I guess the only questions are compounding, annual, quarterly or at each payment, and whether the annual rate is calculated as real compounded annualized rate or as accounting annual rate, e.g. quarterlyrate*4. This leaves "What is the present value, if you get 100 Dollars at the 10th day of each month (or at the next working day if the 10th day is a holiday or a weekend) for the next 5 years and the monthly interest rate is 5/12%?" for another day. Initially I understood you wanted the date as a string or date type as in e.g open office. What would be the units of the user-supplied, floating-point time array? It is still necessary to know the time units to provide an annualized rate, unless the rate is in continuous time, exp(r*t). I don't know whether this would apply to all functions in numpy.finance, it's a while since I looked at the code. Maybe there are some standard simplifications in open office or excel. I briefly skimmed the list of function in the open office help, and it would be useful to have them available, e.g. as a package in scipy. But my google searches in the past for applications in finance with a compatible license didn't provide much useful code that could form the basis of a finance package. Adding more convenience and functionality to numpy.finance is useful, but if they get extended with slow feature creep, then another location (scipy) might be more appropriate and would be more expandable, even if it happens only slowly. That's just my opinion (obviously), I'm a relative newbie to numpy/scipy and still working my way through all the different subpackages. Josef From efiring at hawaii.edu Mon May 25 13:55:28 2009 From: efiring at hawaii.edu (Eric Firing) Date: Mon, 25 May 2009 07:55:28 -1000 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: References: <000e0cd2a07010d73d046a71a597@google.com> <3d375d730905241422j29b11d6dn46212a78c7ddfd56@mail.gmail.com> <4488200F-9E58-4844-88D5-08C03CDAB713@cs.toronto.edu> <200905250857.47279.faltet@pytables.org> <4A1AD336.6040705@hawaii.edu> Message-ID: <4A1ADB90.5020509@hawaii.edu> Albert Thuswaldner wrote: >>From what I understand, netCFD is based on on HDF5, at least as of the > version 4 release. Netcdf4 is indeed built on hdf5, but netcdf3 is not, and netcdf3 format is likely to stick around for a *very* long time. The netcdf4 library is backwards-compatible with netcdf3. Eric > > On Mon, May 25, 2009 at 19:19, Eric Firing wrote: >> Francesc Alted wrote: >>> A Monday 25 May 2009 00:31:43 David Warde-Farley escrigu?: >>>> As Robert's design document for the NPY format says, one option would >>>> be to implement a minimal subset of the HDF5 protocol *from scratch* >>>> (that would be required for saving NumPy arrays as top-level leaf >>>> nodes, for example). This would also sidestep any tricky licensing >>>> issues (I don't know what the HDF5 license is in particular, I know >>>> it's fairly permissive but still might not be suitable for including >>>> any of it in NumPy). >>> The license for HDF5 is BSD-based and apparently permissive enough, as can be >>> seen in: >>> >>> http://www.hdfgroup.org/HDF5/doc/Copyright.html >>> >>> The problem is to select such a desired minimal protocol subset. In addition, >>> this implementation may require quite a bit of work (but I've never had an in- >>> deep look at the guts of the HDF5 library, so I may be wrong). >>> >>> Cheers, >>> >> If the aim is to come up with a method of saving numpy arrays that uses >> a standard protocol and does not introduce large dependencies, then >> could this be accomplished using netcdf instead of hdf5, specifically >> Roberto De Almeida's pupynere, which is already in scipy.io as >> netcdf.py? Or does hdf5 have essential characteristics for this purpose >> that netcdf lacks? >> >> Eric >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From neilcrighton at gmail.com Mon May 25 13:57:56 2009 From: neilcrighton at gmail.com (Neil Crighton) Date: Mon, 25 May 2009 17:57:56 +0000 (UTC) Subject: [Numpy-discussion] List/location of consecutive integers References: Message-ID: Andrea Gavana gmail.com> writes: > this should be a very easy question but I am trying to make a > script run as fast as possible, so please bear with me if the solution > is easy and I just overlooked it. That's weird, I was trying to solve exactly the same problem a couple of weeks ago for a program I was working on. It must come up a lot. I ended up with a similar solution to Josef's, but it took me more than an hour to work it out - I should have asked here first! Neil From faltet at pytables.org Mon May 25 14:05:16 2009 From: faltet at pytables.org (Francesc Alted) Date: Mon, 25 May 2009 20:05:16 +0200 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: <4A1ADB90.5020509@hawaii.edu> References: <000e0cd2a07010d73d046a71a597@google.com> <4A1ADB90.5020509@hawaii.edu> Message-ID: <200905252005.17149.faltet@pytables.org> A Monday 25 May 2009 19:55:28 Eric Firing escrigu?: > >> If the aim is to come up with a method of saving numpy arrays that uses > >> a standard protocol and does not introduce large dependencies, then > >> could this be accomplished using netcdf instead of hdf5, specifically > >> Roberto De Almeida's pupynere, which is already in scipy.io as > >> netcdf.py? Or does hdf5 have essential characteristics for this purpose > >> that netcdf lacks? After looking a bit at the code of pupynere, there is the next line: assert magic == 'CDF', "Error: %s is not a valid NetCDF 3 file" % self.filename So, the current version of pupynere is definitely for version 3 of NetCDF, not version 4. > >>From what I understand, netCFD is based on on HDF5, at least as of the > > > > version 4 release. > > Netcdf4 is indeed built on hdf5, but netcdf3 is not, and netcdf3 format > is likely to stick around for a *very* long time. The netcdf4 library > is backwards-compatible with netcdf3. NetCDF4 is backwards-compatible with NetCDF3 just at API level, not the file format. NetCDF3 has a much more simple format, and completely different from NetCDF4, which is based on HDF5. Cheers, -- Francesc Alted From josef.pktd at gmail.com Mon May 25 14:13:53 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 25 May 2009 14:13:53 -0400 Subject: [Numpy-discussion] List/location of consecutive integers In-Reply-To: References: Message-ID: <1cd32cbb0905251113q649ab0f8i40c918907497e928@mail.gmail.com> On Mon, May 25, 2009 at 1:57 PM, Neil Crighton wrote: > Andrea Gavana gmail.com> writes: > >> ? ? this should be a very easy question but I am trying to make a >> script run as fast as possible, so please bear with me if the solution >> is easy and I just overlooked it. > > That's weird, I was trying to solve exactly the same problem a couple of weeks > ago for a program I was working on. It must come up a lot. > > I ended up with a similar solution to Josef's, but it took me more than an hour > to work it out - I should have asked here first! Actually, I got the hint for something similar from Ray Jones earlier in the discussion on the ndimage.measurement rewrite (find min, max per label). It took me also more time to get the solution to work correctly in the first place. And reading the (python part of the) numpy source can be very instructive. Josef From efiring at hawaii.edu Mon May 25 14:29:25 2009 From: efiring at hawaii.edu (Eric Firing) Date: Mon, 25 May 2009 08:29:25 -1000 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: <200905252005.17149.faltet@pytables.org> References: <000e0cd2a07010d73d046a71a597@google.com> <4A1ADB90.5020509@hawaii.edu> <200905252005.17149.faltet@pytables.org> Message-ID: <4A1AE385.60908@hawaii.edu> Francesc Alted wrote: > A Monday 25 May 2009 19:55:28 Eric Firing escrigu?: >>>> If the aim is to come up with a method of saving numpy arrays that uses >>>> a standard protocol and does not introduce large dependencies, then >>>> could this be accomplished using netcdf instead of hdf5, specifically >>>> Roberto De Almeida's pupynere, which is already in scipy.io as >>>> netcdf.py? Or does hdf5 have essential characteristics for this purpose >>>> that netcdf lacks? > > After looking a bit at the code of pupynere, there is the next line: > > assert magic == 'CDF', "Error: %s is not a valid NetCDF 3 file" % > self.filename > > So, the current version of pupynere is definitely for version 3 of NetCDF, not > version 4. Yes, and I presume it will stay that way--which is fine for the question I am asking above. I should have said "netcdf3" explicitly. Its simplicity compared to hdf5 and netcdf4 is potentially a virtue. The question is, is it *too* simple for the intended purpose? > >>> >From what I understand, netCFD is based on on HDF5, at least as of the >>> >>> version 4 release. >> Netcdf4 is indeed built on hdf5, but netcdf3 is not, and netcdf3 format >> is likely to stick around for a *very* long time. The netcdf4 library >> is backwards-compatible with netcdf3. > > NetCDF4 is backwards-compatible with NetCDF3 just at API level, not the file > format. NetCDF3 has a much more simple format, and completely different from > NetCDF4, which is based on HDF5. Yes, but the netcdf4 *library* includes full netcdf3 compatibility; you can read and write netcdf3 using the netcdf4 library. For example, you can build Jeff Whitaker's http://code.google.com/p/netcdf4-python/ with all the hdf5 bells and whistles, and it will still happily read and, upon request, write netcdf3 files. Eric > > Cheers, > From faltet at pytables.org Mon May 25 15:17:50 2009 From: faltet at pytables.org (Francesc Alted) Date: Mon, 25 May 2009 21:17:50 +0200 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: <4A1AE385.60908@hawaii.edu> References: <000e0cd2a07010d73d046a71a597@google.com> <200905252005.17149.faltet@pytables.org> <4A1AE385.60908@hawaii.edu> Message-ID: <200905252117.50796.faltet@pytables.org> A Monday 25 May 2009 20:29:25 Eric Firing escrigu?: > Francesc Alted wrote: > > A Monday 25 May 2009 19:55:28 Eric Firing escrigu?: > >>>> If the aim is to come up with a method of saving numpy arrays that > >>>> uses a standard protocol and does not introduce large dependencies, > >>>> then could this be accomplished using netcdf instead of hdf5, > >>>> specifically Roberto De Almeida's pupynere, which is already in > >>>> scipy.io as netcdf.py? Or does hdf5 have essential characteristics > >>>> for this purpose that netcdf lacks? > > > > After looking a bit at the code of pupynere, there is the next line: > > > > assert magic == 'CDF', "Error: %s is not a valid NetCDF 3 file" % > > self.filename > > > > So, the current version of pupynere is definitely for version 3 of > > NetCDF, not version 4. > > Yes, and I presume it will stay that way--which is fine for the question > I am asking above. I should have said "netcdf3" explicitly. Its > simplicity compared to hdf5 and netcdf4 is potentially a virtue. > > The question is, is it *too* simple for the intended purpose? I don't think the question is whether a format would be too simple or not, but rather about file compatibility. In that sense HDF5 is emerging as a standard de facto, and many tools are acquiring the capability to read/write this format (e.g. Matlab, IDL, Octave, Mathematica, R, NetCDF4-based apps and many others). Having this interchange capability is what should be seen as desirable, IMO. > > >>> >From what I understand, netCFD is based on on HDF5, at least as of the > >>> > >>> version 4 release. > >> > >> Netcdf4 is indeed built on hdf5, but netcdf3 is not, and netcdf3 format > >> is likely to stick around for a *very* long time. The netcdf4 library > >> is backwards-compatible with netcdf3. > > > > NetCDF4 is backwards-compatible with NetCDF3 just at API level, not the > > file format. NetCDF3 has a much more simple format, and completely > > different from NetCDF4, which is based on HDF5. > > Yes, but the netcdf4 *library* includes full netcdf3 compatibility; you > can read and write netcdf3 using the netcdf4 library. For example, you > can build Jeff Whitaker's http://code.google.com/p/netcdf4-python/ with > all the hdf5 bells and whistles, and it will still happily read and, > upon request, write netcdf3 files. Again, I think that the issue is compatibility with other tools, not just between NetCDF3/NetCDF4 worlds. -- Francesc Alted From jh at physics.ucf.edu Mon May 25 15:40:09 2009 From: jh at physics.ucf.edu (Joe Harrington) Date: Mon, 25 May 2009 15:40:09 -0400 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: (numpy-discussion-request@scipy.org) References: Message-ID: On Mon, 25 May 2009 13:51:38 -0400, josef.pktd at gmail.com wrote: > On Mon, May 25, 2009 at 11:50 AM, Joe Harrington wrote: > > On Sun, 24 May 2009 18:14:42 -0400 josef.pktd at gmail.com wrote: > >> On Sun, May 24, 2009 at 4:33 PM, Joe Harrington wrote: > >> > I hate to ask for another function in numpy, but there's an obvious > >> > one missing in the financial group: xirr. ?It could be done as a new > >> > function or as an extension to the existing np.irr. > >> > > >> > The internal rate of return (np.irr) is defined as the growth rate > >> > that would give you a zero balance at the end of a period of > >> > investment given a series of cash flows into or out of the investment > >> > at regular intervals (the first and last cash flows are usually an > >> > initial deposit and a withdrawal of the current balance). > >> > > >> > This is useful in academics, but if you're tracking a real investment, > >> > you don't just withdraw or add money on a perfectly annual basis, nor > >> > do you want a calc with thousands of days of zero entries just so you > >> > can handle the uneven intervals by evening them out. ?Both excel and > >> > openoffice define a "xirr" function that pairs each cash flow with a > >> > date. ?Would there be an objection to either a xirr or adding an > >> > optional second arg (or a keyword arg) to np.irr in numpy? ?Who writes > >> > the code is a different question, but that part isn't hard. > >> > > >> > >> > >> > >> 3 comments: > >> > >> * open office has also the other function in an x??? version, so it > >> might be good to add it consistently to all functions > >> > >> * date type: scikits.timeseries and the gsoc for implementing a date > >> type would be useful to have a clear date type, or would you want to > >> base it only on python standard library > >> > >> * real life accuracy: given that there are large differences in the > >> definition of a year for financial calculations, any simple > >> implementation would be only approximately accurate. for example in > >> the open office help, oddlyield list the following option > >> > >> Basis is chosen from a list of options and indicates how the year is > >> to be calculated. > >> Basis Calculation > >> 0 or missing US method (NASD), 12 months of 30 days each > >> 1 Exact number of days in months, exact number of days in year > >> 2 Exact number of days in month, year has 360 days > >> 3 Exact number of days in month, year has 365 days > >> 4 European method, 12 months of 30 days each > >> > >> So, my question: what's the purpose of the financial function in numpy? > >> Currently it provides convenient functions for (approximate) interest > >> calculations. > >> If they get expanded to a "serious" implementation of, for example, > >> the main financial functions listed in the open office help (just for > >> reference) then maybe numpy is not the right location for it. > >> > >> I started to do something similar in matlab, and once I tried to use > >> real dates instead of just counting months, the accounting rules get > >> quickly very messy. > >> > >> Using dates as you propose would be very convenient, but the users > >> shouldn't be surprised that their actual payments at the end of the > >> year don't fully match up with what numpy told them. > >> > >> my 3cents > >> > >> Josef > > > > First point: agreed. ?I wish this community had a design review > > process for numpy and scipy, so that these things could get properly > > hashed out, and not just one person (even Travis) suggesting something > > and everyone else saying yeah-sure-whatever. > > > > Does anyone on the list have the financial background to suggest what > > functions "should" be included in a basic set of financial routines? > > xirr is the only one I've ever used in a spreadsheet, myself. > > > > Other points: Yuk. ?You're right. > > > > When these first came up for discussion, I had a Han Solo moment > > ("I've got a baaad feeling about this...") but I couldn't put my > > finger on why. ?They seemed like simple and limited functions with > > high utility. ?Certainly anything as open-ended as financial-industry > > rules should go elsewhere (scikits, scipy, monpy, whatever). > > > > But, that doesn't prevent a user-supplied, floating-point time array > > from going into a function in numpy. ?The rate of return would be in > > units of that array. ?Functions that convert date/time in some format > > (or many) and following some rule (or one of many) to such a floating > > array can still go elsewhere, maintained by people who know the > > definitions, if they have interest (pun intended). ?That would make > > the functions in numpy much more useful without bloating them or > > making them a maintenance nightmare. > > > > If you think of time just as a regularly spaced, e.g. days, but with > sparse points on it, or as a continuous variable, then extending the > current functions should be relatively easy. I guess the only > questions are compounding, annual, quarterly or at each payment, and > whether the annual rate is calculated as real compounded annualized > rate or as accounting annual rate, e.g. quarterlyrate*4. > > This leaves "What is the present value, if you get 100 Dollars at the > 10th day of each month (or at the next working day if the 10th day is > a holiday or a weekend) for the next 5 years and the monthly interest > rate is 5/12%?" for another day. > > Initially I understood you wanted the date as a string or date type as > in e.g open office. What would be the units of the user-supplied, > floating-point time array? > It is still necessary to know the time units to provide an annualized > rate, unless the rate is in continuous time, exp(r*t). I don't know > whether this would apply to all functions in numpy.finance, it's a > while since I looked at the code. Maybe there are some standard > simplifications in open office or excel. > > I briefly skimmed the list of function in the open office help, and it > would be useful to have them available, e.g. as a package in scipy. > But my google searches in the past for applications in finance with a > compatible license didn't provide much useful code that could form the > basis of a finance package. > > Adding more convenience and functionality to numpy.finance is useful, > but if they get extended with slow feature creep, then another > location (scipy) might be more appropriate and would be more > expandable, even if it happens only slowly. > > That's just my opinion (obviously), I'm a relative newbie to > numpy/scipy and still working my way through all the different > subpackages. np.irr is defined on (anonymous) constant time intervals and gives you the growth per time interval. The code is very short, basically a call to np.roots(values): def irr(values): """ Return the Internal Rate of Return (IRR). This is the rate of return that gives a net present value of 0.0. Parameters ---------- values : array_like, shape(N,) Input cash flows per time period. At least the first value would be negative to represent the investment in the project. Returns ------- out : float Internal Rate of Return for periodic input values. Examples -------- >>> np.irr([-100, 39, 59, 55, 20]) 0.2809484211599611 """ res = np.roots(values[::-1]) # Find the root(s) between 0 and 1 mask = (res.imag == 0) & (res.real > 0) & (res.real <= 1) res = res[mask].real if res.size == 0: return np.nan rate = 1.0/res - 1 if rate.size == 1: rate = rate.item() return rate So, I think this is a continuous definition of growth, not some periodic compounding. I'd propose the time array would be in anonymous units, and the result would be in terms of those units. For example, if an interval of 1.0 in the time array were one fortnight, it would give interest in units of continuous growth per fortnight, etc. Anything with many more options than that does not belong in numpy (but it would be interesting to have elsewhere). --jh-- From jsseabold at gmail.com Mon May 25 16:27:49 2009 From: jsseabold at gmail.com (Skipper Seabold) Date: Mon, 25 May 2009 16:27:49 -0400 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: References: Message-ID: On Mon, May 25, 2009 at 3:40 PM, Joe Harrington wrote: > On Mon, 25 May 2009 13:51:38 -0400, josef.pktd at gmail.com wrote: >> On Mon, May 25, 2009 at 11:50 AM, Joe Harrington wrote: >> > On Sun, 24 May 2009 18:14:42 -0400 josef.pktd at gmail.com wrote: >> >> On Sun, May 24, 2009 at 4:33 PM, Joe Harrington wrote: >> >> > I hate to ask for another function in numpy, but there's an obvious >> >> > one missing in the financial group: xirr. ?It could be done as a new >> >> > function or as an extension to the existing np.irr. >> >> > >> >> > The internal rate of return (np.irr) is defined as the growth rate >> >> > that would give you a zero balance at the end of a period of >> >> > investment given a series of cash flows into or out of the investment >> >> > at regular intervals (the first and last cash flows are usually an >> >> > initial deposit and a withdrawal of the current balance). >> >> > >> >> > This is useful in academics, but if you're tracking a real investment, >> >> > you don't just withdraw or add money on a perfectly annual basis, nor >> >> > do you want a calc with thousands of days of zero entries just so you >> >> > can handle the uneven intervals by evening them out. ?Both excel and >> >> > openoffice define a "xirr" function that pairs each cash flow with a >> >> > date. ?Would there be an objection to either a xirr or adding an >> >> > optional second arg (or a keyword arg) to np.irr in numpy? ?Who writes >> >> > the code is a different question, but that part isn't hard. >> >> > >> >> >> >> >> >> >> >> 3 comments: >> >> >> >> * open office has also the other function in an x??? version, so it >> >> might be good to add it consistently to all functions >> >> >> >> * date type: scikits.timeseries and the gsoc for implementing a date >> >> type would be useful to have a clear date type, or would you want to >> >> base it only on python standard library >> >> >> >> * real life accuracy: given that there are large differences in the >> >> definition of a year for financial calculations, any simple >> >> implementation would be only approximately accurate. for example in >> >> the open office help, oddlyield list the following option >> >> >> >> Basis is chosen from a list of options and indicates how the year is >> >> to be calculated. >> >> Basis Calculation >> >> 0 or missing US method (NASD), 12 months of 30 days each >> >> 1 Exact number of days in months, exact number of days in year >> >> 2 Exact number of days in month, year has 360 days >> >> 3 Exact number of days in month, year has 365 days >> >> 4 European method, 12 months of 30 days each >> >> >> >> So, my question: what's the purpose of the financial function in numpy? >> >> Currently it provides convenient functions for (approximate) interest >> >> calculations. >> >> If they get expanded to a "serious" implementation of, for example, >> >> the main financial functions listed in the open office help (just for >> >> reference) then maybe numpy is not the right location for it. >> >> >> >> I started to do something similar in matlab, and once I tried to use >> >> real dates instead of just counting months, the accounting rules get >> >> quickly very messy. >> >> >> >> Using dates as you propose would be very convenient, but the users >> >> shouldn't be surprised that their actual payments at the end of the >> >> year don't fully match up with what numpy told them. >> >> >> >> my 3cents >> >> >> >> Josef >> > >> > First point: agreed. ?I wish this community had a design review >> > process for numpy and scipy, so that these things could get properly >> > hashed out, and not just one person (even Travis) suggesting something >> > and everyone else saying yeah-sure-whatever. >> > >> > Does anyone on the list have the financial background to suggest what >> > functions "should" be included in a basic set of financial routines? >> > xirr is the only one I've ever used in a spreadsheet, myself. >> > >> > Other points: Yuk. ?You're right. >> > >> > When these first came up for discussion, I had a Han Solo moment >> > ("I've got a baaad feeling about this...") but I couldn't put my >> > finger on why. ?They seemed like simple and limited functions with >> > high utility. ?Certainly anything as open-ended as financial-industry >> > rules should go elsewhere (scikits, scipy, monpy, whatever). >> > >> > But, that doesn't prevent a user-supplied, floating-point time array >> > from going into a function in numpy. ?The rate of return would be in >> > units of that array. ?Functions that convert date/time in some format >> > (or many) and following some rule (or one of many) to such a floating >> > array can still go elsewhere, maintained by people who know the >> > definitions, if they have interest (pun intended). ?That would make >> > the functions in numpy much more useful without bloating them or >> > making them a maintenance nightmare. >> > >> >> If you think of time just as a regularly spaced, e.g. days, but with >> sparse points on it, or as a continuous variable, then extending the >> current functions should be relatively easy. I guess the only >> questions are compounding, annual, quarterly or at each payment, and >> whether the annual rate is calculated as real compounded annualized >> rate or as accounting annual rate, e.g. quarterlyrate*4. >> >> This leaves "What is the present value, if you get 100 Dollars at the >> 10th day of each month (or at the next working day if the 10th day is >> a holiday or a weekend) for the next 5 years and the monthly interest >> rate is 5/12%?" ? for another day. >> >> Initially I understood you wanted the date as a string or date type as >> in e.g open office. What would be the units of the user-supplied, >> floating-point time array? >> It is still necessary to know the time units to provide an annualized >> rate, unless the rate is in continuous time, exp(r*t). I don't know >> whether this would apply to all functions in numpy.finance, it's a >> while since I looked at the code. Maybe there are some standard >> simplifications in open office or excel. >> >> I briefly skimmed the list of function in the open office help, and it >> would be useful to have them available, e.g. as a package in scipy. >> But my google searches in the past for applications in finance with a >> compatible license didn't provide much useful code that could form the >> basis of a finance package. >> >> Adding more convenience and functionality to numpy.finance is useful, >> but if they get extended with slow feature creep, then another >> location (scipy) might be more appropriate and would be more >> expandable, even if it happens only slowly. >> >> That's just my opinion (obviously), I'm a relative newbie to >> numpy/scipy and still working my way through all the different >> subpackages. > > np.irr is defined on (anonymous) constant time intervals and gives you > the growth per time interval. ?The code is very short, basically a > call to np.roots(values): > > def irr(values): > ? ?""" > ? ?Return the Internal Rate of Return (IRR). > > ? ?This is the rate of return that gives a net present value of 0.0. > > ? ?Parameters > ? ?---------- > ? ?values : array_like, shape(N,) > ? ? ? ?Input cash flows per time period. ?At least the first value would be > ? ? ? ?negative to represent the investment in the project. > > ? ?Returns > ? ?------- > ? ?out : float > ? ? ? ?Internal Rate of Return for periodic input values. > > ? ?Examples > ? ?-------- > ? ?>>> np.irr([-100, 39, 59, 55, 20]) > ? ?0.2809484211599611 > > ? ?""" > ? ?res = np.roots(values[::-1]) > ? ?# Find the root(s) between 0 and 1 > ? ?mask = (res.imag == 0) & (res.real > 0) & (res.real <= 1) > ? ?res = res[mask].real > ? ?if res.size == 0: > ? ? ? ?return np.nan > ? ?rate = 1.0/res - 1 > ? ?if rate.size == 1: > ? ? ? ?rate = rate.item() > ? ?return rate > > So, I think this is a continuous definition of growth, not some > periodic compounding. > > I'd propose the time array would be in anonymous units, and the result > would be in terms of those units. ?For example, if an interval of 1.0 > in the time array were one fortnight, it would give interest in units > of continuous growth per fortnight, etc. ?Anything with many more > options than that does not belong in numpy (but it would be > interesting to have elsewhere). > Here is my stab at xirr. It depends on the python datetime module and the Newton - Raphson algorithm in scipy.optimize, but it could be taken as a starting point if someone wants to get rid of the dependencies (I haven't worked too much with dates or NR before). The reference for the open office version is here , and it performs in exactly the same way (assumes 365 days a year). It also doesn't take a 'begin' or 'end' argument for when the payments are made. but this is already in the numpy.financial and could be added easily. def _discf(rate, pmts, dates): import numpy as np dcf=[] for i,cf in enumerate(pmts): d=dates[i]-dates[0] dcf.append(cf*(1+rate)**(-d.days/365.)) return np.add.reduce(dcf) def xirr(pmts, dates, guess=.10): ''' IRR function that accepts irregularly spaced cash flows Parameters ---------- values: array_like Contains the cash flows including the initial investment dates: array_like Contains the dates of payments as in the form (year, month, day) Returns: Float Internal Rate of Return Notes ---------- In general the xirr is the solution to .. math:: \sum_{t=0}^M{\frac{v_t}{(1+xirr)^{(date_t-date_0)/365}}} = 0 Examples -------------- dates=[[2008,2,5],[2008,7,5],[2009,1,5]] pmts=[-2750,1000,2000] print xirr(pmts,dates) ''' from datetime import date from scipy.optimize import newton for i,dt in enumerate(dates): dates[i]=date(*dt) f = lambda x: _discf(x, pmts, dates) return newton(f, guess) if __name__=="__main__": dates=[[2008,2,5],[2008,7,5],[2009,1,5]] pmts=[-2750,1000,2000] print xirr(pmts,dates) Cheers, Skipper From josef.pktd at gmail.com Mon May 25 18:29:55 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 25 May 2009 18:29:55 -0400 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: References: Message-ID: <1cd32cbb0905251529y4b042216he726dcbf2982d29d@mail.gmail.com> On Mon, May 25, 2009 at 4:27 PM, Skipper Seabold wrote: > On Mon, May 25, 2009 at 3:40 PM, Joe Harrington wrote: >> On Mon, 25 May 2009 13:51:38 -0400, josef.pktd at gmail.com wrote: >>> On Mon, May 25, 2009 at 11:50 AM, Joe Harrington wrote: >>> > On Sun, 24 May 2009 18:14:42 -0400 josef.pktd at gmail.com wrote: >>> >> On Sun, May 24, 2009 at 4:33 PM, Joe Harrington wrote: >>> >> > I hate to ask for another function in numpy, but there's an obvious >>> >> > one missing in the financial group: xirr. ?It could be done as a new >>> >> > function or as an extension to the existing np.irr. >>> >> > >>> >> > The internal rate of return (np.irr) is defined as the growth rate >>> >> > that would give you a zero balance at the end of a period of >>> >> > investment given a series of cash flows into or out of the investment >>> >> > at regular intervals (the first and last cash flows are usually an >>> >> > initial deposit and a withdrawal of the current balance). >>> >> > >>> >> > This is useful in academics, but if you're tracking a real investment, >>> >> > you don't just withdraw or add money on a perfectly annual basis, nor >>> >> > do you want a calc with thousands of days of zero entries just so you >>> >> > can handle the uneven intervals by evening them out. ?Both excel and >>> >> > openoffice define a "xirr" function that pairs each cash flow with a >>> >> > date. ?Would there be an objection to either a xirr or adding an >>> >> > optional second arg (or a keyword arg) to np.irr in numpy? ?Who writes >>> >> > the code is a different question, but that part isn't hard. >>> >> > >>> >> >>> >> >>> >> >>> >> 3 comments: >>> >> >>> >> * open office has also the other function in an x??? version, so it >>> >> might be good to add it consistently to all functions >>> >> >>> >> * date type: scikits.timeseries and the gsoc for implementing a date >>> >> type would be useful to have a clear date type, or would you want to >>> >> base it only on python standard library >>> >> >>> >> * real life accuracy: given that there are large differences in the >>> >> definition of a year for financial calculations, any simple >>> >> implementation would be only approximately accurate. for example in >>> >> the open office help, oddlyield list the following option >>> >> >>> >> Basis is chosen from a list of options and indicates how the year is >>> >> to be calculated. >>> >> Basis Calculation >>> >> 0 or missing US method (NASD), 12 months of 30 days each >>> >> 1 Exact number of days in months, exact number of days in year >>> >> 2 Exact number of days in month, year has 360 days >>> >> 3 Exact number of days in month, year has 365 days >>> >> 4 European method, 12 months of 30 days each >>> >> >>> >> So, my question: what's the purpose of the financial function in numpy? >>> >> Currently it provides convenient functions for (approximate) interest >>> >> calculations. >>> >> If they get expanded to a "serious" implementation of, for example, >>> >> the main financial functions listed in the open office help (just for >>> >> reference) then maybe numpy is not the right location for it. >>> >> >>> >> I started to do something similar in matlab, and once I tried to use >>> >> real dates instead of just counting months, the accounting rules get >>> >> quickly very messy. >>> >> >>> >> Using dates as you propose would be very convenient, but the users >>> >> shouldn't be surprised that their actual payments at the end of the >>> >> year don't fully match up with what numpy told them. >>> >> >>> >> my 3cents >>> >> >>> >> Josef >>> > >>> > First point: agreed. ?I wish this community had a design review >>> > process for numpy and scipy, so that these things could get properly >>> > hashed out, and not just one person (even Travis) suggesting something >>> > and everyone else saying yeah-sure-whatever. >>> > >>> > Does anyone on the list have the financial background to suggest what >>> > functions "should" be included in a basic set of financial routines? >>> > xirr is the only one I've ever used in a spreadsheet, myself. >>> > >>> > Other points: Yuk. ?You're right. >>> > >>> > When these first came up for discussion, I had a Han Solo moment >>> > ("I've got a baaad feeling about this...") but I couldn't put my >>> > finger on why. ?They seemed like simple and limited functions with >>> > high utility. ?Certainly anything as open-ended as financial-industry >>> > rules should go elsewhere (scikits, scipy, monpy, whatever). >>> > >>> > But, that doesn't prevent a user-supplied, floating-point time array >>> > from going into a function in numpy. ?The rate of return would be in >>> > units of that array. ?Functions that convert date/time in some format >>> > (or many) and following some rule (or one of many) to such a floating >>> > array can still go elsewhere, maintained by people who know the >>> > definitions, if they have interest (pun intended). ?That would make >>> > the functions in numpy much more useful without bloating them or >>> > making them a maintenance nightmare. >>> > >>> >>> If you think of time just as a regularly spaced, e.g. days, but with >>> sparse points on it, or as a continuous variable, then extending the >>> current functions should be relatively easy. I guess the only >>> questions are compounding, annual, quarterly or at each payment, and >>> whether the annual rate is calculated as real compounded annualized >>> rate or as accounting annual rate, e.g. quarterlyrate*4. >>> >>> This leaves "What is the present value, if you get 100 Dollars at the >>> 10th day of each month (or at the next working day if the 10th day is >>> a holiday or a weekend) for the next 5 years and the monthly interest >>> rate is 5/12%?" ? for another day. >>> >>> Initially I understood you wanted the date as a string or date type as >>> in e.g open office. What would be the units of the user-supplied, >>> floating-point time array? >>> It is still necessary to know the time units to provide an annualized >>> rate, unless the rate is in continuous time, exp(r*t). I don't know >>> whether this would apply to all functions in numpy.finance, it's a >>> while since I looked at the code. Maybe there are some standard >>> simplifications in open office or excel. >>> >>> I briefly skimmed the list of function in the open office help, and it >>> would be useful to have them available, e.g. as a package in scipy. >>> But my google searches in the past for applications in finance with a >>> compatible license didn't provide much useful code that could form the >>> basis of a finance package. >>> >>> Adding more convenience and functionality to numpy.finance is useful, >>> but if they get extended with slow feature creep, then another >>> location (scipy) might be more appropriate and would be more >>> expandable, even if it happens only slowly. >>> >>> That's just my opinion (obviously), I'm a relative newbie to >>> numpy/scipy and still working my way through all the different >>> subpackages. >> >> np.irr is defined on (anonymous) constant time intervals and gives you >> the growth per time interval. ?The code is very short, basically a >> call to np.roots(values): >> >> def irr(values): >> ? ?""" >> ? ?Return the Internal Rate of Return (IRR). >> >> ? ?This is the rate of return that gives a net present value of 0.0. >> >> ? ?Parameters >> ? ?---------- >> ? ?values : array_like, shape(N,) >> ? ? ? ?Input cash flows per time period. ?At least the first value would be >> ? ? ? ?negative to represent the investment in the project. >> >> ? ?Returns >> ? ?------- >> ? ?out : float >> ? ? ? ?Internal Rate of Return for periodic input values. >> >> ? ?Examples >> ? ?-------- >> ? ?>>> np.irr([-100, 39, 59, 55, 20]) >> ? ?0.2809484211599611 >> >> ? ?""" >> ? ?res = np.roots(values[::-1]) >> ? ?# Find the root(s) between 0 and 1 >> ? ?mask = (res.imag == 0) & (res.real > 0) & (res.real <= 1) >> ? ?res = res[mask].real >> ? ?if res.size == 0: >> ? ? ? ?return np.nan >> ? ?rate = 1.0/res - 1 >> ? ?if rate.size == 1: >> ? ? ? ?rate = rate.item() >> ? ?return rate >> >> So, I think this is a continuous definition of growth, not some >> periodic compounding. >> >> I'd propose the time array would be in anonymous units, and the result >> would be in terms of those units. ?For example, if an interval of 1.0 >> in the time array were one fortnight, it would give interest in units >> of continuous growth per fortnight, etc. ?Anything with many more >> options than that does not belong in numpy (but it would be >> interesting to have elsewhere). >> > > Here is my stab at xirr. ?It depends on the python datetime module and > the Newton - Raphson algorithm in scipy.optimize, but it could be > taken as a starting point if someone wants to get rid of the > dependencies (I haven't worked too much with dates or NR before). ?The > reference for the open office version is here > , > and it performs in exactly the same way (assumes 365 days a year). ?It > also doesn't take a 'begin' or 'end' argument for when the payments > are made. but this is already in the numpy.financial and could be > added easily. > > def _discf(rate, pmts, dates): > ? ?import numpy as np > ? ?dcf=[] > ? ?for i,cf in enumerate(pmts): > ? ? ? ?d=dates[i]-dates[0] > ? ? ? ?dcf.append(cf*(1+rate)**(-d.days/365.)) > ? ?return np.add.reduce(dcf) > > def xirr(pmts, dates, guess=.10): > ? ?''' > ? ?IRR function that accepts irregularly spaced cash flows > > ? ?Parameters > ? ?---------- > ? ?values: array_like > ? ? ? ? ?Contains the cash flows including the initial investment > ? ?dates: array_like > ? ? ? ? ?Contains the dates of payments as in the form (year, month, day) > > ? ?Returns: Float > ? ? ? ? ?Internal Rate of Return > > ? ?Notes > ? ?---------- > ? ?In general the xirr is the solution to > > ? ?.. math:: \sum_{t=0}^M{\frac{v_t}{(1+xirr)^{(date_t-date_0)/365}}} = 0 > > > ? ?Examples > ? ?-------------- > ? ?dates=[[2008,2,5],[2008,7,5],[2009,1,5]] > ? ?pmts=[-2750,1000,2000] > ? ?print xirr(pmts,dates) > ? ?''' > ? ?from datetime import date > ? ?from scipy.optimize import newton > > ? ?for i,dt in enumerate(dates): > ? ? ? ?dates[i]=date(*dt) > > ? ?f = lambda x: _discf(x, pmts, dates) > > ? ?return newton(f, guess) > > if __name__=="__main__": > ? ?dates=[[2008,2,5],[2008,7,5],[2009,1,5]] > ? ?pmts=[-2750,1000,2000] > ? ?print xirr(pmts,dates) While I was still trying to think about the general problem, Skipper already implemented a solution. The advantage of Skippers implementation using actual dates instead of just an array of numbers is that it is possible to directly calculate the annual irr, since the time units are well specified. The only problem is the need for an equation solver in numpy. Just using a date tuple would remove the problem of string parsing, and it might be possible to extend it later to a date array. So, I think it would be possible to include Skippers solution, with some cleanup and testing, if an equation solver can be found or if np.roots can handle high order (sparse) polynomials. Below is my original message, which is based on the assumption of a date array that is just an array of numbers without any time units associated with it. Josef """ >From my reading of the current irr, you have compounding of the interest rate at the given time interval. So if your data is daily data, you would get a daily interest rate with daily compounded rates, which might not be the most interesting number that the user wants. For the iir function it would still be very easy for the user to convert the daily or monthly rate to the annualized rate, (1+r_d)**365 -1 (1+r_m)**12 -1 (?). For the implementation, would np.roots still work if you have 1000 days for example, or 360 months, or a few hundred fortnights? What would be the alternative in numpy for finding the root? equation solvers are in scipy. For arbitrary time units with possible large numbers, working with exp should be easier . In this case the exponent would be floats and not integers, so not a polynomial. I think in the continuous time version, we need to solve for r in sum(values*exp(-r*dates)) = 0 Can this be done in numpy? If dates are floats where the unit is one year, then this would give the continuously compounded annual rate, I think. Another property of the current function, that I just realized, is, that it doesn't allow for negative interest rates. This might not be a problem for the intended use, but if you look at real, i.e. inflation adjusted, interest rates then it happens often enough. Other options that might work if np.roots can handle it, would be to use integer time internally but fractional time from the user where the integer unit would be the reference period and the fractions would be for example 2/12 for the second month. I never tried this but using fractional units has a long enough tradition in finance. Or that the user optionally specifies the time units ("y" or "m" or "d") or number of periods per year (365, 12, 52, 26) """ From pgmdevlist at gmail.com Mon May 25 18:36:19 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 25 May 2009 18:36:19 -0400 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: <1cd32cbb0905251529y4b042216he726dcbf2982d29d@mail.gmail.com> References: <1cd32cbb0905251529y4b042216he726dcbf2982d29d@mail.gmail.com> Message-ID: Sorry to jump in a conversation I haven't followed too deep in details, but I'm sure you're all aware of the scikits.timeseries package by now. This should at least help you manage the dates operations in a straightforward manner. I think that could be a nice extension to the package: after all, half of the core developers is a financial analyst... From josef.pktd at gmail.com Mon May 25 19:02:09 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 25 May 2009 19:02:09 -0400 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: References: <1cd32cbb0905251529y4b042216he726dcbf2982d29d@mail.gmail.com> Message-ID: <1cd32cbb0905251602n1f8bfff3l71a8d2e4bd470769@mail.gmail.com> On Mon, May 25, 2009 at 6:36 PM, Pierre GM wrote: > Sorry to jump in a conversation I haven't followed too deep in > details, but I'm sure you're all aware of the scikits.timeseries > package by now. This should at least help you manage the dates > operations in a straightforward manner. I think that could be a nice > extension to the package: after all, half of the core developers is a > financial analyst... The problem is, if the functions are enhanced in the current numpy, then scikits.timeseries is not (yet) available. I agree that for any more extended finance package, the handling of "time"series (in calender time) should make use of scikits.timeseries (and possibly the new datetime array type.) Pierre, your not already hiding by chance any finance code in your timeseries scikit? :) Josef BTW: here are the formulas for the NotImplementedError http://wiki.services.openoffice.org/wiki/Documentation/How_Tos/Calc:_Derivation_of_Financial_Formulas#IMPT.2C_PPMT From josef.pktd at gmail.com Mon May 25 19:27:46 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 25 May 2009 19:27:46 -0400 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: <1cd32cbb0905251529y4b042216he726dcbf2982d29d@mail.gmail.com> References: <1cd32cbb0905251529y4b042216he726dcbf2982d29d@mail.gmail.com> Message-ID: <1cd32cbb0905251627t2ea151ddye095df8c775bbee@mail.gmail.com> > The advantage of Skippers implementation using actual dates instead of > just an array of numbers is that it is possible to directly calculate > the annual irr, since the time units are well specified. The only > problem is the need for an equation solver in numpy. Just using a date > tuple would remove the problem of string parsing, and it might be > possible to extend it later to a date array. > > So, I think it would be possible to include Skippers solution, with > some cleanup and testing, if an equation solver can be found or if > np.roots can handle high order (sparse) polynomials. > I looked a bit more: the current implementation of ``rate`` uses it's own iterative (Newton) solver, and in a similar way this could be done for a more general xirr. So with a bit of work this doesn't seem to be a problem and the only question that remains is the specification of the dates. Josef From pgmdevlist at gmail.com Mon May 25 19:37:33 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 25 May 2009 19:37:33 -0400 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: <1cd32cbb0905251602n1f8bfff3l71a8d2e4bd470769@mail.gmail.com> References: <1cd32cbb0905251529y4b042216he726dcbf2982d29d@mail.gmail.com> <1cd32cbb0905251602n1f8bfff3l71a8d2e4bd470769@mail.gmail.com> Message-ID: On May 25, 2009, at 7:02 PM, josef.pktd at gmail.com wrote: > On Mon, May 25, 2009 at 6:36 PM, Pierre GM > wrote: >> Sorry to jump in a conversation I haven't followed too deep in >> details, but I'm sure you're all aware of the scikits.timeseries >> package by now. This should at least help you manage the dates >> operations in a straightforward manner. I think that could be a nice >> extension to the package: after all, half of the core developers is a >> financial analyst... > > The problem is, if the functions are enhanced in the current numpy, > then scikits.timeseries is not (yet) available. Mmh, I'm not following you here... > > Pierre, your not already hiding by chance any finance code in your > timeseries scikit? :) Ah, you should ask Matt, he's the financial analyst, I'm the hydrologist... Would moving_funcs.mov_average_expw do something you'd find useful ? Anyhow, if the pb you have are just to specify dates, I really think you should give the scikits a try. And send feedback, of course... From josef.pktd at gmail.com Mon May 25 20:06:13 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 25 May 2009 20:06:13 -0400 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: References: <1cd32cbb0905251529y4b042216he726dcbf2982d29d@mail.gmail.com> <1cd32cbb0905251602n1f8bfff3l71a8d2e4bd470769@mail.gmail.com> Message-ID: <1cd32cbb0905251706j41f6be2eg21802c32412e6c1e@mail.gmail.com> On Mon, May 25, 2009 at 7:37 PM, Pierre GM wrote: > > On May 25, 2009, at 7:02 PM, josef.pktd at gmail.com wrote: > >> On Mon, May 25, 2009 at 6:36 PM, Pierre GM >> wrote: >>> Sorry to jump in a conversation I haven't followed too deep in >>> details, but I'm sure you're all aware of the scikits.timeseries >>> package by now. This should at least help you manage the dates >>> operations in a straightforward manner. I think that could be a nice >>> extension to the package: after all, half of the core developers is a >>> financial analyst... >> >> The problem is, if the functions are enhanced in the current numpy, >> then scikits.timeseries is not (yet) available. > > Mmh, I'm not following you here... The original question was how we can enhance numpy.financial, eg. np.irr So we are restricted to use only what is available in numpy and in standard python. > >> >> Pierre, your not already hiding by chance any finance code in your >> timeseries scikit? :) > > Ah, you should ask Matt, he's the financial analyst, I'm the > hydrologist... Would moving_funcs.mov_average_expw do something you'd > find useful ? I looked at your moving functions, autocorrelation function and so on a while ago. That's were I learned how to use np.correlate or the scipy versions of it, and the filter functions. I've written the standard array versions for the moving functions and acf, ccf, in one of my experiments. If Skipper has enough time in his google summer of code, we would like to include some basic timeseries econometrics (ARMA, VAR, ...?) however most likely only for regularly spaced data. > Anyhow, if the pb you have are just to specify dates, I really think > you should give the scikits a try. And send feedback, of course... Skipper intends to write some examples to show how to work with the extensions to scipy.stats, which, I think, will include examples using time series, besides recarrays, and other array types. Is there a time line for including the timeseries scikits in numpy/scipy? With code that is intended for incorporation in numpy/scipy, we are restricted in our external dependencies. Josef From pgmdevlist at gmail.com Mon May 25 20:30:10 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Mon, 25 May 2009 20:30:10 -0400 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: <1cd32cbb0905251706j41f6be2eg21802c32412e6c1e@mail.gmail.com> References: <1cd32cbb0905251529y4b042216he726dcbf2982d29d@mail.gmail.com> <1cd32cbb0905251602n1f8bfff3l71a8d2e4bd470769@mail.gmail.com> <1cd32cbb0905251706j41f6be2eg21802c32412e6c1e@mail.gmail.com> Message-ID: <2B446CC8-7960-4274-8A0E-B4E69D8C5A59@gmail.com> On May 25, 2009, at 8:06 PM, josef.pktd at gmail.com wrote: >>> >>> The problem is, if the functions are enhanced in the current numpy, >>> then scikits.timeseries is not (yet) available. >> >> Mmh, I'm not following you here... > > The original question was how we can enhance numpy.financial, eg. > np.irr > So we are restricted to use only what is available in numpy and in > standard python. Ah OK. But it seems that you're now running into a pb w/ dates handling, which might be a bit too specialized for numpy. Anyway, the call isn't mine. >> > I looked at your moving functions, autocorrelation function and so on > a while ago. That's were I learned how to use np.correlate or the > scipy versions of it, and the filter functions. I've written the > standard array versions for the moving functions and acf, ccf, in one > of my experiments. The moving functions were written in C and they work even w/ timeseries (they work quite OK w/ pure MaskedArraysP. We put them in scikits.timeseries because it was easier to have them there than in scipy, for example. > If Skipper has enough time in his google summer of code, we would like > to include some basic timeseries econometrics (ARMA, VAR, ...?) > however most likely only for regularly spaced data. Well, we can easily restrict the functions to the case were there's no missing data nor missing dates. Checking the mask is easy, and we have a method to chek the dates (is_valid) >> Anyhow, if the pb you have are just to specify dates, I really think >> you should give the scikits a try. And send feedback, of course... > > Skipper intends to write some examples to show how to work with the > extensions to scipy.stats, which, I think, will include examples using > time series, besides recarrays, and other array types. Dealing with TimeSeries is pretty much the same thing as dealing with MaskedArray, with the extra convenience of converting from one frequency to another and so forth.... Quite often, an analysis can be performed by dropping the .dates part, working on the .series part (the underlying MaskedArray), and repatching the dates at the end... > > Is there a time line for including the timeseries scikits in numpy/ > scipy? > With code that is intended for incorporation in numpy/scipy, we are > restricted in our external dependencies. I can't tell, because the decision is not mine. For what I understood, there could be an inclusion in scipy if there's a need for it. For that, we need more users end more feedback.... If you catch my drift... > Josef > From josef.pktd at gmail.com Mon May 25 20:55:04 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 25 May 2009 20:55:04 -0400 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: <1cd32cbb0905251627t2ea151ddye095df8c775bbee@mail.gmail.com> References: <1cd32cbb0905251529y4b042216he726dcbf2982d29d@mail.gmail.com> <1cd32cbb0905251627t2ea151ddye095df8c775bbee@mail.gmail.com> Message-ID: <1cd32cbb0905251755w41239410he4dd8053e898c8ea@mail.gmail.com> On Mon, May 25, 2009 at 7:27 PM, wrote: >> The advantage of Skippers implementation using actual dates instead of >> just an array of numbers is that it is possible to directly calculate >> the annual irr, since the time units are well specified. The only >> problem is the need for an equation solver in numpy. Just using a date >> tuple would remove the problem of string parsing, and it might be >> possible to extend it later to a date array. >> >> So, I think it would be possible to include Skippers solution, with >> some cleanup and testing, if an equation solver can be found or if >> np.roots can handle high order (sparse) polynomials. >> > > I looked a bit more: the current implementation of ``rate`` uses it's > own iterative (Newton) solver, and in a similar way this could be done > for a more general xirr. > > So with a bit of work this doesn't seem to be a problem and the only > question that remains is the specification of the dates. Here is a solver using the polynomial class, or is there something like this already in numpy Josef ''' Newton solver for value of a polynomial equal to zero works also for negative rate of return ''' import numpy as np nper = 30 #Number of periods freq = 5 #frequency of payment val = np.zeros(nper) val[1:nper+1:freq] = 1 # periodic payment val[0]=-4 # initial investment p = np.poly1d(val[::-1]) #print p.roots # very slow for array with 1000 periods pd1 = np.polyder(p) #print p(0.95) # net present value #print pd1(0.95) # derivative of polynomial rv = np.linspace(0.9,1.05,16) for v,i in zip(rv, p(rv)):print v,i for v,i in zip(rv, pd1(rv)):print v,i # Newton iteration r = 0.95 # starting value, find polynomial root in neighborhood for i in range(10): r = r - p(r)/pd1(r) print r, p(r) print 'interest rate irr is', 1/r - 1 From josef.pktd at gmail.com Mon May 25 21:06:48 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 25 May 2009 21:06:48 -0400 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: <2B446CC8-7960-4274-8A0E-B4E69D8C5A59@gmail.com> References: <1cd32cbb0905251529y4b042216he726dcbf2982d29d@mail.gmail.com> <1cd32cbb0905251602n1f8bfff3l71a8d2e4bd470769@mail.gmail.com> <1cd32cbb0905251706j41f6be2eg21802c32412e6c1e@mail.gmail.com> <2B446CC8-7960-4274-8A0E-B4E69D8C5A59@gmail.com> Message-ID: <1cd32cbb0905251806p6b208ac9xbee39478eb0dca67@mail.gmail.com> On Mon, May 25, 2009 at 8:30 PM, Pierre GM wrote: > > On May 25, 2009, at 8:06 PM, josef.pktd at gmail.com wrote: >>>> >>>> The problem is, if the functions are enhanced in the current numpy, >>>> then scikits.timeseries is not (yet) available. >>> >>> Mmh, I'm not following you here... >> >> The original question was how we can enhance numpy.financial, eg. >> np.irr >> So we are restricted to use only what is available in numpy and in >> standard python. > > Ah OK. But it seems that you're now running into a pb w/ dates > handling, which might be a bit too specialized for numpy. Anyway, the > call isn't mine. > >>> >> I looked at your moving functions, autocorrelation function and so on >> a while ago. That's were I learned how to use np.correlate or the >> scipy versions of it, and the filter functions. I've written the >> standard array versions for the moving functions and acf, ccf, in one >> of my experiments. > > The moving functions were written in C and they work even w/ > timeseries (they work quite OK w/ pure MaskedArraysP. We put them in > scikits.timeseries because it was easier to have them there than in > scipy, for example. > > >> If Skipper has enough time in his google summer of code, we would like >> to include some basic timeseries econometrics (ARMA, VAR, ...?) >> however most likely only for regularly spaced data. > > Well, we can easily restrict the functions to the case were there's no > missing data nor missing dates. Checking the mask is easy, and we have > a method to chek the dates (is_valid) > > >>> Anyhow, if the pb you have are just to specify dates, I really think >>> you should give the scikits a try. And send feedback, of course... >> >> Skipper intends to write some examples to show how to work with the >> extensions to scipy.stats, which, I think, will include examples using >> time series, besides recarrays, and other array types. > > > Dealing with TimeSeries is pretty much the same thing as dealing with > MaskedArray, with the extra convenience of converting from one > frequency to another and so forth.... Quite often, an analysis can be > performed by dropping the .dates part, ?working on the .series part > (the underlying MaskedArray), and repatching the dates at the end... > > >> >> Is there a time line for including the timeseries scikits in numpy/ >> scipy? >> With code that is intended for incorporation in numpy/scipy, we are >> restricted in our external dependencies. > > I can't tell, because the decision is not mine. For what I understood, > there could be an inclusion in scipy if there's a need for it. For > that, we need more users end more feedback.... If you catch my drift... Thanks for the info, we will keep this in mind. Personally, I still think of data just as an array or matrix of numbers, when they still have dates and units attached to them, they are usually a pain. And I'm only slowly getting used to the possibility that it doesn't necessarily need to be so painful. (I didn't know you moved the moving functions to C, I thought I saw them in python.) Josef From mattknox.ca at gmail.com Mon May 25 21:18:25 2009 From: mattknox.ca at gmail.com (Matt Knox) Date: Tue, 26 May 2009 01:18:25 +0000 (UTC) Subject: [Numpy-discussion] add xirr to numpy financial functions? References: <1cd32cbb0905251529y4b042216he726dcbf2982d29d@mail.gmail.com> Message-ID: forgive me for jumping in on this thread and playing devil's advocate here, but I am a natural pessimist so please bear with me :) ... I think as this discussion has already demonstrated, it is *extremely* difficult to build a solid general purpose API for financial functions (even seemingly simple ones like an IRR calculation) because of the endless amount of possible permutations and interpretations. I think it would be a big mistake to add more financial functions to numpy directly without having them mature independently in a separate (scikits) package first. It is virtually guaranteed that you won't get the API right on the first try and adding the functions to numpy locks you into an API commitment because numpy is supposed to be a stable package with certain guarantees for backwards compatibility. And as for a more fully featured finance/quant module in Python... someone has already mentioned the C++ library, QuantLib - which I use extensively at work - and I think any serious effort to improve Python's capabilities in this area would be best spent on building a good Python/numpy interface to QuantLib rather than reimplementing its very substantial functionality (which is probably an impossible task realistically). - Matt From david at ar.media.kyoto-u.ac.jp Mon May 25 21:11:56 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 26 May 2009 10:11:56 +0900 Subject: [Numpy-discussion] numpy ufuncs and COREPY - any info? In-Reply-To: References: <4A16920E.8060805@indiana.edu> <200905221433.18551.faltet@pytables.org> <4A1A7A13.6080405@indiana.edu> Message-ID: <4A1B41DC.4060006@ar.media.kyoto-u.ac.jp> Charles R Harris wrote: > > > On Mon, May 25, 2009 at 4:59 AM, Andrew Friedley > wrote: > > For some reason the list seems to occasionally drop my messages... > > Francesc Alted wrote: > > A Friday 22 May 2009 13:52:46 Andrew Friedley escrigu?: > >> I'm the student doing the project. I have a blog here, which > contains > >> some initial performance numbers for a couple test ufuncs I did: > >> > >> http://numcorepy.blogspot.com > > >> Another alternative we've talked about, and I (more and more > likely) may > >> look into is composing multiple operations together into a > single ufunc. > >> Again the main idea being that memory accesses can be > reduced/eliminated. > > > > IMHO, composing multiple operations together is the most > promising venue for > > leveraging current multicore systems. > > Agreed -- our concern when considering for the project was to keep the > scope reasonable so I can complete it in the GSoC timeframe. If I > have > time I'll definitely be looking into this over the summer; if not > later. > > > Another interesting approach is to implement costly operations > (from the point > > of view of CPU resources), namely, transcendental functions like > sin, cos or > > tan, but also others like sqrt or pow) in a parallel way. If > besides, you can > > combine this with vectorized versions of them (by using the well > spread SSE2 > > instruction set, see [1] for an example), then you would be able > to achieve > > really good results for sure (at least Intel did with its VML > library ;) > > > > [1] http://gruntthepeon.free.fr/ssemath/ > > I've seen that page before. Using another source [1] I came up with a > quick/dirty cos ufunc. Performance is crazy good compared to NumPy > (100x); see the latest post on my blog for a little more info. I'll > look at the source myself when I get time again, but is NumPy using a > Python-based cos function, a C implementation, or something else? > As I > wrote in my blog, the performance gain is almost too good to believe. > > > Numpy uses the C library version. If long double and float aren't > available the double version is used with number conversions, but that > shouldn't give a factor of 100x. Something else is going on. I think something is wrong with the measurement method - on my machine, computing the cos of an array of double takes roughly ~400 cycles/item for arrays with a reasonable size (> 1e3 items). Taking 4 cycles/item for cos would be very impressive :) David From josef.pktd at gmail.com Mon May 25 22:00:46 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Mon, 25 May 2009 22:00:46 -0400 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: References: <1cd32cbb0905251529y4b042216he726dcbf2982d29d@mail.gmail.com> Message-ID: <1cd32cbb0905251900g4fe889c7mec35a2f30cf4536@mail.gmail.com> On Mon, May 25, 2009 at 9:18 PM, Matt Knox wrote: > forgive me for jumping in on this thread and playing devil's advocate here, but > I am a natural pessimist so please bear with me :) ... It's good to hear from a real finance person. > > I think as this discussion has already demonstrated, it is *extremely* > difficult to build a solid general purpose API for financial functions (even > seemingly simple ones like an IRR calculation) because of the endless amount > of possible permutations and interpretations. I think it would be a big > mistake to add more financial functions to numpy directly without having them > mature independently in a separate (scikits) package first. It is virtually > guaranteed that you won't get the API right on the first try and adding the > functions to numpy locks you into an API commitment because numpy is supposed > to be a stable package with certain guarantees for backwards compatibility. > > And as for a more fully featured finance/quant module in Python... someone has > already mentioned the C++ library, QuantLib - which I use extensively at work > - and I think any serious effort to improve Python's capabilities in this area > would be best spent on building a good Python/numpy interface to QuantLib > rather than reimplementing its very substantial functionality (which is > probably an impossible task realistically). > Quantlib might be good for heavy duty work, but when I looked at their code, I wouldn't know where to start if I want to rewrite any algorithm. My benchmark is more scripting with matlab, where maybe some pieces are readily available, but where the code needs also to be strongly adjusted, or we want to implement a new method or prototype for one. I hadn't tried very hard, but I didn't manage to get Boost and quantlib correctly compiled with the python bindings with MingW. So, while python won't get any "industrial strength" finance package, a more modest "designer package" would be feasible, if there were any interest in it (which I haven't seen). It is similar with statistics, there is no way to achieve the same coverage of statistics as R for example, but still I find in many different python packages many of the basic statistics functions are implemented, without running immediately to R, not to mention the multitude (and multiplicity) of available machine learning packages in python. The other group of python packages cover very specialized requirement of the statistical analysis, as for example the neuroimaging groups. The even more modest question is whether we would want to match open office in it's finance part. These are pretty different use cases from those use cases where you have quantlib all set up and running. (I also saw a book announcement for Finance with Python, I don't remember the exact title.) Josef From mattknox.ca at gmail.com Mon May 25 23:15:41 2009 From: mattknox.ca at gmail.com (Matt Knox) Date: Tue, 26 May 2009 03:15:41 +0000 (UTC) Subject: [Numpy-discussion] add xirr to numpy financial functions? References: <1cd32cbb0905251529y4b042216he726dcbf2982d29d@mail.gmail.com> <1cd32cbb0905251900g4fe889c7mec35a2f30cf4536@mail.gmail.com> Message-ID: gmail.com> writes: > So, while python won't get any "industrial strength" finance package, > a more modest "designer package" would be feasible, if there were any > interest in it (which I haven't seen). > > ... > > The even more modest question is whether we would want to match open > office in it's finance part. > > These are pretty different use cases from those use cases where you > have quantlib all set up and running. > As you have hinted, the scope of what will/should be covered with numpy financial functions needs to be defined better before putting more such functions into numpy. If that scope turns out to be something comparable to what excel or openoffice offers, that's fine, but I think a maturation period outside the numpy core (in the form of a scikit or otherwise) would be still be a good idea to avoid getting stuck with a poorly thought out API. As for my personal feelings on how much financial functionality numpy/scipy should offer... I would agree that QuantLib-like functionality is far beyond what numpy can/should try to achieve. More basic functionality like OpenOffice or Excel probably seems about right. Although maybe it is more appropriate for scipy than numpy. - Matt From ferrell at diablotech.com Mon May 25 23:29:02 2009 From: ferrell at diablotech.com (Robert Ferrell) Date: Mon, 25 May 2009 21:29:02 -0600 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: <1cd32cbb0905251529y4b042216he726dcbf2982d29d@mail.gmail.com> References: <1cd32cbb0905251529y4b042216he726dcbf2982d29d@mail.gmail.com> Message-ID: <7F532E89-6260-4FB3-8D95-AA69849865D4@diablotech.com> I haven't read all the messages in detail, and I'm a consumer not a producer, but I'll comment anyways. I'd love to see additional "financial" functionality, but I'd like to see them in a scikit, not in numpy. I think to be useful they are too complicated to go into numpy. A couple of my many reasons: 1. Doing a precise, bang-up job with dates is paramount to any interesting implementation of many financial functions. I've found timeseries to be a great package - there are some things I'd like to see, but overall it is at the foundation of all of my financial analysis. Any moderately interesting extension of the current capabilities would rapidly end up trying to duplicate much of the timeseries functionality, IMO. Rather than partially re-implement the wheel in numpy, as a consumer I'd like to see financial stuff built on a common basis, and timeseries would be a great start. 2. I've read enough of this discussion to hear a requirement for both good date handling and capable solvers - just for xirr. To do a really interesting job on an interesting amount of capability requires even more dependencies, I think. Although it might be tempting to include a few more "lightweight" financial functions in numpy, I doubt they will be that useful. Most of the lightweight ones are easy enough to whip up when you need them. Also, an approximation that's good today isn't the right one tomorrow - only the really robust stuff seems to survive the test of time, in my limited experience. A start on a really solid scikits financial package would be awesome, though. A few months ago, when the open source software for pricing CDS's was released (http://www.cdsmodel.com/information/cds-model) I took a look and noticed that it had a ton of code for dealing with dates. (I also didn't see any tests in the code. I wonder what that means. Scary for anybody that might want to modify it.) I thought if I had an extra 100 hours in every day it would be fun to re-write that code in numpy/scipy and release it. -r From ferrell at diablotech.com Mon May 25 23:33:13 2009 From: ferrell at diablotech.com (Robert Ferrell) Date: Mon, 25 May 2009 21:33:13 -0600 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: References: <1cd32cbb0905251529y4b042216he726dcbf2982d29d@mail.gmail.com> <1cd32cbb0905251900g4fe889c7mec35a2f30cf4536@mail.gmail.com> Message-ID: <59184CA1-45AC-4B38-96DF-09311608C13A@diablotech.com> On May 25, 2009, at 9:15 PM, Matt Knox wrote: > gmail.com> writes: > >> So, while python won't get any "industrial strength" finance package, >> a more modest "designer package" would be feasible, if there were any >> interest in it (which I haven't seen). >> >> ... >> >> The even more modest question is whether we would want to match open >> office in it's finance part. >> >> These are pretty different use cases from those use cases where you >> have quantlib all set up and running. >> > > As you have hinted, the scope of what will/should be covered with > numpy > financial functions needs to be defined better before putting more > such > functions into numpy. If that scope turns out to be something > comparable to > what excel or openoffice offers, that's fine, but I think a > maturation period > outside the numpy core (in the form of a scikit or otherwise) would > be still > be a good idea to avoid getting stuck with a poorly thought out API. +1 for a maturation period outside the numpy core. > > > As for my personal feelings on how much financial functionality > numpy/scipy > should offer... I would agree that QuantLib-like functionality is > far beyond > what numpy can/should try to achieve. More basic functionality like > OpenOffice > or Excel probably seems about right. Although maybe it is more > appropriate for > scipy than numpy. +1 for something outside numpy. Even OpenOffice or Excel financial capability might, perhaps, go into scipy, but why not have it optional? -r From josef.pktd at gmail.com Tue May 26 00:40:01 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 26 May 2009 00:40:01 -0400 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: <7F532E89-6260-4FB3-8D95-AA69849865D4@diablotech.com> References: <1cd32cbb0905251529y4b042216he726dcbf2982d29d@mail.gmail.com> <7F532E89-6260-4FB3-8D95-AA69849865D4@diablotech.com> Message-ID: <1cd32cbb0905252140u23364c9dh7869ca4e07c92e17@mail.gmail.com> On Mon, May 25, 2009 at 11:29 PM, Robert Ferrell wrote: > I haven't read all the messages in detail, and I'm a consumer not a > producer, but I'll comment anyways. > > I'd love to see additional "financial" functionality, but I'd like to > see them in a scikit, not in numpy. ?I think to be useful they are too > complicated to go into numpy. ?A couple of my many reasons: > > 1. Doing a precise, bang-up job with dates is paramount to any > interesting implementation of many financial functions. ?I've found > timeseries to be a great package - there are some things I'd like to > see, but overall it is at the foundation of all of my financial > analysis. ?Any moderately interesting extension of the current > capabilities would rapidly end up trying to duplicate much of the > timeseries functionality, IMO. ?Rather than partially re-implement the > wheel in numpy, as a consumer I'd like to see financial stuff built on > a common basis, and timeseries would be a great start. > > 2. I've read enough of this discussion to hear a requirement for both > good date handling and capable solvers - just for xirr. ?To do a > really interesting job on an interesting amount of capability requires > even more dependencies, I think. > > Although it might be tempting to include a few more "lightweight" > financial functions in numpy, I doubt they will be that useful. ?Most > of the lightweight ones are easy enough to whip up when you need > them. ?Also, an approximation that's good today isn't the right one > tomorrow - only the really robust stuff seems to survive the test of > time, in my limited experience. ?A start on a really solid scikits > financial package would be awesome, though. > > A few months ago, when the open source software for pricing CDS's was > released (http://www.cdsmodel.com/information/cds-model) I took a look > and noticed that it had a ton of code for dealing with dates. ?(I also > didn't see any tests in the code. ?I wonder what that means. ?Scary > for anybody that might want to modify it.) ?I thought if I had an > extra 100 hours in every day it would be fun to re-write that code in > numpy/scipy and release it. > I was looking at mortgage backed securities before the current crisis hit, and I realized that when I use real dates and real payment schedules then taking actual accounting rules into account, my work and code size would strongly increase. Since it was a semi-theoretic application, sticking to months and ignoring actual calender dates was a useful simplification. As Matt argued it is not possible (or maybe just unrealistic) to write a full finance package in python from scratch. As far as I understand, for example the time series scikits cannot handle business holidays. So some simplification will be necessary. But, I agree, that even for an "approximate" finance package, handling dates and timeseries without a corresponding array type will soon get very tedious or duplicative. One additional advantage of a scikits, besides more freedom for dependencies, would be that models can be incrementally added as contributers find time and interest, and gain more experience with the API and the appropriate abstraction, and to collect hacked up scripts before they get a common structure and implementation. If the only crucial dependency is the time series package, it could go possibly into scipy together with the time series scikits. Also targeting scipy, makes a lot of code available, e.g. the problem with the solver and including statistics. "A sparrow in the hand is better than a pigeon on the roof." (German Proverb) On the other hand, I have seen many plans on the mailing list for great new packages or extensions to existing packages without many results. So maybe an incremental inclusion of the functions and API of open office, excel or similar, now, instead of hoping for a "real" finance package is the more realistic approach, especially, because I haven't found any source where we could "steal" wholesale. (for example http://www.cdsmodel.com/information/cds-model doesn't look compatible with BSD) Josef From jh at physics.ucf.edu Tue May 26 00:59:23 2009 From: jh at physics.ucf.edu (Joe Harrington) Date: Tue, 26 May 2009 00:59:23 -0400 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: (numpy-discussion-request@scipy.org) References: Message-ID: Let's keep this thread focussed on the original issue: just add a floating array of times to irr or a new xirr continuous interest no more Anyone can use the timeseries package to produce a floating array of times from normal dates, if those are the dates they want. If they want some specialized financial date, they may want a different conversion, however. All we should provide in NumPy would be the simplest tool. Specialized dates and date-time conversion belong elsewhere. If we're *not* skipping dates, there is no need for xirr, just use irr, which exists. scikits.financial seems like a great idea, and then knock yourselves out for date conversions and definitions of compounding. Just think big and design it first. But let's keep this thread on the simple question for NumPy. --jh-- From Chris.Barker at noaa.gov Tue May 26 01:11:21 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 25 May 2009 22:11:21 -0700 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: <4488200F-9E58-4844-88D5-08C03CDAB713@cs.toronto.edu> References: <000e0cd2a07010d73d046a71a597@google.com> <200905221000.56593.faltet@pytables.org> <4A193C3A.1080309@stevesimmons.com> <3d375d730905241422j29b11d6dn46212a78c7ddfd56@mail.gmail.com> <4488200F-9E58-4844-88D5-08C03CDAB713@cs.toronto.edu> Message-ID: <4A1B79F9.8010203@noaa.gov> David Warde-Farley wrote: > As Robert's design document for the NPY format says, one option would > be to implement a minimal subset of the HDF5 protocol *from scratch* That would be really cool -- I wonder how hard it would be to implement just the current NPY features? Judging from this: http://www.hdfgroup.org/HDF5/doc/H5.format.html It's far from trivial! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception From robert.kern at gmail.com Tue May 26 01:12:33 2009 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 26 May 2009 00:12:33 -0500 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: References: Message-ID: <3d375d730905252212q6df32cddy66e5dad9a1ccae8c@mail.gmail.com> On Mon, May 25, 2009 at 23:59, Joe Harrington wrote: > Let's keep this thread focussed on the original issue: > > just add a floating array of times to irr or a new xirr > continuous interest > no more > > Anyone can use the timeseries package to produce a floating array of > times from normal dates, if those are the dates they want. ?If they > want some specialized financial date, they may want a different > conversion, however. ?All we should provide in NumPy would be the > simplest tool. ?Specialized dates and date-time conversion belong > elsewhere. > > If we're *not* skipping dates, there is no need for xirr, just use > irr, which exists. > > scikits.financial seems like a great idea, and then knock yourselves > out for date conversions and definitions of compounding. ?Just think > big and design it first. ?But let's keep this thread on the simple > question for NumPy. Then let's just say "No" and move on. I see no compelling reason to extend numpy's financial capabilities (of course, I spoke against their original addition in the first place, so take that as you will). Handling this by asking, "here are the constraints for numpy; what can we shoehorn in there?" is the wrong approach. Figure out what you want to achieve, then figure out what you need to solve the problem best. I don't think that including xirr in numpy, with its constraints, serves the problem best. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Tue May 26 01:15:12 2009 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 26 May 2009 00:15:12 -0500 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: <4A1B79F9.8010203@noaa.gov> References: <000e0cd2a07010d73d046a71a597@google.com> <4A193C3A.1080309@stevesimmons.com> <3d375d730905241422j29b11d6dn46212a78c7ddfd56@mail.gmail.com> <4488200F-9E58-4844-88D5-08C03CDAB713@cs.toronto.edu> <4A1B79F9.8010203@noaa.gov> Message-ID: <3d375d730905252215p4ccf309ard7367fe7d39ef7d7@mail.gmail.com> On Tue, May 26, 2009 at 00:11, Christopher Barker wrote: > David Warde-Farley wrote: >> As Robert's design document for the NPY format says, one option would >> be to implement a minimal subset of the HDF5 protocol *from scratch* > > That would be really cool -- I wonder how hard it would be to implement > just the current NPY features? Judging from this: > > http://www.hdfgroup.org/HDF5/doc/H5.format.html > > It's far from trivial! Yes. That's why I wrote the NPY format instead. I *did* do some due diligence before I designed a new binary format. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From jsseabold at gmail.com Tue May 26 01:20:58 2009 From: jsseabold at gmail.com (Skipper Seabold) Date: Tue, 26 May 2009 01:20:58 -0400 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: <3d375d730905252212q6df32cddy66e5dad9a1ccae8c@mail.gmail.com> References: <3d375d730905252212q6df32cddy66e5dad9a1ccae8c@mail.gmail.com> Message-ID: On Tue, May 26, 2009 at 1:12 AM, Robert Kern wrote: > On Mon, May 25, 2009 at 23:59, Joe Harrington wrote: >> Let's keep this thread focussed on the original issue: >> >> just add a floating array of times to irr or a new xirr >> continuous interest >> no more >> >> Anyone can use the timeseries package to produce a floating array of >> times from normal dates, if those are the dates they want. ?If they >> want some specialized financial date, they may want a different >> conversion, however. ?All we should provide in NumPy would be the >> simplest tool. ?Specialized dates and date-time conversion belong >> elsewhere. >> >> If we're *not* skipping dates, there is no need for xirr, just use >> irr, which exists. >> >> scikits.financial seems like a great idea, and then knock yourselves >> out for date conversions and definitions of compounding. ?Just think >> big and design it first. ?But let's keep this thread on the simple >> question for NumPy. > > Then let's just say "No" and move on. I see no compelling reason to > extend numpy's financial capabilities (of course, I spoke against > their original addition in the first place, so take that as you will). > Handling this by asking, "here are the constraints for numpy; what can > we shoehorn in there?" is the wrong approach. Figure out what you want > to achieve, then figure out what you need to solve the problem best. I > don't think that including xirr in numpy, with its constraints, serves > the problem best. > My only question then would be why have numpy.financials in the first place? I was pretty surprised to find it. Maybe it should be in scipy.financials, so it can take advantage of the solvers? There's already the three line Newton method implementation that can only be used for rates(), which did seem "shoehorned" to me already. I changed mine to rely on a few lines of the Newton secant method (that could be general enough for rates() ) and to work without any internal date dependencies, but I too get the sense that it shouldn't be there, but these type of "expected" spreadsheet-like functions (with a set API and accepted usage behavior) seem to be of interest to some. Not trying to beat a dead horse. Just a thought, because I do have some interest in expanding these kinds of functions wherever they could end up. -Skipper From Chris.Barker at noaa.gov Tue May 26 01:23:14 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 25 May 2009 22:23:14 -0700 Subject: [Numpy-discussion] Inconsistent error messages. In-Reply-To: References: <9457e7c80905230249k39cd9965q784b80b5afa0f7dc@mail.gmail.com> <4A18563C.3010705@noaa.gov> Message-ID: <4A1B7CC2.3060904@noaa.gov> Charles R Harris wrote: > I don't like the idea of a warning here either. How about adding a > keyword 'strict' so that strict=1 means an error is raised if the count > isn't reached, and strict=0 means any count is acceptable? I'd prefer a more meaningful name than "strict" -- you'd have absolutely no idea what that meant without reading the docs -- maybe allow_partial_read? (and please, True and False, rather than 1 or 0) I also STRONGLY prefer that it be default to raise an exception. I am convinced that a Warning is NOT the right way to handle this. While it does appear that a warning can be caught if need be (though it's not totally clear whether the "raise this every time" option works..), it's not default behavior, and it's not a well know feature. This discussion makes that absolutely clear. I may or may not be representative, but I am no python newbie, and I had no idea before this discussion how one would handle a warning in this case. As far as I can see, there are only two possibilities: 1) The writer of the code has anticipated that not all the items requested might be read in, and has written the code to handle that case. In this case s/he would use strict=False 2) The code does not handle that case, either because writer of the code did not anticipate that it would ever happen, or because it really would be a failure. In that case, an Exception is the only correct result -- anything else is inviting hidden bugs. I completely fail to see the logic on a warning here, and I have never seen warnings used it this way. I also don't see what the objection to an exception is if there is a flag that can turn it off. St?fan van der Walt wrote: > The reason I much prefer a warning is that you always get data back, > whether things went wrong or not. If you throw an error, then you > can't get hold of the last read blocks at all. > > I guess a strict flag is OK exactly. > but why, if you've got a warning in > place? Warnings are easy to catch (and this can be documented in > fromfile's docstring): so are exceptions -- and even more so, so are flags. > warnings.simplefilter('error', np.lib.IOWarning) a pain, and no well known. > In Python 2.6 you can use "catch_warnings": much better, but still not used much. The key issue here is that a lot of folks WILL NOT HAVE THOUGHT about what they want to do if not all the items are read -- in that case, it is an error. If they have thought about it, they can turn off the strict (or whatever) flag. St?fan van der Walt wrote: > Warnings are a great way of telling the user that a non-fatal problem > cropped up. but this is only a non-fatal problem if the code has planned for it -- if not it is likely to be fatal. > Maybe we should provide tools in NumPy to handle warnings more easily? > Something like > > with no_warnings: > np.fromfile('x') nice, but only if this becomes a standard, common paradigm that is widely known -- IT IS NOT NOW. It would be interesting to survey how warnings are used in python these days -- most of the ones I've seen are deprecation warnings -- that's the kind of thing they should be used for -- you want people to know, but it doesn't indicate anything even possibly fatal at this point. Sorry to be so strong in my opinion here, but I know this is something I'm going to screw up given the chance. I really want that exception! -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception From robert.kern at gmail.com Tue May 26 01:22:24 2009 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 26 May 2009 00:22:24 -0500 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: References: <3d375d730905252212q6df32cddy66e5dad9a1ccae8c@mail.gmail.com> Message-ID: <3d375d730905252222nae68c3fl647bf42c6b4fa93a@mail.gmail.com> On Tue, May 26, 2009 at 00:20, Skipper Seabold wrote: > My only question then would be why have numpy.financials in the first > place? You can go through the old threads for the arguments. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From charlesr.harris at gmail.com Tue May 26 01:38:26 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Mon, 25 May 2009 23:38:26 -0600 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: <1cd32cbb0905251755w41239410he4dd8053e898c8ea@mail.gmail.com> References: <1cd32cbb0905251529y4b042216he726dcbf2982d29d@mail.gmail.com> <1cd32cbb0905251627t2ea151ddye095df8c775bbee@mail.gmail.com> <1cd32cbb0905251755w41239410he4dd8053e898c8ea@mail.gmail.com> Message-ID: On Mon, May 25, 2009 at 6:55 PM, wrote: > On Mon, May 25, 2009 at 7:27 PM, wrote: > >> The advantage of Skippers implementation using actual dates instead of > >> just an array of numbers is that it is possible to directly calculate > >> the annual irr, since the time units are well specified. The only > >> problem is the need for an equation solver in numpy. Just using a date > >> tuple would remove the problem of string parsing, and it might be > >> possible to extend it later to a date array. > >> > >> So, I think it would be possible to include Skippers solution, with > >> some cleanup and testing, if an equation solver can be found or if > >> np.roots can handle high order (sparse) polynomials. > >> > > > > I looked a bit more: the current implementation of ``rate`` uses it's > > own iterative (Newton) solver, and in a similar way this could be done > > for a more general xirr. > > > > So with a bit of work this doesn't seem to be a problem and the only > > question that remains is the specification of the dates. > > > Here is a solver using the polynomial class, or is there something > like this already in numpy > No. But I think numpy might be a good place for one of the simple 1D solvers. The Brent one would be a good choice as it includes bisection as a fallback strategy. Simple bisection might also be worth adding. The current location of these solvers in scipy.optimize is somewhat obscure and they are the sort of function that gets used often. They don't really fit if we stick to an "arrays only" straight jacket in numpy, but polynomials and financial functions seem to me even further from the core. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Tue May 26 01:50:00 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 25 May 2009 22:50:00 -0700 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: <3d375d730905252215p4ccf309ard7367fe7d39ef7d7@mail.gmail.com> References: <000e0cd2a07010d73d046a71a597@google.com> <4A193C3A.1080309@stevesimmons.com> <3d375d730905241422j29b11d6dn46212a78c7ddfd56@mail.gmail.com> <4488200F-9E58-4844-88D5-08C03CDAB713@cs.toronto.edu> <4A1B79F9.8010203@noaa.gov> <3d375d730905252215p4ccf309ard7367fe7d39ef7d7@mail.gmail.com> Message-ID: <4A1B8308.4050604@noaa.gov> Robert Kern wrote: > Yes. That's why I wrote the NPY format instead. I *did* do some due > diligence before I designed a new binary format. I assumed so, and I also assume you took a look at netcdf3, but since it's been brought up here, I take it it dint fit the bill? Even if it did, while it will be around for a LONG time, it is an out-of-date format. -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception From dwf at cs.toronto.edu Tue May 26 02:02:43 2009 From: dwf at cs.toronto.edu (David Warde-Farley) Date: Tue, 26 May 2009 02:02:43 -0400 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: <3d375d730905252215p4ccf309ard7367fe7d39ef7d7@mail.gmail.com> References: <000e0cd2a07010d73d046a71a597@google.com> <4A193C3A.1080309@stevesimmons.com> <3d375d730905241422j29b11d6dn46212a78c7ddfd56@mail.gmail.com> <4488200F-9E58-4844-88D5-08C03CDAB713@cs.toronto.edu> <4A1B79F9.8010203@noaa.gov> <3d375d730905252215p4ccf309ard7367fe7d39ef7d7@mail.gmail.com> Message-ID: <610EC657-E59D-4D8B-9F78-2EB6E63E814E@cs.toronto.edu> On 26-May-09, at 1:15 AM, Robert Kern wrote: > I *did* do some due diligence before I designed a new binary format. Uh oh, I feel this might've taken a sharp turn towards another "of course Robert is right, Robert is always right" threads. :) David From Chris.Barker at noaa.gov Tue May 26 02:30:09 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Mon, 25 May 2009 23:30:09 -0700 Subject: [Numpy-discussion] parsing text strings/files in fromfile, fromstring In-Reply-To: References: Message-ID: <4A1B8C71.2000206@noaa.gov> Charles R Harris wrote: > I am trying to put together some rule for parsing text strings/files in > fromfile, fromstring so that the two are consistent. Thanks for giving these some attention -- they've needed it for a while! > 1) When the string/file is empty fromfile returns and empty array, split > returns an empty string, I think the behavior of split() is irrelevant here -- fromstring/file is about reading numbers from text -- while split()- is very helpful for that, it's not what it's specifically for. > and fromstring converts the empty string to a > default value. Which should we use? they should NEVER return a number when there isn't one in the source. > 2) When the string/file contains only a single separator > fromfile/fromstring both return a single value, while split returns two > empty strings. Which should we use? neither -- see above. > My preferences would be to return empty arrays whenever the string/file > is empty, but I don't feel strongly about that. yup. > Also, wouldn't a missing value be better interpreted as nan than zero in > the float case? yes, but since I don't think missing values should be returned at all, it doesn't matter. I do think the more interesting case might be a csv or tab-delimited file with a line like: 34, 5, 4.6, , , 45, 32 In this case, I suppose it is clear that this is a row in a table that is supposed to be 7 items long. With floats, it would be pretty rational to put NaNs in there, but without an equivalent for integers, I'd say go with an error. Two other options: 1) a "missing_value" keyword -- the user explicitly says what they want put in for missing values. 2) return a masked array -- also as a keyword option. Masked arrays are supposed to be the numpy way to express missing values. and yes, fromfile( a_file ) should always return the same thing as fromstring( a_file.read() ) Pauli Virtanen wrote: > a) fromstring("1,2,x,4", sep=",") -> [1,2] > fromstring("1,2,x,4", sep=",", strict=True) -> ValueError > fromstring("1,2,x,4", sep=",", count=5) -> [1,2] > fromstring("1,2,x,4", sep=",", count=5, strict=True) -> ValueError > > b) fromstring("1,2,x,4", sep=",") -> [1,2] > fromstring("1,2,x,4", sep=",", strict=True) -> ValueError > fromstring("1,2,x,4", sep=",", default=3) -> [1,2,3,4] > fromstring("1,2,x,4", sep=",", count=5) -> [1,2] > fromstring("1,2,x,4", sep=",", count=5, strict=True) -> ValueError > > c) fromstring("1,2,x,4", sep=",") -> [1,2] + SomeWarning > fromstring("1,2,x,4", sep=",", count=5) -> [1,2] + SomeWarning > > d) fromstring("1,2,x,4", sep=",") -> [1,2] + SomeWarning > fromstring("1,2,x,4", sep=",", default=3) -> [1,2,3,4] > fromstring("1,2,x,4", sep=",", default=3, count=5) -> [1,2,3,4] + SomeWarning > > e) fromstring("1,2,x,4", sep=",") -> ValueError > fromstring("1,2,x,4", sep=",", strict=False) -> [1,2] > fromstring("1,2,x,4", sep=",", count=5) -> ValueError > fromstring("1,2,x,4", sep=",", count=5, strict=False) -> [1,2] (c) and (d) are out, as I don't think Warnings are the right thing here (see my earlier rant). I don't like (a) and (b), as I think "strict" (with a better name...) should be True be default. What I want is (b) with a different default, which would be (e) with a "default" (or, maybe "missing"). Those seem to have defined "strict" in two ways: both number of elements, and what to do with non-numerical input, I wonder if those should be merged? Also, I wonder if setting a "missing" should work for any non-numerical entires, or only empty space? I think I'd go with: f) fromstring("1,2,x,4", sep=",") -> [1,2] fromstring("1,2,x,4", sep=",", count=4) -> ValueError fromstring("1,2,3,4", sep=",", count=5) -> ValueError fromstring("1,2,3,4", sep=",", count=5, strict=False) -> [1,2,3,4] fromstring("1,2, ,4", sep=",", missing=3) -> [1,2,3,4] fromstring("1,2,x,4", sep=",", missing=3) -> ValueError I THINK we can break it down into these distinct questions: (1) What should be returned if there is a non-number between separators and there is no default value specified? a) ValueError b) a default value (2) If a default value was specified: a) the default value b) if it is whitespace: the default else: ValueError (3) What should be returned if EOF is reached before count is reached? (a) a warning (b) just the numbers read so far (c) if strict: an exception else: just the numbers read so far (4) Should any non-numeric text behave the same as EOF when count is not specified? (a) yes (b) no (5) what should "strict" default to? (a) True (b) False (6) Should \n be interpreted as a sep along with the specified sep? (a) yes (b) no [OK, I added that one as my pet desire...) I vote: (1) a (2) b (3) c (4) b (5) a (6) a Does that cover it? > and binary data implied by sep='' would be interpreted in the same > way it would if first converted to comma-separated text. Only with regard to less than count numbers read -- I don't think any of the rest applies -- though I'm still for splitting binary and text file reading anyway. > I'd vote for (e) if the slate was clean, but since it's not: I think the slate is clean enough, given that the current implementation is buggy. While you are digging into this code, we did have a discussion a while back, captured in this ticket: http://projects.scipy.org/numpy/ticket/909 Any chance you could address any of that, too? -Chris -- Christopher Barker, Ph.D. Oceanographer NOAA/OR&R/HAZMAT (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception From faltet at pytables.org Tue May 26 02:32:10 2009 From: faltet at pytables.org (Francesc Alted) Date: Tue, 26 May 2009 08:32:10 +0200 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: <610EC657-E59D-4D8B-9F78-2EB6E63E814E@cs.toronto.edu> References: <000e0cd2a07010d73d046a71a597@google.com> <3d375d730905252215p4ccf309ard7367fe7d39ef7d7@mail.gmail.com> <610EC657-E59D-4D8B-9F78-2EB6E63E814E@cs.toronto.edu> Message-ID: <200905260832.10338.faltet@pytables.org> A Tuesday 26 May 2009 08:02:43 David Warde-Farley escrigu?: > On 26-May-09, at 1:15 AM, Robert Kern wrote: > > I *did* do some due diligence before I designed a new binary format. > > Uh oh, I feel this might've taken a sharp turn towards another "of > course Robert is right, Robert is always right" threads. :) Agreed :) -- Francesc Alted From brennan.williams at visualreservoir.com Tue May 26 02:47:25 2009 From: brennan.williams at visualreservoir.com (Brennan Williams) Date: Tue, 26 May 2009 18:47:25 +1200 Subject: [Numpy-discussion] CUDA Message-ID: <4A1B907D.8040807@visualreservoir.com> Not a question really but just for discussion/pie-in-the-sky etc.... This is a news item on vizworld about getting Matlab code to run on a CUDA enabled GPU. http://www.vizworld.com/2009/05/cuda-enable-matlab-with-gpumat/ If the use of GPU's for numerical tasks takes off (has it already?) then Id be interested to know the views of the numpy experts out there. Cheers Brennan From david at ar.media.kyoto-u.ac.jp Tue May 26 02:35:28 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 26 May 2009 15:35:28 +0900 Subject: [Numpy-discussion] CUDA In-Reply-To: <4A1B907D.8040807@visualreservoir.com> References: <4A1B907D.8040807@visualreservoir.com> Message-ID: <4A1B8DB0.3050309@ar.media.kyoto-u.ac.jp> Brennan Williams wrote: > Not a question really but just for discussion/pie-in-the-sky etc.... > > This is a news item on vizworld about getting Matlab code to run on a > CUDA enabled GPU. > > http://www.vizworld.com/2009/05/cuda-enable-matlab-with-gpumat/ > There is this which looks similar for numpy: http://kered.org/blog/2009-04-13/easy-python-numpy-cuda-cublas/ I have never used it, just saw it mentioned somewhere, cheers, David From faltet at pytables.org Tue May 26 02:56:25 2009 From: faltet at pytables.org (Francesc Alted) Date: Tue, 26 May 2009 08:56:25 +0200 Subject: [Numpy-discussion] numpy ufuncs and COREPY - any info? In-Reply-To: <4A1B41DC.4060006@ar.media.kyoto-u.ac.jp> References: <4A1B41DC.4060006@ar.media.kyoto-u.ac.jp> Message-ID: <200905260856.26001.faltet@pytables.org> A Tuesday 26 May 2009 03:11:56 David Cournapeau escrigu?: > Charles R Harris wrote: > > On Mon, May 25, 2009 at 4:59 AM, Andrew Friedley > > wrote: > > > > For some reason the list seems to occasionally drop my messages... > > > > Francesc Alted wrote: > > > A Friday 22 May 2009 13:52:46 Andrew Friedley escrigu?: > > >> I'm the student doing the project. I have a blog here, which > > > > contains > > > > >> some initial performance numbers for a couple test ufuncs I did: > > >> > > >> http://numcorepy.blogspot.com > > >> > > >> Another alternative we've talked about, and I (more and more > > > > likely) may > > > > >> look into is composing multiple operations together into a > > > > single ufunc. > > > > >> Again the main idea being that memory accesses can be > > > > reduced/eliminated. > > > > > IMHO, composing multiple operations together is the most > > > > promising venue for > > > > > leveraging current multicore systems. > > > > Agreed -- our concern when considering for the project was to keep > > the scope reasonable so I can complete it in the GSoC timeframe. If I > > have > > time I'll definitely be looking into this over the summer; if not > > later. > > > > > Another interesting approach is to implement costly operations > > > > (from the point > > > > > of view of CPU resources), namely, transcendental functions like > > > > sin, cos or > > > > > tan, but also others like sqrt or pow) in a parallel way. If > > > > besides, you can > > > > > combine this with vectorized versions of them (by using the well > > > > spread SSE2 > > > > > instruction set, see [1] for an example), then you would be able > > > > to achieve > > > > > really good results for sure (at least Intel did with its VML > > > > library ;) > > > > > [1] http://gruntthepeon.free.fr/ssemath/ > > > > I've seen that page before. Using another source [1] I came up with > > a quick/dirty cos ufunc. Performance is crazy good compared to NumPy > > (100x); see the latest post on my blog for a little more info. I'll look > > at the source myself when I get time again, but is NumPy using a > > Python-based cos function, a C implementation, or something else? As I > > wrote in my blog, the performance gain is almost too good to believe. > > > > > > Numpy uses the C library version. If long double and float aren't > > available the double version is used with number conversions, but that > > shouldn't give a factor of 100x. Something else is going on. > > I think something is wrong with the measurement method - on my machine, > computing the cos of an array of double takes roughly ~400 cycles/item > for arrays with a reasonable size (> 1e3 items). Taking 4 cycles/item > for cos would be very impressive :) Well, it is Andrew who should demonstrate that his measurement is correct, but in principle, 4 cycles/item *should* be feasible when using 8 cores in parallel. In [1] one can see how Intel achieves (with his VML kernel) to compute a cos() in less than 23 cycles in one single core. Having 8 cores in parallel would allow, in theory, reach 3 cycles/item. [1]http://www.intel.com/software/products/mkl/data/vml/functions/_performanceall.html -- Francesc Alted From david at ar.media.kyoto-u.ac.jp Tue May 26 02:58:52 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Tue, 26 May 2009 15:58:52 +0900 Subject: [Numpy-discussion] numpy ufuncs and COREPY - any info? In-Reply-To: <200905260856.26001.faltet@pytables.org> References: <4A1B41DC.4060006@ar.media.kyoto-u.ac.jp> <200905260856.26001.faltet@pytables.org> Message-ID: <4A1B932C.30504@ar.media.kyoto-u.ac.jp> Francesc Alted wrote: > > Well, it is Andrew who should demonstrate that his measurement is correct, but > in principle, 4 cycles/item *should* be feasible when using 8 cores in > parallel. But the 100x speed increase is for one core only unless I misread the table. And I should have mentioned that 400 cycles/item for cos is on a pentium 4, which has dreadful performances (defective L1). On a much better core duo extreme something, I get 100 cycles / item (on a 64 bits machines, though, and not same compiler, although I guess the libm version is what matters the most here). And let's not forget that there is the python wrapping cost: by doing everything in C, I got ~ 200 cycle/cos on the PIV, and ~60 cycles/cos on the core 2 duo (for double), using the rdtsc performance counter. All this for 1024 items in the array, so very optimistic usecase (everything in cache 2 if not 1). This shows that python wrapping cost is not so high, making the 100x claim a bit doubtful without more details on the way to measure speed. cheers, David From olivier.grisel at ensta.org Tue May 26 03:31:24 2009 From: olivier.grisel at ensta.org (Olivier Grisel) Date: Tue, 26 May 2009 09:31:24 +0200 Subject: [Numpy-discussion] CUDA In-Reply-To: <4A1B8DB0.3050309@ar.media.kyoto-u.ac.jp> References: <4A1B907D.8040807@visualreservoir.com> <4A1B8DB0.3050309@ar.media.kyoto-u.ac.jp> Message-ID: Also note: nvidia is about to release the first implementation of an OpenCL runtime based on cuda. OpenCL is an open standard such as OpenGL but for numerical computing on stream platforms (GPUs, Cell BE, Larrabee, ...). -- Olivier On May 26, 2009 8:54 AM, "David Cournapeau" wrote: Brennan Williams wrote: > Not a question really but just for discussion/pie-in-the-sky etc.... > > T... There is this which looks similar for numpy: http://kered.org/blog/2009-04-13/easy-python-numpy-cuda-cublas/ I have never used it, just saw it mentioned somewhere, cheers, David _______________________________________________ Numpy-discussion mailing list Numpy-discussion at scipy... -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nicolas.Rougier at loria.fr Tue May 26 03:55:58 2009 From: Nicolas.Rougier at loria.fr (Nicolas Rougier) Date: Tue, 26 May 2009 09:55:58 +0200 Subject: [Numpy-discussion] Segmentation fault on large arrays Message-ID: <1243324558.14931.14.camel@sulfur.loria.fr> Hello, I've come across what is probably a bug in size check for large arrays: >>> import numpy >>> z1 = numpy.zeros((255*256,256*256)) Traceback (most recent call last): File "", line 1, in ValueError: dimensions too large. >>> z2 = numpy.zeros((256*256,256*256)) >>> z2.shape (65536, 65536) >>> z2[0] = 0 Segmentation fault Note that z1 size is smaller than z2 but z2 is not told that its dimensions are too large. This has been tested with numpy 1.3.0. Nicolas From ndbecker2 at gmail.com Tue May 26 07:43:02 2009 From: ndbecker2 at gmail.com (Neal Becker) Date: Tue, 26 May 2009 07:43:02 -0400 Subject: [Numpy-discussion] CUDA References: <4A1B907D.8040807@visualreservoir.com> <4A1B8DB0.3050309@ar.media.kyoto-u.ac.jp> Message-ID: Olivier Grisel wrote: > Also note: nvidia is about to release the first implementation of an > OpenCL runtime based on cuda. OpenCL is an open standard such as OpenGL > but for numerical computing on stream platforms (GPUs, Cell BE, Larrabee, > ...). > You might be interested in pycuda. From gael.varoquaux at normalesup.org Tue May 26 08:04:19 2009 From: gael.varoquaux at normalesup.org (Gael Varoquaux) Date: Tue, 26 May 2009 14:04:19 +0200 Subject: [Numpy-discussion] CUDA In-Reply-To: References: <4A1B907D.8040807@visualreservoir.com> <4A1B8DB0.3050309@ar.media.kyoto-u.ac.jp> Message-ID: <20090526120419.GA6537@phare.normalesup.org> On Tue, May 26, 2009 at 07:43:02AM -0400, Neal Becker wrote: > Olivier Grisel wrote: > > Also note: nvidia is about to release the first implementation of an > > OpenCL runtime based on cuda. OpenCL is an open standard such as OpenGL > > but for numerical computing on stream platforms (GPUs, Cell BE, Larrabee, > > ...). > You might be interested in pycuda. I am sure Olivier knows about pycuda :). However, the big deal with OpenCL, compared to CUDA, is that it is an open standard. With CUDA, you are bound to nvidia's future policies. Ga?l From matthieu.brucher at gmail.com Tue May 26 08:08:32 2009 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 26 May 2009 14:08:32 +0200 Subject: [Numpy-discussion] CUDA In-Reply-To: <20090526120419.GA6537@phare.normalesup.org> References: <4A1B907D.8040807@visualreservoir.com> <4A1B8DB0.3050309@ar.media.kyoto-u.ac.jp> <20090526120419.GA6537@phare.normalesup.org> Message-ID: 2009/5/26 Gael Varoquaux : > On Tue, May 26, 2009 at 07:43:02AM -0400, Neal Becker wrote: >> Olivier Grisel wrote: > >> > Also note: nvidia is about to release the first implementation of an >> > OpenCL runtime based on cuda. OpenCL is an open standard such as OpenGL >> > but for numerical computing on stream platforms (GPUs, Cell BE, Larrabee, >> > ...). > > >> You might be interested in pycuda. > > I am sure Olivier knows about pycuda :). However, the big deal with > OpenCL, compared to CUDA, is that it is an open standard. With CUDA, you > are bound to nvidia's future policies. > > Ga?l The issue with OpenCL is that there will be some extensions for each supported architecture, which means that the generic OpenCL will never be very fast or more exactly near the optimum. Matthieu -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher From seb.binet at gmail.com Tue May 26 08:17:58 2009 From: seb.binet at gmail.com (Sebastien Binet) Date: Tue, 26 May 2009 14:17:58 +0200 Subject: [Numpy-discussion] CUDA In-Reply-To: References: <4A1B907D.8040807@visualreservoir.com> <20090526120419.GA6537@phare.normalesup.org> Message-ID: <200905261417.58631.binet@cern.ch> On Tuesday 26 May 2009 14:08:32 Matthieu Brucher wrote: > 2009/5/26 Gael Varoquaux : > > On Tue, May 26, 2009 at 07:43:02AM -0400, Neal Becker wrote: > >> Olivier Grisel wrote: > >> > Also note: nvidia is about to release the first implementation of an > >> > OpenCL runtime based on cuda. OpenCL is an open standard such as > >> > OpenGL but for numerical computing on stream platforms (GPUs, Cell BE, > >> > Larrabee, ...). > >> > >> You might be interested in pycuda. > > > > I am sure Olivier knows about pycuda :). However, the big deal with > > OpenCL, compared to CUDA, is that it is an open standard. With CUDA, you > > are bound to nvidia's future policies. > > > > Ga?l > > The issue with OpenCL is that there will be some extensions for each > supported architecture, which means that the generic OpenCL will never > be very fast or more exactly near the optimum. what's the difference w/ OpenGL ? i.e. isn't the job of the "underlying" library to provide the best algorithm- freakingly-optimized-bare-to-the-metal-whatever-opcode, hidden away from the user's face ? OpenCL is just an API (modeled after the CUDA one AFAICT) so implementers can use whatever trick they want, right ? my 2 euro-cents. cheers, sebastien. -- ######################################### # Dr. Sebastien Binet # Laboratoire de l'Accelerateur Lineaire # Universite Paris-Sud XI # Batiment 200 # 91898 Orsay ######################################### From matthieu.brucher at gmail.com Tue May 26 08:29:51 2009 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Tue, 26 May 2009 14:29:51 +0200 Subject: [Numpy-discussion] CUDA In-Reply-To: <200905261417.58631.binet@cern.ch> References: <4A1B907D.8040807@visualreservoir.com> <20090526120419.GA6537@phare.normalesup.org> <200905261417.58631.binet@cern.ch> Message-ID: >> The issue with OpenCL is that there will be some extensions for each >> supported architecture, which means that the generic OpenCL will never >> be very fast or more exactly near the optimum. > > what's the difference w/ OpenGL ? > i.e. isn't the job of the "underlying" library to provide the best algorithm- > freakingly-optimized-bare-to-the-metal-whatever-opcode, hidden away from the > user's face ? It's like OpenGL: you have to fall back to more simple functions if you want to support every platform. If you target only one specific platform, you can use custom optimized functions. > OpenCL is just an API (modeled after the CUDA one AFAICT) so implementers can > use whatever trick they want, right ? Implementers can't know for instance how the data-domain must be split (1D, 2D, 3D, ... ? what if the underlying tool doesn't provide all of them?). OpenCL will have ways to tell that some data must be stored in the local or shared memory (for the GPU), ... There are some companies that provide ways to do this with pragmas ion C and Fortran (i.e. CAPS), but even if there are pragmas dedicated to CUDA, the generated code is not optimal. So I don't think it is reasonable to expect the implementers to provide in the common API the tools to make a really optimal code. You will have to use additional, manufacturer-related API, like what you do for state-of-the-art OpenGL. > my 2 euro-cents. my 2 euro-cents ;) Matthieu -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher From afriedle at indiana.edu Tue May 26 09:14:39 2009 From: afriedle at indiana.edu (Andrew Friedley) Date: Tue, 26 May 2009 09:14:39 -0400 Subject: [Numpy-discussion] numpy ufuncs and COREPY - any info? In-Reply-To: <4A1B932C.30504@ar.media.kyoto-u.ac.jp> References: <4A1B41DC.4060006@ar.media.kyoto-u.ac.jp> <200905260856.26001.faltet@pytables.org> <4A1B932C.30504@ar.media.kyoto-u.ac.jp> Message-ID: <4A1BEB3F.2020809@indiana.edu> David Cournapeau wrote: > Francesc Alted wrote: >> Well, it is Andrew who should demonstrate that his measurement is correct, but >> in principle, 4 cycles/item *should* be feasible when using 8 cores in >> parallel. > > But the 100x speed increase is for one core only unless I misread the > table. And I should have mentioned that 400 cycles/item for cos is on a > pentium 4, which has dreadful performances (defective L1). On a much > better core duo extreme something, I get 100 cycles / item (on a 64 bits > machines, though, and not same compiler, although I guess the libm > version is what matters the most here). > > And let's not forget that there is the python wrapping cost: by doing > everything in C, I got ~ 200 cycle/cos on the PIV, and ~60 cycles/cos on > the core 2 duo (for double), using the rdtsc performance counter. All > this for 1024 items in the array, so very optimistic usecase (everything > in cache 2 if not 1). > > This shows that python wrapping cost is not so high, making the 100x > claim a bit doubtful without more details on the way to measure speed. I appreciate all the discussion this is creating. I wish I could work on this more right now; I have a big paper deadline coming up June 1 that I need to focus on. Yes, you're reading the table right. I should have been more clear on what my implementation is doing. It's using SIMD, so performing 4 cosine's at a time where a libm cosine is only doing one. Also I don't think libm trancendentals are known for being fast; I'm also likely gaining performance by using a well-optimized but less accurate approximation. In fact a little more inspection shows my accuracy decreases as the input values increase; I will probably need to take a performance hit to fix this. I went and wrote code to use the libm fcos() routine instead of my cos code. Performance is equivalent to numpy, plus an overhead: inp sizes 1024 10240 102400 1024000 3072000 numpy 0.7282 9.6278 115.5976 993.5738 3017.3680 lmcos 1 0.7594 9.7579 116.7135 1039.5783 3156.8371 lmcos 2 0.5274 5.7885 61.8052 537.8451 1576.2057 lmcos 4 0.5172 5.1240 40.5018 313.2487 791.9730 corepy 1 0.0142 0.0880 0.9566 9.6162 28.4972 corepy 2 0.0342 0.0754 0.6991 6.1647 15.3545 corepy 4 0.0596 0.0963 0.5671 4.9499 13.8784 The times I show are in milliseconds; the system used is a dual-socket dual-core 2ghz opteron. I'm testing at the ufunc level, like this: def benchmark(fn, args): avgtime = 0 fn(*args) for i in xrange(7): t1 = time.time() fn(*args) t2 = time.time() tm = t2 - t1 avgtime += tm return avgtime / 7 Where fn is a ufunc, ie numpy.cos. So I prime the execution once, then do 7 timings and take the average. I always appreciate suggestions on better way to benchmark things. Andrew From aisaac at american.edu Tue May 26 09:22:24 2009 From: aisaac at american.edu (Alan G Isaac) Date: Tue, 26 May 2009 09:22:24 -0400 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: References: Message-ID: <4A1BED10.7050801@american.edu> Would you like to put xirr in econpy until it finds a home in SciPy? (Might as well make it available.) Cheers, Alan Isaac From charlesr.harris at gmail.com Tue May 26 09:22:54 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Tue, 26 May 2009 07:22:54 -0600 Subject: [Numpy-discussion] Segmentation fault on large arrays In-Reply-To: <1243324558.14931.14.camel@sulfur.loria.fr> References: <1243324558.14931.14.camel@sulfur.loria.fr> Message-ID: On Tue, May 26, 2009 at 1:55 AM, Nicolas Rougier wrote: > > Hello, > > I've come across what is probably a bug in size check for large arrays: > > >>> import numpy > >>> z1 = numpy.zeros((255*256,256*256)) > Traceback (most recent call last): > File "", line 1, in > ValueError: dimensions too large. > >>> z2 = numpy.zeros((256*256,256*256)) > >>> z2.shape > (65536, 65536) > >>> z2[0] = 0 > Segmentation fault > This one has been fixed. See ticket #1080 . Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From josef.pktd at gmail.com Tue May 26 11:07:21 2009 From: josef.pktd at gmail.com (josef.pktd at gmail.com) Date: Tue, 26 May 2009 11:07:21 -0400 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: <4A1BED10.7050801@american.edu> References: <4A1BED10.7050801@american.edu> Message-ID: <1cd32cbb0905260807y63d6f999r33c3182658f18ed7@mail.gmail.com> I rewrote irr to use the iterative solver instead of polynomial roots so that it can also handle large arrays. For 3000 values, I had to kill the current np.irr since I didn't want to wait longer than 10 minutes When writing the test, I found that npv is missing a "when" keyword, for the case when the first payment is immediate, i.e. in the present, and that broadcasting has problems: >>> np.npv(0.05, np.array([[1,1],[1,1]])) array([ 1.9047619 , 1.81405896]) >>> np.npv(0.05, np.array([[1,1],[1,1],[1,1]])) Traceback (most recent call last): File "", line 1, in np.npv(0.05, np.array([[1,1],[1,1],[1,1]])) File "C:\Programs\Python25\Lib\site-packages\numpy\lib\financial.py", line 449, in npv return (values / (1+rate)**np.arange(1,len(values)+1)).sum(axis=0) ValueError: shape mismatch: objects cannot be broadcast to a single shape -------------------------- Here is the changed version, that only looks for one root. I added an optional starting value as keyword argument (as in open office) but didn't make any other changes: def irr(values, start=None): """ Return the Internal Rate of Return (IRR). This is the rate of return that gives a net present value of 0.0. Parameters ---------- values : array_like, shape(N,) Input cash flows per time period. At least the first value would be negative to represent the investment in the project. Returns ------- out : float Internal Rate of Return for periodic input values. Examples -------- >>> np.irr([-100, 39, 59, 55, 20]) 0.2809484211599611 """ p = np.poly1d(values[::-1]) pd1 = np.polyder(p) if start is None: r = 0.99 # starting value, find polynomial root in neighborhood else: r = start # iterative solver for discount factor for i in range(10): r = r - p(r)/pd1(r) ## #res = np.roots(values[::-1]) ## # Find the root(s) between 0 and 1 ## mask = (res.imag == 0) & (res.real > 0) & (res.real <= 1) ## res = res[mask].real ## if res.size == 0: ## return np.nan rate = 1.0/r - 1 if rate.size == 1: rate = rate.item() return rate def test_irr(): v = [-150000, 15000, 25000, 35000, 45000, 60000] assert_almost_equal(irr(v), 0.0524, 2) nper = 300 #Number of periods freq = 5 #frequency of payment v = np.zeros(nper) v[1:nper+1:freq] = 1 # periodic payment v[0] = -4.3995180296393199 assert_almost_equal(irr(v), 0.05, 10) nper = 3000 #Number of periods freq = 5 #frequency of payment v = np.zeros(nper) v[1:nper+1:freq] = 1 # periodic payment v[0] = -4.3995199643479603 assert_almost_equal(irr(v), 0.05, 10) If this looks ok, I can write a proper patch. Josef From ferrell at diablotech.com Tue May 26 12:28:31 2009 From: ferrell at diablotech.com (Robert Ferrell) Date: Tue, 26 May 2009 10:28:31 -0600 Subject: [Numpy-discussion] add xirr to numpy financial functions? In-Reply-To: References: Message-ID: <3E23FC31-07B2-4373-9CE7-BB0412D9F281@diablotech.com> On May 25, 2009, at 10:59 PM, Joe Harrington wrote: > Let's keep this thread focussed on the original issue: > > just add a floating array of times to irr or a new xirr > continuous interest > no more > > Anyone can use the timeseries package to produce a floating array of > times from normal dates, if those are the dates they want. If they > want some specialized financial date, they may want a different > conversion, however. All we should provide in NumPy would be the > simplest tool. Specialized dates and date-time conversion belong > elsewhere. > > If we're *not* skipping dates, there is no need for xirr, just use > irr, which exists. > > scikits.financial seems like a great idea, and then knock yourselves > out for date conversions and definitions of compounding. Just think > big and design it first. But let's keep this thread on the simple > question for NumPy. My vote is against adding xirr to NumPy. In my experience, if you want internal rate of return, then you also want time weighted return, for instance, and all of sudden it becomes surprising that NumPy tantalizes with a some of the needed capability but not all of it. I read in an old thread that irr was included partly because OLPC was including NumPy and it was great that kids would have a tool to help them understand the present value of money. In my opinion, cumprod() is an even better teaching tool for that. I'm not advocating reducing functionality in NumPy, but I prefer the idea of keeping NumPy as an array core, and having higher-level capability available as add-ons (scipy, scikit, etc...) -r From andrea.gavana at gmail.com Tue May 26 15:27:01 2009 From: andrea.gavana at gmail.com (Andrea Gavana) Date: Tue, 26 May 2009 20:27:01 +0100 Subject: [Numpy-discussion] List/location of consecutive integers (2) Message-ID: Hi All, I have tried the solutions proposed in the previous thread and it looks like Chris' one is the fastest for my purposes. Now, I have a question which is probably more conceptual than implementation-related. I started this little thread as my task is to read medium to (relatively) big unformatted binary files written by another (black-box) software (which is written in Fortran). These files can range from 10 MB to 200 MB, more or less, and I read them using a f2py-wrapped Fortran subroutine. I got a stupendous speed improvement when I switched from Compaq Visual Fortran to G95 with "STREAM" access (from 8% to 90% faster, depending on the infamous "indices" I was talking about). Now, I was thinking about using the multiprocessing module in Python, as we have 4-cpus PCs at work and I could try to call my subroutine using multiple Python processes. I *really* should do this in Fortran directly but I haven't found any reference on how to do file I/O in parallel in Fortran and I haven't got any help from comp.lang.fortran in that sense (only a warning that I may slow down everything by using multiple processes). Splitting the reading process between 4 processes will require the exchange of 5-20 MB from the child processes to the main one: do you think my script will benefit from using multiprocessing? Is there any drawback in using Numpy arrays in multiple processes? If using multiprocessing in Python will create too much overhead, does anyone have any suggestion/reference/link/code on how to handle parallel I/O in Fortran directly? Should I try another approach? Thank you a lot for your suggestions. Andrea. "Imagination Is The Only Weapon In The War Against Reality." http://xoomer.alice.it/infinity77/ http://thedoomedcity.blogspot.com/ From doutriaux1 at llnl.gov Tue May 26 19:17:08 2009 From: doutriaux1 at llnl.gov (=?UTF-8?Q?Charles_=D8=B3=D9=85=D9=8A=D8=B1_Doutriaux?=) Date: Tue, 26 May 2009 16:17:08 -0700 Subject: [Numpy-discussion] casting bug Message-ID: <91B9B859-AB9D-4097-A5D2-236E10C9D308@llnl.gov> Hi there, One of our users just found a bug in numpy that has to do with casting. Consider the attached example. The difference at the end should be 0 (zero) everywhere. But it's not by default. Casting the data to 'float64' at reading and assiging to the arrays works Defining the arrays "at creation time" as "float32" works But simply reading in the data (flaot32) and putting them into the default arrays (float64) leads to differences! Thanks for looking into this, C. -------------- next part -------------- A non-text attachment was scrubbed... Name: squares.py Type: text/x-python-script Size: 507 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: numpy_test_data.nc Type: application/octet-stream Size: 1580 bytes Desc: not available URL: From robert.kern at gmail.com Tue May 26 19:28:08 2009 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 26 May 2009 18:28:08 -0500 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: <4A1B8308.4050604@noaa.gov> References: <000e0cd2a07010d73d046a71a597@google.com> <4A193C3A.1080309@stevesimmons.com> <3d375d730905241422j29b11d6dn46212a78c7ddfd56@mail.gmail.com> <4488200F-9E58-4844-88D5-08C03CDAB713@cs.toronto.edu> <4A1B79F9.8010203@noaa.gov> <3d375d730905252215p4ccf309ard7367fe7d39ef7d7@mail.gmail.com> <4A1B8308.4050604@noaa.gov> Message-ID: <3d375d730905261628o989afd3ia1383dd5f2fdbb77@mail.gmail.com> On Tue, May 26, 2009 at 00:50, Christopher Barker wrote: > Robert Kern wrote: >> Yes. That's why I wrote the NPY format instead. I *did* do some due >> diligence before I designed a new binary format. > > I assumed so, and I also assume you took a look at netcdf3, but since > it's been brought up here, I take it it dint fit the bill? Even if it > did, while it will be around for a LONG time, it is an out-of-date format. Lack of unsigned and 64-bit integers for the most part. But even if they were supported, I didn't see much point in using a standard that is being replaced by its own community. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From robert.kern at gmail.com Tue May 26 19:50:09 2009 From: robert.kern at gmail.com (Robert Kern) Date: Tue, 26 May 2009 18:50:09 -0500 Subject: [Numpy-discussion] casting bug In-Reply-To: <91B9B859-AB9D-4097-A5D2-236E10C9D308@llnl.gov> References: <91B9B859-AB9D-4097-A5D2-236E10C9D308@llnl.gov> Message-ID: <3d375d730905261650ob777821ob0a9a64495043e6e@mail.gmail.com> 2009/5/26 Charles ???? Doutriaux : > Hi there, > > One of our users just found a bug in numpy that has to do with casting. > > Consider the attached example. > > The difference at the end should be ?0 (zero) everywhere. > > But it's not by default. > > Casting the data to 'float64' at reading and assiging to the arrays works > Defining the arrays "at creation time" as "float32" works > > But simply reading in the data (flaot32) and putting them into the default > arrays (float64) leads to differences! That is probably not a good characterization of the code. You are doing a summation of squares of floats two different ways. The data is stored in the file with dtype=float32. You are reading in the data a "row" at a time (with the name "tmp" for future reference). You are keeping an accumulator ("ex2" with dtype=float64) and storing each row into an array shaped like the whole data in the file ("a" with dtype=float64). You calculate the sum of squares one way by "ex2 = ex2 + power(tmp, 2.0)". In this case, the square is computed, then stuffed back into a float32 intermediate result, *then* added to "ex2". You calculate the sum of squares the other way by "numpy.sum(power(a, 2.0))". In this case, the square is computed in full float64 precision, then added up. The difference between the two results is just that of using a low precision intermediate for the square. You have fairly large inputs so you are losing a good number of digits by using a float32 intermediate before the summation. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From Chris.Barker at noaa.gov Tue May 26 20:05:34 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 26 May 2009 17:05:34 -0700 Subject: [Numpy-discussion] Home for pyhdf5io? In-Reply-To: <3d375d730905261628o989afd3ia1383dd5f2fdbb77@mail.gmail.com> References: <000e0cd2a07010d73d046a71a597@google.com> <4A193C3A.1080309@stevesimmons.com> <3d375d730905241422j29b11d6dn46212a78c7ddfd56@mail.gmail.com> <4488200F-9E58-4844-88D5-08C03CDAB713@cs.toronto.edu> <4A1B79F9.8010203@noaa.gov> <3d375d730905252215p4ccf309ard7367fe7d39ef7d7@mail.gmail.com> <4A1B8308.4050604@noaa.gov> <3d375d730905261628o989afd3ia1383dd5f2fdbb77@mail.gmail.com> Message-ID: <4A1C83CE.3000506@noaa.gov> Robert Kern wrote: > On Tue, May 26, 2009 at 00:50, Christopher Barker wrote: >> I assumed so, and I also assume you took a look at netcdf3, but since >> it's been brought up here, I take it it didn't fit the bill? > Lack of unsigned and 64-bit integers for the most part. But even if > they were supported, I didn't see much point in using a standard that > is being replaced by its own community. I agree -- I, and many others, are using netcdf4 libs to work with netcdf3 files, but the change will come. So we have a technical and social reason not to use it. Good enough for me. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Chris.Barker at noaa.gov Tue May 26 20:07:39 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Tue, 26 May 2009 17:07:39 -0700 Subject: [Numpy-discussion] List/location of consecutive integers (2) In-Reply-To: References: Message-ID: <4A1C844B.8040307@noaa.gov> Andrea Gavana wrote: > I have tried the solutions proposed in the previous thread and it > looks like Chris' one is the fastest for my purposes. whoo hoo! What do I win? ;-) > Splitting the reading process between 4 processes will require the > exchange of 5-20 MB from the child processes to the main one: do you > think my script will benefit from using multiprocessing? If you are talking about multiprocessing to read the data in -- I don't think so -- that's probably IO bound anyway. You can't make your disks faster with multiple processors. > Should I try another approach? I don't know it will do anything for performance, but you might want to look at memory mapped arrays -- it's a very cool way to work with data files too big to want to bring into memory all at once. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From pchaow at gmail.com Wed May 27 01:55:09 2009 From: pchaow at gmail.com (chaow porkaew) Date: Wed, 27 May 2009 12:55:09 +0700 Subject: [Numpy-discussion] Can you teach me how to used array api in C/C++? Message-ID: Can you teach me how to used array api in C/C++? 1.How to get a data in co-ordinate i,j , example a = array([[1,2,3],[4,5,6]]) how do i get the value of 5 in c/c++ or 2.How i sum all of data in arrays in c/c++ -------------- next part -------------- An HTML attachment was scrubbed... URL: From pchaow at gmail.com Wed May 27 02:14:49 2009 From: pchaow at gmail.com (chaow porkaew) Date: Wed, 27 May 2009 13:14:49 +0700 Subject: [Numpy-discussion] help Can you teach me how to used array api in C/C+ Message-ID: Can you teach me how to used array api in C/C++? 1.How to get a data in co-ordinate i,j , example a = array([[1,2,3],[4,5,6]]) how do i get the value of 5 in c/c++ or 2.How i sum all of data in arrays in c/c++ Best regards. -------------- next part -------------- An HTML attachment was scrubbed... URL: From bsouthey at gmail.com Wed May 27 09:24:06 2009 From: bsouthey at gmail.com (Bruce Southey) Date: Wed, 27 May 2009 08:24:06 -0500 Subject: [Numpy-discussion] List/location of consecutive integers (2) In-Reply-To: <4A1C844B.8040307@noaa.gov> References: <4A1C844B.8040307@noaa.gov> Message-ID: <4A1D3EF6.70601@gmail.com> Christopher Barker wrote: > Andrea Gavana wrote: > >> I have tried the solutions proposed in the previous thread and it >> looks like Chris' one is the fastest for my purposes. >> > > whoo hoo! What do I win? ;-) > > >> Splitting the reading process between 4 processes will require the >> exchange of 5-20 MB from the child processes to the main one: do you >> think my script will benefit from using multiprocessing? >> > > If you are talking about multiprocessing to read the data in -- I don't > think so -- that's probably IO bound anyway. You can't make your disks > faster with multiple processors. > > >> Should I try another approach? >> > > I don't know it will do anything for performance, but you might want to > look at memory mapped arrays -- it's a very cool way to work with data > files too big to want to bring into memory all at once. > > -Chris > > > Depending on your system and OS, I would agree with Chris that you are most likely to be I/O bound. If so, you have to look at a different approach to overcome that barrier. If you are not I/O bound then you need to find out what is the limiting your performance (like using Robert Kern's line_profiler http://pypi.python.org/pypi/line_profiler/). If you find it CPU-bound then you might you gain benefits from multiple cpu's - of which has been addressed in multiple times on the list. Cython is probably a very viable option for what you have described. Bruce From david at ar.media.kyoto-u.ac.jp Wed May 27 09:08:00 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Wed, 27 May 2009 22:08:00 +0900 Subject: [Numpy-discussion] Best way to inherit from PyArrayIterObject at the C level ? Message-ID: <4A1D3B30.80304@ar.media.kyoto-u.ac.jp> Hi, I have been scratching my head on the following problem. I am designing a new array iterator, in C, to walk into a neighborhood of an array. I would like this iterator to 'inherit' from PyArrayIterObject, so that I can design some API which accept both PyArrayIterObject and PyArrayNeighIterObject (through pointer casting). I have tried something like: typedef struct { /* first item is the base class, so that casting a PyArrayNeighIterObject* to PyArrayIterObject* works */ PyArrayIterObject base; /* PyArrayNeighIterObject specific members */ .... } PyArrayNeighIterObject; But this forces me to cast a PyArrayNeighIterObject* to PyArrayIterObject whenever I want to access members of the base instance. The alternative is to copy the PyArrayIterObject members by hand, as is currently done in numpy itself (for broadcasting iterator PyArrayMapIterObject). But since my iterator lives outside numpy, this is really error-prone IMHO - there will be crashes whenever the PyArrayIterObject struct changes in an ABI incompatible way, and this may be quite hard to debug Is there a better way ? David From matthieu.brucher at gmail.com Wed May 27 09:51:15 2009 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Wed, 27 May 2009 15:51:15 +0200 Subject: [Numpy-discussion] Failure with 1.3 Message-ID: Hi, I've just tested the latest numpy with my new configuration (Opteron 2220, 64bits with RH5.2, compiled with ICC 10.1.018) and I got this failure. ====================================================================== FAIL: test_umath.TestLogAddExp2.test_logaddexp2_values [...] assert_almost_equal(np.logaddexp2(xf, yf), zf, decimal=dec) [...] raise AssertionError(msg) AssertionError: Arrays are not almost equal (mismatch 80.0%) x: array([ 2.32090838, 2.00127574, 2.5849625 , 2.00127574, 2.32090838]) y: array([ 2.5849625, 2.5849625, 2.5849625, 2.5849625, 2.5849625]) Is it fixed in the current trunk? Matthieu -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher From charlesr.harris at gmail.com Wed May 27 10:32:15 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 27 May 2009 08:32:15 -0600 Subject: [Numpy-discussion] Failure with 1.3 In-Reply-To: References: Message-ID: On Wed, May 27, 2009 at 7:51 AM, Matthieu Brucher < matthieu.brucher at gmail.com> wrote: > Hi, > > I've just tested the latest numpy with my new configuration (Opteron > 2220, 64bits with RH5.2, compiled with ICC 10.1.018) and I got this > failure. > > ====================================================================== > FAIL: test_umath.TestLogAddExp2.test_logaddexp2_values > [...] > assert_almost_equal(np.logaddexp2(xf, yf), zf, decimal=dec) > [...] > raise AssertionError(msg) > AssertionError: > Arrays are not almost equal > > (mismatch 80.0%) > x: array([ 2.32090838, 2.00127574, 2.5849625 , 2.00127574, > 2.32090838]) > y: array([ 2.5849625, 2.5849625, 2.5849625, 2.5849625, 2.5849625]) > > Is it fixed in the current trunk? > This is the first report. I'll guess it is related to icc. What happens if you use gcc? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From doutriaux1 at llnl.gov Wed May 27 11:14:53 2009 From: doutriaux1 at llnl.gov (=?UTF-8?Q?Charles_=D8=B3=D9=85=D9=8A=D8=B1_Doutriaux?=) Date: Wed, 27 May 2009 08:14:53 -0700 Subject: [Numpy-discussion] casting bug In-Reply-To: <3d375d730905261650ob777821ob0a9a64495043e6e@mail.gmail.com> References: <91B9B859-AB9D-4097-A5D2-236E10C9D308@llnl.gov> <3d375d730905261650ob777821ob0a9a64495043e6e@mail.gmail.com> Message-ID: <1619CC0D-DFE7-4167-856C-72D563D61EAB@llnl.gov> Thanks Robert, I thought it was something like that but couldn't figure it out. C. On May 26, 2009, at 4:50 PM, Robert Kern wrote: > 2009/5/26 Charles ???? Doutriaux : >> Hi there, >> >> One of our users just found a bug in numpy that has to do with >> casting. >> >> Consider the attached example. >> >> The difference at the end should be 0 (zero) everywhere. >> >> But it's not by default. >> >> Casting the data to 'float64' at reading and assiging to the arrays >> works >> Defining the arrays "at creation time" as "float32" works >> >> But simply reading in the data (flaot32) and putting them into the >> default >> arrays (float64) leads to differences! > > That is probably not a good characterization of the code. You are > doing a summation of squares of floats two different ways. The data is > stored in the file with dtype=float32. You are reading in the data a > "row" at a time (with the name "tmp" for future reference). You are > keeping an accumulator ("ex2" with dtype=float64) and storing each row > into an array shaped like the whole data in the file ("a" with > dtype=float64). > > You calculate the sum of squares one way by "ex2 = ex2 + power(tmp, > 2.0)". In this case, the square is computed, then stuffed back into a > float32 intermediate result, *then* added to "ex2". > > You calculate the sum of squares the other way by "numpy.sum(power(a, > 2.0))". In this case, the square is computed in full float64 > precision, then added up. > > The difference between the two results is just that of using a low > precision intermediate for the square. You have fairly large inputs so > you are losing a good number of digits by using a float32 intermediate > before the summation. > > -- > Robert Kern > > "I have come to believe that the whole world is an enigma, a harmless > enigma that is made terrible by our own mad attempt to interpret it as > though it had an underlying truth." > -- Umberto Eco > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http:// mail.scipy.org/mailman/listinfo/numpy-discussion From lubensch.proletariat.inc at gmail.com Wed May 27 11:12:44 2009 From: lubensch.proletariat.inc at gmail.com (cp) Date: Wed, 27 May 2009 15:12:44 +0000 (UTC) Subject: [Numpy-discussion] asarray() and PIL Message-ID: Hi, I'm using PIL for image processing, but lately I also try numpy for the flexibility and superior speed it offers. The first thing I noticed is that for an RGB image with height=1600 and width=1900 while img=Image.open('something.tif') img.size (1900,1600) then arr=asarray(img) arr.shape (1600,1900,3) This means that the array-image has 1600 color channels, 1900 image pixel rows and 3 image pixel columns. Why is that? if I reshape with arr.reshape(3,1900,1600) will there be a mix-up in pixel values and coordinates when compared to the initial PIL image? From stefan at sun.ac.za Wed May 27 11:19:20 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Wed, 27 May 2009 17:19:20 +0200 Subject: [Numpy-discussion] asarray() and PIL In-Reply-To: References: Message-ID: <9457e7c80905270819j362f1d55y81402df38267c169@mail.gmail.com> 2009/5/27 cp : > img=Image.open('something.tif') > img.size > (1900,1600) > > then > > arr=asarray(img) > arr.shape > (1600,1900,3) > > This means that the array-image has 1600 color channels, 1900 image pixel rows > and 3 image pixel columns. Why is that? No, it means that you have 1600 rows, 1900 columns and 3 colour channels. > if I reshape with > > arr.reshape(3,1900,1600) > > will there be a mix-up in pixel values and coordinates when compared to the > initial PIL image? You'll have to use np.rollaxis(img, -1) to get the shape you want. Regards St?fan From seb.haase at gmail.com Wed May 27 11:20:46 2009 From: seb.haase at gmail.com (Sebastian Haase) Date: Wed, 27 May 2009 17:20:46 +0200 Subject: [Numpy-discussion] asarray() and PIL In-Reply-To: References: Message-ID: On Wed, May 27, 2009 at 5:12 PM, cp wrote: > Hi, > I'm using PIL for image processing, but lately I also try numpy for the > flexibility and superior speed it offers. The first thing I noticed is that for > an RGB image with height=1600 and width=1900 while > > img=Image.open('something.tif') > img.size > (1900,1600) > > then > > arr=asarray(img) > arr.shape > (1600,1900,3) > > This means that the array-image has 1600 color channels, 1900 image pixel rows > and 3 image pixel columns. Why is that? > if I reshape with > > arr.reshape(3,1900,1600) > You should look into transpose if you prefer the the colors to be on the first axis instead of the last one -- that's what I like to do. Regards, Sebastian Haase From dsdale24 at gmail.com Wed May 27 11:30:52 2009 From: dsdale24 at gmail.com (Darren Dale) Date: Wed, 27 May 2009 11:30:52 -0400 Subject: [Numpy-discussion] suggestion for generalizing numpy functions In-Reply-To: References: Message-ID: Now that numpy-1.3 has been released, I was hoping I could engage the numpy developers and community concerning my suggestion to improve the ufunc wrapping mechanism. Currently, ufuncs call, on the way out, the __array_wrap__ method of the input array with the highest __array_priority__. There are use cases, like masked arrays or arrays with units, where it is imperative to run some code on the way in to the ufunc as well. MaskedArrays do this by reimplementing or wrapping ufuncs, but this approach puts some pretty severe constraints on subclassing. For example, in my Quantities package I have a Quantity object that derives from ndarray. It has been suggested that in order to make ufuncs work with Quantity, I should wrap numpy's built-in ufuncs. But I intend to make a MaskedQuantity object as well, deriving from MaskedArray, and would therefore have to wrap the MaskedArray ufuncs as well. If ufuncs would simply call a method both on the way in and on the way out, I think this would go a long way to improving this situation. I whipped up a simple proof of concept and posted it in this thread a while back. For example, a MaskedQuantity would implement a method like __gfunc_pre__ to check the validity of the units operation etc, and would then call MaskedArray.__gfunc_pre__ (if defined) to determine the domain etc. __gfunc_pre__ would return a dict containing any metadata the subclasses wish to provide based on the inputs, and that dict would be passed along with the inputs, output and context to __gfunc_post__, so postprocessing can be done (__gfunc_post__ replacing __array_wrap__). Of course, packages like MaskedArray may still wish to reimplement ufuncs, like Eric Firing is investigating right now. The point is that classes that dont care about the implementation of ufuncs, that only need to provide metadata based on the inputs and the output, can do so using this mechanism and can build upon other specialized arrays. I would really appreciate input from numpy developers and other interested parties. I would like to continue developing the Quantities package this summer, and have been approached by numerous people interested in using Quantities with sage, sympy, matplotlib. But I would prefer to improve the ufunc mechanism (or establish that there is no interest among the community to do so) so I can improve the package (or limit its scope) before making an official announcement. Thank you, Darren On Mon, Mar 9, 2009 at 5:37 PM, Darren Dale wrote: > On Mon, Mar 9, 2009 at 9:50 AM, Darren Dale wrote: > >> I spent some time over the weekend fixing a few bugs in numpy that were >> exposed when attempting to use ufuncs with ndarray subclasses. It got me >> thinking that, with relatively little work, numpy's functions could be made >> to be more general. For example, the numpy.ma module redefines many of >> the standard ufuncs in order to do some preprocessing before the builtin >> ufunc is called. Likewise, in the units/quantities package I have been >> working on, I would like to perform a dimensional analysis to make sure an >> operation is allowed before I call a ufunc that might change data in place. >> >> Imagine an ndarray subclass with methods like __gfunc_pre__ and >> __gfunc_post__. __gfunc_pre__ could accept the context that is currently >> provided to __array_wrap__ (the inputs and the function called), perform >> whatever preprocessing is desired, and maybe return a dictionary containing >> metadata. Numpy functions could then be wrapped with a decorator that 1) >> calls __gfunc_pre__ and obtain any metadata that is returned 2) calls the >> wrapped functions, and then 3) calls __gfunc_post__, which might be very >> similar to __array_wrap__ except that it would also accept the metadata >> created by __gfunc_pre__. >> >> In cases where the routines to be called by __gfunc_pre__ and _post__ >> depend on what function is called, the the subclass could implement routines >> and store them in a dictionary-like object that is keyed using the function >> called. I have been exploring this approach with Quantities and it seems to >> work well. For example: >> >> def __gfunc_pre__(self, gfunc, *args): >> try: >> return gfunc_pre_registry[gfunc](*args) >> except KeyError: >> return {} >> >> I think such an approach for generalizing numpy's functions could be >> implemented without being disruptive to the existing __array_wrap__ >> framework. The decorator would attempt to identify an input or output array >> to use to call __gfunc_pre__ and _post__. If it finds them, it uses them. If >> it doesnt find them, no harm done, the existing __array_wrap__ mechanisms >> are still in place if the wrapped function is a ufunc. >> >> One other nice feature: the metadata that is returned by __gfunc_pre__ >> could contain an optional flag that the decorator attempts to pass to the >> wrapped function so that __gfunc_pre__ and _post are not called for any >> decorated internal functions. That way the subclass could specify that >> __gfunc_pre__ and _post should be called only for the outer-most function. >> >> Comments? >> > > I'm attaching a proof of concept script, maybe it will better illustrate > what I am talking about. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nicolas.Rougier at loria.fr Wed May 27 11:31:20 2009 From: Nicolas.Rougier at loria.fr (Nicolas Rougier) Date: Wed, 27 May 2009 17:31:20 +0200 Subject: [Numpy-discussion] Benchmak on record arrays Message-ID: <1243438280.14931.30.camel@sulfur.loria.fr> Hi, I've written a very simple benchmark on recarrays: import numpy, time Z = numpy.zeros((100,100), dtype=numpy.float64) Z_fast = numpy.zeros((100,100), dtype=[('x',numpy.float64), ('y',numpy.int32)]) Z_slow = numpy.zeros((100,100), dtype=[('x',numpy.float64), ('y',numpy.bool)]) t = time.clock() for i in range(10000): Z*Z print time.clock()-t t = time.clock() for i in range(10000): Z_fast['x']*Z_fast['x'] print time.clock()-t t = time.clock() for i in range(10000): Z_slow['x']*Z_slow['x'] print time.clock()-t And got the following results: 0.23 0.37 3.96 Am I right in thinking that the last case is quite slow because of some memory misalignment between float64 and bool or is there some machinery behind that makes things slow in this case ? Should this be mentioned somewhere in the recarray documentation ? Nicolas From lubensch.proletariat.inc at gmail.com Wed May 27 11:33:56 2009 From: lubensch.proletariat.inc at gmail.com (cp) Date: Wed, 27 May 2009 15:33:56 +0000 (UTC) Subject: [Numpy-discussion] Numpy vs PIL in image statistics Message-ID: Testing the PIL vs numpy in calculating the mean value of each color channel of an image I timed the following. impil = Image.open("10.tif") imnum = asarray(impil) #in PIL for i in range(1,10): stats = ImageStat.Stat(impil) stats.mean # for numpy for i in range(1,10): imnum.reshape(-1,3).mean(axis=0) The image I tested initially is 2000x2000 RGB tif ~11mb in size. I set a timer in each for loop and measured the performance of numpy 7 times slower than PIL. When I did the the same with an 10x10 RGB tif and with 1000 cycles in for, numpy was 25 times faster than PIL. Why is that? Does mean or reshape, make a copy? From lubensch.proletariat.inc at gmail.com Wed May 27 11:50:13 2009 From: lubensch.proletariat.inc at gmail.com (cp) Date: Wed, 27 May 2009 15:50:13 +0000 (UTC) Subject: [Numpy-discussion] asarray() and PIL References: <9457e7c80905270819j362f1d55y81402df38267c169@mail.gmail.com> Message-ID: > > arr=asarray(img) > > arr.shape > > (1600,1900,3) > No, it means that you have 1600 rows, 1900 columns and 3 colour channels. According to scipy documentation at http://pages.physics.cornell.edu/~myers/teaching/ComputationalMethods/python/arrays.html you are right. In this case I import numpy where according to http://www.scipy.org/Tentative_NumPy_Tutorial and the Printing Arrays paragraph (also in http://www.scipy.org/Numpy_Example_List#reshape reshape example) the first number is the layer, the second the rows and the last the columns. Are all the above valid or am I missing something? Thanks From charlesr.harris at gmail.com Wed May 27 12:01:51 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 27 May 2009 10:01:51 -0600 Subject: [Numpy-discussion] Benchmak on record arrays In-Reply-To: <1243438280.14931.30.camel@sulfur.loria.fr> References: <1243438280.14931.30.camel@sulfur.loria.fr> Message-ID: On Wed, May 27, 2009 at 9:31 AM, Nicolas Rougier wrote: > > Hi, > > I've written a very simple benchmark on recarrays: > > import numpy, time > > Z = numpy.zeros((100,100), dtype=numpy.float64) > Z_fast = numpy.zeros((100,100), dtype=[('x',numpy.float64), > ('y',numpy.int32)]) > Z_slow = numpy.zeros((100,100), dtype=[('x',numpy.float64), > ('y',numpy.bool)]) > > t = time.clock() > for i in range(10000): Z*Z > print time.clock()-t > > t = time.clock() > for i in range(10000): Z_fast['x']*Z_fast['x'] > print time.clock()-t > > t = time.clock() > for i in range(10000): Z_slow['x']*Z_slow['x'] > print time.clock()-t > > > And got the following results: > 0.23 > 0.37 > 3.96 > > Am I right in thinking that the last case is quite slow because of some > memory misalignment between float64 and bool or is there some machinery > behind that makes things slow in this case ? Probably. Record arrays are stored like packed c structures and need to be unpacked by copying the bytes to aligned data types. > Should this be mentioned somewhere in the recarray documentation ? A note would be appropriate, yes. You should be able to do that, do you have edit permissions for the documentation? Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From Chris.Barker at noaa.gov Wed May 27 12:43:55 2009 From: Chris.Barker at noaa.gov (Christopher Barker) Date: Wed, 27 May 2009 09:43:55 -0700 Subject: [Numpy-discussion] asarray() and PIL In-Reply-To: References: <9457e7c80905270819j362f1d55y81402df38267c169@mail.gmail.com> Message-ID: <4A1D6DCB.6030404@noaa.gov> cp wrote: >>> arr=asarray(img) >>> arr.shape >>> (1600,1900,3) > >> No, it means that you have 1600 rows, 1900 columns and 3 colour channels. > > According to scipy documentation at > http://pages.physics.cornell.edu/~myers/teaching/ComputationalMethods/python/arrays.html > you are right. > > In this case I import numpy where according to > http://www.scipy.org/Tentative_NumPy_Tutorial and the Printing Arrays paragraph > (also in http://www.scipy.org/Numpy_Example_List#reshape reshape example) the > first number is the layer, the second the rows and the last the columns. > > Are all the above valid or am I missing something? I'm not sure what part of those docs you're referring to -- but they are probably both right. What you are missing is that numpy doesn't define for you what the axis mean, they just are: the zeroth axis is of length 1600 elements the first axis is of length 1900 elements the second axis is of length 3 elements what they represent is up to your application. In the case of importing from PIL, it is (height, width, rgb) (I think height and width get swapped due to how memory is laid out in PIL vs numpy) by default, numpy arrays are stored and worked with in C-array order, so that array layout has the pixels together in memory as rgb triples. Depending on what you are doing you may not want that. You can work with them any way you want: sub_region = arr[r1:r2, c1:c2, :] all_red = arr[:,:,0] but if you tend to work with all the red, or all the blue, etc, then it might be easier to re-arrange it to: (rgb, height, width) If you google a bit, you'll find various notes about working with image data in numpy. HTH, -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker at noaa.gov From Nicolas.Rougier at loria.fr Wed May 27 15:21:37 2009 From: Nicolas.Rougier at loria.fr (Nicolas Rougier) Date: Wed, 27 May 2009 21:21:37 +0200 Subject: [Numpy-discussion] Benchmak on record arrays In-Reply-To: References: <1243438280.14931.30.camel@sulfur.loria.fr> Message-ID: <90DEC74F-9C1A-4E12-8416-B16827934A1F@loria.fr> No, I don't have permission to edit. Nicolas On 27 May, 2009, at 18:01 , Charles R Harris wrote: > > > On Wed, May 27, 2009 at 9:31 AM, Nicolas Rougier > wrote: > > Hi, > > I've written a very simple benchmark on recarrays: > > import numpy, time > > Z = numpy.zeros((100,100), dtype=numpy.float64) > Z_fast = numpy.zeros((100,100), dtype=[('x',numpy.float64), > ('y',numpy.int32)]) > Z_slow = numpy.zeros((100,100), dtype=[('x',numpy.float64), > ('y',numpy.bool)]) > > t = time.clock() > for i in range(10000): Z*Z > print time.clock()-t > > t = time.clock() > for i in range(10000): Z_fast['x']*Z_fast['x'] > print time.clock()-t > > t = time.clock() > for i in range(10000): Z_slow['x']*Z_slow['x'] > print time.clock()-t > > > And got the following results: > 0.23 > 0.37 > 3.96 > > Am I right in thinking that the last case is quite slow because of > some > memory misalignment between float64 and bool or is there some > machinery > behind that makes things slow in this case ? > > Probably. Record arrays are stored like packed c structures and need > to be unpacked by copying the bytes to aligned data types. > > Should this be mentioned somewhere in the recarray documentation ? > > A note would be appropriate, yes. You should be able to do that, do > you have edit permissions for the documentation? > > Chuck > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nicolas.Rougier at loria.fr Wed May 27 15:26:08 2009 From: Nicolas.Rougier at loria.fr (Nicolas Rougier) Date: Wed, 27 May 2009 21:26:08 +0200 Subject: [Numpy-discussion] arrray/matrix nonzero() type Message-ID: <71AA343B-BC43-4A39-BF46-AE8D4A8746C7@loria.fr> Hi again, I have a problem with the nonzero() function for matrix. The following test program: import numpy, scipy.sparse Z = numpy.zeros((10,10)) Z[0,0] = Z[1,1] = 1 i = Z.nonzero() print i Zc = scipy.sparse.coo_matrix((Z[i],i)) Z = numpy.matrix(Z) i = Z.nonzero() print i Zc = scipy.sparse.coo_matrix((Z[i],i)) gives me: (array([0, 1]), array([0, 1])) (matrix([[0, 1]]), matrix([[0, 1]])) Traceback (most recent call last): File "test.py", line 13, in Zc = scipy.sparse.coo_matrix((Z[i],i)) File "/Volumes/Data/Local/lib/python2.6/site-packages/scipy/sparse/ coo.py", line 179, in __init__ self._check() File "/Volumes/Data/Local/lib/python2.6/site-packages/scipy/sparse/ coo.py", line 194, in _check nnz = self.nnz File "/Volumes/Data/Local/lib/python2.6/site-packages/scipy/sparse/ coo.py", line 187, in getnnz raise ValueError('row, column, and data arrays must have rank 1') ValueError: row, column, and data arrays must have rank 1 Is that the intended behavior ? How can I use nonzero with matrix to build the coo one ? Nicolas From wnbell at gmail.com Wed May 27 17:47:21 2009 From: wnbell at gmail.com (Nathan Bell) Date: Wed, 27 May 2009 17:47:21 -0400 Subject: [Numpy-discussion] arrray/matrix nonzero() type In-Reply-To: <71AA343B-BC43-4A39-BF46-AE8D4A8746C7@loria.fr> References: <71AA343B-BC43-4A39-BF46-AE8D4A8746C7@loria.fr> Message-ID: On Wed, May 27, 2009 at 3:26 PM, Nicolas Rougier wrote: > > Hi again, > > I ?have a problem with the nonzero() function for matrix. > > The following test program: > > import numpy, scipy.sparse > > Z = numpy.zeros((10,10)) > > i = Z.nonzero() > print i > Zc = scipy.sparse.coo_matrix((Z[i],i)) > > Z = numpy.matrix(Z) > i = Z.nonzero() > print i > Zc = scipy.sparse.coo_matrix((Z[i],i)) > > > Is that the intended behavior ? How can I use nonzero with matrix to > build the coo one ? > Even simpler, just do Zc = scipy.sparse.coo_matrix(Z) As of SciPy 0.7, all the sparse matrix constructors accept dense matrices and array-like objects. The problem with the matrix case is that Z[i] is rank-2 when a rank-1 array is expected. -- Nathan Bell wnbell at gmail.com http://graphics.cs.uiuc.edu/~wnbell/ From fperez.net at gmail.com Wed May 27 17:53:20 2009 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 27 May 2009 14:53:20 -0700 Subject: [Numpy-discussion] Nested recarrays with subarrays and loadtxt: a bug in loadtxt? Message-ID: Howdy, I'm wondering if the code below illustrates a bug in loadtxt, or just a 'live with it' limitation. I'm inlining it for ease of discussion, but the same code is attached to ensure that anyone willing to look at this can just download and run without pasting/whitespace issues. The code is, I hope, sufficiently commented to explain the problem in full detail. Thanks for any input! Cheers, f ### """Simple illustration of nested record arrays. Note: possible numpy.loadtxt bug?""" from StringIO import StringIO import numpy as np from numpy import array, dtype, loadtxt, recarray # Consider the task of loading data that is stored in plain text in a file such # as the string below, where the last block of numbers is meant to be # interpreted as a single 2x3 int array, whose field name in the resulting # structured array will be 'block'. txtdata = StringIO(""" # name x y block - 2x3 ints aaaa 1.0 8.0 1 2 3 4 5 6 aaaa 2.0 7.4 2 11 22 3 4 5 6 bbbb 3.5 8.5 3 0 22 44 5 6 aaaa 6.4 4.0 4 1 3 33 54 65 aaaa 8.8 4.1 5 5 3 4 44 77 bbbb 5.5 9.1 6 3 4 5 0 55 bbbb 7.7 8.5 7 2 3 4 5 66 """) # We make the dtype for it: dt = dtype(dict(names=['name','x','y','block'], formats=['S4',float,float,(int,(2,3))])) # And we load it with loadtxt and make a recarray version for convenience data = loadtxt(txtdata,dt) rdata = data.view(recarray) # Unfortunately, if we look at the block data, it repeats the first number # found. This seems to be a loadtxt bug: # In [176]: rdata.block[0,1] # Out[176]: array([1, 1, 1]) # we'd expect array([4, 5, 6]) if np.any(rdata.block[0,1] != array([4, 5, 6])): print 'WARNING: loadtxt bug??' # A workaround can be used by doing a second pass on the file, loading the # columns corresponding to the block as plain ints and doing a reassignment of # that data into the original data. # Rewind the data and reload only the 'block' of ints: txtdata.seek(0) block_data = loadtxt(txtdata,int,usecols=range(3,9)) # Let's work with a copy of the original so we can compare interactively... rdata2 = rdata.copy() # We assign to the block field in our real array the block_data one, # appropriately reshaped rdata2.block[:] = block_data.reshape(rdata.block.shape) # Same check as before, with the new one if np.any(rdata2.block[0,1] != array([4, 5, 6])): print 'WARNING: loadtxt bug??' else: print 'Second pass - data loaded OK.' -------------- next part -------------- A non-text attachment was scrubbed... Name: rec_nested.py Type: text/x-python Size: 2001 bytes Desc: not available URL: From pgmdevlist at gmail.com Wed May 27 18:01:30 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 27 May 2009 18:01:30 -0400 Subject: [Numpy-discussion] Nested recarrays with subarrays and loadtxt: a bug in loadtxt? In-Reply-To: References: Message-ID: <59D32033-6501-4837-97D0-DE5EFCB03FDE@gmail.com> On May 27, 2009, at 5:53 PM, Fernando Perez wrote: > Howdy, > > I'm wondering if the code below illustrates a bug in loadtxt, or just > a 'live with it' limitation. Have you tried np.lib.io.genfromtxt ? dt = dtype(dict(names=['name','x','y','block'], formats=['S4',float,float,(int,(2,3))])) txtdata = StringIO(""" # name x y block - 2x3 ints aaaa 1.0 8.0 1 2 3 4 5 6 aaaa 2.0 7.4 2 11 22 3 4 5 6 bbbb 3.5 8.5 3 0 22 44 5 6 aaaa 6.4 4.0 4 1 3 33 54 65 aaaa 8.8 4.1 5 5 3 4 44 77 bbbb 5.5 9.1 6 3 4 5 0 55 bbbb 7.7 8.5 7 2 3 4 5 66 """) alt_data = np.lib.io.genfromtxt(txtdata,dtype=dt) array([('aaaa', 1.0, 8.0, [[1, 1, 1], [1, 1, 1]]), ('aaaa', 2.0, 7.4000000000000004, [[2, 2, 2], [2, 2, 2]]), ('bbbb', 3.5, 8.5, [[3, 3, 3], [3, 3, 3]]), ('aaaa', 6.4000000000000004, 4.0, [[4, 4, 4], [4, 4, 4]]), ('aaaa', 8.8000000000000007, 4.0999999999999996, [[5, 5, 5], [5, 5, 5]]), ('bbbb', 5.5, 9.0999999999999996, [[6, 6, 6], [6, 6, 6]]), ('bbbb', 7.7000000000000002, 8.5, [[7, 7, 7], [7, 7, 7]])], dtype=[('name', '|S4'), ('x', ' References: <59D32033-6501-4837-97D0-DE5EFCB03FDE@gmail.com> Message-ID: Hi Pierre, On Wed, May 27, 2009 at 3:01 PM, Pierre GM wrote: > Have you tried np.lib.io.genfromtxt ? > I didn't know about it, but it has the same problem as loadtxt: In [5]: rdata.block[0,1] # incorrect Out[5]: array([1, 1, 1]) In [6]: alt_data.block[0,1] # same thing, still wrong Out[6]: array([1, 1, 1]) In [7]: rdata2.block[0,1] # with my manual workaround, this is right Out[7]: array([4, 5, 6]) The data is: # name x y block - 2x3 ints aaaa 1.0 8.0 1 2 3 4 5 6 ... so only rdata2 is correct, the others are repeating the first '1' throughout the entire block, which is the problem. Cheers, f From robert.kern at gmail.com Wed May 27 18:56:39 2009 From: robert.kern at gmail.com (Robert Kern) Date: Wed, 27 May 2009 17:56:39 -0500 Subject: [Numpy-discussion] Numpy vs PIL in image statistics In-Reply-To: References: Message-ID: <3d375d730905271556n3f5b1b9aw551e485a0ab0192@mail.gmail.com> On Wed, May 27, 2009 at 10:33, cp wrote: > Testing the PIL vs numpy in calculating the mean value of each color channel of > an image I timed the following. > > impil = Image.open("10.tif") > imnum = asarray(impil) > > #in PIL > for i in range(1,10): > ? ?stats = ImageStat.Stat(impil) > ? ?stats.mean > > # for numpy > for i in range(1,10): > ? ?imnum.reshape(-1,3).mean(axis=0) > > The image I tested initially is 2000x2000 RGB tif ~11mb in size. I set a timer > in each for loop and measured the performance of numpy 7 times slower than PIL. > When I did the the same with an 10x10 RGB tif and with 1000 cycles in for, numpy > was 25 times faster than PIL. Why is that? Does mean or reshape, make a copy? reshape() might if the array wasn't contiguous. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco From stefan at sun.ac.za Wed May 27 18:57:12 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 28 May 2009 00:57:12 +0200 Subject: [Numpy-discussion] Nested recarrays with subarrays and loadtxt: a bug in loadtxt? In-Reply-To: References: Message-ID: <9457e7c80905271557l4eb5e413t966d9f6beee05538@mail.gmail.com> Hi Fernando 2009/5/27 Fernando Perez : > I'm wondering if the code below illustrates a bug in loadtxt, or just > a 'live with it' limitation. I'm not sure whether this is a bug or not. By specifying the dtype > dt = dtype(dict(names=['name','x','y','block'], > formats=['S4',float,float,(int,(2,3))])) you are saying "column four contains 6 integers", which is a bit of a strange notion. If you want this to be interpreted as "the last 6 columns should be stored in block", then a simple modification to flatten_dtype should do the trick. Cheers St?fan From pgmdevlist at gmail.com Wed May 27 19:03:11 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 27 May 2009 19:03:11 -0400 Subject: [Numpy-discussion] Nested recarrays with subarrays and loadtxt: a bug in loadtxt? In-Reply-To: References: <59D32033-6501-4837-97D0-DE5EFCB03FDE@gmail.com> Message-ID: <736CE0B3-2F42-4861-9745-131EAC5DB5EA@gmail.com> On May 27, 2009, at 6:15 PM, Fernando Perez wrote: > Hi Pierre, > > On Wed, May 27, 2009 at 3:01 PM, Pierre GM > wrote: >> Have you tried np.lib.io.genfromtxt ? >> > > I didn't know about it, but it has the same problem as loadtxt: Oh yes indeed. Yet another case of "I-opened-my-mouth-too-soon'... OK, so there's a trick. Kinda: * Define a specific converter: def block_converter(values): # Convert the strings to int val = [int(_) for _ in values.split()] new = np.array(val, dtype=int).reshape(2,3) out = tuple([tuple(_) for _ in new]) return out * Now, make sure that the column-delimiter is set to '\t' and use the new converter data = genfromtxt(txtdata,dt, delimiter="\t", converters={3:block_converter}) That works if your second line is "aaaa 2.0 7.4 2 11 22 3 4 56" instead of "aaaa 2.0 7.4 2 11 22 3 4 5 6" (that is, if you have exactly 6 ints in the last entry, not 7). Note that youcould modify the converter to deal with that if needed. From fperez.net at gmail.com Wed May 27 19:09:08 2009 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 27 May 2009 16:09:08 -0700 Subject: [Numpy-discussion] Nested recarrays with subarrays and loadtxt: a bug in loadtxt? In-Reply-To: <9457e7c80905271557l4eb5e413t966d9f6beee05538@mail.gmail.com> References: <9457e7c80905271557l4eb5e413t966d9f6beee05538@mail.gmail.com> Message-ID: Hi Stefan, 2009/5/27 St?fan van der Walt : > Hi Fernando > > 2009/5/27 Fernando Perez : >> I'm wondering if the code below illustrates a bug in loadtxt, or just >> a 'live with it' limitation. > > I'm not sure whether this is a bug or not. > > By specifying the dtype > >> dt = dtype(dict(names=['name','x','y','block'], >> ? ? ? ? ? ? ? ?formats=['S4',float,float,(int,(2,3))])) > > you are saying "column four contains 6 integers", which is a bit of a > strange notion. ?If you want this to be interpreted as "the last 6 > columns should be stored in block", then a simple modification to > flatten_dtype should do the trick. Well, since dtypes allow for nesting full arrays in this fashion, where I can say that the 'block' field can have (2,3) shape, it seems like it would be nice to be able to express this nesting into loading of plain text files as well. The idea would be that any nested dtype like the above would be expanded out for reading purposes into columns, so that the dt spec is interpreted in the second form you provided. So I'd give it a mild +0.5 for this modification if it's indeed easy, since it seems to make loadtxt more convenient to use for this class of uses. But if people feel it's stretching things too far, there's always either the two-pass hack I used or the custom converter Pierre suggested... Thanks for the feedback! Cheers, f From fperez.net at gmail.com Wed May 27 19:10:16 2009 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 27 May 2009 16:10:16 -0700 Subject: [Numpy-discussion] Nested recarrays with subarrays and loadtxt: a bug in loadtxt? In-Reply-To: <736CE0B3-2F42-4861-9745-131EAC5DB5EA@gmail.com> References: <59D32033-6501-4837-97D0-DE5EFCB03FDE@gmail.com> <736CE0B3-2F42-4861-9745-131EAC5DB5EA@gmail.com> Message-ID: Hi Pierre, On Wed, May 27, 2009 at 4:03 PM, Pierre GM wrote: > Oh yes indeed. Yet another case of "I-opened-my-mouth-too-soon'... > > OK, so there's a trick. Kinda: > * Define a specific converter: > Thanks, that's an alternative, though I think I prefer my two-pass hack, though I can't quite really say why... Cheers, f From pgmdevlist at gmail.com Wed May 27 19:15:40 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Wed, 27 May 2009 19:15:40 -0400 Subject: [Numpy-discussion] Nested recarrays with subarrays and loadtxt: a bug in loadtxt? In-Reply-To: References: <59D32033-6501-4837-97D0-DE5EFCB03FDE@gmail.com> <736CE0B3-2F42-4861-9745-131EAC5DB5EA@gmail.com> Message-ID: <7679EB5C-D5B8-480D-BDF2-EBAC9347A0BA@gmail.com> On May 27, 2009, at 7:10 PM, Fernando Perez wrote: > Hi Pierre, > > On Wed, May 27, 2009 at 4:03 PM, Pierre GM > wrote: >> Oh yes indeed. Yet another case of "I-opened-my-mouth-too-soon'... >> >> OK, so there's a trick. Kinda: >> * Define a specific converter: >> > > Thanks, that's an alternative, though I think I prefer my two-pass > hack, though I can't quite really say why... Funny, I prefer mine ;) Seriously: there might be some overhead in your 2-pass method that might be inconvenient. Some timing would be needed... From stefan at sun.ac.za Wed May 27 19:29:14 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 28 May 2009 01:29:14 +0200 Subject: [Numpy-discussion] Nested recarrays with subarrays and loadtxt: a bug in loadtxt? In-Reply-To: References: <9457e7c80905271557l4eb5e413t966d9f6beee05538@mail.gmail.com> Message-ID: <9457e7c80905271629x3ef89819s163f7f700966721d@mail.gmail.com> Hi Fernando 2009/5/28 Fernando Perez : > Well, since dtypes allow for nesting full arrays in this fashion, > where I can say that the 'block' field can have (2,3) shape, it seems > like it would be nice to be able to express this nesting into loading > of plain text files as well. I think that would be very useful. Please verify whether http://projects.scipy.org/numpy/changeset/7022 does the trick! Cheers St?fan From fperez.net at gmail.com Wed May 27 19:41:01 2009 From: fperez.net at gmail.com (Fernando Perez) Date: Wed, 27 May 2009 16:41:01 -0700 Subject: [Numpy-discussion] Nested recarrays with subarrays and loadtxt: a bug in loadtxt? In-Reply-To: <9457e7c80905271629x3ef89819s163f7f700966721d@mail.gmail.com> References: <9457e7c80905271557l4eb5e413t966d9f6beee05538@mail.gmail.com> <9457e7c80905271629x3ef89819s163f7f700966721d@mail.gmail.com> Message-ID: 2009/5/27 St?fan van der Walt : > Hi Fernando > > 2009/5/28 Fernando Perez : >> Well, since dtypes allow for nesting full arrays in this fashion, >> where I can say that the 'block' field can have (2,3) shape, it seems >> like it would be nice to be able to express this nesting into loading >> of plain text files as well. > > I think that would be very useful. ?Please verify whether > > http://projects.scipy.org/numpy/changeset/7022 > > does the trick! beeooteefool! No warnings now: uqbar[recarray]> python rec_nested.py Second pass - data loaded OK. This is great, many thanks :) Cheers, f From charlesr.harris at gmail.com Wed May 27 21:38:21 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Wed, 27 May 2009 19:38:21 -0600 Subject: [Numpy-discussion] Benchmak on record arrays In-Reply-To: <90DEC74F-9C1A-4E12-8416-B16827934A1F@loria.fr> References: <1243438280.14931.30.camel@sulfur.loria.fr> <90DEC74F-9C1A-4E12-8416-B16827934A1F@loria.fr> Message-ID: On Wed, May 27, 2009 at 1:21 PM, Nicolas Rougier wrote: > > > No, I don't have permission to edit. > Nicolas > You should ask for it then. Email stephan at . The docs are here . Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From faltet at pytables.org Thu May 28 02:44:48 2009 From: faltet at pytables.org (Francesc Alted) Date: Thu, 28 May 2009 08:44:48 +0200 Subject: [Numpy-discussion] numpy ufuncs and COREPY - any info? In-Reply-To: <4A1BEB3F.2020809@indiana.edu> References: <4A1B932C.30504@ar.media.kyoto-u.ac.jp> <4A1BEB3F.2020809@indiana.edu> Message-ID: <200905280844.48853.faltet@pytables.org> A Tuesday 26 May 2009 15:14:39 Andrew Friedley escrigu?: > David Cournapeau wrote: > > Francesc Alted wrote: > >> Well, it is Andrew who should demonstrate that his measurement is > >> correct, but in principle, 4 cycles/item *should* be feasible when using > >> 8 cores in parallel. > > > > But the 100x speed increase is for one core only unless I misread the > > table. And I should have mentioned that 400 cycles/item for cos is on a > > pentium 4, which has dreadful performances (defective L1). On a much > > better core duo extreme something, I get 100 cycles / item (on a 64 bits > > machines, though, and not same compiler, although I guess the libm > > version is what matters the most here). > > > > And let's not forget that there is the python wrapping cost: by doing > > everything in C, I got ~ 200 cycle/cos on the PIV, and ~60 cycles/cos on > > the core 2 duo (for double), using the rdtsc performance counter. All > > this for 1024 items in the array, so very optimistic usecase (everything > > in cache 2 if not 1). > > > > This shows that python wrapping cost is not so high, making the 100x > > claim a bit doubtful without more details on the way to measure speed. > > I appreciate all the discussion this is creating. I wish I could work > on this more right now; I have a big paper deadline coming up June 1 > that I need to focus on. > > Yes, you're reading the table right. I should have been more clear on > what my implementation is doing. It's using SIMD, so performing 4 > cosine's at a time where a libm cosine is only doing one. Also I don't > think libm trancendentals are known for being fast; I'm also likely > gaining performance by using a well-optimized but less accurate > approximation. In fact a little more inspection shows my accuracy > decreases as the input values increase; I will probably need to take a > performance hit to fix this. > > I went and wrote code to use the libm fcos() routine instead of my cos > code. Performance is equivalent to numpy, plus an overhead: > > inp sizes 1024 10240 102400 1024000 3072000 > numpy 0.7282 9.6278 115.5976 993.5738 3017.3680 > > lmcos 1 0.7594 9.7579 116.7135 1039.5783 3156.8371 > lmcos 2 0.5274 5.7885 61.8052 537.8451 1576.2057 > lmcos 4 0.5172 5.1240 40.5018 313.2487 791.9730 > > corepy 1 0.0142 0.0880 0.9566 9.6162 28.4972 > corepy 2 0.0342 0.0754 0.6991 6.1647 15.3545 > corepy 4 0.0596 0.0963 0.5671 4.9499 13.8784 > > > The times I show are in milliseconds; the system used is a dual-socket > dual-core 2ghz opteron. I'm testing at the ufunc level, like this: > > def benchmark(fn, args): > avgtime = 0 > fn(*args) > > for i in xrange(7): > t1 = time.time() > fn(*args) > t2 = time.time() > > tm = t2 - t1 > avgtime += tm > > return avgtime / 7 > > Where fn is a ufunc, ie numpy.cos. So I prime the execution once, then > do 7 timings and take the average. I always appreciate suggestions on > better way to benchmark things. No, that seems good enough. But maybe you can present results in cycles/item. This is a relatively common unit and has the advantage that it does not depend on the frequency of your cores. -- Francesc Alted From david at ar.media.kyoto-u.ac.jp Thu May 28 03:02:12 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Thu, 28 May 2009 16:02:12 +0900 Subject: [Numpy-discussion] numpy ufuncs and COREPY - any info? In-Reply-To: <200905280844.48853.faltet@pytables.org> References: <4A1B932C.30504@ar.media.kyoto-u.ac.jp> <4A1BEB3F.2020809@indiana.edu> <200905280844.48853.faltet@pytables.org> Message-ID: <4A1E36F4.2010209@ar.media.kyoto-u.ac.jp> Francesc Alted wrote: > A Tuesday 26 May 2009 15:14:39 Andrew Friedley escrigu?: > >> David Cournapeau wrote: >> >>> Francesc Alted wrote: >>> >>>> Well, it is Andrew who should demonstrate that his measurement is >>>> correct, but in principle, 4 cycles/item *should* be feasible when using >>>> 8 cores in parallel. >>>> >>> But the 100x speed increase is for one core only unless I misread the >>> table. And I should have mentioned that 400 cycles/item for cos is on a >>> pentium 4, which has dreadful performances (defective L1). On a much >>> better core duo extreme something, I get 100 cycles / item (on a 64 bits >>> machines, though, and not same compiler, although I guess the libm >>> version is what matters the most here). >>> >>> And let's not forget that there is the python wrapping cost: by doing >>> everything in C, I got ~ 200 cycle/cos on the PIV, and ~60 cycles/cos on >>> the core 2 duo (for double), using the rdtsc performance counter. All >>> this for 1024 items in the array, so very optimistic usecase (everything >>> in cache 2 if not 1). >>> >>> This shows that python wrapping cost is not so high, making the 100x >>> claim a bit doubtful without more details on the way to measure speed. >>> >> I appreciate all the discussion this is creating. I wish I could work >> on this more right now; I have a big paper deadline coming up June 1 >> that I need to focus on. >> >> Yes, you're reading the table right. I should have been more clear on >> what my implementation is doing. It's using SIMD, so performing 4 >> cosine's at a time where a libm cosine is only doing one. Also I don't >> think libm trancendentals are known for being fast; I'm also likely >> gaining performance by using a well-optimized but less accurate >> approximation. In fact a little more inspection shows my accuracy >> decreases as the input values increase; I will probably need to take a >> performance hit to fix this. >> >> I went and wrote code to use the libm fcos() routine instead of my cos >> code. Performance is equivalent to numpy, plus an overhead: >> >> inp sizes 1024 10240 102400 1024000 3072000 >> numpy 0.7282 9.6278 115.5976 993.5738 3017.3680 >> >> lmcos 1 0.7594 9.7579 116.7135 1039.5783 3156.8371 >> lmcos 2 0.5274 5.7885 61.8052 537.8451 1576.2057 >> lmcos 4 0.5172 5.1240 40.5018 313.2487 791.9730 >> >> corepy 1 0.0142 0.0880 0.9566 9.6162 28.4972 >> corepy 2 0.0342 0.0754 0.6991 6.1647 15.3545 >> corepy 4 0.0596 0.0963 0.5671 4.9499 13.8784 >> >> >> The times I show are in milliseconds; the system used is a dual-socket >> dual-core 2ghz opteron. I'm testing at the ufunc level, like this: >> >> def benchmark(fn, args): >> avgtime = 0 >> fn(*args) >> >> for i in xrange(7): >> t1 = time.time() >> fn(*args) >> t2 = time.time() >> >> tm = t2 - t1 >> avgtime += tm >> >> return avgtime / 7 >> >> Where fn is a ufunc, ie numpy.cos. So I prime the execution once, then >> do 7 timings and take the average. I always appreciate suggestions on >> better way to benchmark things. >> > > No, that seems good enough. But maybe you can present results in cycles/item. > This is a relatively common unit and has the advantage that it does not depend > on the frequency of your cores. > (it seems that I do not receive all emails - I never get the emails from Andrew ?) Concerning the timing: I think generally, you should report the minimum, not the average. The numbers for numpy are strange: 3s to compute 3e6 cos on a 2Ghz core duo (~2000 cycles/item) is very slow. In that sense, taking 20 cycles/item for your optimized version is much more believable, though :) I know the usual libm functions are not super fast, specially if high accuracy is not needed. Music softwares and games usually go away with approximations which are quite fast (.e.g using cos+sin evaluation at the same time), but those are generally unacceptable for scientific usage. I think it is critical to always check the result of your implementation, because getting something fast but wrong can waste a lot of your time :) One thing which may be hard to do is correct nan/inf handling. I don't know how SIMD extensions handle this. cheers, David From matthieu.brucher at gmail.com Thu May 28 03:49:31 2009 From: matthieu.brucher at gmail.com (Matthieu Brucher) Date: Thu, 28 May 2009 09:49:31 +0200 Subject: [Numpy-discussion] Failure with 1.3 In-Reply-To: References: Message-ID: > This is the first report. I'll guess it is related to icc. What happens if > you use gcc? Indeed, with gcc4.1, the error isn't there. Matthieu -- Information System Engineer, Ph.D. Website: http://matthieu-brucher.developpez.com/ Blogs: http://matt.eifelle.com and http://blog.developpez.com/?blog=92 LinkedIn: http://www.linkedin.com/in/matthieubrucher From lubensch.proletariat.inc at gmail.com Thu May 28 04:51:09 2009 From: lubensch.proletariat.inc at gmail.com (cp) Date: Thu, 28 May 2009 08:51:09 +0000 (UTC) Subject: [Numpy-discussion] Numpy vs PIL in image statistics References: <3d375d730905271556n3f5b1b9aw551e485a0ab0192@mail.gmail.com> Message-ID: >> The image I tested initially is 2000x2000 RGB tif ~11mb in size. I continued testing, with the initial PIL approach and 3 alternative numpy scripts: #Script 1 - indexing for i in range(10): imarr[:,:,0].mean() imarr[:,:,1].mean() imarr[:,:,2].mean() #Script 2 - slicing for i in range(10): imarr[:,:,0:1].mean() imarr[:,:,1:2].mean() imarr[:,:,2:3].mean() #Script 3 - reshape for i in range(10): imarr.reshape(-1,3).mean(axis=0) #Script 4 - PIL for i in range(10): stats = ImageStat.stat(img) stats.mean After profiling the four scripts separately I got the following script 1: 5.432sec script 2: 10.234sec script 3: 4.980sec script 4: 0.741sec when I profiled scripts 1-3 without calculating the mean, I got similar results of about 0.45sec for 1000 cycles, meaning that even if there is a copy involved the time required is only a small fraction of the whole procedure.Getting back to my initial statement I cannot explain why PIL is very fast in calculations for whole images, but very slow in calculations of small sub-images. From david at ar.media.kyoto-u.ac.jp Thu May 28 04:40:04 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Thu, 28 May 2009 17:40:04 +0900 Subject: [Numpy-discussion] Numpy vs PIL in image statistics In-Reply-To: References: <3d375d730905271556n3f5b1b9aw551e485a0ab0192@mail.gmail.com> Message-ID: <4A1E4DE4.6090200@ar.media.kyoto-u.ac.jp> cp wrote: >>> The image I tested initially is 2000x2000 RGB tif ~11mb in size. >>> > I continued testing, with the initial PIL approach > and 3 alternative numpy scripts: > > #Script 1 - indexing > for i in range(10): > imarr[:,:,0].mean() > imarr[:,:,1].mean() > imarr[:,:,2].mean() > > #Script 2 - slicing > for i in range(10): > imarr[:,:,0:1].mean() > imarr[:,:,1:2].mean() > imarr[:,:,2:3].mean() > > #Script 3 - reshape > for i in range(10): > imarr.reshape(-1,3).mean(axis=0) > > #Script 4 - PIL > for i in range(10): > stats = ImageStat.stat(img) > stats.mean > > After profiling the four scripts separately I got the following > script 1: 5.432sec > script 2: 10.234sec > script 3: 4.980sec > script 4: 0.741sec > > when I profiled scripts 1-3 without calculating the mean, I got similar > results of about 0.45sec for 1000 cycles, meaning that even if there > is a copy involved the time required is only a small fraction of the whole > procedure.Getting back to my initial statement I cannot explain why PIL > is very fast in calculations for whole images, but very slow in > calculations of small sub-images. > I don't know anything about PIL and its implementation, but I would not be surprised if the cost is mostly accessing items which are not contiguous in memory and bounds checking ( to check where you are in the subimage). Conditional inside loops often kills performances, and the actual computation (one addition/item for naive average implementation) is negligeable in this case. cheers, David From lubensch.proletariat.inc at gmail.com Thu May 28 05:10:28 2009 From: lubensch.proletariat.inc at gmail.com (cp) Date: Thu, 28 May 2009 09:10:28 +0000 (UTC) Subject: [Numpy-discussion] Numpy vs PIL in image statistics References: <3d375d730905271556n3f5b1b9aw551e485a0ab0192@mail.gmail.com> <4A1E4DE4.6090200@ar.media.kyoto-u.ac.jp> Message-ID: > I don't know anything about PIL and its implementation, but I would not > be surprised if the cost is mostly accessing items which are not > contiguous in memory and bounds checking ( to check where you are in the > subimage). Conditional inside loops often kills performances, and the > actual computation (one addition/item for naive average implementation) > is negligeable in this case. This would definitely be the case in sub-images. However, coming back to my original question, how do you explain that although PIL is extremely fast in large images (2000x2000), numpy is much faster when it comes down to very small images (I tested with 10x10 image files)? From stefan at sun.ac.za Thu May 28 05:21:32 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Thu, 28 May 2009 11:21:32 +0200 Subject: [Numpy-discussion] Benchmak on record arrays In-Reply-To: <90DEC74F-9C1A-4E12-8416-B16827934A1F@loria.fr> References: <1243438280.14931.30.camel@sulfur.loria.fr> <90DEC74F-9C1A-4E12-8416-B16827934A1F@loria.fr> Message-ID: <9457e7c80905280221p75b8d3d6m9fdf96e908e2b5ff@mail.gmail.com> Hi Nicolas 2009/5/27 Nicolas Rougier : > No, I don't have permission to edit. Thanks for helping out with the docs! Please create an account on docs.scipy.org and give me a shout when you're done. Cheers St?fan From afriedle at indiana.edu Thu May 28 08:34:11 2009 From: afriedle at indiana.edu (Andrew Friedley) Date: Thu, 28 May 2009 08:34:11 -0400 Subject: [Numpy-discussion] numpy ufuncs and COREPY - any info? In-Reply-To: <4A1E36F4.2010209@ar.media.kyoto-u.ac.jp> References: <4A1B932C.30504@ar.media.kyoto-u.ac.jp> <4A1BEB3F.2020809@indiana.edu> <200905280844.48853.faltet@pytables.org> <4A1E36F4.2010209@ar.media.kyoto-u.ac.jp> Message-ID: <4A1E84C3.1040909@indiana.edu> David Cournapeau wrote: > Francesc Alted wrote: >> No, that seems good enough. But maybe you can present results in cycles/item. >> This is a relatively common unit and has the advantage that it does not depend >> on the frequency of your cores. Sure, cycles is fine, but I'll argue that in this case the number still does depend on the frequency of the cores, particularly as it relates to the frequency of the memory bus/controllers. A processor with a higher clock rate and higher multiplier may show lower performance when measuring in cycles because the memory bandwidth has not necessarily increased, only the CPU clock rate. Plus between say a xeon and opteron you will have different SSE performance characteristics. So really, any sole number/unit is not sufficient without also describing the system it was obtained on :) > (it seems that I do not receive all emails - I never get the emails from > Andrew ?) I seem to have issues with my emails just disappearing; sometimes they never appear on the list and I have to re-send them. > Concerning the timing: I think generally, you should report the minimum, > not the average. The numbers for numpy are strange: 3s to compute 3e6 > cos on a 2Ghz core duo (~2000 cycles/item) is very slow. In that sense, > taking 20 cycles/item for your optimized version is much more > believable, though :) I can do minimum. My motivation for average was to show a common-case performance an application might see. If that application executes the ufunc many times, the performance will tend towards the average. > I know the usual libm functions are not super fast, specially if high > accuracy is not needed. Music softwares and games usually go away with > approximations which are quite fast (.e.g using cos+sin evaluation at > the same time), but those are generally unacceptable for scientific > usage. I think it is critical to always check the result of your > implementation, because getting something fast but wrong can waste a lot > of your time :) One thing which may be hard to do is correct nan/inf > handling. I don't know how SIMD extensions handle this. I was waiting for someone to bring this up :) I used an implementation that I'm now thinking is not accurate enough for scientific use. But the question is, what is a concrete measure for determining whether some cosine (or other function) implementation is accurate enough? I guess we have precedent in the form of libm's implementation/accuracy tradeoffs, but is that precedent correct? Really answering that question, and coming up with the best possible implementations that meet the requirements, is probably a GSoC project on its own. Andrew From Nicolas.Rougier at loria.fr Thu May 28 10:14:12 2009 From: Nicolas.Rougier at loria.fr (Nicolas Rougier) Date: Thu, 28 May 2009 16:14:12 +0200 Subject: [Numpy-discussion] sparse matrix dot product Message-ID: <1243520052.14931.36.camel@sulfur.loria.fr> Hi, I'm now testing dot product and using the following: import numpy as np, scipy.sparse as sp A = np.matrix(np.zeros((5,10))) B = np.zeros((10,1)) print (A*B).shape print np.dot(A,B).shape A = sp.csr_matrix(np.zeros((5,10))) B = sp.csr_matrix((10,1)) print (A*B).shape print np.dot(A,B).shape A = sp.csr_matrix(np.zeros((5,10))) B = np.zeros((10,1)) print (A*B).shape print np.dot(A,B).shape I got: (5, 1) (5, 1) (5, 1) (5, 1) (5, 1) (10, 1) Obviously, the last computation is not a dot product, but I got no warning at all. Is that the expected behavior ? By the way, I wrote a speed benchmark for dot product using the different flavors of sparse matrices and I wonder if it should go somewhere in documentation (in anycase, if anyone interested, I can post the benchmark program and result). Nicolas From Nicolas.Rougier at loria.fr Thu May 28 10:14:57 2009 From: Nicolas.Rougier at loria.fr (Nicolas Rougier) Date: Thu, 28 May 2009 16:14:57 +0200 Subject: [Numpy-discussion] Benchmak on record arrays In-Reply-To: <9457e7c80905280221p75b8d3d6m9fdf96e908e2b5ff@mail.gmail.com> References: <1243438280.14931.30.camel@sulfur.loria.fr> <90DEC74F-9C1A-4E12-8416-B16827934A1F@loria.fr> <9457e7c80905280221p75b8d3d6m9fdf96e908e2b5ff@mail.gmail.com> Message-ID: <1243520097.14931.37.camel@sulfur.loria.fr> I just created the account. Nicolas On Thu, 2009-05-28 at 11:21 +0200, St?fan van der Walt wrote: > Hi Nicolas > > 2009/5/27 Nicolas Rougier : > > No, I don't have permission to edit. > > Thanks for helping out with the docs! Please create an account on > docs.scipy.org and give me a shout when you're done. > > Cheers > St?fan > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion From sebastian.walter at gmail.com Thu May 28 10:26:28 2009 From: sebastian.walter at gmail.com (Sebastian Walter) Date: Thu, 28 May 2009 16:26:28 +0200 Subject: [Numpy-discussion] sparse matrix dot product In-Reply-To: <1243520052.14931.36.camel@sulfur.loria.fr> References: <1243520052.14931.36.camel@sulfur.loria.fr> Message-ID: I'd be interested to see the benchmark ;) On Thu, May 28, 2009 at 4:14 PM, Nicolas Rougier wrote: > > Hi, > > I'm now testing dot product and using the following: > > import numpy as np, scipy.sparse as sp > > A = np.matrix(np.zeros((5,10))) > B = np.zeros((10,1)) > print (A*B).shape > print np.dot(A,B).shape > > A = sp.csr_matrix(np.zeros((5,10))) > B = sp.csr_matrix((10,1)) > print (A*B).shape > print np.dot(A,B).shape > > A = sp.csr_matrix(np.zeros((5,10))) > B = np.zeros((10,1)) > print (A*B).shape > print np.dot(A,B).shape > > > I got: > > (5, 1) > (5, 1) > (5, 1) > (5, 1) > (5, 1) > (10, 1) > > > Obviously, the last computation is not a dot product, but I got no > warning at all. Is that the expected behavior ? > > > By the way, I wrote a speed benchmark for dot product using the > different flavors of sparse matrices and I wonder if it should go > somewhere in documentation (in anycase, if anyone interested, I can post > the benchmark program and result). > > Nicolas > > > > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > From david.froger.info at gmail.com Thu May 28 10:32:49 2009 From: david.froger.info at gmail.com (David Froger) Date: Thu, 28 May 2009 16:32:49 +0200 Subject: [Numpy-discussion] example reading binary Fortran file In-Reply-To: References: <6df541a5c8d6ecf26b6e38f404958401.squirrel@webmail.uio.no> <200902031015.17490.faltet@pytables.org> Message-ID: Hy Neil Martinsen-Burrell, I'm trying the FortranFile class, http://www.scipy.org/Cookbook/FortranIO/FortranFile It looks like there are some bug in the last revision (7): * There are errors cause by lines 60,61,63 in * There are indentation errors on lines 97 and 113. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rogpeppe at gmail.com Thu May 28 10:39:01 2009 From: rogpeppe at gmail.com (roger peppe) Date: Thu, 28 May 2009 15:39:01 +0100 Subject: [Numpy-discussion] convert between structure arrays with different record orderings? Message-ID: hi, sorry, i'm new to the list, and if this is a frequently asked question, please point me in the right direction. say, for some reason i've got two numpy structure arrays that both contain the same fields with the same types but in a different order, is there a simple way to convert one to the other type? i'd have thought that astype() might do the job, but it seems to ignore the field names entirely, thus values are lost in the following transcript: > t1 = np.dtype([('foo', np.float64), ('bar', '?')]) > t2 = np.dtype([('bar', '?'), ('foo', np.float64)]) > a1 = np.array([(123, False), (654, True)], dtype=t1) > a1.astype(t2) array([(True, 0.0), (True, 1.0)], dtype=[('bar', '|b1'), ('foo', ' thanks for any help, rog. From nmb at wartburg.edu Thu May 28 12:27:49 2009 From: nmb at wartburg.edu (Neil Martinsen-Burrell) Date: Thu, 28 May 2009 11:27:49 -0500 Subject: [Numpy-discussion] example reading binary Fortran file In-Reply-To: References: <6df541a5c8d6ecf26b6e38f404958401.squirrel@webmail.uio.no> <200902031015.17490.faltet@pytables.org> Message-ID: <4A1EBB85.5080706@wartburg.edu> On 2009-05-28 09:32 , David Froger wrote: > Hy Neil Martinsen-Burrell, > > I'm trying the FortranFile class, > http://www.scipy.org/Cookbook/FortranIO/FortranFile > > It looks like there are some bug in the last revision (7): > > * There are errors cause by lines 60,61,63 in > > * There are indentation errors on lines 97 and 113. There seem to have been some problems in putting the file on the wiki ("Proxy-Connection: keep-alive\nCache-Control: max-age=0" seems to come from an HTML communication). I've attached my current version of the file to this email. Let me know if you have problems with this. I will try to get the working version up on the wiki. Peace, -Neil -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fortranfile.py URL: From david.froger.info at gmail.com Thu May 28 13:11:31 2009 From: david.froger.info at gmail.com (David Froger) Date: Thu, 28 May 2009 19:11:31 +0200 Subject: [Numpy-discussion] example reading binary Fortran file In-Reply-To: <4A1EBB85.5080706@wartburg.edu> References: <6df541a5c8d6ecf26b6e38f404958401.squirrel@webmail.uio.no> <200902031015.17490.faltet@pytables.org> <4A1EBB85.5080706@wartburg.edu> Message-ID: Thank you very much :-) 2009/5/28 Neil Martinsen-Burrell > On 2009-05-28 09:32 , David Froger wrote: > >> Hy Neil Martinsen-Burrell, >> >> I'm trying the FortranFile class, >> http://www.scipy.org/Cookbook/FortranIO/FortranFile >> >> It looks like there are some bug in the last revision (7): >> >> * There are errors cause by lines 60,61,63 in >> >> * There are indentation errors on lines 97 and 113. >> > > There seem to have been some problems in putting the file on the wiki > ("Proxy-Connection: keep-alive\nCache-Control: max-age=0" seems to come from > an HTML communication). I've attached my current version of the file to > this email. Let me know if you have problems with this. I will try to get > the working version up on the wiki. Peace, > > -Neil > > _______________________________________________ > Numpy-discussion mailing list > Numpy-discussion at scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nmb at wartburg.edu Thu May 28 13:13:04 2009 From: nmb at wartburg.edu (Neil Martinsen-Burrell) Date: Thu, 28 May 2009 12:13:04 -0500 Subject: [Numpy-discussion] example reading binary Fortran file In-Reply-To: References: <6df541a5c8d6ecf26b6e38f404958401.squirrel@webmail.uio.no> <200902031015.17490.faltet@pytables.org> <4A1EBB85.5080706@wartburg.edu> Message-ID: <4A1EC620.3070407@wartburg.edu> On 2009-05-28 12:11 , David Froger wrote: > Thank you very much :-) Things should be cleared up now on the wiki as well. Peace, -Neil From faltet at pytables.org Thu May 28 13:25:42 2009 From: faltet at pytables.org (Francesc Alted) Date: Thu, 28 May 2009 19:25:42 +0200 Subject: [Numpy-discussion] Benchmak on record arrays In-Reply-To: <1243438280.14931.30.camel@sulfur.loria.fr> References: <1243438280.14931.30.camel@sulfur.loria.fr> Message-ID: <200905281925.42827.faltet@pytables.org> A Wednesday 27 May 2009 17:31:20 Nicolas Rougier escrigu?: > Hi, > > I've written a very simple benchmark on recarrays: > > import numpy, time > > Z = numpy.zeros((100,100), dtype=numpy.float64) > Z_fast = numpy.zeros((100,100), dtype=[('x',numpy.float64), > ('y',numpy.int32)]) > Z_slow = numpy.zeros((100,100), dtype=[('x',numpy.float64), > ('y',numpy.bool)]) > > t = time.clock() > for i in range(10000): Z*Z > print time.clock()-t > > t = time.clock() > for i in range(10000): Z_fast['x']*Z_fast['x'] > print time.clock()-t > > t = time.clock() > for i in range(10000): Z_slow['x']*Z_slow['x'] > print time.clock()-t > > > And got the following results: > 0.23 > 0.37 > 3.96 > > Am I right in thinking that the last case is quite slow because of some > memory misalignment between float64 and bool or is there some machinery > behind that makes things slow in this case ? Should this be mentioned > somewhere in the recarray documentation ? Yes, I can reproduce your results, and I must admit that a 10x slowdown is a lot. However, I think that this affects mostly to small record arrays (i.e. those that fit in CPU cache), and mainly in benchmarks (precisely because they fit well in cache). You can simulate a more real-life scenario by defining a large recarray that do not fit in CPU's cache. For example: In [17]: Z = np.zeros((1000,1000), dtype=np.float64) # 8 MB object In [18]: Z_fast = np.zeros((1000,1000), dtype=[('x',np.float64), ('y',np.int64)]) # 16 MB object In [19]: Z_slow = np.zeros((1000,1000), dtype=[('x',np.float64), ('y',np.bool)]) # 9 MB object In [20]: x_fast = Z_fast['x'] In [21]: timeit x_fast * x_fast 100 loops, best of 3: 5.48 ms per loop In [22]: x_slow = Z_slow['x'] In [23]: timeit x_slow * x_slow 100 loops, best of 3: 14.4 ms per loop So, the slowdown is less than 3x, which is a more reasonable figure. If you need optimal speed for operating with unaligned columns, you can use numexpr. Here it is an example of what you can expect from it: In [24]: import numexpr as nx In [25]: timeit nx.evaluate('x_slow * x_slow') 100 loops, best of 3: 11.1 ms per loop So, the slowdown is just 2x instead of 3x, which is near optimal for the unaligned case. Numexpr also seems to help for small recarrays that fits in cache (i.e. for benchmarking purposes ;) : # Create a 160 KB object In [26]: Z_fast = np.zeros((100,100), dtype=[('x',np.float64),('y',np.int64)]) # Create a 110 KB object In [27]: Z_slow = np.zeros((100,100), dtype=[('x',np.float64),('y',np.bool)]) In [28]: x_fast = Z_fast['x'] In [29]: timeit x_fast * x_fast 10000 loops, best of 3: 20.7 ?s per loop In [30]: x_slow = Z_slow['x'] In [31]: timeit x_slow * x_slow 10000 loops, best of 3: 149 ?s per loop In [32]: timeit nx.evaluate('x_slow * x_slow') 10000 loops, best of 3: 45.3 ?s per loop Hope that helps, -- Francesc Alted From david.froger.info at gmail.com Thu May 28 13:48:51 2009 From: david.froger.info at gmail.com (David Froger) Date: Thu, 28 May 2009 19:48:51 +0200 Subject: [Numpy-discussion] example reading binary Fortran file In-Reply-To: References: <6df541a5c8d6ecf26b6e38f404958401.squirrel@webmail.uio.no> <200902031015.17490.faltet@pytables.org> <4A1EBB85.5080706@wartburg.edu> Message-ID: Sorry, I still don't understand how to use FortranFile ... ============ The fortran code ============ program writeArray implicit none integer,parameter:: nx=2,ny=5 real(4),dimension(nx,ny):: ux,uy,p integer :: i,j do i = 1,nx do j = 1,ny ux(i,j) = 100. + j+(i-1.)*10. uy(i,j) = 200. + j+(i-1.)*10. p(i,j) = 300. + j+(i-1.)*10. enddo enddo open(11,file='uxuyp.bin',form='unformatted') write(11) ux,uy write(11) p close(11) end program writeArray ============= The Python script ============= from fortranfile import FortranFile f = FortranFile('uxuyp.bin') x = f.readReals() ============= The output ============= Traceback (most recent call last): File "readArray.py", line 5, in x = f.readReals() File "/home/users/redone/file2/froger/travail/codes/lib/Tests/fortranread/fortranfile.py", line 181, in readReals data_str = self.readRecord() File "/home/users/redone/file2/froger/travail/codes/lib/Tests/fortranread/fortranfile.py", line 128, in readRecord raise IOError('Could not read enough data') IOError: Could not read enough dat => How to read the file 'uxuyp.bin' ? 2009/5/28 David Froger > Thank you very much :-) > > 2009/5/28 Neil Martinsen-Burrell > >> On 2009-05-28 09:32 , David Froger wrote: >> >>> Hy Neil Martinsen-Burrell, >>> >>> I'm trying the FortranFile class, >>> http://www.scipy.org/Cookbook/FortranIO/FortranFile >>> >>> It looks like there are some bug in the last revision (7): >>> >>> * There are errors cause by lines 60,61,63 in >>> >>> * There are indentation errors on lines 97 and 113. >>> >> >> There seem to have been some problems in putting the file on the wiki >> ("Proxy-Connection: keep-alive\nCache-Control: max-age=0" seems to come from >> an HTML communication). I've attached my current version of the file to >> this email. Let me know if you have problems with this. I will try to get >> the working version up on the wiki. Peace, >> >> -Neil >> >> _______________________________________________ >> Numpy-discussion mailing list >> Numpy-discussion at scipy.org >> http://mail.scipy.org/mailman/listinfo/numpy-discussion >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wnbell at gmail.com Thu May 28 16:45:01 2009 From: wnbell at gmail.com (Nathan Bell) Date: Thu, 28 May 2009 16:45:01 -0400 Subject: [Numpy-discussion] sparse matrix dot product In-Reply-To: <1243520052.14931.36.camel@sulfur.loria.fr> References: <1243520052.14931.36.camel@sulfur.loria.fr> Message-ID: On Thu, May 28, 2009 at 10:14 AM, Nicolas Rougier wrote: > > Obviously, the last computation is not a dot product, but I got no > warning at all. Is that the expected behavior ? > Sparse matrices make no attempt to work with numpy functions like dot(), so I'm not sure what is happening there. > > By the way, I wrote a speed benchmark for dot product using the > different flavors of sparse matrices and I wonder if it should go > somewhere in documentation (in anycase, if anyone interested, I can post > the benchmark program and result). > Sparse dot products will ultimately map to sparse matrix multiplication, so I'd imagine your best bet is to use A.T * B (for column matrices A and B in csc_matrix format). -- Nathan Bell wnbell at gmail.com http://graphics.cs.uiuc.edu/~wnbell/ From pgmdevlist at gmail.com Thu May 28 18:15:44 2009 From: pgmdevlist at gmail.com (Pierre GM) Date: Thu, 28 May 2009 18:15:44 -0400 Subject: [Numpy-discussion] Nested recarrays with subarrays and loadtxt: a bug in loadtxt? In-Reply-To: <9457e7c80905271629x3ef89819s163f7f700966721d@mail.gmail.com> References: <9457e7c80905271557l4eb5e413t966d9f6beee05538@mail.gmail.com> <9457e7c80905271629x3ef89819s163f7f700966721d@mail.gmail.com> Message-ID: On May 27, 2009, at 7:29 PM, St?fan van der Walt wrote: > Hi Fernando > > 2009/5/28 Fernando Perez : >> Well, since dtypes allow for nesting full arrays in this fashion, >> where I can say that the 'block' field can have (2,3) shape, it seems >> like it would be nice to be able to express this nesting into loading >> of plain text files as well. > > I think that would be very useful. Please verify whether > > http://projects.scipy.org/numpy/changeset/7022 I fixed it for genfromtxt as well (r7023). Should we backport the changes ? From david at ar.media.kyoto-u.ac.jp Thu May 28 21:56:15 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Fri, 29 May 2009 10:56:15 +0900 Subject: [Numpy-discussion] numpy ufuncs and COREPY - any info? In-Reply-To: <4A1E84C3.1040909@indiana.edu> References: <4A1B932C.30504@ar.media.kyoto-u.ac.jp> <4A1BEB3F.2020809@indiana.edu> <200905280844.48853.faltet@pytables.org> <4A1E36F4.2010209@ar.media.kyoto-u.ac.jp> <4A1E84C3.1040909@indiana.edu> Message-ID: <4A1F40BF.8080807@ar.media.kyoto-u.ac.jp> Andrew Friedley wrote: > David Cournapeau wrote: > >> Francesc Alted wrote: >> >>> No, that seems good enough. But maybe you can present results in cycles/item. >>> This is a relatively common unit and has the advantage that it does not depend >>> on the frequency of your cores. >>> > > Sure, cycles is fine, but I'll argue that in this case the number still > does depend on the frequency of the cores, particularly as it relates to > the frequency of the memory bus/controllers. A processor with a higher > clock rate and higher multiplier may show lower performance when > measuring in cycles because the memory bandwidth has not necessarily > increased, only the CPU clock rate. Plus between say a xeon and opteron > you will have different SSE performance characteristics. So really, any > sole number/unit is not sufficient without also describing the system it > was obtained on :) > Yes, that's why people usually add the CPU type with the cycles/operation count :) It makes comparison easier. Sure, the comparison is not accurate because differences in CPU may make a difference. But with cycles/computation, we could see right away that something was strange with the numpy timing, so I think it is a better representation for discussion/comoparison. > > I can do minimum. My motivation for average was to show a common-case > performance an application might see. If that application executes the > ufunc many times, the performance will tend towards the average. > The rationale for minimum is to remove external factors like other tasks taking CPU, etc... > > I was waiting for someone to bring this up :) I used an implementation > that I'm now thinking is not accurate enough for scientific use. But > the question is, what is a concrete measure for determining whether some > cosine (or other function) implementation is accurate enough? Nan/inf/zero handling should be tested for every function (the exact behavior for standard functions is part of the C standard), and then, the particular values depend on the function and implementation. If your implementation has several codepath, each codepath should be tested. But really, most implementations just test for a few more or less random known values. I know the GNU libc has some tests for the math library, for example. For single precision, brute force testing against a reference implementation for every possible input is actually feasible, too :) David From stefan at sun.ac.za Fri May 29 03:14:15 2009 From: stefan at sun.ac.za (=?ISO-8859-1?Q?St=E9fan_van_der_Walt?=) Date: Fri, 29 May 2009 09:14:15 +0200 Subject: [Numpy-discussion] Benchmak on record arrays In-Reply-To: <1243520097.14931.37.camel@sulfur.loria.fr> References: <1243438280.14931.30.camel@sulfur.loria.fr> <90DEC74F-9C1A-4E12-8416-B16827934A1F@loria.fr> <9457e7c80905280221p75b8d3d6m9fdf96e908e2b5ff@mail.gmail.com> <1243520097.14931.37.camel@sulfur.loria.fr> Message-ID: <9457e7c80905290014q700fa0acm34826a1e22c896a1@mail.gmail.com> Thanks, Nicolas. Your username has been changed to "NicolasRougier" and you can now edit the docs. Regards St?fan 2009/5/28 Nicolas Rougier : > > > I just created the account. > > Nicolas From dagss at student.matnat.uio.no Fri May 29 04:36:41 2009 From: dagss at student.matnat.uio.no (Dag Sverre Seljebotn) Date: Fri, 29 May 2009 10:36:41 +0200 Subject: [Numpy-discussion] Numpy vs PIL in image statistics In-Reply-To: References: <3d375d730905271556n3f5b1b9aw551e485a0ab0192@mail.gmail.com> <4A1E4DE4.6090200@ar.media.kyoto-u.ac.jp> Message-ID: <4A1F9E99.6040206@student.matnat.uio.no> cp wrote: >> I don't know anything about PIL and its implementation, but I would not >> be surprised if the cost is mostly accessing items which are not >> contiguous in memory and bounds checking ( to check where you are in the >> subimage). Conditional inside loops often kills performances, and the >> actual computation (one addition/item for naive average implementation) >> is negligeable in this case. >> > > This would definitely be the case in sub-images. However, coming back to my > original question, how do you explain that although PIL is extremely fast in > large images (2000x2000), numpy is much faster when it comes down to very small > images (I tested with 10x10 image files)? > I've heard rumors that PIL stores it's images as pointers-to-rows (or cols), i.e. to access a new row you need to dereference a pointer. NumPy on the other hand always stores its memory in blocks. When N grows larger, the N pointer lookups needed in PIL doesn't matter, but they do for low N. Dag Sverre From Nicolas.Rougier at loria.fr Fri May 29 07:52:16 2009 From: Nicolas.Rougier at loria.fr (Nicolas Rougier) Date: Fri, 29 May 2009 13:52:16 +0200 Subject: [Numpy-discussion] sparse matrix dot product In-Reply-To: References: <1243520052.14931.36.camel@sulfur.loria.fr> Message-ID: <1243597936.14931.66.camel@sulfur.loria.fr> Hi, I tried to post results but the file is too big, anyway, here is the benchmark program if you want to run it: Nicolas ----- import time import numpy from scipy import sparse def benchmark(xtype = 'numpy.array', xdensity = 0.1, ytype = 'numpy.array', ydensity = 1.0, n = 1000): x = numpy.zeros((n,n), dtype = numpy.float64) xi = int(n*n*xdensity) x.reshape(n*n)[0:xi] = numpy.random.random((xi,)) y = numpy.zeros((n,1), dtype = numpy.float64) yi = int(n*ydensity) y.reshape(n)[0:yi] = numpy.random.random((yi,)) x = eval('%s(x)' % xtype) y = eval('%s(y)' % ytype) t0 = time.clock() if xtype == 'numpy.array' and ytype == 'numpy.array': for i in range(1000): z = numpy.dot(x,y) else: for i in range(1000): z = x*y tf = time.clock() - t0 text = '' text += (xtype + ' '*20)[0:20] text += (ytype + ' '*20)[0:20] text += '%4dx%4d %4dx%4d %.2f %.2f %.2f' % (n,n,n,1,xdensity, ydensity, tf) return text xtypes = ['numpy.array', 'numpy.matrix', 'sparse.lil_matrix', 'sparse.csr_matrix', 'sparse.csc_matrix'] ytypes = ['numpy.array', 'numpy.matrix', 'sparse.lil_matrix', 'sparse.csr_matrix', 'sparse.csc_matrix'] xdensities = [0.01, 0.10, 0.25, 0.50, 1.00] ydensities = [1.00] print '=================== =================== =========== =========== =========== =========== =======' print 'X type Y type X size Y size X density Y density Time ' print '------------------- ------------------- ----------- ----------- ----------- ----------- -------' n = 100 for xdensity in xdensities: for ydensity in ydensities: for xtype in xtypes: for ytype in ytypes: print benchmark(xtype, xdensity, ytype, ydensity, n) print '------------------- ------------------- ----------- ----------- ----------- ----------- -------' n = 1000 for xdensity in xdensities: for ydensity in ydensities: for xtype in xtypes: for ytype in ytypes: print benchmark(xtype, xdensity, ytype, ydensity, n) print '------------------- ------------------- ----------- ----------- ----------- ----------- -------' print '=================== =================== =========== =========== =========== =========== =======' -------------- next part -------------- An HTML attachment was scrubbed... URL: From Nicolas.Rougier at loria.fr Fri May 29 07:53:12 2009 From: Nicolas.Rougier at loria.fr (Nicolas Rougier) Date: Fri, 29 May 2009 13:53:12 +0200 Subject: [Numpy-discussion] Benchmak on record arrays In-Reply-To: <200905281925.42827.faltet@pytables.org> References: <1243438280.14931.30.camel@sulfur.loria.fr> <200905281925.42827.faltet@pytables.org> Message-ID: <1243597992.14931.67.camel@sulfur.loria.fr> Thank for the clear answer, it definitely helps. Nicolas On Thu, 2009-05-28 at 19:25 +0200, Francesc Alted wrote: > A Wednesday 27 May 2009 17:31:20 Nicolas Rougier escrigu?: > > Hi, > > > > I've written a very simple benchmark on recarrays: > > > > import numpy, time > > > > Z = numpy.zeros((100,100), dtype=numpy.float64) > > Z_fast = numpy.zeros((100,100), dtype=[('x',numpy.float64), > > ('y',numpy.int32)]) > > Z_slow = numpy.zeros((100,100), dtype=[('x',numpy.float64), > > ('y',numpy.bool)]) > > > > t = time.clock() > > for i in range(10000): Z*Z > > print time.clock()-t > > > > t = time.clock() > > for i in range(10000): Z_fast['x']*Z_fast['x'] > > print time.clock()-t > > > > t = time.clock() > > for i in range(10000): Z_slow['x']*Z_slow['x'] > > print time.clock()-t > > > > > > And got the following results: > > 0.23 > > 0.37 > > 3.96 > > > > Am I right in thinking that the last case is quite slow because of some > > memory misalignment between float64 and bool or is there some machinery > > behind that makes things slow in this case ? Should this be mentioned > > somewhere in the recarray documentation ? > > Yes, I can reproduce your results, and I must admit that a 10x slowdown is a > lot. However, I think that this affects mostly to small record arrays (i.e. > those that fit in CPU cache), and mainly in benchmarks (precisely because they > fit well in cache). You can simulate a more real-life scenario by defining a > large recarray that do not fit in CPU's cache. For example: > > In [17]: Z = np.zeros((1000,1000), dtype=np.float64) # 8 MB object > > In [18]: Z_fast = np.zeros((1000,1000), dtype=[('x',np.float64), > ('y',np.int64)]) # 16 MB object > > In [19]: Z_slow = np.zeros((1000,1000), dtype=[('x',np.float64), > ('y',np.bool)]) # 9 MB object > > In [20]: x_fast = Z_fast['x'] > In [21]: timeit x_fast * x_fast > 100 loops, best of 3: 5.48 ms per loop > > In [22]: x_slow = Z_slow['x'] > > In [23]: timeit x_slow * x_slow > 100 loops, best of 3: 14.4 ms per loop > > So, the slowdown is less than 3x, which is a more reasonable figure. If you > need optimal speed for operating with unaligned columns, you can use numexpr. > Here it is an example of what you can expect from it: > > In [24]: import numexpr as nx > > In [25]: timeit nx.evaluate('x_slow * x_slow') > 100 loops, best of 3: 11.1 ms per loop > > So, the slowdown is just 2x instead of 3x, which is near optimal for the > unaligned case. > > Numexpr also seems to help for small recarrays that fits in cache (i.e. for > benchmarking purposes ;) : > > # Create a 160 KB object > In [26]: Z_fast = np.zeros((100,100), dtype=[('x',np.float64),('y',np.int64)]) > # Create a 110 KB object > In [27]: Z_slow = np.zeros((100,100), dtype=[('x',np.float64),('y',np.bool)]) > > In [28]: x_fast = Z_fast['x'] > > In [29]: timeit x_fast * x_fast > 10000 loops, best of 3: 20.7 ?s per loop > > In [30]: x_slow = Z_slow['x'] > > In [31]: timeit x_slow * x_slow > 10000 loops, best of 3: 149 ?s per loop > > In [32]: timeit nx.evaluate('x_slow * x_slow') > 10000 loops, best of 3: 45.3 ?s per loop > > Hope that helps, > From david.froger.info at gmail.com Fri May 29 11:12:06 2009 From: david.froger.info at gmail.com (David Froger) Date: Fri, 29 May 2009 17:12:06 +0200 Subject: [Numpy-discussion] example reading binary Fortran file In-Reply-To: References: <6df541a5c8d6ecf26b6e38f404958401.squirrel@webmail.uio.no> <200902031015.17490.faltet@pytables.org> <4A1EBB85.5080706@wartburg.edu> Message-ID: I think the FortranFile class is not intended to read arrays written with the syntax 'write(11) array1, array2, array3' (correct me if I'm wrong). This is the use in the laboratory where I'm currently completing a phd. I'm going to dive into struc, FotranFile etc.. to propose something convenient for people who have to read unformatted binary fortran file very often. 2009/5/28 David Froger > Sorry, I still don't understand how to use FortranFile ... > > ============ > The fortran code > ============ > > program writeArray > > implicit none > integer,parameter:: nx=2,ny=5 > real(4),dimension(nx,ny):: ux,uy,p > integer :: i,j > > do i = 1,nx > do j = 1,ny > ux(i,j) = 100. + j+(i-1.)*10. > uy(i,j) = 200. + j+(i-1.)*10. > p(i,j) = 300. + j+(i-1.)*10. > enddo > enddo > > open(11,file='uxuyp.bin',form='unformatted') > write(11) ux,uy > write(11) p > close(11) > > end program writeArray > > ============= > The Python script > ============= > > from fortranfile import FortranFile > > f = FortranFile('uxuyp.bin') > > x = f.readReals() > > ============= > The output > ============= > > Traceback (most recent call last): > File "readArray.py", line 5, in > x = f.readReals() > File > "/home/users/redone/file2/froger/travail/codes/lib/Tests/fortranread/fortranfile.py", > line 181, in readReals > data_str = self.readRecord() > File > "/home/users/redone/file2/froger/travail/codes/lib/Tests/fortranread/fortranfile.py", > line 128, in readRecord > raise IOError('Could not read enough data') > IOError: Could not read enough dat > > > => How to read the file 'uxuyp.bin' ? > > > 2009/5/28 David Froger > > Thank you very much :-) >> >> 2009/5/28 Neil Martinsen-Burrell >> >>> On 2009-05-28 09:32 , David Froger wrote: >>> >>>> Hy Neil Martinsen-Burrell, >>>> >>>> I'm trying the FortranFile class, >>>> http://www.scipy.org/Cookbook/FortranIO/FortranFile >>>> >>>> It looks like there are some bug in the last revision (7): >>>> >>>> * There are errors cause by lines 60,61,63 in >>>> >>>> * There are indentation errors on lines 97 and 113. >>>> >>> >>> There seem to have been some problems in putting the file on the wiki >>> ("Proxy-Connection: keep-alive\nCache-Control: max-age=0" seems to come from >>> an HTML communication). I've attached my current version of the file to >>> this email. Let me know if you have problems with this. I will try to get >>> the working version up on the wiki. Peace, >>> >>> -Neil >>> >>> _______________________________________________ >>> Numpy-discussion mailing list >>> Numpy-discussion at scipy.org >>> http://mail.scipy.org/mailman/listinfo/numpy-discussion >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dineshbvadhia at hotmail.com Fri May 29 19:02:06 2009 From: dineshbvadhia at hotmail.com (Dinesh B Vadhia) Date: Fri, 29 May 2009 16:02:06 -0700 Subject: [Numpy-discussion] Installation problem with Numpy 1.3 on Windows AMD64 Message-ID: Hi! I just upgraded to Python 2.6.2 (from 2.5) on Windows AMD64 in order to use Numpy 1.3 for AMD64 and got the following error: - pythonw.exe has stopped working Numpy was installed both per-machine and per-user but the error persists. Python 2.6.2 works without Numpy. Any ideas? Dinesh -------------- next part -------------- An HTML attachment was scrubbed... URL: From nmb at wartburg.edu Fri May 29 22:55:50 2009 From: nmb at wartburg.edu (Neil Martinsen-Burrell) Date: Fri, 29 May 2009 21:55:50 -0500 Subject: [Numpy-discussion] example reading binary Fortran file In-Reply-To: References: <6df541a5c8d6ecf26b6e38f404958401.squirrel@webmail.uio.no> <200902031015.17490.faltet@pytables.org> <4A1EBB85.5080706@wartburg.edu> Message-ID: <4A20A036.3030804@wartburg.edu> On 2009-05-29 10:12 , David Froger wrote: > I think the FortranFile class is not intended to read arrays written > with the syntax 'write(11) array1, array2, array3' (correct me if I'm > wrong). This is the use in the laboratory where I'm currently > completing a phd. You're half wrong. FortranFile can read arrays written as above, but it sees them as a single real array. So, with the attached Fortran program:: In [1]: from fortranfile import FortranFile In [2]: f = FortranFile('uxuyp.bin', endian='<') # Original bug was incorrect byte order In [3]: u = f.readReals() In [4]: u.shape Out[4]: (20,) In [5]: u Out[5]: array([ 101., 111., 102., 112., 103., 113., 104., 114., 105., 115., 201., 211., 202., 212., 203., 213., 204., 214., 205., 215.], dtype=float32) In [6]: ux = u[:10].reshape(2,5); uy = u[10:].reshape(2,5) In [7]: p = f.readReals().reshape(2,5) In [8]: ux, uy, p Out[8]: (array([[ 101., 111., 102., 112., 103.], [ 113., 104., 114., 105., 115.]], dtype=float32), array([[ 201., 211., 202., 212., 203.], [ 213., 204., 214., 205., 215.]], dtype=float32), array([[ 301., 311., 302., 312., 303.], [ 313., 304., 314., 305., 315.]], dtype=float32)) What doesn't currently work is to have arrays of mixed types in the same write statement, e.g. integer :: index(10) real :: x(10,10) ... write(13) x, index To address the original problem, I've changed the code to default to the native byte-ordering (f.ENDIAN='@') and to be more informative about what happened in the error. In the latest version (attached): In [1]: from fortranfile import FortranFile In [2]: f = FortranFile('uxuyp.bin', endian='>') # incorrect endian-ness In [3]: u = f.readReals() IOError: Could not read enough data. Wanted 1342177280 bytes, got 132 and hopefully when people see crazy big numbers like 1.34e9 they will think of byte order problems. > I'm going to dive into struc, FotranFile etc.. to propose something > convenient for people who have to read unformatted binary fortran file > very often. Awesome! The thoughts banging around in my head right now are that some sort of mini-language that encapsulates the content of the declarations and the write statements should allow one to tease out exactly which struct call will unpack the right information. f2py has some fortran parsing capabilities, so you might be able to use the fortran itself as the mini-language. Something like spec = fortranfile.OutputSpecification(\ """real(4),dimension(2,5):: ux,uy write(11) ux,uy""") ux, uy = fortranfile.FortranFile('uxuyp.bin').readSpec(spec) Best of luck. Peace, -Neil -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: fortranfile.py URL: From cournape at gmail.com Sat May 30 05:14:52 2009 From: cournape at gmail.com (David Cournapeau) Date: Sat, 30 May 2009 18:14:52 +0900 Subject: [Numpy-discussion] Installation problem with Numpy 1.3 on Windows AMD64 In-Reply-To: References: Message-ID: <5b8d13220905300214k1ad98fbbj8a2d2bdb69958b38@mail.gmail.com> On Sat, May 30, 2009 at 8:02 AM, Dinesh B Vadhia wrote: > Hi!? I just upgraded to Python 2.6.2 (from 2.5) on Windows AMD64 in order to > use Numpy 1.3 for AMD64 and got the following error: > > - pythonw.exe has stopped working > > Numpy was installed both per-machine and per-user but the error persists. > Python 2.6.2 works without Numpy. The 64 bits build is experimental on windows - there is one fundamental issue which I have not been able to nail down yet. It may be a numpy bug, a python bug, or a mingw-w64 bug, I am not sure yet. But it means that under some conditions, numpy crashes before import. If you need a working 64 bits numpy, the best bet is to build it by yourself using the MS compilers (the 64 bits compilers are available for free if you install the platform SDK 6.0a or later). But then you won't be able to build scipy on top of it unless you manage to use a 64 bits fortran compiler. David From dineshbvadhia at hotmail.com Sat May 30 16:36:52 2009 From: dineshbvadhia at hotmail.com (Dinesh B Vadhia) Date: Sat, 30 May 2009 13:36:52 -0700 Subject: [Numpy-discussion] Installation problem with Numpy 1.3 onWindows AMD64 In-Reply-To: <5b8d13220905300214k1ad98fbbj8a2d2bdb69958b38@mail.gmail.com> References: <5b8d13220905300214k1ad98fbbj8a2d2bdb69958b38@mail.gmail.com> Message-ID: re: "But it means that under some conditions, numpy crashes before import." It it helps the debugging, I have a standard Windows 64-bit configuration. Please let me know when this build is fixed ... Cheers Dinesh From: David Cournapeau Sent: Saturday, May 30, 2009 2:14 AM To: Discussion of Numerical Python Subject: Re: [Numpy-discussion] Installation problem with Numpy 1.3 onWindows AMD64 On Sat, May 30, 2009 at 8:02 AM, Dinesh B Vadhia wrote: > Hi! I just upgraded to Python 2.6.2 (from 2.5) on Windows AMD64 in order to > use Numpy 1.3 for AMD64 and got the following error: > > - pythonw.exe has stopped working > > Numpy was installed both per-machine and per-user but the error persists. > Python 2.6.2 works without Numpy. The 64 bits build is experimental on windows - there is one fundamental issue which I have not been able to nail down yet. It may be a numpy bug, a python bug, or a mingw-w64 bug, I am not sure yet. But it means that under some conditions, numpy crashes before import. If you need a working 64 bits numpy, the best bet is to build it by yourself using the MS compilers (the 64 bits compilers are available for free if you install the platform SDK 6.0a or later). But then you won't be able to build scipy on top of it unless you manage to use a 64 bits fortran compiler. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From david.froger.info at gmail.com Sat May 30 18:49:00 2009 From: david.froger.info at gmail.com (David Froger) Date: Sun, 31 May 2009 00:49:00 +0200 Subject: [Numpy-discussion] example reading binary Fortran file In-Reply-To: <4A20A036.3030804@wartburg.edu> References: <200902031015.17490.faltet@pytables.org> <4A1EBB85.5080706@wartburg.edu> <4A20A036.3030804@wartburg.edu> Message-ID: > > You're half wrong. FortranFile can read arrays written as above, but it > sees them as a single real array. So, with the attached Fortran program:: > > In [1]: from fortranfile import FortranFile > > In [2]: f = FortranFile('uxuyp.bin', endian='<') # Original bug was > incorrect byte order > > In [3]: u = f.readReals() > > In [4]: u.shape > Out[4]: (20,) > > In [5]: u > Out[5]: > array([ 101., 111., 102., 112., 103., 113., 104., 114., 105., > 115., 201., 211., 202., 212., 203., 213., 204., 214., > 205., 215.], dtype=float32) > > In [6]: ux = u[:10].reshape(2,5); uy = u[10:].reshape(2,5) > > In [7]: p = f.readReals().reshape(2,5) > > In [8]: ux, uy, p > Out[8]: > (array([[ 101., 111., 102., 112., 103.], > [ 113., 104., 114., 105., 115.]], dtype=float32), > array([[ 201., 211., 202., 212., 203.], > [ 213., 204., 214., 205., 215.]], dtype=float32), > array([[ 301., 311., 302., 312., 303.], > [ 313., 304., 314., 305., 315.]], dtype=float32)) ok! That's exactlly what I was looking for, thank you. Awesome! The thoughts banging around in my head right now are that some > sort of mini-language that encapsulates the content of the declarations and > the write statements should allow one to tease out exactly which struct call > will unpack the right information. f2py has some fortran parsing > capabilities, so you might be able to use the fortran itself as the > mini-language. Something like > > spec = fortranfile.OutputSpecification(\ > """real(4),dimension(2,5):: ux,uy > write(11) ux,uy""") > ux, uy = fortranfile.FortranFile('uxuyp.bin').readSpec(spec) > whouhou, I'm really enthusiastic, I love this solution!!! I begin to code it... I'll give news around 1 june, (somethings to finish before). One more time, thanks for this help! best, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From rjsteed at talk21.com Sun May 31 13:54:48 2009 From: rjsteed at talk21.com (rob steed) Date: Sun, 31 May 2009 17:54:48 +0000 (GMT) Subject: [Numpy-discussion] Problem with correlate In-Reply-To: References: Message-ID: <871697.92371.qm@web86003.mail.ird.yahoo.com> Hi, After my previous email, I have opened a ticket #1117 (correlate not order dependent) I have found that the correlate function is defined in multiarraymodule.c and that inputs are being swapped using the following code n1 = ap1->dimensions[0]; n2 = ap2->dimensions[0]; if (n1 < n2) { ret = ap1; ap1 = ap2; ap2 = ret; ret = NULL; i = n1; n1 = n2; n2 = i; } I do not know the code well enough to see whether this could just be removed (I don't know c either). Maybe the algorithmn requires the inputs to be length ordered? I will try to work it out. Regards Rob From charlesr.harris at gmail.com Sun May 31 15:47:55 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 31 May 2009 13:47:55 -0600 Subject: [Numpy-discussion] Problem with correlate In-Reply-To: <871697.92371.qm@web86003.mail.ird.yahoo.com> References: <871697.92371.qm@web86003.mail.ird.yahoo.com> Message-ID: On Sun, May 31, 2009 at 11:54 AM, rob steed wrote: > > Hi, > After my previous email, I have opened a ticket #1117 (correlate not order > dependent) > > I have found that the correlate function is defined in multiarraymodule.c > and > that inputs are being swapped using the following code > > n1 = ap1->dimensions[0]; > n2 = ap2->dimensions[0]; > if (n1 < n2) { > ret = ap1; > ap1 = ap2; > ap2 = ret; > ret = NULL; > i = n1; > n1 = n2; > n2 = i; > } > > I do not know the code well enough to see whether this could just be > removed (I don't know c either). > Maybe the algorithmn requires the inputs to be length ordered? I will try > to work it out. > If the correlation algorithm doesn't use an fft and is done explicitly, then the maximum overlap for any shift is the length of the shortest input. Swapping the arrays makes that logic easier to implement, but it isn't necessary. Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From ralf.gommers at googlemail.com Sun May 31 20:19:37 2009 From: ralf.gommers at googlemail.com (Ralf Gommers) Date: Sun, 31 May 2009 20:19:37 -0400 Subject: [Numpy-discussion] resetting set_string_function Message-ID: Hi, There seems to be a bug in set_string_function when resetting the formatting function to the default. After doing that the dtype of the array that is printed is the character string, not the numpy type. Example: In [1]: a=arange(10, dtype=uint16) In [2]: a Out[2]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], dtype=uint16) In [3]: set_string_function(lambda x: str(x*2)) In [4]: a Out[4]: [ 0 2 4 6 8 10 12 14 16 18] In [5]: set_string_function(None) # reset to default In [6]: a Out[6]: array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'H') The functionality to reset to default was introduced here: http://projects.scipy.org/numpy/ticket/351 Should I open a trac ticket, or am I missing something here? Cheers, Ralf -------------- next part -------------- An HTML attachment was scrubbed... URL: From thomas.robitaille at gmail.com Sun May 31 20:39:06 2009 From: thomas.robitaille at gmail.com (Thomas Robitaille) Date: Sun, 31 May 2009 20:39:06 -0400 Subject: [Numpy-discussion] Rasterizing points onto an array Message-ID: <96F50D68-C7F4-4F27-92B4-4CFE7ED01CBA@gmail.com> Hi, I have a set of n points with real coordinates between 0 and 1, given by two numpy arrays x and y, with a value at each point represented by a third array z. I am trying to then rasterize the points onto a grid of size npix*npix. So I can start by converting x and y to integer pixel coordinates ix and iy. But my question is, is there an efficient way to add z[i] to the pixel given by (xi[i],yi[i])? Below is what I am doing at the moment, but the for loop becomes very inefficient for large n. I would imagine that there is a way to do this without using a loop? --- import numpy as np n = 10000000 x = np.random.random(n) y = np.random.random(n) z = np.random.random(n) npix = 100 ix = np.array(x*float(npix),int) iy = np.array(y*float(npix),int) image = np.zeros((npix,npix)) for i in range(len(ix)): image[ix[i],iy[i]] = image[ix[i],iy[i]] + z[i] --- Thanks for any advice, Thomas From david at ar.media.kyoto-u.ac.jp Sun May 31 21:18:31 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Mon, 01 Jun 2009 10:18:31 +0900 Subject: [Numpy-discussion] Problem with correlate In-Reply-To: References: <871697.92371.qm@web86003.mail.ird.yahoo.com> Message-ID: <4A232C67.2080206@ar.media.kyoto-u.ac.jp> Charles R Harris wrote: > > > On Sun, May 31, 2009 at 11:54 AM, rob steed > wrote: > > > Hi, > After my previous email, I have opened a ticket #1117 (correlate > not order dependent) > > I have found that the correlate function is defined in > multiarraymodule.c and > that inputs are being swapped using the following code > > n1 = ap1->dimensions[0]; > n2 = ap2->dimensions[0]; > if (n1 < n2) { > ret = ap1; > ap1 = ap2; > ap2 = ret; > ret = NULL; > i = n1; > n1 = n2; > n2 = i; > } > > I do not know the code well enough to see whether this could just > be removed (I don't know c either). > Maybe the algorithmn requires the inputs to be length ordered? I > will try to work it out. > > > If the correlation algorithm doesn't use an fft and is done > explicitly, then the maximum overlap for any shift is the length of > the shortest input. Swapping the arrays makes that logic easier to > implement, but it isn't necessary. But this logic is also wrong if the swapping is not taken into account - as the OP mentioned, correlate(a, b) is not equal to correlate(b, a) in the general case. The output is reversed in the second case compared to the first case. cheers, David From charlesr.harris at gmail.com Sun May 31 21:45:56 2009 From: charlesr.harris at gmail.com (Charles R Harris) Date: Sun, 31 May 2009 19:45:56 -0600 Subject: [Numpy-discussion] Problem with correlate In-Reply-To: <4A232C67.2080206@ar.media.kyoto-u.ac.jp> References: <871697.92371.qm@web86003.mail.ird.yahoo.com> <4A232C67.2080206@ar.media.kyoto-u.ac.jp> Message-ID: On Sun, May 31, 2009 at 7:18 PM, David Cournapeau < david at ar.media.kyoto-u.ac.jp> wrote: > Charles R Harris wrote: > > > > > > On Sun, May 31, 2009 at 11:54 AM, rob steed > > wrote: > > > > > > Hi, > > After my previous email, I have opened a ticket #1117 (correlate > > not order dependent) > > > > I have found that the correlate function is defined in > > multiarraymodule.c and > > that inputs are being swapped using the following code > > > > n1 = ap1->dimensions[0]; > > n2 = ap2->dimensions[0]; > > if (n1 < n2) { > > ret = ap1; > > ap1 = ap2; > > ap2 = ret; > > ret = NULL; > > i = n1; > > n1 = n2; > > n2 = i; > > } > > > > I do not know the code well enough to see whether this could just > > be removed (I don't know c either). > > Maybe the algorithmn requires the inputs to be length ordered? I > > will try to work it out. > > > > > > If the correlation algorithm doesn't use an fft and is done > > explicitly, then the maximum overlap for any shift is the length of > > the shortest input. Swapping the arrays makes that logic easier to > > implement, but it isn't necessary. > > But this logic is also wrong if the swapping is not taken into account - > as the OP mentioned, correlate(a, b) is not equal to correlate(b, a) in > the general case. The output is reversed in the second case compared to > the first case. > I didn't say it was *correctly* implemented ;) Chuck -------------- next part -------------- An HTML attachment was scrubbed... URL: From david at ar.media.kyoto-u.ac.jp Sun May 31 23:08:51 2009 From: david at ar.media.kyoto-u.ac.jp (David Cournapeau) Date: Mon, 01 Jun 2009 12:08:51 +0900 Subject: [Numpy-discussion] Problem with correlate In-Reply-To: References: <871697.92371.qm@web86003.mail.ird.yahoo.com> <4A232C67.2080206@ar.media.kyoto-u.ac.jp> Message-ID: <4A234643.6040606@ar.media.kyoto-u.ac.jp> Charles R Harris wrote: > > > On Sun, May 31, 2009 at 7:18 PM, David Cournapeau > > > wrote: > > Charles R Harris wrote: > > > > > > On Sun, May 31, 2009 at 11:54 AM, rob steed > > >> wrote: > > > > > > Hi, > > After my previous email, I have opened a ticket #1117 (correlate > > not order dependent) > > > > I have found that the correlate function is defined in > > multiarraymodule.c and > > that inputs are being swapped using the following code > > > > n1 = ap1->dimensions[0]; > > n2 = ap2->dimensions[0]; > > if (n1 < n2) { > > ret = ap1; > > ap1 = ap2; > > ap2 = ret; > > ret = NULL; > > i = n1; > > n1 = n2; > > n2 = i; > > } > > > > I do not know the code well enough to see whether this could > just > > be removed (I don't know c either). > > Maybe the algorithmn requires the inputs to be length ordered? I > > will try to work it out. > > > > > > If the correlation algorithm doesn't use an fft and is done > > explicitly, then the maximum overlap for any shift is the length of > > the shortest input. Swapping the arrays makes that logic easier to > > implement, but it isn't necessary. > > But this logic is also wrong if the swapping is not taken into > account - > as the OP mentioned, correlate(a, b) is not equal to correlate(b, > a) in > the general case. The output is reversed in the second case > compared to > the first case. > > > I didn't say it was *correctly* implemented ;) :) So I gave it a shot http://github.com/cournape/numpy/commits/fix_correlate (It took me a while to realize that PyArray_ISFLEXIBLE returns false for array object. Is this expected ? The documentation concerning copyswap says that it is necessary for flexible arrays, but I think it is necessary for object arrays as well). It still bothers me that correlate does not conjugate the second argument for complex arrays... cheers, David